Generative Models for Biological Sequences
Overview
Biological sequence design requires generation that respects the physical laws governing molecular structure and function. I study how to incorporate biophysical constraints — thermodynamic stability, stereochemistry, binding geometry — into generative models (diffusion, autoregressive, flow-based) for molecules, RNA, and proteins, so that designed sequences are not just plausible but experimentally viable.
This involves encoding 3D structural constraints into sequence-level models, steering generation toward desired properties in data-scarce regimes, and developing pretraining strategies that help foundation models internalize the physical rules of biology.