My research is broadly in the area of computational biology, with particular emphasis on models and algorithms for studying complex systems in biology. My group is currently pursuing two major areas:
Genetic variation analysis. Since the human genome was first sequenced in 2002, the field of human genetics has turned its attention to identifying the millions of small differences that distinguish one human being from another. Most of these differences are in the form of single DNA bases that vary from one person to another, which are called single nucleotide polymorphisms (SNPs). My group has worked on models and algorithms to analyze large datasets of SNPs and infer evolutionary trees (phylogenies) and more complex population models that tell us how modern human populations arose from the earliest human ancestors and how our genome has evolved over that time. Our largest area of research in recent years has been the development of similar phylogenetic and population genetic methods to study evolution of cell populations in tumors.
Algorithms for macromolecular assembly simulation. One of the recurring features of molecular biology is self-assembly, a process by which isolated molecules spontaneously join together to build structures or molecular machines. Self-assembly is required for nearly every important function a cell undergoes, including division, movement, shape control, and synthesis and degradation of DNA, RNA, and proteins. Biological self-assembly systems are also an important model for the development of novel nanotechnology. They are, however, very challenging to standard methods for simulating biochemistry because of their large size and the long time scales on which they operate. My group develops algorithms to accelerate stochastic models of these assembly systems, builds simulation systems based on these algorithms, and applies them to investigate properties of assembly systems that are difficult to explore through laboratory experiment. In recent years, we have particularly been interested in combining such models with numerical optimization algorithms to fit stochastic models to experimental data and, in the process, learn how these systems function at much finer scales than can be measured experimentally.
In addition to these two core areas, we are involved in many side projects, usually in collaboration with experimentalists. Over the past few years, these projects have included work on modeling biomedical systems, developing more realistic models of biochemistry in the cell, and developing new methods for deconvolving complex genomic data sets from heterogeneous cell populations, primarily with application to cancer genomics. These projects draw on a wide variety of computational tools from discrete algorithms, operations research, applied mathematics, statistics, and machine learning.