Gene regulation is central to orchestrating life. A single-cell zygote must produce — from a single genome — a complex, multicellular organism. As cells take on different functions, they retain the same genetic material, yet they express different genes to make specialized proteins. Thus, somewhere in our DNA, there are knobs that change gene expression, and information on how to turn the knobs to reliably make organs, tissues, and specialized cells. This information is encoded in DNA through cis-regulatory elements (CREs) and is decoded by transcription factors (TFs), proteins that bind to CREs and recruit machinery that increase or decrease gene expression. Genetic variation that alters gene expression has long been hypothesized to be a major determinant of phenotypic variation. Indeed, it is changes to gene regulation that are responsible for snakes losing their limbs, almond domestication, wing pigmentation of Drosophila species, sporulation efficiency among yeast populations, diversity between people, and most of the heritable component of human disease.
Intense efforts are underway to construct computer atlases of the regulatory genome to obtain a molecular understanding of the principles underlying development, disease, and evolution. A crucial step in creating an atlas is to interrogate it for falsifiable hypotheses that can be tested in the lab. However, this creates a difficult issue of choice. A hypothesis can generates multiple predictions, and we might be interested in testing many hypotheses. How to decide which experiments are worth investing time, labor, and money? Which experiments would advance our capacity to understand gene regulation? In other words, which experiments are both feasible and interesting?
I am interested in answering these questions by creating computational systems that actively engage in a dialogue with experimental biology. By explicitly treating computer models as part of the scientific method, we can borrow ideas from active learning to design experiments that are aligned with our goals. Currently, I am applying these ideas to study zebrafish development at single-cell resolution.