DNA architecture plays a key role in determining spatial and temporal patterns of gene expression. This architecture encompasses both the nucleotide sequence (i.e., the information content) and the physical state of the DNA such as its spatial organization and mechanical properties. We study several regulatory motifs in E. coli using a three pronged approach: theoretical modeling, in vitro single molecule experiments, and in vivo single cell experiments. Through systematic experimentation we show that we can account for the effect of varying the different relevant "knobs" governing a repression regulatory motif such as the concentration of transcription factor and the strength of their binding to DNA. The result is a framework that predicts the regulatory outcome of any mutant of this regulatory architecture, which we show can be tested in a variety of different ways. We also present our recent experimental efforts aimed at dissecting repression by DNA looping and the sequence-dependent flexibility associated with the mechanical code of the DNA.
Knowledge of the sequence binding preferences of transcription factors (TFs) is central to the understanding of gene regulation and genome function. However, much information is lacking regarding the sequence specificities of TFs, even in well-studied organisms. Similar DNA binding domains often have similar sequence preferences, and in some cases rules have been derived for inferring sequence specificities within TF families.
Our aims are three-fold. First, we are using high-resolution DNA-binding data (e.g. from Protein Binding Microarrays) to refine and test rules for inference of TF sequence specificity. Second, we are generating the data needed to produce accurate "pfam-wide" inferences of sequence specificity for as many eukaryotic DNA-binding domain classes as possible. Third, we have constructed a database (CIS-BP: Catalog of Inferred Sequence Binding Preferences) to house both known and inferred sequence preferences in the form of 8-mer binding scores, position weight matrices, and IUPAC consensus motifs.
Using our current dataset, we provide a summary of knowledge of eukaryotic TF binding preferences. We also show that, for multiple DBD classes, simple sequence rules (e.g. percent amino acid identity) can readily identify TFs with new and different DNA-binding activities. Using these sequence rules to guide selection of proteins for further analysis, we have experimentally obtained distinctive sequence specificities for DBDs from a wide variety of eukaryotes.
(Joint work with L. Cotta-Ramusino and R. Manning)
DNA interactions with proteins frequently involves looping in which the location and orientation of the two ends of a DNA segment are prescribed. I will show how path integral methods can be used to obtain a sequence-dependent formula for the probability of loop formation, including the case of minicircle cyclization. The expression involves the minimal energy path, a nonlinear computation, with a correction for fluctuations in terms of certain Jacobi fields, a linear computation.
Genomic DNA is packaged into chromatin in eukaryotic cells. The fundamental building block of chromatin is the nucleosome, a 147 bp-long DNA segment wrapped around the surface of a histone octamer. Nucleosomes function to compact genomic DNA and to regulate access to it both by physical occlusion and by providing the substrate for numerous covalent epigenetic tags. We have studied intrinsic sequence specificity of histone-DNA interactions by using a high-throughput map of nucleosomes assembled in vitro on yeast and E.coli genomic DNA.
We have inferred free energies of nucleosome formation genome-wide using a biophysical model that rigorously takes steric exclusion between neighboring nucleosomes into account. Surprisingly, most S.cerevisiae nucleosomes do not appear to be positioned by periodic dinucleotide patterns or by exclusion of longer sequence motifs such as poly(dA:dT) tracts - rather, their locations are simply controlled by the dinucleotide content of the underlying DNA sequence. Similar nucleosome positioning rules emerge from the studies of C.elegans chromatin and even from nucleosome-free control experiments, likely because histone sequence preferences are correlated with those revealed by sonicating nucleosome-free genomic DNA or digesting it with MNase.
Our findings suggest that the nature of the nucleosome positioning code is fairly simple. Nucleosome energetics based on dinucleotide biases would make it easier to evolve and maintain nucleosome positioning sequences in eukaryotic genomes. Such sequences could then be refined and strengthened with 10-11 bp periodic dinucleotide patterns.
The chromatin state of a gene helps encode the regulatory information that controls its expression. ATP-dependent chromatin-remodeling motors catalyze the changes in chromatin structure that are required for both rapid exposure and long-term occlusion of DNA sites. The mechanisms by which these motors function however, are not well understood. The challenges in studying chromatin-remodeling motors are intimately linked to the complex nature of their task. Even the smallest movement of the histone octamer across DNA requires a highly coordinated process of breaking and reforming the many histone-DNA contacts. We have developed quantitative approaches to study chromatin-specific conformational changes. These methods have allowed us to apply conceptual advances from established motor fields to discover mechanistic features that are unique to chromatin-remodeling motors. An illustrative example is our work on ACF, the main chromatin remodeling complex involved in generating the evenly spaced nucleosomes required to generate condensed chromatin structures. We have found that ACF functions as dimeric motor in which the two ATPases face each other and take turns to engage either side of a nucleosome. This dimeric motor mechanism differs from that of other dimeric motors studied to date, such as kinesin and dimeric helicases, whose biological functions require translocation along a uniform polymeric substrate. The biological functions of chromatin remodeling enzymes like ACF appear to have placed very different demands on motor architecture. While kinesin and other dimeric motors studied to date use two motors facing in the same direction to processively move in one direction, the opposing polarity of the two motors in ACF enables ACF to rapidly and processively change the direction of nucleosome movement in order to achieve a defined spacing.
We show how to calculate the probability of DNA loop formation mediated by regulatory proteins such as Lac repressor, using a mathematical model of DNA elasticity. Our approach has new features enabling us to compute quantities directly observable in Tethered Particle Motion (TPM) experiments; e.g. it accounts for all the entropic forces present in such experiments. Our model has no free parameters; it characterizes DNA elasticity using information obtained in other kinds of experiments. It can compute both the "looping J factor" (or equivalently, looping free energy) for various DNA construct geometries and repressor concentrations, as well as the detailed probability density function of bead excursions. We also show how to extract the same quantities from recent experimental data on tethered particle motion, and compare to our model's predictions. In particular, we present a new method to correct observed data for finite camera shutter time.
The model successfully reproduces the detailed distributions of bead excursion, including their surprising three-peak structure, without any fit parameters and without invoking any alternative conformation of the repressor tetramer. However, for short DNA loops (around 95 bp) the experiments show more looping than is predicted by the linear-elasticity model, echoing other recent experimental results. Because the experiments we study are done in vitro, this anomalously high looping cannot be rationalized as resulting from the presence of DNA-bending proteins or other cellular machinery. We also show that it is unlikely to be the result of a hypothetical "open" conformation of the repressor.
Making sense of gene expression in living systems requires understanding of the looping properties of DNA in crowded, multi-component systems. The presence of non-specific binding proteins that introduce sharp bends, localized untwisting, and/or dislocation of the DNA double-helical axis, stabilizes functional repression loops ranging from as few as 65 base pairs to as many as tens of thousands of base pairs. As a first step in the analysis of such looping, we have investigated the effects of various proteins on the configurational properties of fragments of DNA, treating the DNA at the level of base-pair steps and incorporating the known effects of various proteins on DNA double-helical structure. The presentation will highlight some of the new models and computational techniques that we have developed to generate the three-dimensional configurations of protein-mediated DNA loops and illustrate new insights gained from this work about the effects of various proteins on DNA topology and the apparent contributions of the non-specific binding proteins to gene expression.
Many different kinds of data are available for modeling the specificity of a DNA-binding protein, and the quality of the model depends on both the type of data used and the algorithms for estimating binding energies. We discuss our approaches for modeling from several different types of data, and assess the accuracy of each based on experimental measurements. Given specificities for many proteins of a specific class one can also predict the binding specificities of novel proteins, allowing for the design of new proteins with unique specificities. We describe our current approaches to this challenging problem.
Advances in optical imaging and molecular manipulation techniques have made it possible to observe individual enzymes and record molecular movies that provide new insight into their dynamics and reaction mechanisms. In a biological context, most of these enzymes function in concert with other enzymes in multi-protein complexes, so an important future direction will be the utilization of single-molecule techniques to unravel the orchestration of large macromolecular assemblies. Our group is developing the single-molecule tools that will make it possible to study biochemical pathways of arbitrary complexity at the single-molecule level.
I will discuss results of single-molecule experiments on the replisome, the molecular machinery that is responsible for replication of DNA. We stretch individual DNA molecules and use their elastic properties to obtain dynamic information on the proteins that unwind the double helix and copy its genetic information. We also use DNA length as a probe to measure the dynamic formation and release of replication loops. Further, I will present new results of experiments that combine the observation of replisome activity by DNA length with the fluorescence imaging of individual components of the replication complex. These measurements allow us to track the composition of the replisome while monitoring its unwinding and synthesis activities.
Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNA binding proteins. We have discovered that genomes care where their nucleosomes are located on average, and that genomes manifest this care by encoding an additional layer of genetic information, superimposed on top of other kinds of regulatory and coding information that were previously recognized. We have developed a partial ability to read this nucleosome positioning code and predict the in vivo locations of nucleosomes. Most recently, we showed that the distribution of nucleosomes reconstituted on yeast genomic DNA in a purified in vitro system closely resembles that in vivo, implying that much of the in vivo nucleosome organization is explicitly encoded in the genomic DNA sequence itself, through the nucleosomes' DNA sequence preferences. Comparisons across diverse organisms suggests that basic aspects of this nucleosome positioning code may be conserved from archaebacteria to man. Our results suggest that genomes utilize the nucleosome positioning code to facilitate specific chromosome functions, including to delineate functional versus nonfunctional binding sites for key gene regulatory proteins, and to define the next higher level of chromosome structure. The physical basis of the nucleosome DNA sequences preferences lies in the sequence-dependent mechanics of DNA itself.
The stochasticity of chromosome organization was investigated by fluorescently labeling genetic loci in live E.coli cells. In spite of the common assumption that the chromosome is well-modeled by an unstructured polymer, measurements of the locus distributions reveal that the E.coli chromosome is precisely organized into a nucleoid filament with a linear order. Loci in the body of the nucleoid show a precision of positioning within the cell of better than 10% of the cell length. The precision of inter-locus distance of genomically proximate loci was better than 4% of the cell length. The measured dependence of the precision of inter-locus distance on genomic distance singles out intra-nucleoid interactions as the mechanism responsible for chromosome organization. From the magnitude of the variance, we infer the existence of an as-yet uncharacterized higher-order DNA organization in bacteria. We demonstrate that both the stochastic and average structure of the nucleoid is captured by a fluctuating elastic filament model.