Workshop 5: Mathematical and experimental approaches to dynamics of protein-DNA interactions

(March 8,2010 - March 11,2010 )

Organizers


Jane Kondev
Physics Department, Brandeis University
Hao Li
Dept. of Biochemistry & Biophysics, University of California, San Francisco

The elucidation of the structure of DNA was a watershed event in the history of biology. In one fell sloop it provided the molecular basis of the gene and explained how genes are propagated from mother to daughter cell. Its crowning achievement is the central dogma of molecular biology which describes information flow from DNA to protein via messenger RNA. Over fifty years of DNA research since then has lead to a detailed understanding of practically all the molecular players of the central dogma, and their mutual interactions. More recently, a new view of the information encoded by DNA has emerged, one that goes beyond DNA as the physical embodiment of the gene. Unlike genomic information which is encoded in the base sequence, this information is carried by interactions of DNA with DNA-binding proteins.

 

This workshop will bring together researchers from the mathematical, physical and biological sciences interested in protein-DNA interactions and how they steer cellular processes such as transcription, replication and DNA packing. The workshop will attempt to bridge scales starting from single molecules and macromolecular complexes, all the way to whole cells, and to highlight the fundamental mathematical problems posed by each one.

 

This workshop will attempt to take a snapshot of the field of DNA-protein interactions and examine it from a number of viewpoints provided by different length and time scales. These will include single-molecule studies of DNA-protein interactions, such as those by DNA motors and DNA packaging proteins (histones in eukaryotes and histone-like proteins in prokaryotes) as well as whole-genome studies that seek to uncover regulatory motifs that bind transcription factors.

 

One of the main thrusts of the workshop will be to highlight opportunities for mathematicians and physicists interested in applying ideas from statistics, stochastic equations, statistical and continuum mechanics to the burgeoning field of protein-DNA interactions.

Accepted Speakers

Zev Bryant
Department of Bioengineering, Stanford University
Hernan Garcia
Physics Department, California Institute of Technology
Timothy Hughes
Department of Medical Research, University of Toronto
John Maddocks
EPFL SB IMB LCVMM, Aarhus University
John Marko
Depts. Biochemistry, Molecular Biology & Cell Biology, J. L. Kellogg Graduate School of Management
Leonid Mirny
Health Sciences and Technology and Physics, Massachusetts Institute of Technology
Alexandre Morozov
Department of Physics & Astronomy, Rutgers University at New Brunswick
Geeta Narlikar
Biochemistry and Biophysics, UCSF
Phil Nelson
Physics and Astronomy, University of Pennsylvania
Wilma Olson
Chemistry & Chemical Biology, Rutgers, The State University of New Jersey
Eran Segal
Department of Computer Science And Applied Mathematics, Weizmann Institute of Science
Gary Stormo
Department of Genetics, Washington University Medical School
Antoine Von Oijen
Dept. of Biological Chemistry and Molecular Pharmacology, Harvard Medical School
Jon Widom
Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University Medical School
Paul Wiggins
Whitehead Institute for Biomedical Research
Monday, March 8, 2010
Time Session
09:00 AM
10:00 AM
Zev Bryant - Zev Bryant Lecture
Zev Bryant Lecture
10:30 AM
11:30 AM
John Maddocks - DNA Looping Probabilities and Semi-Classical Path Integrals
(Joint work with L. Cotta-Ramusino and R. Manning)

DNA interactions with proteins frequently involves looping in which the location and orientation of the two ends of a DNA segment are prescribed. I will show how path integral methods can be used to obtain a sequence-dependent formula for the probability of loop formation, including the case of minicircle cyclization. The expression involves the minimal energy path, a nonlinear computation, with a correction for fluctuations in terms of certain Jacobi fields, a linear computation.
01:30 PM
02:30 PM
Antoine Von Oijen - Single-molecule studies of DNA replication
Advances in optical imaging and molecular manipulation techniques have made it possible to observe individual enzymes and record molecular movies that provide new insight into their dynamics and reaction mechanisms. In a biological context, most of these enzymes function in concert with other enzymes in multi-protein complexes, so an important future direction will be the utilization of single-molecule techniques to unravel the orchestration of large macromolecular assemblies. Our group is developing the single-molecule tools that will make it possible to study biochemical pathways of arbitrary complexity at the single-molecule level.

I will discuss results of single-molecule experiments on the replisome, the molecular machinery that is responsible for replication of DNA. We stretch individual DNA molecules and use their elastic properties to obtain dynamic information on the proteins that unwind the double helix and copy its genetic information. We also use DNA length as a probe to measure the dynamic formation and release of replication loops. Further, I will present new results of experiments that combine the observation of replisome activity by DNA length with the fluorescence imaging of individual components of the replication complex. These measurements allow us to track the composition of the replisome while monitoring its unwinding and synthesis activities.
03:00 PM
04:00 PM
Wilma Olson - DNA Mechanics and Gene Expression
Making sense of gene expression in living systems requires understanding of the looping properties of DNA in crowded, multi-component systems. The presence of non-specific binding proteins that introduce sharp bends, localized untwisting, and/or dislocation of the DNA double-helical axis, stabilizes functional repression loops ranging from as few as 65 base pairs to as many as tens of thousands of base pairs. As a first step in the analysis of such looping, we have investigated the effects of various proteins on the configurational properties of fragments of DNA, treating the DNA at the level of base-pair steps and incorporating the known effects of various proteins on DNA double-helical structure. The presentation will highlight some of the new models and computational techniques that we have developed to generate the three-dimensional configurations of protein-mediated DNA loops and illustrate new insights gained from this work about the effects of various proteins on DNA topology and the apparent contributions of the non-specific binding proteins to gene expression.
Tuesday, March 9, 2010
Time Session
09:00 AM
10:00 AM
Hernan Garcia - DNA Architecture and Transcriptional Regulation: The Physics of Genome Management
DNA architecture plays a key role in determining spatial and temporal patterns of gene expression. This architecture encompasses both the nucleotide sequence (i.e., the information content) and the physical state of the DNA such as its spatial organization and mechanical properties. We study several regulatory motifs in E. coli using a three pronged approach: theoretical modeling, in vitro single molecule experiments, and in vivo single cell experiments. Through systematic experimentation we show that we can account for the effect of varying the different relevant "knobs" governing a repression regulatory motif such as the concentration of transcription factor and the strength of their binding to DNA. The result is a framework that predicts the regulatory outcome of any mutant of this regulatory architecture, which we show can be tested in a variety of different ways. We also present our recent experimental efforts aimed at dissecting repression by DNA looping and the sequence-dependent flexibility associated with the mechanical code of the DNA.
10:30 AM
11:30 AM
Gary Stormo - Modeling and predicting DNA-binding specificity
Many different kinds of data are available for modeling the specificity of a DNA-binding protein, and the quality of the model depends on both the type of data used and the algorithms for estimating binding energies. We discuss our approaches for modeling from several different types of data, and assess the accuracy of each based on experimental measurements. Given specificities for many proteins of a specific class one can also predict the binding specificities of novel proteins, allowing for the design of new proteins with unique specificities. We describe our current approaches to this challenging problem.
01:30 PM
02:30 PM
John Marko - Micromechanical study of protein-DNA interactions and chromosome structure
Micromechanical study of protein-DNA interactions and chromosome structure
03:00 PM
04:00 PM
Phil Nelson - First-principles calculation of DNA looping in tethered particle experiments
We show how to calculate the probability of DNA loop formation mediated by regulatory proteins such as Lac repressor, using a mathematical model of DNA elasticity. Our approach has new features enabling us to compute quantities directly observable in Tethered Particle Motion (TPM) experiments; e.g. it accounts for all the entropic forces present in such experiments. Our model has no free parameters; it characterizes DNA elasticity using information obtained in other kinds of experiments. It can compute both the "looping J factor" (or equivalently, looping free energy) for various DNA construct geometries and repressor concentrations, as well as the detailed probability density function of bead excursions. We also show how to extract the same quantities from recent experimental data on tethered particle motion, and compare to our model's predictions. In particular, we present a new method to correct observed data for finite camera shutter time.

The model successfully reproduces the detailed distributions of bead excursion, including their surprising three-peak structure, without any fit parameters and without invoking any alternative conformation of the repressor tetramer. However, for short DNA loops (around 95 bp) the experiments show more looping than is predicted by the linear-elasticity model, echoing other recent experimental results. Because the experiments we study are done in vitro, this anomalously high looping cannot be rationalized as resulting from the presence of DNA-bending proteins or other cellular machinery. We also show that it is unlikely to be the result of a hypothetical "open" conformation of the repressor.
Wednesday, March 10, 2010
Time Session
09:00 AM
10:00 AM
Timothy Hughes - Pfam-wide determination and inference of transcription factor DNA sequence specificities
Knowledge of the sequence binding preferences of transcription factors (TFs) is central to the understanding of gene regulation and genome function. However, much information is lacking regarding the sequence specificities of TFs, even in well-studied organisms. Similar DNA binding domains often have similar sequence preferences, and in some cases rules have been derived for inferring sequence specificities within TF families.

Our aims are three-fold. First, we are using high-resolution DNA-binding data (e.g. from Protein Binding Microarrays) to refine and test rules for inference of TF sequence specificity. Second, we are generating the data needed to produce accurate "pfam-wide" inferences of sequence specificity for as many eukaryotic DNA-binding domain classes as possible. Third, we have constructed a database (CIS-BP: Catalog of Inferred Sequence Binding Preferences) to house both known and inferred sequence preferences in the form of 8-mer binding scores, position weight matrices, and IUPAC consensus motifs.

Using our current dataset, we provide a summary of knowledge of eukaryotic TF binding preferences. We also show that, for multiple DBD classes, simple sequence rules (e.g. percent amino acid identity) can readily identify TFs with new and different DNA-binding activities. Using these sequence rules to guide selection of proteins for further analysis, we have experimentally obtained distinctive sequence specificities for DBDs from a wide variety of eukaryotes.
09:00 AM
10:00 AM
Timothy Hughes - Pfam-wide determination and inference of transcription factor DNA sequence specificities
Knowledge of the sequence binding preferences of transcription factors (TFs) is central to the understanding of gene regulation and genome function. However, much information is lacking regarding the sequence specificities of TFs, even in well-studied organisms. Similar DNA binding domains often have similar sequence preferences, and in some cases rules have been derived for inferring sequence specificities within TF families.

Our aims are three-fold. First, we are using high-resolution DNA-binding data (e.g. from Protein Binding Microarrays) to refine and test rules for inference of TF sequence specificity. Second, we are generating the data needed to produce accurate "pfam-wide" inferences of sequence specificity for as many eukaryotic DNA-binding domain classes as possible. Third, we have constructed a database (CIS-BP: Catalog of Inferred Sequence Binding Preferences) to house both known and inferred sequence preferences in the form of 8-mer binding scores, position weight matrices, and IUPAC consensus motifs.

Using our current dataset, we provide a summary of knowledge of eukaryotic TF binding preferences. We also show that, for multiple DBD classes, simple sequence rules (e.g. percent amino acid identity) can readily identify TFs with new and different DNA-binding activities. Using these sequence rules to guide selection of proteins for further analysis, we have experimentally obtained distinctive sequence specificities for DBDs from a wide variety of eukaryotes.
10:30 AM
11:30 AM
Eran Segal - Reading and writing transcriptional behaviors using DNA sequence
Reading and writing transcriptional behaviors using DNA sequence
01:30 PM
02:30 PM
Geeta Narlikar - Mechanisms of Chromatin Remodeling Motors
The chromatin state of a gene helps encode the regulatory information that controls its expression. ATP-dependent chromatin-remodeling motors catalyze the changes in chromatin structure that are required for both rapid exposure and long-term occlusion of DNA sites. The mechanisms by which these motors function however, are not well understood. The challenges in studying chromatin-remodeling motors are intimately linked to the complex nature of their task. Even the smallest movement of the histone octamer across DNA requires a highly coordinated process of breaking and reforming the many histone-DNA contacts. We have developed quantitative approaches to study chromatin-specific conformational changes. These methods have allowed us to apply conceptual advances from established motor fields to discover mechanistic features that are unique to chromatin-remodeling motors. An illustrative example is our work on ACF, the main chromatin remodeling complex involved in generating the evenly spaced nucleosomes required to generate condensed chromatin structures. We have found that ACF functions as dimeric motor in which the two ATPases face each other and take turns to engage either side of a nucleosome. This dimeric motor mechanism differs from that of other dimeric motors studied to date, such as kinesin and dimeric helicases, whose biological functions require translocation along a uniform polymeric substrate. The biological functions of chromatin remodeling enzymes like ACF appear to have placed very different demands on motor architecture. While kinesin and other dimeric motors studied to date use two motors facing in the same direction to processively move in one direction, the opposing polarity of the two motors in ACF enables ACF to rapidly and processively change the direction of nucleosome movement in order to achieve a defined spacing.
03:00 PM
04:00 PM
Alexandre Morozov - A simple biophysical model of nucleosome positioning and energetics
Genomic DNA is packaged into chromatin in eukaryotic cells. The fundamental building block of chromatin is the nucleosome, a 147 bp-long DNA segment wrapped around the surface of a histone octamer. Nucleosomes function to compact genomic DNA and to regulate access to it both by physical occlusion and by providing the substrate for numerous covalent epigenetic tags. We have studied intrinsic sequence specificity of histone-DNA interactions by using a high-throughput map of nucleosomes assembled in vitro on yeast and E.coli genomic DNA.

We have inferred free energies of nucleosome formation genome-wide using a biophysical model that rigorously takes steric exclusion between neighboring nucleosomes into account. Surprisingly, most S.cerevisiae nucleosomes do not appear to be positioned by periodic dinucleotide patterns or by exclusion of longer sequence motifs such as poly(dA:dT) tracts - rather, their locations are simply controlled by the dinucleotide content of the underlying DNA sequence. Similar nucleosome positioning rules emerge from the studies of C.elegans chromatin and even from nucleosome-free control experiments, likely because histone sequence preferences are correlated with those revealed by sonicating nucleosome-free genomic DNA or digesting it with MNase.

Our findings suggest that the nature of the nucleosome positioning code is fairly simple. Nucleosome energetics based on dinucleotide biases would make it easier to evolve and maintain nucleosome positioning sequences in eukaryotic genomes. Such sequences could then be refined and strengthened with 10-11 bp periodic dinucleotide patterns.
Thursday, March 11, 2010
Time Session
09:00 AM
10:00 AM
Jon Widom - Nucleosome positioning and chromosome structure from archaebacteria to man
Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNA binding proteins. We have discovered that genomes care where their nucleosomes are located on average, and that genomes manifest this care by encoding an additional layer of genetic information, superimposed on top of other kinds of regulatory and coding information that were previously recognized. We have developed a partial ability to read this nucleosome positioning code and predict the in vivo locations of nucleosomes. Most recently, we showed that the distribution of nucleosomes reconstituted on yeast genomic DNA in a purified in vitro system closely resembles that in vivo, implying that much of the in vivo nucleosome organization is explicitly encoded in the genomic DNA sequence itself, through the nucleosomes' DNA sequence preferences. Comparisons across diverse organisms suggests that basic aspects of this nucleosome positioning code may be conserved from archaebacteria to man. Our results suggest that genomes utilize the nucleosome positioning code to facilitate specific chromosome functions, including to delineate functional versus nonfunctional binding sites for key gene regulatory proteins, and to define the next higher level of chromosome structure. The physical basis of the nucleosome DNA sequences preferences lies in the sequence-dependent mechanics of DNA itself.
10:30 AM
11:30 AM
Paul Wiggins - Structuring a prokaryotic chromosome
The stochasticity of chromosome organization was investigated by fluorescently labeling genetic loci in live E.coli cells. In spite of the common assumption that the chromosome is well-modeled by an unstructured polymer, measurements of the locus distributions reveal that the E.coli chromosome is precisely organized into a nucleoid filament with a linear order. Loci in the body of the nucleoid show a precision of positioning within the cell of better than 10% of the cell length. The precision of inter-locus distance of genomically proximate loci was better than 4% of the cell length. The measured dependence of the precision of inter-locus distance on genomic distance singles out intra-nucleoid interactions as the mechanism responsible for chromosome organization. From the magnitude of the variance, we infer the existence of an as-yet uncharacterized higher-order DNA organization in bacteria. We demonstrate that both the stochastic and average structure of the nucleoid is captured by a fluctuating elastic filament model.
11:30 AM
12:30 PM
Leonid Mirny - Birds eye view at protein-DNA binding and DNA packing
Birds eye view at protein-DNA binding and DNA packing
Name Email Affiliation
Avsaroglu, Baris baris@brandeis.edu Physics Department, Brandeis University
Betterton, Meredith mdb@colorado.edu Physics Department, University of Colorado at Boulder
Blatti, Charles blatti@illinois.edu Computer Science, University of Illinois at Urbana-Champaign
Bryant, Zev zevry_1999@yahoo.com Department of Bioengineering, Stanford University
Bundschuh, Ralf bundschuh@mbi.osu.edu Department of Physics, The Ohio State University
Cao, Xiaoyi xcao3@illinois.edu Center for Biophysics and Computational Biology, University of Illinois, Urbana-Champaign
Chen, Chieh-Chun cchen63@illinois.edu Bioengineering, University of Illinois, Urbana-Champaign
Chen, Duan chenduan@msu.edu Mathematics, Michigan State University
DeWille, Jim dewille.1@osu.edu Veterinary Biosciences, The Ohio State University
Dougherty, Julie dougherty.138@buckeyemail.osu.edu IBGP/MVIMG, The Ohio State University
Fan, Yue yue@bu.edu Mathematics and Statistics, Boston University
Fok, Pak-Wing pakwing@udel.edu Mathematical Sciences, University of Delaware
Forties, Robert forties@mps.ohio-state.edu Physics, The Ohio State University
Garcia , Hernan hggarcia@gmail.com Physics Department, California Institute of Technology
He, Xin xin.he@ucsf.edu Biochemistry and Biophysics, UCSF
Huang, Tim tim.huang@osumc.edu Human Cancer Genetics Program, The Ohio State University
Hughes, Timothy t.hughes@utoronto.ca Department of Medical Research, University of Toronto
Imakaev, Maxim imakaev@mit.edu Physics, The Mitre Corporation
Kondev , Jane kondev@brandeis.edu Physics Department, Brandeis University
Krishnan, Sanjeevi sanjeevi.krishnan@gmail.com Informatic Phenomena Group, NRL, National Academy of Sciences of Ukraine
LaRussa, Mary alarussa24@gmail.com Mathematics, University of Alabama at Birmingham (UAB)
Li, Hao haoli@genome.ucsf.edu Dept. of Biochemistry & Biophysics, University of California, San Francisco
Lim, Sookkyung limsk@math.uc.edu Mathematical Sciences, University of Cincinnati
Maddocks, John john.maddocks@epfl.ch EPFL SB IMB LCVMM, Aarhus University
Manning, Robert rmanning@haverford.edu Mathematics, Haverford College
Marko, John john-marko@northwestern.edu Depts. Biochemistry, Molecular Biology & Cell Biology, J. L. Kellogg Graduate School of Management
Mirny, Leonid leonid@mit.edu Health Sciences and Technology and Physics, Massachusetts Institute of Technology
Mitra, Indranil imitra@clemson.edu mathematical science, Clemson University
Mooney, Alex alexmmooney@gmail.com Physics, The Ohio State University
Morozov, Alexandre morozov@physics.rutgers.edu Department of Physics & Astronomy, Rutgers University at New Brunswick
Narlikar, Geeta Geeta.narlikar@ucsf.edu Biochemistry and Biophysics, UCSF
Nelson, Phil nelson@physics.upenn.edu Physics and Astronomy, University of Pennsylvania
Olson, Wilma wilma.olson@rutgers.edu Chemistry & Chemical Biology, Rutgers, The State University of New Jersey
Oluyede, Broderick boluyede@georgiasouthern.edu Mathematical Sciences, Georgia Southern University
Orlova, Tatiana tatiana.orlova@fulbrightmail.org Mathematics, University of South Carolina, Columbia
Poirier , Michael mpoirier@mps.ohio-state.edu Department of Physics, The Ohio State University
Pospisil, Cameron cpospisi@ucla.edu EEB, UCLA
Riano-Pachon, Diego Mauricio riano-pachon.1@osu.edu Plant, Cellular, and Molecular Biology, The Ohio State Univesity
Segal, Eran eran.segal@weizmann.ac.il Department of Computer Science And Applied Mathematics, Weizmann Institute of Science
Spakowitz, Andrew vhlee@stanford.edu Department of Chemical Engineering, Stanford University
Stormo, Gary stormo@wustl.edu Department of Genetics, Washington University Medical School
Swigon, David swigon@pitt.edu Department of Mathematics, University of Pittsburgh
Taslim, Cenny taslim.2@osu.edu Statistics/Comprehensive Cancer Ctr., The Ohio State University
Ucar, Duygu ucarduygu@gmail.com internal medicine, University of Iowa
Underhill , Patrick underhill@rpi.edu Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute
Von Oijen, Antoine antoine_van_oijen@hms.harvard.edu Dept. of Biological Chemistry and Molecular Pharmacology, Harvard Medical School
Welch, Lonnie welch@ohio.edu Bioinformatics Laboratory, Ohio University
Widom, Jon j-widom@northwestern.edu Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University Medical School
Wiggins, Paul wiggins@wi.mit.edu Whitehead Institute for Biomedical Research
Williams, Mark mark@neu.edu Department of Physics, Northeastern University
Xie, Dan danxie2@illinois.edu Bioengineering, University of Illinois at Urbana and Champaign
Yan, Koon-Kiu koon-kiu.yan@yale.edu Molecular Biophysics and Biochemistry, Yale University
Yu, Pengfei yu68@illinois.edu Biophysics and Computational Biology, University of Illinois, Urbana-Champaign
Zhong, Sheng szhong@ad.uiuc.edu Bioengineering, University of Illinois at Urbana-Champaign
Zhu, Yali zhu.87@osu.edu MVIMG, The Ohio State University
Zev Bryant Lecture
Zev Bryant Lecture
DNA Architecture and Transcriptional Regulation: The Physics of Genome Management
DNA architecture plays a key role in determining spatial and temporal patterns of gene expression. This architecture encompasses both the nucleotide sequence (i.e., the information content) and the physical state of the DNA such as its spatial organization and mechanical properties. We study several regulatory motifs in E. coli using a three pronged approach: theoretical modeling, in vitro single molecule experiments, and in vivo single cell experiments. Through systematic experimentation we show that we can account for the effect of varying the different relevant "knobs" governing a repression regulatory motif such as the concentration of transcription factor and the strength of their binding to DNA. The result is a framework that predicts the regulatory outcome of any mutant of this regulatory architecture, which we show can be tested in a variety of different ways. We also present our recent experimental efforts aimed at dissecting repression by DNA looping and the sequence-dependent flexibility associated with the mechanical code of the DNA.
Pfam-wide determination and inference of transcription factor DNA sequence specificities
Knowledge of the sequence binding preferences of transcription factors (TFs) is central to the understanding of gene regulation and genome function. However, much information is lacking regarding the sequence specificities of TFs, even in well-studied organisms. Similar DNA binding domains often have similar sequence preferences, and in some cases rules have been derived for inferring sequence specificities within TF families.

Our aims are three-fold. First, we are using high-resolution DNA-binding data (e.g. from Protein Binding Microarrays) to refine and test rules for inference of TF sequence specificity. Second, we are generating the data needed to produce accurate "pfam-wide" inferences of sequence specificity for as many eukaryotic DNA-binding domain classes as possible. Third, we have constructed a database (CIS-BP: Catalog of Inferred Sequence Binding Preferences) to house both known and inferred sequence preferences in the form of 8-mer binding scores, position weight matrices, and IUPAC consensus motifs.

Using our current dataset, we provide a summary of knowledge of eukaryotic TF binding preferences. We also show that, for multiple DBD classes, simple sequence rules (e.g. percent amino acid identity) can readily identify TFs with new and different DNA-binding activities. Using these sequence rules to guide selection of proteins for further analysis, we have experimentally obtained distinctive sequence specificities for DBDs from a wide variety of eukaryotes.
"Pfam-wide" determination and inference of transcription factor DNA sequence specificities

Knowledge of the sequence binding preferences of transcription factors (TFs) is central to the understanding of gene regulation and genome function. However, much information is lacking regarding the sequence specificities of TFs, even in well-studied organisms. Similar DNA binding domains often have similar sequence preferences, and in some cases rules have been derived for inferring sequence specificities within TF families.


Our aims are three-fold. First, we are using high-resolution DNA-binding data (e.g. from Protein Binding Microarrays) to refine and test rules for inference of TF sequence specificity. Second, we are generating the data needed to produce accurate "pfam-wide" inferences of sequence specificity for as many eukaryotic DNA-binding domain classes as possible. Third, we have constructed a database (CIS-BP: Catalog of Inferred Sequence Binding Preferences) to house both known and inferred sequence preferences in the form of 8-mer binding scores, position weight matrices, and IUPAC consensus motifs.


Using our current dataset, we provide a summary of knowledge of eukaryotic TF binding preferences. We also show that, for multiple DBD classes, simple sequence rules (e.g. percent amino acid identity) can readily identify TFs with new and different DNA-binding activities. Using these sequence rules to guide selection of proteins for further analysis, we have experimentally obtained distinctive sequence specificities for DBDs from a wide variety of eukaryotes.

DNA Looping Probabilities and Semi-Classical Path Integrals
(Joint work with L. Cotta-Ramusino and R. Manning)

DNA interactions with proteins frequently involves looping in which the location and orientation of the two ends of a DNA segment are prescribed. I will show how path integral methods can be used to obtain a sequence-dependent formula for the probability of loop formation, including the case of minicircle cyclization. The expression involves the minimal energy path, a nonlinear computation, with a correction for fluctuations in terms of certain Jacobi fields, a linear computation.
Micromechanical study of protein-DNA interactions and chromosome structure
Micromechanical study of protein-DNA interactions and chromosome structure
Birds eye view at protein-DNA binding and DNA packing
Birds eye view at protein-DNA binding and DNA packing
A simple biophysical model of nucleosome positioning and energetics
Genomic DNA is packaged into chromatin in eukaryotic cells. The fundamental building block of chromatin is the nucleosome, a 147 bp-long DNA segment wrapped around the surface of a histone octamer. Nucleosomes function to compact genomic DNA and to regulate access to it both by physical occlusion and by providing the substrate for numerous covalent epigenetic tags. We have studied intrinsic sequence specificity of histone-DNA interactions by using a high-throughput map of nucleosomes assembled in vitro on yeast and E.coli genomic DNA.

We have inferred free energies of nucleosome formation genome-wide using a biophysical model that rigorously takes steric exclusion between neighboring nucleosomes into account. Surprisingly, most S.cerevisiae nucleosomes do not appear to be positioned by periodic dinucleotide patterns or by exclusion of longer sequence motifs such as poly(dA:dT) tracts - rather, their locations are simply controlled by the dinucleotide content of the underlying DNA sequence. Similar nucleosome positioning rules emerge from the studies of C.elegans chromatin and even from nucleosome-free control experiments, likely because histone sequence preferences are correlated with those revealed by sonicating nucleosome-free genomic DNA or digesting it with MNase.

Our findings suggest that the nature of the nucleosome positioning code is fairly simple. Nucleosome energetics based on dinucleotide biases would make it easier to evolve and maintain nucleosome positioning sequences in eukaryotic genomes. Such sequences could then be refined and strengthened with 10-11 bp periodic dinucleotide patterns.
Mechanisms of Chromatin Remodeling Motors
The chromatin state of a gene helps encode the regulatory information that controls its expression. ATP-dependent chromatin-remodeling motors catalyze the changes in chromatin structure that are required for both rapid exposure and long-term occlusion of DNA sites. The mechanisms by which these motors function however, are not well understood. The challenges in studying chromatin-remodeling motors are intimately linked to the complex nature of their task. Even the smallest movement of the histone octamer across DNA requires a highly coordinated process of breaking and reforming the many histone-DNA contacts. We have developed quantitative approaches to study chromatin-specific conformational changes. These methods have allowed us to apply conceptual advances from established motor fields to discover mechanistic features that are unique to chromatin-remodeling motors. An illustrative example is our work on ACF, the main chromatin remodeling complex involved in generating the evenly spaced nucleosomes required to generate condensed chromatin structures. We have found that ACF functions as dimeric motor in which the two ATPases face each other and take turns to engage either side of a nucleosome. This dimeric motor mechanism differs from that of other dimeric motors studied to date, such as kinesin and dimeric helicases, whose biological functions require translocation along a uniform polymeric substrate. The biological functions of chromatin remodeling enzymes like ACF appear to have placed very different demands on motor architecture. While kinesin and other dimeric motors studied to date use two motors facing in the same direction to processively move in one direction, the opposing polarity of the two motors in ACF enables ACF to rapidly and processively change the direction of nucleosome movement in order to achieve a defined spacing.
First-principles calculation of DNA looping in tethered particle experiments
We show how to calculate the probability of DNA loop formation mediated by regulatory proteins such as Lac repressor, using a mathematical model of DNA elasticity. Our approach has new features enabling us to compute quantities directly observable in Tethered Particle Motion (TPM) experiments; e.g. it accounts for all the entropic forces present in such experiments. Our model has no free parameters; it characterizes DNA elasticity using information obtained in other kinds of experiments. It can compute both the "looping J factor" (or equivalently, looping free energy) for various DNA construct geometries and repressor concentrations, as well as the detailed probability density function of bead excursions. We also show how to extract the same quantities from recent experimental data on tethered particle motion, and compare to our model's predictions. In particular, we present a new method to correct observed data for finite camera shutter time.

The model successfully reproduces the detailed distributions of bead excursion, including their surprising three-peak structure, without any fit parameters and without invoking any alternative conformation of the repressor tetramer. However, for short DNA loops (around 95 bp) the experiments show more looping than is predicted by the linear-elasticity model, echoing other recent experimental results. Because the experiments we study are done in vitro, this anomalously high looping cannot be rationalized as resulting from the presence of DNA-bending proteins or other cellular machinery. We also show that it is unlikely to be the result of a hypothetical "open" conformation of the repressor.
DNA Mechanics and Gene Expression
Making sense of gene expression in living systems requires understanding of the looping properties of DNA in crowded, multi-component systems. The presence of non-specific binding proteins that introduce sharp bends, localized untwisting, and/or dislocation of the DNA double-helical axis, stabilizes functional repression loops ranging from as few as 65 base pairs to as many as tens of thousands of base pairs. As a first step in the analysis of such looping, we have investigated the effects of various proteins on the configurational properties of fragments of DNA, treating the DNA at the level of base-pair steps and incorporating the known effects of various proteins on DNA double-helical structure. The presentation will highlight some of the new models and computational techniques that we have developed to generate the three-dimensional configurations of protein-mediated DNA loops and illustrate new insights gained from this work about the effects of various proteins on DNA topology and the apparent contributions of the non-specific binding proteins to gene expression.
Reading and writing transcriptional behaviors using DNA sequence
Reading and writing transcriptional behaviors using DNA sequence
Modeling and predicting DNA-binding specificity
Many different kinds of data are available for modeling the specificity of a DNA-binding protein, and the quality of the model depends on both the type of data used and the algorithms for estimating binding energies. We discuss our approaches for modeling from several different types of data, and assess the accuracy of each based on experimental measurements. Given specificities for many proteins of a specific class one can also predict the binding specificities of novel proteins, allowing for the design of new proteins with unique specificities. We describe our current approaches to this challenging problem.
Single-molecule studies of DNA replication
Advances in optical imaging and molecular manipulation techniques have made it possible to observe individual enzymes and record molecular movies that provide new insight into their dynamics and reaction mechanisms. In a biological context, most of these enzymes function in concert with other enzymes in multi-protein complexes, so an important future direction will be the utilization of single-molecule techniques to unravel the orchestration of large macromolecular assemblies. Our group is developing the single-molecule tools that will make it possible to study biochemical pathways of arbitrary complexity at the single-molecule level.

I will discuss results of single-molecule experiments on the replisome, the molecular machinery that is responsible for replication of DNA. We stretch individual DNA molecules and use their elastic properties to obtain dynamic information on the proteins that unwind the double helix and copy its genetic information. We also use DNA length as a probe to measure the dynamic formation and release of replication loops. Further, I will present new results of experiments that combine the observation of replisome activity by DNA length with the fluorescence imaging of individual components of the replication complex. These measurements allow us to track the composition of the replisome while monitoring its unwinding and synthesis activities.
Nucleosome positioning and chromosome structure from archaebacteria to man
Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNA binding proteins. We have discovered that genomes care where their nucleosomes are located on average, and that genomes manifest this care by encoding an additional layer of genetic information, superimposed on top of other kinds of regulatory and coding information that were previously recognized. We have developed a partial ability to read this nucleosome positioning code and predict the in vivo locations of nucleosomes. Most recently, we showed that the distribution of nucleosomes reconstituted on yeast genomic DNA in a purified in vitro system closely resembles that in vivo, implying that much of the in vivo nucleosome organization is explicitly encoded in the genomic DNA sequence itself, through the nucleosomes' DNA sequence preferences. Comparisons across diverse organisms suggests that basic aspects of this nucleosome positioning code may be conserved from archaebacteria to man. Our results suggest that genomes utilize the nucleosome positioning code to facilitate specific chromosome functions, including to delineate functional versus nonfunctional binding sites for key gene regulatory proteins, and to define the next higher level of chromosome structure. The physical basis of the nucleosome DNA sequences preferences lies in the sequence-dependent mechanics of DNA itself.
Structuring a prokaryotic chromosome
The stochasticity of chromosome organization was investigated by fluorescently labeling genetic loci in live E.coli cells. In spite of the common assumption that the chromosome is well-modeled by an unstructured polymer, measurements of the locus distributions reveal that the E.coli chromosome is precisely organized into a nucleoid filament with a linear order. Loci in the body of the nucleoid show a precision of positioning within the cell of better than 10% of the cell length. The precision of inter-locus distance of genomically proximate loci was better than 4% of the cell length. The measured dependence of the precision of inter-locus distance on genomic distance singles out intra-nucleoid interactions as the mechanism responsible for chromosome organization. From the magnitude of the variance, we infer the existence of an as-yet uncharacterized higher-order DNA organization in bacteria. We demonstrate that both the stochastic and average structure of the nucleoid is captured by a fluctuating elastic filament model.
video image

First-principles calculation of DNA looping in tethered particle experiments
Phil Nelson We show how to calculate the probability of DNA loop formation mediated by regulatory proteins such as Lac repressor, using a mathematical model of DNA elasticity. Our approach has new features enabling us to compute quantities directly observable in

video image

A simple biophysical model of nucleosome positioning and energetics
Alexandre Morozov Genomic DNA is packaged into chromatin in eukaryotic cells. The fundamental building block of chromatin is the nucleosome, a 147 bp-long DNA segment wrapped around the surface of a histone octamer. Nucleosomes function to compact genomic DNA and to r

video image

DNA Looping Probabilities and Semi-Classical Path Integrals
John Maddocks (Joint work with L. Cotta-Ramusino and R. Manning)

DNA interactions with proteins frequently involves looping in which the location and orientation of the two ends of a DNA segment are prescribed. I will show how path integral methods

video image

DNA Architecture and Transcriptional Regulation: The Physics of Genome Management
Hernan Garcia DNA architecture plays a key role in determining spatial and temporal patterns of gene expression. This architecture encompasses both the nucleotide sequence (i.e., the information content) and the physical state of the DNA such as its spatial organi

video image

Modeling and predicting DNA-binding specificity
Gary Stormo Many different kinds of data are available for modeling the specificity of a DNA-binding protein, and the quality of the model depends on both the type of data used and the algorithms for estimating binding energies. We discuss our approaches for mod