Signal transduction networks integrate protein-protein interactions and biochemical reactions in a manner that is not currently amenable to high-throughput experimental interaction assays. Novel theoretical and computational methods are thus needed to integrate disparate an often indirect information into a consistent network, and to gain insight into the dynamic processes supported by this network. This talk will present methods to synthesize signal transduction networks from indirect causal evidence as obtained from knockout or overexpression experiments, and to extend graph theoretical analysis to incorporate negative regulation and synergistic regulation by several components.
A large class of proteins called enzymes carries out the majority of chemical processes in cellular metabolism. The biological and chemical functions of these enzymes are closely connected with their 3D structures, in particular localized regions called active sites. The parts of a gene that code for a catalytic active site tend to be evolutionary highly conserved even when the gene as a whole has experienced extensive sequence changes. Furthermore, the active-site amino acids are typically spread out across a protein sequence (or occasionally across multiple protein sequences); finding these (3D) active-site residues from a protein (1D) sequence is challenging for well-conserved proteins-and nearly impossible for distantly related proteins. A network built from the active-site structural similarity of enzymes offers a new approach for the large-scale investigation of evolution of protein function. Here, I will present key results from our network analysis of all metalloprotein structures (>10,000) deposited in the Protein Data Bank (PDB).
A network of disorders and disease genes linked by known disorder-gene associations offers a platform to explore in a single graph-heoretic framework all known phenotype and disease gene associations, indicating the common genetic origin of many diseases. We find that the vast majority of disease genes are nonessential and show no tendency to encode hub proteins, and their expression pattern indicates that they are localized in the functional periphery of the network. We also study the evolution of patient illness using a network summarizing the disease associations extracted from 32 million Medicare claims, demonstrating that the cellular level links between disease causing proteins are amplified in the population as comorbodity patterns.
The lecture introduces a novel and integrative method family, the ModuLand method (www.linklgroup.hu/modules.php), which uses the entire network topology to determine the overlapping modules. The modularization helps us to
The disintegration of the modular structure in stress (during the decrease of system resources) fits well to the series of topological phase transitions of networks (Csermely, 2009) and helps the system by 'quarantining' the damage; decreasing noise propagation and allowing a larger independence of various units, thus expanding the response-space. The re-integration of the modular structure after stress offers a chance for learning, adaptation and modular evolution (Csermely, 2008; Korcsmaros et al., 2007).
Selected References (all downloadable from: www.linkgroup.hu)
My talk will be concerned with understanding protein function on a genomic scale. My lab approaches this through the prediction and analysis of biological networks, focusing on protein-protein interaction and transcription-factor-target ones. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various topological statistics. In particular, I will discuss a number of recent analyses: (1) Improving the prediction of molecular networks through systematic training-set expansion; (2) Showing how the analysis of pathways across environments potentially allows them to act as biosensors; (3) Showing how integrating gene expression data with regulatory networks identifies transient hubs for characterizing of proteins of unknown function; (4) Analyzing the structure of the regulatory network shows that it has a hierarchical layout with the "middle-managers" acting as information bottlenecks; (5) Showing that most human variation occurs on the periphery of the protein interaction network; and (6) Developing useful web-based tools for the analysis of networks (TopNet and tYNA).
The tools of operations research (OR) have proven invaluable to the field of computational biology, but in addition to biological insights, the problems have themselves become research entities within OR. This gives a symbiotic relationship between biology, computer science, and mathematics. Each discipline approaches problems from its own perspective, and our goal is to present some modern work in OR that is motivated by problems in computational biology. In particular, we will discuss how flux balance analysis and the problem of inferring haplotypes have acquired a research importance within OR. Advances in modelling, solving and analyzing broaden OR's repertoire, making it's study more robust as it addresses other problems, and it adds to the original biological intent of the research.
Pathways and complexes can be considered fundamental units of cell biology, but their relationship to each other is difficult to define. Comprehensive tagging and purification experiments have generated networks of interactions that represent most stable protein complexes in yeast cells. We describe this work, and show how the analysis of pairwise epistatic relationships between genes complements the physical interaction data, and furthermore can used to classify gene products into parallel and linear pathways.
Recent developments in high-throughput technology in model organisms have enabled the unprecedented construction and phenotyping of hundreds of thousands of combinatorial mutants. Such analyses have produced large collections of genetic interactions in yeast, which have proven to be a powerful means of defining gene function, identifying protein complexes, and even ordering linear pathways. However, due to the sheer number of possible mutant combinations, earlier genetic interactions studies were limited in their coverage as they typically measured less than 5% of even the pair-wise interaction network. Thus, we have so far been unable to assess the global structure of the network and several questions remain about the fundamental mechanisms governing genetic interactions.
Based on improvements in the throughput of Synthetic Genetic Array technology, we have compiled the largest genetic interaction network to date based on the construction of more than 5 million double mutant combinations. I will discuss technological obstacles overcome in constructing this network, including normalization and modeling techniques that allowed us to measure reliable, quantitative interactions. I will also describe several striking properties revealed by our mining of this global network, and demonstrate its utility for characterizing both specific pathway-level functions as well as its ability to provide a broader picture of cellular organization. Genome-scale studies of genetic interactions should enable us to understand fundamental properties underlying these relationships in yeast as well as higher organisms, and I will discuss our progress in addressing this question.
Advances in high-throughput technologies have transformed microbiology from a primarily "reductionist" field of science with a focus on one specific cellular process, to one which analyzes the behavior of an entire system. To this end, computational modeling of biochemical processes is vital to the successful assimilation of biological information into system-wide descriptions. Dearth of kinetic measurements has been a considerable impediment to development of fully dynamic models. However, this barrier can be partially overcome through the use of constraint-based modeling, where the most widely used method is that of flux balance analysis (FBA). This tutorial will briefly cover the fundamentals of developing genome-scale FBA-based models as well as the various uses of these tools toward elucidation of a cell's phenotypic behavior. We will also touch upon different optimization principles as well as some of the recent efforts such as development of multi-cellular models and thermodynamic-based metabolic flux analysis.
The genomics revolution has led to the generation of an enormous amount of data on the composition, regulation, and physiology of cellular networks. There is a need to integrate this information into a computational framework so that testable predictions can be made with an accounting of the complexity inherent in cellular systems. Recent advances on the integration of transcriptional regulatory network models with metabolic network reconstructions will be presented. The resultant genome-scale models have been used to make experimentally testable predictions. Novel methods to identify ideal drug targets and mechanisms of pathogenicity will also be discussed, with results presented from two important human pathogens, Leishmania major and Pseudomonas aeruginosa. These systems biology approaches hold the promise of revolutionizing drug discovery efforts to tackle challenges in many human diseases as well as address fundamental questions in biology.
Two trends are driving innovation and discovery in biological sciences: technologies that allow holistic surveys of genes, proteins, and metabolites and a realization that biological processes are driven by complex networks of interacting biological molecules. However, there is a gap between the gene lists emerging from genome sequencing projects and the network diagrams that are essential if we are to understand the link between genotype and phenotype. 'Omic technologies such as DNA microarrays were once heralded as providing a window into those networks, but so far their success has been limited, in large part because the high-dimensional they produce cannot be fully constrained by the limited number of measurements and in part because the data themselves represent only a small part of the complete story. To circumvent these limitations, we have developed methods that combine 'omic data with other sources of information in an effort to leverage, more completely, the compendium of information that we have been able to amass. Here we will present a number of approaches we have developed, including an integrated database that collects clinical, research, and public domain data and synthesizes it to drive discovery and an application of seeded Bayesian Network analysis applied to gene expression data that deduces predictive models of network response. Looking forward, we will examine more abstract state-space models that may have potential to lead us to a more general predictive, theoretical biology.
One domain in which considerable progress has been made in developing genome-scale network models is metabolism, a central tenant of life. This talk will begin with a brief primer to constraint-based modeling of metabolism. I shall then describe the human metabolic model that has been published by the Palsson lab in 2007, and proceed to present two recently published studies from my lab: (1) Developing and testing descriptions of the metabolism of specific human tissues, including the brain, heart, liver and kidney, and studying the role of post-transcriptional regulation in determining tissue metabolism (NBT08), and (2) An in silico investigation of Inborn Error Metabolic disorders, generating predictions of metabolic profiles in biofluids for hundreds of these diseases (MSB09). Finally, I shall describe some of our ongoing projects, developing a generic approach for building tissue-specific metabolic models and providing a computational account for metabolic alterations in cancer.
In this talk I will present graph-theoretical approach to analysis of the Universe of protein folds and will show that Protein Domain Universe Graphs (PDUG) where nodes represent structural domains and edges represent degree of structural similarity between them exhibit unusual -scale-free- properties: the probability to find a node connected with other nodes by k edges scales as power-law of k with exponent -1.6. This is in sharp contrast with a "null model" of random graph where such dependence is expected to follow Poisson distribution. Search into origin of such unusual global properties of PDUG reveals "Big Bang" scenario where all Protein Universe evolved from small number of original genes via duplication and divergence. Further analysis revealed deep connection between properties of gene families (their size and relation to other families) and structural properties that they encode. The PDUG approach provides a possibility of a robust structure-based construction of phylogenetic trees. Further, we present a microscopic, physics-based model of fold discovery and evolution which allows to visualize and quantitatively explain the Big Bang process including explanation of exponents of scale-free PDUG.
Genome sequencing dramatically increased our ability to understand cellular response to perturbation and facilitated the development of cell-wide measurements of cellular biomolecules. Integrating such (transcriptional, proteomic, metabolic and other) measurements with networks of protein-protein interactions and transcription factor binding data has revealed critical insights into cellular behavior. The potential of these systems biology approaches can be significantly enhanced by complementing the above measurements with data of metabolic fluxes. Fluxes are a most informative indicator of cellular physiological state as they describe what the cell does at a particular point in time. In combination with metabolite and transcriptional data they form a powerful set that can be used to generate a much more complete picture of cellular physiology.
In this talk I will summarize methods for the high resolution determination of metabolic fluxes using stable isotopic labeling methods. I will then show how metabolic fluxes can be applied to identify rate controlling steps in metabolic networks and thus direct the modulation of metabolism at the genetic level in order to amplify fluxes for the overproduction of fuels and chemicals. In another example, fluxes will be used, along with transcriptional and metabolite data from steady state yeast cultures to elucidate the functions of the yeast global regulator Gcn4p. While mRNA expression alone did not directly predict metabolic response, this correlation improved through incorporating a network-based model of amino-acid biosynthesis (from r = 0.07 to 0.80 for mRNA-flux agreement). The model also revealed some general biological principles: rewiring of metabolic flux by transcriptional regulation and metabolite-enzyme interaction density as a key biosynthetic control determinant. These results underline the importance of fluxes as a critical indicator of the state of cellular metabolism and irreplaceable guide for metabolic engineering.
For complex cellular networks, limited mechanistic knowledge, conflicting hypotheses, and relatively scarce experimental data hamper the development of mathematical models as systems analysis tools. The talk focuses on two approaches for dealing with this combination of complexity and uncertainty. They combine theory development and applications to specific biological examples. Firstly, network reaction stoichiometries are relatively well-characterized and therefore suitable starting points for pathway analysis. It allows one to investigate the space of a (metabolic) network's feasible states. Applications are becoming possible for genome-scale networks, and they range from investigating the effects of network perturbations to predicting cellular control features. Moreover, recent theory extensions connect the approach to systems dynamics, for instance, to identify key mechanisms in cellular decision processes. Secondly, and more mechanistically, we propose to cast hypotheses into a library of dynamic mathematical models, evaluate these against experimental observations, and design pivotal experiments to discriminate between alternatives. For TOR signaling in yeast, this strategy identified key control mechanisms that are quantitatively consistent with all available experimental data, and systematic extension of the approach to larger networks is a current challenge. Overall, the importance of network structures seems to outweigh the fine tuning of parameters. Structure-oriented analysis of biological systems, thus, provides challenging theory problems as well as broad perspectives for uncovering the organization and functionality of cellular networks.
Cellular processes are typically controlled by gene regulatory circuits that are comprised of interactions among genes and proteins. However, the functional importance of a particular pattern of interactions (architecture) that constitutes a genetic circuit remains poorly understood. To investigate this problem, we compared the circuit that controls differentiation of Bacillus subtilis cells into the state of competence to a seemingly equivalent engineered counterpart with an alternative architecture. The architectures of the native and synthetic circuits differed primarily in the order of successive activation and repression reactions, but retained the same overall feedback structure. Comparative analysis showed that the reversed order of positive and negative reactions between natural and synthetic circuits give rise to distinct levels of temporal variability in single cell dynamics (noise). This noise difference in turn controlled the physiological response range of competence to varying extracellular DNA concentrations. These results demonstrate a noise-mediated tradeoff between temporal precision and physiological reliability that is encoded into the architecture of a cellular differentiation circuit.
For over half a century it has been conjectured that macromolecules form complex networks of functionally interacting components, and that the molecular mechanisms underlying most biological processes correspond to particular steady states adopted by such cellular networks. However, until recently, systems-level theoretical conjectures remained largely unappreciated, mainly because of lack of supporting experimental data.
To generate the information necessary to eventually address how complex cellular networks relate to biology, we initiated, at the scale of the whole proteome, an integrated approach for modeling protein-protein interaction or "interactome" networks. Our main questions are: How are interactome networks organized at the scale of the whole cell? How can we uncover local and global features underlying this organization, and how are interactome networks modified in human disease, such as cancer?