### Teaching Discrete and Algebraic Mathematical Biology to Undergraduates (July 29 - August 2, 2013)

#### Organizers: Raina Robeva (Sweet Briar College), Matt Macauley (Clemson University), Terrell Hodge (Western Michigan University)

Discrete dynamic modeling of signaling networks (part 1 and part 2)
Reka Albert, Penn State University

The use of differential equation based modeling frameworks for intra-and inter-cellular signaling networks is greatly hampered by the sparsity of known kinetic detail for the interactions and processes involved in these networks. As an alternative, discrete dynamic and algebraic methods are gaining acceptance as the basis of successful predictive models of signal transduction, and a tool for inferring regulatory mechanisms. For example, Boolean models have been fruitfully used to model signaling networks related to embryonic development, to plant responses to their environment, and to immunological disorders. The construction of a Boolean model starts with a synthesis of the nodes (components) and edges (interactions) of the signaling network, followed by a distillation of the edges incident on each node into a Boolean regulatory function. The analysis of the model consists of finding its attractors (e.g. steady states), and the basin of attraction of each attractor (the initial states that converge into that attractor). The model can be used to look at "what if" scenarios, to analyze the effects of perturbations (e.g. node disruptions), and thus to predict which nodes are critical for the normal behavior of the network.

Part one of this presentation will review the basics of Boolean modeling, with special attention to models that allow different timescales in the system (i.e. asynchronous models). I will then present an asynchronous Boolean model of the signaling network that governs plants' response to drought conditions. This model synthesizes a large number of independent observations into a coherent system, reproduces known normal and perturbed responses, and predicts the effects of perturbations in network components. Two of these predictions were validated experimentally. Part two of this presentation will present an asynchronous Boolean model of the signaling network that is responsible for the activation induced cell death of T cells (a type of white blood cell). Perturbations of this network were identified as the root cause of the disease T-LGL leukemia, wherein T cells aberrantly survive and then attack normal cells. The model integrates interactions and information on certain components' abnormal state, explains all the observed abnormal states, and predicts manipulations that can abolish the T-LGL survival state. Several of these predictions were validated experimentally. I will finish by presenting two methods for extracting useful predictions from Boolean models of signal transduction networks without extensive simulations.

Developing one day to one week drop-in modules for undergraduate classrooms In Mathematical Biology
Margaret Cozzens, Rutgers University

The highly interdisciplinary nature of the modern day work force is one of the greatest challenges facing high school and college education. Rarely is knowledge of just a single discipline adequate for today’s job market: a reality that is forcing students to explore means to better equip themselves with an understanding of the full gambit of disciplines needed to make an effective and valuable contribution. Attempting to bridge the current gap in the curriculum, between seemingly disparate subjects, The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) is carrying out a portfolio of projects to improve student engagement in STEM fields and versatility for their transition from education to career. Through three series of curricular modules, DIMACS offers students insights into mathematics, computer sciences, and the sciences in general, and their interrelationships. In particular, one of these projects is the Challenge of Interdisciplinary Mathematics and Biology (IMB), which pioneered the linking of biology and mathematics in high schools. IMB includes twenty individual modules, which can be used in either biology or mathematics classes, as well as a senior-year (online book) on biomathematics, already in use across the country. A second project focuses on sustainability drop-in modules for college and high school classes. In both cases, many of the modules develop and utilize discrete and algebraic tools to solve specific problems at the interface of mathematics and biology. This talk will focus on how one develops curricular modules and tests their use in classrooms. Midge will help you draft an outline for modules to be developed this week, and will be available to assist participants as the week goes on.

Food Webs, Competition Graphs, and Habitat Formation
Margaret Cozzens, Rutgers University

One interesting example of a discrete mathematical model used in biology is a food web. The first biology courses in high school and in college present the fundamental nature of a food web, one that is understandable by students at all levels. But food webs as part of a larger system are often not addressed. This talk presents materials that can be used in undergraduate classes in biology and mathematics which provide students with the opportunity to explore mathematical models of predator-prey relationships, determine trophic levels, dominant species, stability of the ecosystem, competition graphs, interval graphs, and even confront problems that would appear to have logical answers that are as yet unsolved. Fundamental goals are for participants to be able to:

• Recognize various relationships between organisms, and look for patterns in food webs.
• Use graphs and directed graphs to model complex trophic relationships.
• Determine trophic levels and status within a food web, and the significance of these levels in calculating the relative importance of each species (vertices) and each relationship (arcs) in a food web.
• Use the food web and corresponding competition graph to determine the dimension of the community’s habitat.
Fitness landscapes: geometric and discrete approaches
Kristina Crona, University of California, Merced

Fitness landscapes and epistasis are central in evolutionary theory. The surface of a fitness landscape consists of genotypes, the height coordinates represent fitness, and adaptation can be illustrated as an uphill walk in the landscape. Epistasis means that the effect of a mutation depends on background. For instance, a double mutant may have low fitness, even if it combines two beneficial single mutations. Epistasis is relevant for most aspects of evolutionary biology, including recombination, speciation, evolutionary predictability, and maintenance of genetic variation in nature. Epistasis is also of practical importance. One application is antimicrobial drug resistance, where the goal is to predict, prevent and manage drug resistance problems.

The talk describes two recent approaches to epistasis using discrete methods. Fitness graphs are useful for describing coarse properties of a landscape, such as mutational trajectories and the number of peaks. The vertices of the graphs represent genotypes and the arrows point toward the more fit genotype. The geometric theory of gene interactions, on the other hand, is the most fine-scaled approach to epistasis. The theory is relevant for recombination and it depends on triangulations of polytopes, or shapes. In the biallelic case, the genotypes can be represented as vertices in a hypercube. The shape of a fitness landscape is the triangulation of the hypercube induced by the fitness landscape. Shapes and graphs provide complementary information.

BioModel Engineering - a Petri Net Perspective
Monika Heiner, Brandenburg University of Technology, Cottbus, Germany

The talk describes a Petri net-based framework to model and analyse biochemical networks, such as gene regulatory, signal transduction or metabolic networks. The framework comprises the qualitative, stochastic, and continuous modeling paradigms. Each perspective adds its contribution to the understanding of the system.

Recent extensions of the framework focus on multiscale issues. Multiscale modeling at multiple temporal scales is supported by hybrid Petri nets combining the stochastic and continuous modeling paradigms, while multiscale modeling at multiple spatial scales is assisted by colored Petri nets. Colored Petri nets combine (qualitative, stochastic, continuous, hybrid) Petri nets with (finite, discrete) data types, which permits concise representations of regularly structured nets, e.g. to statically encode finite, discrete space, possibly hierarchically structured.

The methodology is supported by a sophisticated toolkit, consisting of (1) Snoopy - tool for modeling and animation/simulation of hierarchical qualitative, stochastic, continuous, and hybrid Petri nets, and their colored counterparts; (2) Charlie - analysis tool of standard Petri net properties; and (3) Marcie - symbolic reachability analysis of qualitative Petri nets, analytical and simulative model checking of stochastic Petri nets.

Graph theory applications in biology
John Jungck, University of Delaware

Graph Theory has enormous applicability to biology. Each of the following graph theoretic concepts will be instantiated with biological applications: (1) ordering problems with One-dimensional Interval Graphs, Maximal Cliques, Transitive Orientability, Adjacency matrices, Incidence matrices; (2) Planar Graphs with Two-dimensional Voronoi Tessellations, Delaunay Triangulations, Lewis, Desch, and Aboav-Weaire Laws, Topological charge; (3) Three-dimensional Polytopes, Polyhedral Graphs, Delaunay tetrahedra, Euler on vertices, edges, and faces, T-numbers, quasi-crystals, Kepler's face-centered cubic packing, voxels; (4) x.y-dimensional Graphs, Fractals; (5) Graph Grammars' Lindermayer Systems; (6) Trees: binary branching trees, minimal spanning trees, Ulam trees, polytomies; (7) Networks: Random, Scale-free, Small World, diameter, connectance, complexity, clustering coefficients, hubs, fragility, robustness, Maximal Cliques and other sub-graphs; (8) Paths and Circuits: Eulerian and Hamiltonian, de Bruijn graphs; and (9) Bipartite Graphs: Matching, Pancaking flipping problem, Hungarian algorithm, Gale-Shapley algorithm & 2012 Nobel Prize. In many cases we have developed software for handling biological data using these graph theoretic techniques: Ka-me: A Voronoi Image Analyzer, javaBENZER, 3D FractaL Tree, Split Decomposition, and BioGrapher. Basically I will argue that many biological problems are best addressed with a discrete perspective and that topological reduction is a good way to investigate complex data sets.

Dynamics of Disease Transmission Networks
Winfried Just, Ohio University

The most basic models of transmission of infectious diseases assume a partition of the host population into a small number of compartments such as S (susceptible), I (infectious), and R (removed) and conceptualize the number or proportion of individuals in each compartment as variables in an ODE system. But disease transmission is inherently a stochastic event that may occur during a contact between a susceptible and an infectious individual. In real population, this probability will be different for different pairs of individuals. While classical compartment-based ODE models have only a limited capacity for dealing with these heterogeneities, it is, at least in principle, possible to build models of disease transmission based on the underlying contact networks.

This contribution will introduce models of disease transmission dynamics on a given contact network and the problem of comparing the predictions of such models with coarser, compartment-based ODE models. In other words, we want to know which network properties most significantly influence the course of an epidemic. These questions are currently gaining prominence in research on disease dynamics. Meaningful numerical explorations are feasible at a level suitable for undergraduate research and will constitute a large part of the module. The topic is also well suited for exploring how the choice of modeling assumptions influences the model's predictions. Moreover, the model will introduce some strategies for collecting data on contact networks and building models of such networks based on limited and somewhat unreliable data.

Algebraic models in systems biology
Reinhard Laubenbacher,University of Connecticut Health Center

Progress in systems biology relies on the use of mathematical and statistical models for system level studies of biological processes. Several different modeling frameworks have been used successfully. This talk will focus on several types of discrete models, and illustrate their construction and use via two case studies, one related to breast cancer, the other to respiratory fungal infections. The common theme connecting them is the role of iron metabolism.

Reduction of discrete models

Due to the size and complexity of biological systems and models, it is sometimes necessary to consider a "core" model that keeps the key dynamical properties of the original model, but is simple enough for analysis. In this talk we will present a systematic way to reduce discrete models while keeping important dynamical features such as steady states. We will focus on Boolean models of biological systems. A similar approach for discrete-time continuous-space models may be discussed.

Combinatorics of RNA secondary structures
Svetlana Poznanovik, Clemson University

Determining the structure of the RNA molecule for a given RNA sequence is an important problem in molecular and computational biology. The molecule structure depends on different types of intra-sequence base pairings. Unfortunately, many aspects of the 3D interactions between the nucleotides are still not well-understood. Since the prediction of the 3D structure is currently out of our reach, a lot of effort has been devoted to the prediction of the secondary structure as an important intermediate step towards the complete solution. Various combinatorial methods and models have been used to compare, classify, and otherwise analyze secondary structures and thereby contribute to our understanding of the base pairing of RNA sequences. This talk will survey some of the graph-theoretic and formal language approaches that are used to analyze secondary structures with emphasis on those accessible to undergraduate students.

Methods to reverse engineer biological networks
Brandilyn Stigler, Mathematics, Southern Methodist University

Gene regulatory networks are ubiquitous in molecular systems biology and contribute to the control of major biological processes including metabolism and development. Given the abundance of gene data sets, the ability to reconstruct the regulatory network underlying the data has become one of the prime objectives in systems biology research. We will introduce various methods for reverse-engineering or inferring the structure of gene regulatory networks. These methods use techniques from computational algebra to build models of polynomial dynamical systems, which provide a rich backdrop within which to perform network analyses. We will build some algebraic models and simulate how one could use them to design biologically meaningful models in an experimental setting.

Algebraic methods for molecular phylogenetics
Ruriko Yoshida, University of Kentucky

Recently there have been much work and collaborations between modern biology and higher mathematics. A number of important connections have been established between computational biology and the emerging field of "algebraic statistics", which applies tools from combinatorics, computational algebra, and polyhedral geometry to statistical computational problems and statistical modeling.

Phylogenetics has provided an abundant source of applications for algebraic statistics, with research areas including phylogenetic invariants, the geometry of tree space, and analysis of Phylogenetic reconstruction.

The purpose of this review is to provide the audience with an introduction to this subject, a non-comprehensive guide to further reading, and a collection of more detailed case studies that provide examples of how algebraic methods have been used in this the context of molecular phylogeny.

#### Posters

Modelling the Tryptophan Operon in E. coli
Jennifer Galovich, Mathematics, St. John's University

This poster introduces a discrete mathematical model of the tryptophan operon in E. coli using a polynomial dynamical system (PDS) and Analysis of Dynamic Algebraic Models (ADAM) for analysis. The goal of this work is to develop a discrete model of a biological system to serve as a starting point for further research by undergraduates.

Synchronous Dynamics of Boolean Models for Three- and Four-Gene Regulatory Networks with Multiple Feedback Loops
Dan Hrozencik, Chicago State University and Tim Comar, Benedictine University

We model the dynamics of a gene regulatory network with a synchronous updating Boolean model. Our study focuses on gene regulatory networks with two feedback circuits, one of which is Hamiltonian, and has one gene X acted upon by two genes in the network. We show how the structure of the state transition graph is determined by logical gate at the gene X. In particular, if the logical gate is XOR, the state transition graph consists of fixed points and state cycles, and if the logical gate is OR or AND, there is at least one state that does not have a predecessor in the state transition graph.

A Hybrid Method of Multiple Linear Regression and Population PK for the Improvement of AUC Estimation Using Limited Sampling Strategies
Leila Kheibarshekan, Mathematics, St. John's University

Pharmacometrics have been used to assist clinical protocol design and evaluate drug therapy performance. In a mathematical way, it gives a clear numerical relationship between drug intake, exposure and clinical outcomes, which are usually expressed through surrogates such as the area under the concentration-time curve (AUC). AUC generally represents the amount of drug reaching the bloodstream for a dose. Thus it is an important index to estimate the bioavailability, the percentage absorbed following the drug administration. The reliability of AUC estimation highly depends on effective blood samplings. Because of the inconvenience of dense sampling for pediatric patients, financial costs and labor to clinical staffs, limited sampling strategies (LSS) have been proposed. LSS searches not only the most effective sampling times (as few as possible, usually<4) for accuracy but also inexpensive method for the estimation.

Usual approaches for the estimation of AUC are classified as compartmental approaches or non-compartmental ones. Population pharmacokinetic (pop-PK) methods use the compartmental approaches and require data fittings for the PK model, whereas non-compartmental methods only use measured concentrations, with a predetermined formula for the calculation. Although non-compartmental methods are easy to manipulate, their lack of considerations for the non-linear nature and variability exhibited by biopharmaceutical processes often induces unstable results with completely misleading conclusions. On the other hand, compartmental approaches are rather difficult to manipulate and can be computationally expensive.

We propose here a hybrid method taking advantages of both multiple linear regression (MLR) and pop-PK. This new-fashioned multiple linear regression (hybrid MLR-PopPK) aims a more realistic target compared to the classical multiple-linear regression (MLR) method in LSS. In classical MLRs, the predictors are measured concentrations and the prediction is the observed AUC, which is the AUC of the observed concentrations. Thus, LSS tries to find those limited sampling times (best design, D*DV) that give rise to the best approximation of the observed AUC. Since the observed concentrations are in fact not the actual concentrations, which may better reflect the clinical indication, these results of LSS can be distorted and unreliable. In the hybrid MLR-PopPK, the target of prediction is changed to the AUC of actual concentrations. To validate our propose, NONMEM7 is used to estimate the actual full concentration-time curve (IPRED) from measured concentrations (DV). The estimation method implemented in NONMEM enables us to estimate the actual AUC as accurate as possible. The model is then fitted with these simulated data such that the difference between actual and predicted AUC is minimized. This approach is then repeated for different designs in the second step to find those limited sampling times (best design, D*NONMEM) which give rise to the best approximation of actual AUC. Using this new-fashioned MLR we could even improve the estimation of AUC by 7 percent or more compared to the classical MLR for one-compartment kinetic model with first order absorption.

Fireflies, finches, and digraphs
Drew LaMar, Mathematical Biology, College of William and Mary

Connections will be drawn between how some species of fireflies synchronize their flashing, how to test if species of finches on an archipelago are in competition with each other, and a class of directed graphs called split digraphs.

Mean-Field Boolean Network Model of a Signal Transduction Network
Dora Matache, Mathematics, University of Nebraska, Ohmaha

We provide a mean-field Boolean network model for a signal transduction network of a generic fibroblast cell. The network consists of several main signaling pathways, including the receptor tyrosine kinase (RTK), the G-protein coupled receptor, and the Integrin signaling pathway. The network contains 130 nodes, each of which representing a signaling molecule (mainly proteins). Nodes are governed by Boolean dynamics including canalizing functions as well as totalistic Boolean functions that depend only on the overall fraction of active nodes. We categorize the Boolean functions into several different classes. Using a mean-field approach that assumes a large enough network, we can ignore local correlations and generate a mathematical formula for the probability of a node becoming active at any time step. The model is shown to be a good match for the actual network. This is done by iterating both the actual network and the model and compare the results numerically. The model can be used to predict the network behavior under various parameter combinations, and to explore the robustness of the network to node mutations.

On the Sensitivity to Noise of a Boolean Function
Dora Matache, Mathematics, University of Nebraska, Ohmaha

We generate upper and lower bounds for the sensitivity to noise of a Boolean function. The robustness of a Boolean network to noisy inputs is related to the average sensitivity of that function, which measures how sensitive to changes in the inputs the output of the function is. We give an exact formula relating the sensitivity to noise and the average sensitivity of a Boolean function. It is observed that, for certain parameter combinations, the upper estimates in this paper are sharper than other estimates in the literature and that the lower estimates are very close to the actual values of the sensitivity to noise of the selected Boolean functions.

Graph theory in DNA self-assembly: an early entry point for interdisciplinary student research
Greta Pangborn, Computer Science, Saint Michael's College

For several years we have run an undergraduate research program, both summers and AY, using graph theoretical techniques to provide design strategies for DNA self-assembly. A working team frequently includes students from mathematics, computer science, and biology, as the project genuinely needs expertise from these different fields, thus requiring students from these different disciplines to use their backgrounds collaboratively. Furthermore, we accept students quite early in their careers, often the summer between their first and second year. We will highlight some of our students' results.

Machine Learning for Phylogenetic Invariants
Joe Rusinko, Mathematics, Winthrop University

We build on Eriksson and Yao's work, which uses machine learning to optimize the power of phylogenetic invariants to reconstruct evolutionary trees. To do this, we solve a semidefinite programming problem, using GNU Octave and CSDP, a solver for semidefinite programming problems. Our method includes inequalities arising from the study of the real points on the phylogenetic variety into the metric learning algorithm. Previous work focused on selecting a good set of invariants for the construction of quartet trees, we extend this work to more taxa by testing the accuracy on twelve taxa trees.