CTW: Recent Advances in Statistical Inference for Mathematical Biology

(February 20,2012 - February 24,2012 )

Organizers


Mark Girolami
Statistical Science,
Giles Hooker
Biological Statistics and Computational Biology, Cornell University
Theodore Kypraios
Mathematical Sciences, University of Nottingham
Simon Preston
Mathematical Sciences, University of Nottingham

The era of quantitative biology, and abundant data, calls for theoreticians and and experimentalists to address a fundamental scientific question: how can we learn as much as possible about the biological system we are studying - and make justified inferential statements about it - on the basis of combining theoretical models and experimental data? Related questions include: how to identify model parameters or test hypotheses given experimental data; how to evaluate model adequacy and inform model refinement; how to choose amongst a set of candidate models; and how to determine optimal experimental design to maximise information in the data.

Exciting progress is being made on these challenging issues. This workshop will bring together foremost researchers from the fields of biology, applied mathematics, statistics, and computer science to discuss recent advances in statistical inference for mathematical biology.

Accepted Speakers

Douglas Bates
Statistics, University of Wisconsin
Richard Boys
School of Mathematics & Statistics, Newcastle University
Nicolas Brunel
departement de mathematiques, University of Chicago
Nigel Burroughs
Systems Biology Centre,
Ben Calderhead
Statistical Science, University College London
David Campbell
Statistics and Actuarial Science,
Carson Chow
NIH/NIDDK/LBM
Vanja Dukic
Applied Mathematics, University of Colorado
Barbel Finkenstadt
Statistics,
Colin Gillespie
School of Mathematics & Statistics, Newcastle University
Andrew Golightly
School of Mathematics & Statistics,
John Guckenheimer
Department of Mathematics, Cornell University
Ed Ionides
Statistics, University of Michigan
Clemens Kreutz
Institute of Physics, University of Freiburg
Subhash Lele
Mathematical and Statistical Sciences, University of Alberta
Christina Leslie
Computational Biology Program, Memorial Sloan-Kettering Cancer Center
Sofia Olhede
Department of Statistical Science,
Dennis Prangle
Mathematics and Statistics, Lancaster University
Dov Stekel
School of Biosciences,
Joe Tien
Department of Mathematics, The Ohio State University
Tina Toni
Department of Biological Engineering, Massachusetts Institute of Technology
Mark Transtrum
Bioinformatics and Computational Biology, University of Texas M. D. Anderson Cancer Center
Hulin Wu
Biostatistics and Computational Biology, University of Rochester
Monday, February 20, 2012
Time Session
09:00 AM
09:50 AM
Douglas Bates - Inference in Mixed-Effects (and other) Models Through Profiling the Objective
Douglas Bates, Department of Statistics, University of Wisconsin - Madison
Presentation (slides version): http://mbi.osu.edu/2011/rasmaterials/ProfilingD.pdf
Presentation (notes version): http://mbi.osu.edu/2011/rasmaterials/ProfilingN.pdf

The use of Markov-chain Monte Carlo methods for Bayesian inference has increased awareness of the need to view the posterior distribution of the parameter (in the Bayesian sense) or the distribution of the parameter estimator for those who prefer non-Bayesian techniques. I will concentrate on non-Bayesian inference although the techniques can also be applied to the posterior density in Bayesian methods. For many statistical models, including linear and generalized linear mixed-effects models, parameter estimates are defined as the optimizer of an objective function, e.g. the MLE's maximize the log-likelihood, and inference is based upon the location of the optimizer and local approximation at the optimizer, without assessing the validity of the approximation. This made sense when fitting a single model may have involved many days waiting for the answers from shared computer systems. It doesn't make sense when models can be fit in a few seconds. By repeatedly fitting a model subject to holding a particular parameter fixed we can build up a profile of the objective with respect to the parameter and use the information to produce profile based confidence intervals. But perhaps the most important aspect of the technique is graphical presentation of the results that force us to consider the behavior of the estimator beyond the estimate, which can cast doubt on many of the principles of inference and simulation that we hold dear.
09:50 AM
10:40 AM
Ed Ionides - Inference for partially observed stochastic dynamic system
Ed Ionides, Statistics, University of Michigan
Presentation: http://mbi.osu.edu/2011/rasmaterials/mbi12_ionides.pdf

Characteristic features of biological dynamic systems include stochasticity, nonlinearity, measurement error, unobserved variables, unknown system parameters, and even unknown system mechanisms. I will consider the resulting inferential challenges, with particular reference to pathogen/host systems (i.e., disease transmission). I will focus on statistical inference methodology which is based on simulations from a numerical model; such methodology is said to have the plug-and-play property. Plug-and-play methodology frees the modeler from an obligation to work with models for which transition probabilities are analytically tractable. A recent advance in plug-and-play likelihood-based inference for general partially observed Markov process models has been provided by the iterated filtering algorithm. I will discuss the theory and practice of iterated filtering.
11:10 AM
12:00 PM
Nicolas Brunel - Estimation of ordinary differential equations with orthogonal conditions
Parameter inference of ordinary differential equations from noisy data can be seen as a nonlinear regression problem, within a parametric setting. The use of a classical statistical method such as Nonlinear Least Squares (NLS) gives rise to difficult and heavy optimization problems due to the corresponding badly posed inverse problem. Gradient Matching algorithms use a smooth (nonparametric) estimation of the solution from which is derived a nonparametric estimate of the derivative, and gives rise to a natural criterion easier than NLS to optimize. We introduce here a new class of criteria based on a weak formulation of the ODE. The estimator derived can be viewed as a generalized moment estimators which possesses nice statistical and computational properties. Finally, we consider several examples which illustrate the efficiency and the versatility of the proposed method.
02:00 PM
02:50 PM
Sofia Olhede - Temporal inhomogeneity and dependence of brain networks
Electroencephalography (EEG) measurements yield information about electrical activity of the brain, via measured voltage fluctuations. Measurements are normally made at several locations on the scalp, and the network of activity is inferred from analysing multiple time series, that are neither linear nor stationary. We shall discuss how modern time series analysis methods can be adapted and innovated to model and make inferences of the complicated joint time-varying properties of such observations.

This is joint work with Maria Fitzgerald (UCL) and Hernando Ombao (Brown).
03:20 PM
04:10 PM
Nigel Burroughs - Verification of a biophysical surface protein patternation model: MCMC analysis of dual fluorescence data
In a number of biological systems a complex spatio-temporal orchestration of protein relocation is observed in cell:cell contact interfaces, predominantly a separation of small receptors from large ones, i.e. there is a segregation by size. This patterned structure is called the immunological synapse, occurring predominantly between cells of the immune system. Such structures can be realised in model experimental systems in 2D, or visualised in 3D z-stacks. A segregation mechanism that explains patternation by a minimisation of the free energy using a model of bond stretching has been extensively examined theoretically within a number of modelling frameworks; all these models agree that such a mechanism is able to explain the observed patterns within certain parameter regimes. However proving that the system is in the parameter regime conducive to separation has not been achieved. We use Bayesian inference to fit a model of fluorescence intensity to single cell data, extracting key parameters of the patternation- specifically estimating local free energies for proteins. By quantifying degrees of protein exclusion we determine the degree to which the patterns are consistent with the size dependent thermodynamic model of segregation.
Tuesday, February 21, 2012
Time Session
09:00 AM
09:50 AM
Barbel Finkenstadt - Modeling and inference for gene expression time series data (an overview)
A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. Here we present an overview of the modeling approaches based on stochastic population dynamic models and their approximations. For an application on the mesoscopic scale, we present a two dimensional continuous-time Bayesian hierarchical diffusion model which has the potential to address the different sources of variability that are relevant to the stochastic modelling of transcriptional and translational processes at the molecular level, namely, intrinsic noise due to the stochastic nature of the birth and deaths processes involved in chemical reactions, extrinsic noise arising from the cell-to-cell variation of kinetic parameters associated with these processes and noise associated with the measurement process. Inference is complicated by the fact that only the protein and rarely other molecular species are observed which is typically entailing problems of parameter identification in dynamical systems.

For an application on the macroscopic scale, we introduce a mechanistic 'switch' model for encoding a continuous transcriptional profile of genes over time with the aim of identifying the timing properties of mRNA synthesis which is assumed to switch between periods of transcriptional activity and inactivity, each time leading to the transition of a new steady state, while mRNA degradation is an ongoing linear process. The model is rich enough to capture a wide variety of expression behaviours including periodic genes. Finally, I will also give a brief introduction to some recent work on inferring the periodicity of the expression of circadian and other oscillating genes.

Joint work with: Maria Costa, Dan Woodcock, Dafyd Jenkins, David Rand (all Warwick Systems Biology), Michal Komorowski (Imperial College London).
09:50 AM
10:40 AM
Dov Stekel - Inferring the gap between mechanism and phenotype in dynamical models of gene regulation
Dynamical (differential equation) models in molecular biology are often cast in terms of biological mechanisms such as transcription, translation and protein-protein and protein-DNA interactions. However, most molecular biological measurements are at the phenotypic level, such as levels of gene or protein expression in wild type and chemically or genetically perturbed systems. Mechanistic parameters are often difficult or impossible to measure. We have been combining dynamical models with statistical inference as a means to integrate phenotypic data with mechanistic hypotheses. In doing so we are able to identify key parameters that determine system behaviour, and parameters with insufficient evidence to estimate, and thus make informed predictions for further experimental work. We are also able to use inferred parameters to build stochastic and multi-scale models to investigate behaviour at single-cell level. We apply these ideas to two systems in microbiology: global gene regulation in the antibiotic-resistance bearing RK2 plasmids, and zinc uptake and efflux regulation in Escherichia coli.
11:10 AM
12:00 PM
Hulin Wu - Statistical Methods for High-Dimensional ODE Models for Dynamic Gene Regulatory Networks
Gene regulation is a complicated process. The interaction of many genes and their products forms an intricate biological network. Identification of this dynamic network will help us understand the biological process in a systematic way. However, the construction of such a dynamic network is very challenging for a high-dimensional system. We propose to use a set of ordinary differential equations (ODE), coupled with dimensional reduction by clustering and mixed-effects modeling techniques, to model the dynamic gene regulatory network (GRN). The ODE models allow us to quantify both positive and negative gene regulations as well as feedback effects of one set of genes in a functional module on the dynamic expression changes of the genes in another functional module, which results in a directed graph network. A five-step procedure, Clustering, Smoothing, regulation Identification, parameter Estimates refining and Function enrichment analysis (CSIEF) is developed to identify the ODE-based dynamic GRN. In the proposed CSIEF procedure, a series of cutting-edge statistical methods and techniques are employed. We apply the proposed method to identify the dynamic GRN for yeast cell cycle progression data and immune response to influenza infection. We are able to annotate the identified modules through function enrichment analyses. Some interesting biological findings are discussed. The proposed procedure is a promising tool for constructing a general dynamic GRN and more complicated dynamic networks.
02:00 PM
02:50 PM
Mark Transtrum - Sloppy Models, Information Geometry, and Data Fitting
Parameter estimation by nonlinear least squares minimization is a ubiquitous problem that has an elegant geometric interpretation: all possible parameter values induce a manifold embedded within the space of data. The minimization problem is then to find the point on the manifold closest to the data. By interpreting nonlinear models as a generalized interpolation scheme, we find that the manifolds of many models, known as sloppy models, have boundaries and that their widths form a hierarchy. We describe this universal structure as a hyper-ribbon. The hyper-ribbon structure explains many of the difficulties associated with fitting nonlinear models and suggests improvements to standard algorithms. We add a "geodesic acceleration" correction to the standard Levenberg-Marquardt algorithm and observe a dramatic increase in success rate and convergence speed on many fitting problems.
02:50 PM
03:40 PM
Ben Calderhead - Riemannian Manifolds and Statistical Models: The use of Differential Geometry in Probabilistic Modelling
As mathematical models of biological systems increase in size and complexity, so too does the need for sophisticated statistical methodology that can consistently evaluate and update the evidence in favour of each model, given the limited and highly variable experimentaldata that is often available. A probabilistic approach based on Bayesian statistics offers a mathematically consistent way of characterising this uncertainty, both within the parameters and the models themselves.

There are many computational challenges associated with a Bayesian approach to model ranking and inference. The procedure is equivalent to evaluating high-dimensional integrals generally involving highly nonlinear functions. In more than three or four dimensions deterministic approaches are no longer feasible and we may resort to stochastic integration using simulation-based Monte Carlo techniques. The main challenge often lies in drawing samples from analytically intractable posterior probabilitydistributions, which exhibit strong nonlinear correlation structures, high dimensionality and non-identifiability of parameters.

In this talk I will discuss how Markov chain Monte Carlo methodology based on the natural differential geometric structure of the parameter space of a statistical model may alleviate many of the simulation issues by making proposals based on local sensitivity information. Along the way, I shall highlight the deep link between the sensitivity analysis of such statistical models and the underlying Riemannian geometry of the induced posterior probability distributions. I shall demonstrate how this methodology may be used to rank two differential equation models describing circadian rhythms in the plant Arabidopsis thaliana.
Wednesday, February 22, 2012
Time Session
09:00 AM
09:50 AM
John Guckenheimer - Dynamical Models of Periodic Processes
Normal forms of dynamical systems can be used as nonlinear models for time series data. This talk describes the use of Floquet theory as a normal form for a stable periodic orbit. Primary goals of this joint research with Shai Revzen are to estimate the spectrum of Lyapunov exponents and to develop reduced order models based upon weakly stable modes of the system. The methods have been applied to animal locomotion, notably data from running cockroaches.
09:50 AM
10:40 AM
Subhash Lele - Errors in variables models: Diagnosing parameter estimability and MCMC convergence using empirical characteristic functions
Most ecological models are constructed to understand the relationship between environmental variables and an ecological response, be it site occupancy or population abundance or changes to them. The usual regression models take into account the environmental variation in the response but in many cases, the measurement of the environmental variables themselves are made with error. This is called an errors-in-variables model. Measurement error in the covariates leads to substantial issues with parameter estimability and likelihood-based inference is computationally challenging. Bayesian inference using Markov Chain Monte Carlo methods also runs into trouble because of the convergence issues with the MCMC algorithm. These convergence issues are severe especially with non-informative priors.

Errors in variables models, linear and non-linear, can be formulated as hierarchical models. Data cloning is a recently developed computational technique to conduct likelihood- based analysis for general hierarchical models. In this paper, we show that data cloning coupled with informative priors can circumvent the convergence issues with MCMC. We develop a new testing procedure to compare multivariate distributions using empirical characteristic functions and show its usefulness in diagnosing convergence of MCMC algorithm in these tricky situations. More importantly, we show that data cloning not only facilitates parameter estimation but also diagnosing which parameters are estimable and which ones are not. This is essential for drawing scientifically meaningful inferences. We illustrate the method using various linear and non-linear regression models useful in ecology. We report a somewhat surprising result that a widely used population dynamics model, the Hasell model, is non-identifiable but a closely related Generalized Beverton-Holt model is identifiable.

Work done in collaboration with Khurram Nadeem.
11:10 AM
12:00 PM
Tina Toni - Modeling and inference framework for studying noise and cell-to-cell variability in synthetic biology
Synthetic biology faces many challenges in its bold attempt to construct functioning and programmable devices, circuits, and higher constructs. Chief amongst these may be selecting from multiple design possibilities given partially characterized biological systems, effects of biological noise and cell-to-cell variability. Here we report progress on developing modeling and inference techniques that incorporate noise and cell-to-cell variability to guide design principles in synthetic biology.
12:00 PM
12:50 PM
Clemens Kreutz - Likelihood-based observability analysis and confidence intervals for model predictions
Dynamic models of biochemical networks contain unknown parameters like the reaction rates and the initial concentrations of the compounds. The large number of parameters as well as their nonlinear impact on the model responses hampers the determination of confidence regions for parameter estimates. At the same time, classical approaches translating the uncertainty of the parameters into confidence intervals for model predictions are hardly feasible.

We present the so-called prediction profile likelihood which is utilized to generate reliable confidence intervals for model predictions. The prediction confidence intervals of the dynamic states are exploited for a data-based observability analysis. Moreover, a validation profile likelihood is introduced that can be applied when noisy validation experiments are judged.

The presented approaches are also applicable if there are non-identifiable parameters. Such ambiguities yield insufficiently specified model predictions that can be interpreted as nonobservability. The properties and applicability the approach are demonstrated by two examples, a small but instructive ODE model, and a model for the MAP kinase signal transduction pathway.

Work done in collaboration with A. Raue and J. Timmer. This project has been funded by the BMBF grants VirtualLiver 0315766 and FRISYS 0313921.
Thursday, February 23, 2012
Time Session
09:00 AM
09:50 AM
Carson Chow - Using Bayesian and MCMC approaches for parameter estimation and model evaluation in physiology
Presentation: http://mbi.osu.edu/2011/rasmaterials/mbibayes20121_chow.pdf
Differential equations are often used to model biological and physiological systems. An important and difficult problem is how to estimate parameters and decide which model among possible models is the best. I will show in several examples how Bayesian and Markov Chain Monte Carlo approaches provide a self-consistent framework to do both tasks. In particular, Bayesian parameter estimation provides a natural measure of parameter sensitivity and Bayesian model comparison automatically evaluates models by rewarding fit to the data while penalizing the number of parameters.
09:50 AM
10:40 AM
Andrew Golightly - Particle MCMC for Stochastic Kinetic Models
Andrew Golightly, School of Mathematics & Statistics, Newcastle University
Presentation: http://mbi.osu.edu/2011/rasmaterials/AGmbi12.pdf

We consider the problem of performing Bayesian inference for the rate constants governing stochastic kinetic models. As well as considering inference for the resulting Markov jump process (MJP) we consider working with a diffusion approximation obtained by matching the infinitesimal mean and variance of the MJP to the drift and diffusion coefficients of a stochastic differential equation (SDE). We sample from the posterior distribution of the model parameters given observations at discrete times via recently proposed particle MCMC methods. In the case of the diffusion approximation we increase the efficiency of the inference algorithm by exploiting the structure of the SDE. We present results from two toy examples: a Lotka-Volterra system and a simple model of prokaryotic autoregulation.
11:10 AM
12:00 PM
Dennis Prangle - Summary statistics for ABC model choice
Dennis Prangle, Mathematics & Statistics, Lancaster University
Presentation: http://mbi.osu.edu/2011/rasmaterials/MBI_DennisPrangle.pdf

ABC is a powerful method for inference of statistical models with intractable likelihoods. Recently there has been much interest in using ABC for model choice and concerns have been raised that the results are not robust to summary statistic choice. We propose a method of choosing useful summary statistics and apply the method to population genetic and epidemiological examples.
02:00 PM
02:50 PM
Colin Gillespie - Bayesian inference for generalized stochastic population growth models with application to aphids
In this talk I will analyse the effects of various treatments on cotton aphids Aphis gossypii. The standard analysis of count data on cotton aphids determines parameter values by assuming a deterministic growth model and combines these with the corresponding stochastic model to make predictions on population sizes, depending on treatment. Here, we use an integrated stochastic model to capture the intrinsic stochasticity, of both observed aphid counts and unobserved cumulative population size for all treatment combinations simultaneously.

Unlike previous approaches, this allows us to explicitly explore and more accurately assess treatment interactions. Markov chain Monte Carlo methods and the moment closure technique are used within a Bayesian framework to integrate over uncertainty associated with the unobserved cumulative population size and estimate the result twenty-eight parameters. We restrict attention to data on aphid counts in the Texas High Plains obtained for three different levels of irrigation water, nitrogen fertiliser and block, but note that the methods we develop can be applied to a wide range of problems in population ecology.
02:50 PM
03:40 PM
Richard Boys - Linking systems biology models to data: a stochastic kinetic model of p53 oscillations
This talk considers the assessment and refinement of a dynamic stochastic process model of the cellular response to DNA damage. The proposed model is a complex nonlinear continuous time latent stochastic process. It is compared to time course data on the levels of two key proteins involved in this response, captured at the level of individual cells in a human cancer cell line. The primary goal of is to "calibrate" the model by finding parameters of the model (kinetic rate constants) that are most consistent with the experimental data. Significant amounts of prior information are available for the model parameters. It is therefore most natural to consider a Bayesian analysis of the problem, using sophisticated MCMC methods to overcome the formidable computational challenges.
04:10 PM
05:00 PM
Vanja Dukic - Bayesian Modeling of Smoking Exposure During Pregnancy
Studies trying to assess effects of prenatal exposure to cigarettes frequently acquire both self-report and biologic assays of maternal smoking. Most common biological assays are those of cotinine, a metabolite of nicotine, from urine or serum. Both of those measures have their own sources of information and bias. Single bioassay measures alone cannot reflect the metabolic mechanism over time, while self-report may have serious recall, topographic, and metabolic biases. In this project we present a Bayesian statistical model for describing in utero smoking exposure based on the combined biological and self-report information. The model takes into account heterogeneity among women and metabolism during pregnancy. The model is applied to the data from East Boston Family Study.
Friday, February 24, 2012
Time Session
09:00 AM
09:50 AM
Joe Tien - Parameter estimation for bursting neural models
Bursting is a ubiquitous phenomenon in neuroscience which involves multiple time scales (fast spikes vs. long quiescent intervals). Parameter estimation for bursting models is difficult due to these multiple scales. I will describe an approach to parameter estimation for these models which utilizes the geometry underlying bursting. This is joint work with John Guckenheimer.
10:20 AM
11:10 AM
Christina Leslie - Inferring transcriptional and microRNA-mediated regulatory programs in cancer
Large-scale cancer genomics projects are profiling hundreds of tumors at multiple molecular layers, including copy number, mRNA and miRNA expression, but the mechanistic relationships between these layers are often excluded from computational models. We developed a sparse regression framework for integrating molecular profiles with regulatory elements to reveal reveal mechanisms of dysregulation of gene expression in cancer, including miRNA-mediated expression changes. We applied our approach to 320 glioblastoma tumors and identified key miRNAs and transcription factors as common or subtype-specific regulators. We confirmed that target gene expression signatures for proneural subtype regulators were consistent with in vivo expression changes in a relevant mouse model. We tested two predicted proneural drivers, miR-124 and miR-132, both underexpressed in proneural tumors, by overexpression in neurospheres and observed a partial reversal of corresponding tumor expression changes. Computationally dissecting the role of miRNAs in cancer may ultimately lead to small RNA therapeutics tailored to subtype or individual.
Name Email Affiliation
Allen, Linda linda.j.allen@ttu.edu Department of Mathematics and Statistics, Texas Tech University
Bates, Douglas bates@stat.wisc.edu Statistics, University of Wisconsin
Bortz, David dmbortz@Colorado.edu Applied Mathematics, University of Colorado
Boys, Richard richard.boys@ncl.ac.uk School of Mathematics & Statistics, Newcastle University
Brunel, Nicolas nicolas.brunel@ibisc.fr departement de mathematiques, University of Chicago
Burroughs, Nigel N.J.Burroughs@warwick.ac.uk Systems Biology Centre,
Calderhead, Ben ben@stats.ucl.ac.uk Statistical Science, University College London
Campbell, David dac5@stat.sfu.ca Statistics and Actuarial Science,
Chen, Hegang hchen003@umaryland.edu Biostatistics and Bioinformatics, University of Maryland
Chen, Iris sinuiris@gmail.com Biostatistics and Computational Biology, University of Rochester
Chkrebtii, Oksana ochkrebt@sfu.ca Statistics & Actuarial Science, Simon Fraser University
Chow, Carson carsonc@niddk.nih.gov NIH/NIDDK/LBM
Cui, Xinping xinping.cui@ucr.edu Statistics, University of California, Riverside
Dukic, Vanja vanja.dukic@colorado.edu Applied Mathematics, University of Colorado
Dworkin, Michael dworkin.11@osu.edu Mathematics, The Ohio State University
Finkenstadt, Barbel b.f.finkenstadt@warwick.ac.uk Statistics,
Giannoulatou, Eleni giannoul@stats.ox.ac.uk Weatherall Institute of Molecular Medicine, University of Oxford
Gillespie, Colin colin.gillespie@ncl.ac.uk School of Mathematics & Statistics, Newcastle University
Girolami, Mark girolami@stats.ucl.ac.uk Statistical Science,
Golightly, Andrew a.golightly@ncl.ac.uk School of Mathematics & Statistics,
Guckenheimer, John jmg16@cornell.edu Department of Mathematics, Cornell University
Herman, Dorota DXH885@bham.ac.uk Centre for Systems Biology, University of Birmingham
Hirt, Bartholomaeus Barth.Hirt@gmx.net School of Mathematical Sciences, University of Nottingham
Hooker, Giles gjh27@cornell.edu Biological Statistics and Computational Biology, Cornell University
Ionides, Ed ionides@umich.edu Statistics, University of Michigan
Jabbari, Sara sara.jabbari@nottingham.ac.uk School of Molecular and Medical Sciences, University of Nottingham
King, Aaron kingaa@umich.edu Ecology and Evolutionary Biology/Mathematics, University of Michigan
Kramer, Peter kramep@rpi.edu Mathematical Sciences, Rensselaer Polytechnic Institute
Kreutz, Clemens ckreutz@fdm.uni-freiburg.de Institute of Physics, University of Freiburg
Kypraios, Theodore theodore.kypraios@nottingham.ac.uk Mathematical Sciences, University of Nottingham
Lele, Subhash slele@ualberta.ca Mathematical and Statistical Sciences, University of Alberta
Leslie, Christina cleslie@cbio.mskcc.org Computational Biology Program, Memorial Sloan-Kettering Cancer Center
Ma, Ping pingma@illinois.edu Department of Statistics and Institute for Genomic Biology, University of Illinois at Urbana-Champaign
Manolopoulou, Ioanna im30@stat.duke.edu Statistical Science, Duke University
Nadeem, Khurram knadeem@ualberta.ca Mathematical & Statistical Sciences, University of Alberta
Nanda, Seema nanda@math.tifrbng.res.in Mathematics,
Ngonghala, Calistus ngonghala@yahoo.com Mathematics , West Virginia University
Oganyan, Anna aoganyan@georgiasouthern.edu Mathematical Sciences, Georgia Southern University
Olhede, Sofia s.olhede@ucl.ac.uk Department of Statistical Science,
Pawlikowska, Iwona ipiwona@yahoo.com Center for Biotechnology and Genomic Medicine, Georgia Health Sciences University
Peacock, Stephanie stephanie.peacock@ualberta.ca Biological Sciences, University of Alberta
Prangle, Dennis d.prangle@lancaster.ac.uk Mathematics and Statistics, Lancaster University
Preston, Simon simon.preston@nottingham.ac.uk Mathematical Sciences, University of Nottingham
Ratmann, Oliver oliver.ratmann@duke.edu Biology, Duke University
Seweryn, Michal mseweryn@math.uni.lodz.pl Biostatistics, Georgia Health Sciences University
Stanhope, Shelby srs114@pitt.edu Mathematics, University of Pittsburgh
Stekel, Dov dov.stekel@nottingham.ac.uk School of Biosciences,
Tien, Joe jtien@math.ohio-state.edu Department of Mathematics, The Ohio State University
Toni, Tina ttoni@mit.edu Department of Biological Engineering, Massachusetts Institute of Technology
Transtrum, Mark mkt26@cornell.edu Bioinformatics and Computational Biology, University of Texas M. D. Anderson Cancer Center
Wesolowski, Sergiusz wesserg@gmail.com Mathematics, Informatics, Mechanics, University of Warsaw
Wu, Hulin Hulin_Wu@urmc.rochester.edu Biostatistics and Computational Biology, University of Rochester
Wu, Jialiang gtg337v@mail.gatech.edu Biomedical Engineering, Georgia Institute of Technology
Wu, Shuang shuang_wu@urmc.rochester.edu Department of Biostatistics and Computational Biology, University of Rochester
Wyse, Jason jason@stats.ucl.ac.uk
Xiao, Zhen zxiao001@ucr.edu Department of Statistics, University of California, Riverside
Inference in Mixed-Effects (and other) Models Through Profiling the Objective
Douglas Bates, Department of Statistics, University of Wisconsin - Madison
Presentation (slides version): http://mbi.osu.edu/2011/rasmaterials/ProfilingD.pdf
Presentation (notes version): http://mbi.osu.edu/2011/rasmaterials/ProfilingN.pdf

The use of Markov-chain Monte Carlo methods for Bayesian inference has increased awareness of the need to view the posterior distribution of the parameter (in the Bayesian sense) or the distribution of the parameter estimator for those who prefer non-Bayesian techniques. I will concentrate on non-Bayesian inference although the techniques can also be applied to the posterior density in Bayesian methods. For many statistical models, including linear and generalized linear mixed-effects models, parameter estimates are defined as the optimizer of an objective function, e.g. the MLE's maximize the log-likelihood, and inference is based upon the location of the optimizer and local approximation at the optimizer, without assessing the validity of the approximation. This made sense when fitting a single model may have involved many days waiting for the answers from shared computer systems. It doesn't make sense when models can be fit in a few seconds. By repeatedly fitting a model subject to holding a particular parameter fixed we can build up a profile of the objective with respect to the parameter and use the information to produce profile based confidence intervals. But perhaps the most important aspect of the technique is graphical presentation of the results that force us to consider the behavior of the estimator beyond the estimate, which can cast doubt on many of the principles of inference and simulation that we hold dear.
Linking systems biology models to data: a stochastic kinetic model of p53 oscillations
This talk considers the assessment and refinement of a dynamic stochastic process model of the cellular response to DNA damage. The proposed model is a complex nonlinear continuous time latent stochastic process. It is compared to time course data on the levels of two key proteins involved in this response, captured at the level of individual cells in a human cancer cell line. The primary goal of is to "calibrate" the model by finding parameters of the model (kinetic rate constants) that are most consistent with the experimental data. Significant amounts of prior information are available for the model parameters. It is therefore most natural to consider a Bayesian analysis of the problem, using sophisticated MCMC methods to overcome the formidable computational challenges.
Estimation of ordinary differential equations with orthogonal conditions
Parameter inference of ordinary differential equations from noisy data can be seen as a nonlinear regression problem, within a parametric setting. The use of a classical statistical method such as Nonlinear Least Squares (NLS) gives rise to difficult and heavy optimization problems due to the corresponding badly posed inverse problem. Gradient Matching algorithms use a smooth (nonparametric) estimation of the solution from which is derived a nonparametric estimate of the derivative, and gives rise to a natural criterion easier than NLS to optimize. We introduce here a new class of criteria based on a weak formulation of the ODE. The estimator derived can be viewed as a generalized moment estimators which possesses nice statistical and computational properties. Finally, we consider several examples which illustrate the efficiency and the versatility of the proposed method.
Verification of a biophysical surface protein patternation model: MCMC analysis of dual fluorescence data
In a number of biological systems a complex spatio-temporal orchestration of protein relocation is observed in cell:cell contact interfaces, predominantly a separation of small receptors from large ones, i.e. there is a segregation by size. This patterned structure is called the immunological synapse, occurring predominantly between cells of the immune system. Such structures can be realised in model experimental systems in 2D, or visualised in 3D z-stacks. A segregation mechanism that explains patternation by a minimisation of the free energy using a model of bond stretching has been extensively examined theoretically within a number of modelling frameworks; all these models agree that such a mechanism is able to explain the observed patterns within certain parameter regimes. However proving that the system is in the parameter regime conducive to separation has not been achieved. We use Bayesian inference to fit a model of fluorescence intensity to single cell data, extracting key parameters of the patternation- specifically estimating local free energies for proteins. By quantifying degrees of protein exclusion we determine the degree to which the patterns are consistent with the size dependent thermodynamic model of segregation.
Riemannian Manifolds and Statistical Models: The use of Differential Geometry in Probabilistic Modelling
As mathematical models of biological systems increase in size and complexity, so too does the need for sophisticated statistical methodology that can consistently evaluate and update the evidence in favour of each model, given the limited and highly variable experimentaldata that is often available. A probabilistic approach based on Bayesian statistics offers a mathematically consistent way of characterising this uncertainty, both within the parameters and the models themselves.

There are many computational challenges associated with a Bayesian approach to model ranking and inference. The procedure is equivalent to evaluating high-dimensional integrals generally involving highly nonlinear functions. In more than three or four dimensions deterministic approaches are no longer feasible and we may resort to stochastic integration using simulation-based Monte Carlo techniques. The main challenge often lies in drawing samples from analytically intractable posterior probabilitydistributions, which exhibit strong nonlinear correlation structures, high dimensionality and non-identifiability of parameters.

In this talk I will discuss how Markov chain Monte Carlo methodology based on the natural differential geometric structure of the parameter space of a statistical model may alleviate many of the simulation issues by making proposals based on local sensitivity information. Along the way, I shall highlight the deep link between the sensitivity analysis of such statistical models and the underlying Riemannian geometry of the induced posterior probability distributions. I shall demonstrate how this methodology may be used to rank two differential equation models describing circadian rhythms in the plant Arabidopsis thaliana.
Using Bayesian and MCMC approaches for parameter estimation and model evaluation in physiology
Presentation: http://mbi.osu.edu/2011/rasmaterials/mbibayes20121_chow.pdf
Differential equations are often used to model biological and physiological systems. An important and difficult problem is how to estimate parameters and decide which model among possible models is the best. I will show in several examples how Bayesian and Markov Chain Monte Carlo approaches provide a self-consistent framework to do both tasks. In particular, Bayesian parameter estimation provides a natural measure of parameter sensitivity and Bayesian model comparison automatically evaluates models by rewarding fit to the data while penalizing the number of parameters.
Bayesian Modeling of Smoking Exposure During Pregnancy
Studies trying to assess effects of prenatal exposure to cigarettes frequently acquire both self-report and biologic assays of maternal smoking. Most common biological assays are those of cotinine, a metabolite of nicotine, from urine or serum. Both of those measures have their own sources of information and bias. Single bioassay measures alone cannot reflect the metabolic mechanism over time, while self-report may have serious recall, topographic, and metabolic biases. In this project we present a Bayesian statistical model for describing in utero smoking exposure based on the combined biological and self-report information. The model takes into account heterogeneity among women and metabolism during pregnancy. The model is applied to the data from East Boston Family Study.
Modeling and inference for gene expression time series data (an overview)
A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. Here we present an overview of the modeling approaches based on stochastic population dynamic models and their approximations. For an application on the mesoscopic scale, we present a two dimensional continuous-time Bayesian hierarchical diffusion model which has the potential to address the different sources of variability that are relevant to the stochastic modelling of transcriptional and translational processes at the molecular level, namely, intrinsic noise due to the stochastic nature of the birth and deaths processes involved in chemical reactions, extrinsic noise arising from the cell-to-cell variation of kinetic parameters associated with these processes and noise associated with the measurement process. Inference is complicated by the fact that only the protein and rarely other molecular species are observed which is typically entailing problems of parameter identification in dynamical systems.

For an application on the macroscopic scale, we introduce a mechanistic 'switch' model for encoding a continuous transcriptional profile of genes over time with the aim of identifying the timing properties of mRNA synthesis which is assumed to switch between periods of transcriptional activity and inactivity, each time leading to the transition of a new steady state, while mRNA degradation is an ongoing linear process. The model is rich enough to capture a wide variety of expression behaviours including periodic genes. Finally, I will also give a brief introduction to some recent work on inferring the periodicity of the expression of circadian and other oscillating genes.

Joint work with: Maria Costa, Dan Woodcock, Dafyd Jenkins, David Rand (all Warwick Systems Biology), Michal Komorowski (Imperial College London).
Bayesian inference for generalized stochastic population growth models with application to aphids
In this talk I will analyse the effects of various treatments on cotton aphids Aphis gossypii. The standard analysis of count data on cotton aphids determines parameter values by assuming a deterministic growth model and combines these with the corresponding stochastic model to make predictions on population sizes, depending on treatment. Here, we use an integrated stochastic model to capture the intrinsic stochasticity, of both observed aphid counts and unobserved cumulative population size for all treatment combinations simultaneously.

Unlike previous approaches, this allows us to explicitly explore and more accurately assess treatment interactions. Markov chain Monte Carlo methods and the moment closure technique are used within a Bayesian framework to integrate over uncertainty associated with the unobserved cumulative population size and estimate the result twenty-eight parameters. We restrict attention to data on aphid counts in the Texas High Plains obtained for three different levels of irrigation water, nitrogen fertiliser and block, but note that the methods we develop can be applied to a wide range of problems in population ecology.
Particle MCMC for Stochastic Kinetic Models
Andrew Golightly, School of Mathematics & Statistics, Newcastle University
Presentation: http://mbi.osu.edu/2011/rasmaterials/AGmbi12.pdf

We consider the problem of performing Bayesian inference for the rate constants governing stochastic kinetic models. As well as considering inference for the resulting Markov jump process (MJP) we consider working with a diffusion approximation obtained by matching the infinitesimal mean and variance of the MJP to the drift and diffusion coefficients of a stochastic differential equation (SDE). We sample from the posterior distribution of the model parameters given observations at discrete times via recently proposed particle MCMC methods. In the case of the diffusion approximation we increase the efficiency of the inference algorithm by exploiting the structure of the SDE. We present results from two toy examples: a Lotka-Volterra system and a simple model of prokaryotic autoregulation.
Dynamical Models of Periodic Processes
Normal forms of dynamical systems can be used as nonlinear models for time series data. This talk describes the use of Floquet theory as a normal form for a stable periodic orbit. Primary goals of this joint research with Shai Revzen are to estimate the spectrum of Lyapunov exponents and to develop reduced order models based upon weakly stable modes of the system. The methods have been applied to animal locomotion, notably data from running cockroaches.
Inference for partially observed stochastic dynamic system
Ed Ionides, Statistics, University of Michigan
Presentation: http://mbi.osu.edu/2011/rasmaterials/mbi12_ionides.pdf

Characteristic features of biological dynamic systems include stochasticity, nonlinearity, measurement error, unobserved variables, unknown system parameters, and even unknown system mechanisms. I will consider the resulting inferential challenges, with particular reference to pathogen/host systems (i.e., disease transmission). I will focus on statistical inference methodology which is based on simulations from a numerical model; such methodology is said to have the plug-and-play property. Plug-and-play methodology frees the modeler from an obligation to work with models for which transition probabilities are analytically tractable. A recent advance in plug-and-play likelihood-based inference for general partially observed Markov process models has been provided by the iterated filtering algorithm. I will discuss the theory and practice of iterated filtering.
Likelihood-based observability analysis and confidence intervals for model predictions
Dynamic models of biochemical networks contain unknown parameters like the reaction rates and the initial concentrations of the compounds. The large number of parameters as well as their nonlinear impact on the model responses hampers the determination of confidence regions for parameter estimates. At the same time, classical approaches translating the uncertainty of the parameters into confidence intervals for model predictions are hardly feasible.

We present the so-called prediction profile likelihood which is utilized to generate reliable confidence intervals for model predictions. The prediction confidence intervals of the dynamic states are exploited for a data-based observability analysis. Moreover, a validation profile likelihood is introduced that can be applied when noisy validation experiments are judged.

The presented approaches are also applicable if there are non-identifiable parameters. Such ambiguities yield insufficiently specified model predictions that can be interpreted as nonobservability. The properties and applicability the approach are demonstrated by two examples, a small but instructive ODE model, and a model for the MAP kinase signal transduction pathway.

Work done in collaboration with A. Raue and J. Timmer. This project has been funded by the BMBF grants VirtualLiver 0315766 and FRISYS 0313921.
Errors in variables models: Diagnosing parameter estimability and MCMC convergence using empirical characteristic functions
Most ecological models are constructed to understand the relationship between environmental variables and an ecological response, be it site occupancy or population abundance or changes to them. The usual regression models take into account the environmental variation in the response but in many cases, the measurement of the environmental variables themselves are made with error. This is called an errors-in-variables model. Measurement error in the covariates leads to substantial issues with parameter estimability and likelihood-based inference is computationally challenging. Bayesian inference using Markov Chain Monte Carlo methods also runs into trouble because of the convergence issues with the MCMC algorithm. These convergence issues are severe especially with non-informative priors.

Errors in variables models, linear and non-linear, can be formulated as hierarchical models. Data cloning is a recently developed computational technique to conduct likelihood- based analysis for general hierarchical models. In this paper, we show that data cloning coupled with informative priors can circumvent the convergence issues with MCMC. We develop a new testing procedure to compare multivariate distributions using empirical characteristic functions and show its usefulness in diagnosing convergence of MCMC algorithm in these tricky situations. More importantly, we show that data cloning not only facilitates parameter estimation but also diagnosing which parameters are estimable and which ones are not. This is essential for drawing scientifically meaningful inferences. We illustrate the method using various linear and non-linear regression models useful in ecology. We report a somewhat surprising result that a widely used population dynamics model, the Hasell model, is non-identifiable but a closely related Generalized Beverton-Holt model is identifiable.

Work done in collaboration with Khurram Nadeem.
Inferring transcriptional and microRNA-mediated regulatory programs in cancer
Large-scale cancer genomics projects are profiling hundreds of tumors at multiple molecular layers, including copy number, mRNA and miRNA expression, but the mechanistic relationships between these layers are often excluded from computational models. We developed a sparse regression framework for integrating molecular profiles with regulatory elements to reveal reveal mechanisms of dysregulation of gene expression in cancer, including miRNA-mediated expression changes. We applied our approach to 320 glioblastoma tumors and identified key miRNAs and transcription factors as common or subtype-specific regulators. We confirmed that target gene expression signatures for proneural subtype regulators were consistent with in vivo expression changes in a relevant mouse model. We tested two predicted proneural drivers, miR-124 and miR-132, both underexpressed in proneural tumors, by overexpression in neurospheres and observed a partial reversal of corresponding tumor expression changes. Computationally dissecting the role of miRNAs in cancer may ultimately lead to small RNA therapeutics tailored to subtype or individual.
Temporal inhomogeneity and dependence of brain networks
Electroencephalography (EEG) measurements yield information about electrical activity of the brain, via measured voltage fluctuations. Measurements are normally made at several locations on the scalp, and the network of activity is inferred from analysing multiple time series, that are neither linear nor stationary. We shall discuss how modern time series analysis methods can be adapted and innovated to model and make inferences of the complicated joint time-varying properties of such observations.

This is joint work with Maria Fitzgerald (UCL) and Hernando Ombao (Brown).
Summary statistics for ABC model choice
Dennis Prangle, Mathematics & Statistics, Lancaster University
Presentation: http://mbi.osu.edu/2011/rasmaterials/MBI_DennisPrangle.pdf

ABC is a powerful method for inference of statistical models with intractable likelihoods. Recently there has been much interest in using ABC for model choice and concerns have been raised that the results are not robust to summary statistic choice. We propose a method of choosing useful summary statistics and apply the method to population genetic and epidemiological examples.
Inferring the gap between mechanism and phenotype in dynamical models of gene regulation
Dynamical (differential equation) models in molecular biology are often cast in terms of biological mechanisms such as transcription, translation and protein-protein and protein-DNA interactions. However, most molecular biological measurements are at the phenotypic level, such as levels of gene or protein expression in wild type and chemically or genetically perturbed systems. Mechanistic parameters are often difficult or impossible to measure. We have been combining dynamical models with statistical inference as a means to integrate phenotypic data with mechanistic hypotheses. In doing so we are able to identify key parameters that determine system behaviour, and parameters with insufficient evidence to estimate, and thus make informed predictions for further experimental work. We are also able to use inferred parameters to build stochastic and multi-scale models to investigate behaviour at single-cell level. We apply these ideas to two systems in microbiology: global gene regulation in the antibiotic-resistance bearing RK2 plasmids, and zinc uptake and efflux regulation in Escherichia coli.
Parameter estimation for bursting neural models
Bursting is a ubiquitous phenomenon in neuroscience which involves multiple time scales (fast spikes vs. long quiescent intervals). Parameter estimation for bursting models is difficult due to these multiple scales. I will describe an approach to parameter estimation for these models which utilizes the geometry underlying bursting. This is joint work with John Guckenheimer.
Modeling and inference framework for studying noise and cell-to-cell variability in synthetic biology
Synthetic biology faces many challenges in its bold attempt to construct functioning and programmable devices, circuits, and higher constructs. Chief amongst these may be selecting from multiple design possibilities given partially characterized biological systems, effects of biological noise and cell-to-cell variability. Here we report progress on developing modeling and inference techniques that incorporate noise and cell-to-cell variability to guide design principles in synthetic biology.
Sloppy Models, Information Geometry, and Data Fitting
Parameter estimation by nonlinear least squares minimization is a ubiquitous problem that has an elegant geometric interpretation: all possible parameter values induce a manifold embedded within the space of data. The minimization problem is then to find the point on the manifold closest to the data. By interpreting nonlinear models as a generalized interpolation scheme, we find that the manifolds of many models, known as sloppy models, have boundaries and that their widths form a hierarchy. We describe this universal structure as a hyper-ribbon. The hyper-ribbon structure explains many of the difficulties associated with fitting nonlinear models and suggests improvements to standard algorithms. We add a "geodesic acceleration" correction to the standard Levenberg-Marquardt algorithm and observe a dramatic increase in success rate and convergence speed on many fitting problems.
Statistical Methods for High-Dimensional ODE Models for Dynamic Gene Regulatory Networks
Gene regulation is a complicated process. The interaction of many genes and their products forms an intricate biological network. Identification of this dynamic network will help us understand the biological process in a systematic way. However, the construction of such a dynamic network is very challenging for a high-dimensional system. We propose to use a set of ordinary differential equations (ODE), coupled with dimensional reduction by clustering and mixed-effects modeling techniques, to model the dynamic gene regulatory network (GRN). The ODE models allow us to quantify both positive and negative gene regulations as well as feedback effects of one set of genes in a functional module on the dynamic expression changes of the genes in another functional module, which results in a directed graph network. A five-step procedure, Clustering, Smoothing, regulation Identification, parameter Estimates refining and Function enrichment analysis (CSIEF) is developed to identify the ODE-based dynamic GRN. In the proposed CSIEF procedure, a series of cutting-edge statistical methods and techniques are employed. We apply the proposed method to identify the dynamic GRN for yeast cell cycle progression data and immune response to influenza infection. We are able to annotate the identified modules through function enrichment analyses. Some interesting biological findings are discussed. The proposed procedure is a promising tool for constructing a general dynamic GRN and more complicated dynamic networks.
video image

Bayesian inference for generalized stochastic population growth models with application to aphids
Colin Gillespie In this talk I will analyse the effects of various treatments on cotton aphids Aphis gossypii. The standard analysis of count data on cotton aphids determines parameter values by assuming a deterministic growth model and combines these with the corre

video image

Estimation of ordinary differential equations with orthogonal conditions
Nicolas Brunel Parameter inference of ordinary differential equations from noisy data can be seen as a nonlinear regression problem, within a parametric setting. The use of a classical statistical method such as Nonlinear Least Squares (NLS) gives rise to difficult

video image

Using Bayesian and MCMC approaches for parameter estimation and model evaluation in physiology
Carson Chow Presentation: http://mbi.osu.edu/2011/rasmaterials/mbibayes20121_chow.pdf
Differential equations are often used to model biological and physiological systems. An important and difficult problem is how to estimate parameters and decide which mo

video image

Inference for partially observed stochastic dynamic system
Ed Ionides Ed Ionides, Statistics, University of Michigan
Presentation: http://mbi.osu.edu/2011/rasmaterials/mbi12_ionides.pdf

Characteristic features of biological dynamic systems include stochasticity, nonlinearity, measurement error, un

video image

Inference in Mixed-Effects (and other) Models Through Profiling the Objective
Douglas Bates Douglas Bates, Department of Statistics, University of Wisconsin - Madison
Presentation (slides version): http://mbi.osu.edu/2011/rasmaterials/ProfilingD.pdf
Presentation (notes version): http://mbi.osu.edu/2011/rasmaterials/ProfilingN.

video image

Errors in variables models: Diagnosing parameter estimability and MCMC convergence using empirical characteristic functions
Subhash Lele Most ecological models are constructed to understand the relationship between environmental variables and an ecological response, be it site occupancy or population abundance or changes to them. The usual regression models take into account the envir

video image

Particle MCMC for Stochastic Kinetic Models
Andrew Golightly Andrew Golightly, School of Mathematics & Statistics, Newcastle University
Presentation: http://mbi.osu.edu/2011/rasmaterials/AGmbi12.pdf

We consider the problem of performing Bayesian inference for the rate constants govern

video image

Summary statistics for ABC model choice
Dennis Prangle Dennis Prangle, Mathematics & Statistics, Lancaster University
Presentation: http://mbi.osu.edu/2011/rasmaterials/MBI_DennisPrangle.pdf

ABC is a powerful method for inference of statistical models with intractable likelihood

video image

Modeling and inference for gene expression time series data (an overview)
Barbel Finkenstadt A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. Here we present an overview of the modeling approaches based on stochastic population dynamic models and th

video image

Likelihood-based observability analysis and confidence intervals for model predictions
Clemens Kreutz Dynamic models of biochemical networks contain unknown parameters like the reaction rates and the initial concentrations of the compounds. The large number of parameters as well as their nonlinear impact on the model responses hampers the determinati

video image

Parameter estimation for bursting neural models
Joe Tien Bursting is a ubiquitous phenomenon in neuroscience which involves multiple time scales (fast spikes vs. long quiescent intervals). Parameter estimation for bursting models is difficult due to these multiple scales. I will describe an approach to par