The workshop in phylogeography and phylogenetics will focus on the maturation of quantitative techniques that need to occur in these fields. Analytical development is a challenge for researchers seeking clear and unambiguous inferences because both fields use complicated multiparameterized models. A given pattern of genetic diversity between and among species or populations can usually be explained and produced by different scenarios. Maturation of phylogenetic methodologies will be critical if we hope to study such things as the tree of life, linking phenotypic and historical evolution, ancestral character state reconstruction, viral evolution, and the evolution of regulation in protein expression. Likewise, solving the analytical and computational challenges necessary for phylogeographic inferences will be critical for studying dispersal distances, mating systems, sex-biased dispersal, pathogen history, speciation, selection, local adaptation, hybridization, community history, food web stability, the origin of human pathogens, and the evolutionary history of humans.
|Saturday, November 26, 2005|
|Sunday, November 27, 2005|
|Monday, November 28, 2005|
|Tuesday, November 29, 2005|
|Wednesday, November 30, 2005|
|Allman, Elizabethfirstname.lastname@example.org||Department of Mathematics & Statistics, University of Southern Maine|
|Baird, Stuartemail@example.com||Campus International de Baillarguet, Centre de Biologie et de Gestion des Populations|
|Barrett, Craigfirstname.lastname@example.org||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Beaumont, Markemail@example.com||School of Animal & Microbial Sciences, University of Reading|
|Beerli, Peterfirstname.lastname@example.org||Computational Evolutionary Biology Group, Florida State University|
|Belfiore, Nataliaemail@example.com||Museum of Vertebrate Zoology, University of California, Berkeley|
|Best, Janetfirstname.lastname@example.org||Mathematics, The Ohio State University|
|Brandley, Matthewemail@example.com||Museum of Vertebrate Zoology, University of California, Berkeley|
|Calvino, Carolinafirstname.lastname@example.org||Department of Plant Biology, University of California, Berkeley|
|Cannon, Chuckemail@example.com||Department of Biological Sciences, Texas Tech University|
|Carnaval, Ana Carolinafirstname.lastname@example.org||Museum of Vertebrate Zoology, University of California, Berkeley|
|Carstens, Bryanemail@example.com||Ecology and Evolutionary Biology, University of Michigan|
|Corey, Sarah Jeanfirstname.lastname@example.org||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Deckelman, Stevenemail@example.com||Mathematical Biosciences Institute, The Ohio State University|
|Degnan, Jamesfirstname.lastname@example.org||Department of Biostatistics, Harvard University|
|Edwards, Scottemail@example.com||Organismic & Evolutionary Biology, Harvard University|
|Enciso, German||German_Enciso@hms.harvard.edu||Mathematics Department, University of California, Irvine|
|Farrington, Heatherfirstname.lastname@example.org||Department of Biological Sciences, University of Cincinnati|
|Fuchs de Jesus, Flavia||Genetica & Evolucao, State University of Campinas (UNICAMP)|
|Galovich, Jenniferemail@example.com||Mathematics and Statistics, St. John's University|
|Goel, Pranayfirstname.lastname@example.org||NIDDK, Indian Institute of Science Education and Research|
|Grajdeanu, Paulaemail@example.com||Mathematics, Shenandoah University|
|Griffiths, Robertfirstname.lastname@example.org||Department of Statistics, University of Oxford|
|Hickerson, Michael||mhick@socrates.Berkeley.edu||Integrative Biology, University of California, Berkeley|
|Holmes, Susanemail@example.com||Department of Statistics, Stanford University|
|Huelsenbeck, Johnfirstname.lastname@example.org||Division of Biological Sciences, University of California, San Diego|
|Jolles, Dianaemail@example.com||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Just, Winfriedfirstname.lastname@example.org||Mathematical Biosciences Institute, The Ohio State University|
|Juswara, Linaemail@example.com||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|King, Nicolefirstname.lastname@example.org||Department of Molecular & Cell Biology, University of California, Berkeley|
|Knowles, Laceyemail@example.com||Department of Ecology & Evolutionary Biology, University of Michigan|
|Kuhner, Maryfirstname.lastname@example.org||Department of Genome Sciences, University of Washington|
|Larget, Bretemail@example.com,||Department of Botany, University of Wisconsin|
|Lim, Sookkyungfirstname.lastname@example.org||Department of Mathematical Sciences, University of Cincinnati|
|Liu, Liang||LiuLiang@stat.ohio-state.edu||Department of Statistics, The Ohio State University|
|Marschall, Elizabethemail@example.com||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Martin, Floydfirstname.lastname@example.org||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Mateiu, Ligia||Medical Genetics, University of Alberta|
|Mateiu, Ligia||Medical Genetics, University of Alberta|
|Mateiu, Ligia||Medical Genetics, University of Alberta|
|McLachlan, Jasonemail@example.com||Harvard Forest, Harvard University|
|Moritz, Craigfirstname.lastname@example.org||Integrative Biology, University of California, Berkeley|
|Niedzwiecki, John||niedzwjh@UCMAIL.UC.EDU||Department of Biological Sciences, University of Cincinnati|
|Oakley, Toddemail@example.com||Ecology, Evolution, & Marine Biology, University of California, Santa Barbara|
|Pan, Xueliang (Jeff)||firstname.lastname@example.org||Department of Statistics, The Ohio State University|
|Pearl, Dennisemail@example.com||Department of Statistics, The Ohio State University|
|Petren, Kennethfirstname.lastname@example.org||Department of Biological Sciences, University of Cincinnati|
|Pol, Diegoemail@example.com||Independent Researcher, Museo Paleontologico E. Feruglio|
|Pollack, D. Dennisfirstname.lastname@example.org||Molecular Virology, Immunology, & Medical Genetics, The Ohio State University|
|Porter, Masonemail@example.com||Department of Physics, California Institute of Technology|
|Randle, Chrisfirstname.lastname@example.org||Ecology and Evolutionary Biology, University of Kansas|
|Rhodes, Johnemail@example.com||Department of Mathematics, Bates College|
|Rokas, Antonisfirstname.lastname@example.org||HHMI and Laboratory of Molecular Biology, University of Wisconsin|
|Rosenberg, Noahemail@example.com||College of Biological Sciences, University of Southern California|
|Russell, Amyfirstname.lastname@example.org||Arizona Research Laboratories, University of Arizona|
|Salter Kubatko, Lauraemail@example.com||Department of Statistics, University of New Mexico|
|Schugart, Richardfirstname.lastname@example.org||Department of Mathematics, Western Kentucky University|
|Srinivasan, Parthasarathyemail@example.com||Department of Mathematics, Cleveland State University|
|Stahl, Elifirstname.lastname@example.org||Department of Biology, University of Massachusetts|
|Steel, Mike||Math and Statistics, University of Canterbury|
|Stigler, Brandilynemail@example.com||Department of Mathematics, Southern Methodist University|
|Stubna, Michaelfirstname.lastname@example.org||Engineering Team Leader, Pulsar Informatics|
|Suchard, Marcemail@example.com||Biomathematics and Human Genetics, University of California, Los Angeles|
|Tay, Davidfirstname.lastname@example.org||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Taylor, Ameliaemail@example.com||Department of Math, St. Olaf College|
|Terman, Davidfirstname.lastname@example.org||Mathemathics Department, The Ohio State University|
|Thornton, Kevinemail@example.com||Molecular Biology and Genetics, Cornell University|
|Tian, Jianjun (Paul)||firstname.lastname@example.org||Mathematical Biosciences Institute, The Ohio State University|
|Vakalis, Ignatiosemail@example.com||Mathematics & Computer Sc, Capital University|
|Wang, Zailongfirstname.lastname@example.org||Integrated Information Sciences, Novartis|
|Warnow, Tandyemail@example.com||Department of Computer Sciences, University of Texas|
|Webb, Campbellfirstname.lastname@example.org||Ecology and Evolutionary Biology, Yale University|
|Williams, Josephemail@example.com||Evolution, Ecology, and Organismal Biology, The Ohio State University|
|Yoder, Annefirstname.lastname@example.org||Ecology and Evolutionary Biology, Duke University|
|Yoshida, Rurikoemail@example.com||Department of Mathematics, Duke University|
|Zhou, Jinfirstname.lastname@example.org||Department of Mathematics, Northern Michigan University|
This talk will highlight recent developments in the study of phylogenetic invariants. In particular, assuming a general model of the mutation process of orthologous sequences, 'most' polynomial relationships in expected pattern frequencies can be explicitly constructed. These constructions are tied to specific topological features (edges and nodes) of a phylogenetic tree. This new understanding of invariants leads to theoretical results on the identifiability of the tree topology for models with increased biological realism, such as the covarion model and certain mixture models.
Wright's neighborhood size can be seen as a statement about the probability of coalescence of lineages integrated over space. Moving backwards in time neighbourhood size increases and the probability of coalescence decreases. As such Wright's neighborhood model could potentially be used for coalescent inference over structured populations parameterised by parent-offspring dispersal and population density. This is in contrast to models parameterised by the size of panmictic units and migration vectors between them. If we wish to use coalescent inference over a study system, and lack prior knowledge of the scale at which panmixis can be assumed, Wright's neighborhood model seems appropriate. Here I show how Wright's neighborhood model can be implemented on a lattice, allowing sampling of the properties of genealogies in space and time for a set of georeferenced field observations. I contrast two sampling approaches that allow Bayesian inference over these genealogies and discuss the implications for inference over recent timescales (geneflow, population structure) and deeper timescales (phylogeographic process).
I describe a Bayesian method that uses summary statistics measured from microsatellite loci to make inferences about demographic parameters in 2- and 3-population models. Preliminary results with an infinite sites model of sequence evolution are also described. The method can be used to infer effective sizes of current and ancestral populations, immigration rates, splitting times and tree topology (in the 3-population case). A novel method for model selection is introduced. Comparisons are made with the IM program of Hey and Nielsen, and a data set of 19 microsatellite loci from Channel Island foxes is analysed. It is concluded that the method is competitive with IM on 2-population data. There appears to be little scope for accurate inference with microsatellite data unless very large numbers of loci are used.
The results of human action, from the scale of the climate to the niche, will dominate our evolutionary future. Meaningful ways of intersecting theoretical and empirical studies with conservation and management of our natural resources are important. Using tropical tree communities as an example, the application of phylogenetic and biogeographic evidence to mitigating some of this change will be discussed. Emergent questions, with implications for the utility of this data, will be explored. These questions have inspired the development of a DNA microarray based technique for gathering genomic samples of neutral variation in previously unstudied organisms. The approach, called Hyperdispersed Illiterate Primer Screening (HIPS), will be particularly effective for developing a database of genomic signatures that can allow phylogenetically scalable queries, virtual subtractive hybridizations, and the rapid development of simple downstream bioassays for screening large numbers of individuals.
The problem of inferring trees of closely related species from multilocus data sets suffers from a lack of robust implementations of existing theory and from lack of empirical data on which to help set priorities for new directions. We have been accumulating multilocus DNA sequence data sets of anonymous, noncoding regions of Australian songbird genomes to examine the historical demography of speciation and population structure. Using two data sets from northern Australia, one from grassfinches (Poephila) and one from treecreepers (Climacteris), I illustrate the potential of anonymous loci to provide a higher resolving power for current and ancestral population parameters than mitochondrial DNA, and for inferring relationships among closely related species when gene trees conflict with one another. However, our studies also pinpoint several gaps in existing software packages that prevent full exploration of the data. In particular our data reveals the need for an integrated approach to estimating the sequence of speciation events (species phylogeny) from multilocus data sets that does not require a priori assumptions. In addition, the data sets reveal a need for analyses of gene flow that can encompass more than one species even when there is no current gene flow between those species. These studies, like those in Drosophila and humans, show that even phylogeographic analyses focused on single species in general will require analysis of sequence data from multiple species, especially those that continue to share residual polymorphisms with the focal species, and will require implementations of theory that can accommodate multispecies data sets.
A unique gene tree describing the mutation history of a sample of DNA sequences can be constructed as a perfect phylogeny under an assumption of non-recurrent point mutations. An empirical distribution of the stochastic history of the gene tree, conditional on its topology, can be found by an advanced simulation technique of importance sampling on coalescent histories. The distribution of the time to the most recent common ancestor and ages of mutations in the gene tree, conditional on its topology, can be found from empirical distribution. This talk will present examples of ancestral inference from gene trees, microsatellite data, and sketch the importance sampling technique.
Conditioning out phylogenetic information in HIV sequences, we performed multivariate studies of eventual drug resistant mutations using multidimensionnal scaling and correspondence analyses methods, we propose several approaches to the problem of correlated variables in this context.
Most methods for detecting Darwinian natural selection at the molecular level rely on estimating the rates or numbers of nonsynonymous and synonymous changes in an alignment of protein- coding DNA sequences. In some of these methods, the nonsynonymous rate of substitution is allowed to vary across the sequence, permitting the identification of single amino-acid positions that are under positive natural selection. However, it is unclear which probability distribution should be used to describe how the nonsynonymous rate of substitution varies across the sequence. One widely used solution is to model variation in the nonsynonymous rate across the sequence as a mixture of several discrete or continuous probability distributions. Unfortunately, there is little population genetics theory to inform us of the appropriate probability distribution for among-site variation in the nonsynonymous rate of substitution. Here, we describe an approach to modeling variation in the nonsynonymous rate of substitution using a Dirichlet process mixture model. The Dirichlet process allows there to be a countably infinite number of nonsynonymous rate classes, and is very flexible in accommodating different potential distributions for the nonsynonymous rate of substitution. We implemented the model in a fully Bayesian approach, with all parameters of the model considered as random variables.
It is now well known that incomplete lineage sorting can cause serious difficulties for phylogenetic and phylogeographic inference. Yet, little attention has been paid to methods that attempt to overcome these difficulties by explicitly considering the processes that produce them. Here I explore approaches to historical inference designed to consider retention and sorting of ancestral polymorphism. I examine how the reconstructability of a species (or population) histories is affected by (a) the number of loci used to estimate the phylogeny and (b) the number of individuals sampled per species (or population). Even in difficult cases with considerable incomplete lineage sorting (divergences times separated by less than 1Ne generations), accurate historical reconstructions are possible, as long as a reasonable numbers of individuals and loci are sampled. Moreover, tradeoffs between sampling more loci versus more individuals shift depending on the depth of the species history under study. Taken together these results demonstrate that gene sequences retain enough signal to achieve an accurate estimate of history despite widespread incomplete lineage sorting. Continued methodological improvements for inference near the species level require not only a statistical framework for evaluating the likelihood of particular gene trees, but also a shift to compound models that consider the molecular evolutionary process of nucleotide substitutions, as well as the population genetics processes of lineage sorting.
The phylogenetic relationships among most metazoan phyla remain uncertain. Here, we obtained large numbers of gene sequences from metazoans, including key understudied taxa. Despite the amount of data and breadth of taxa analyzed, relationships among most metazoan phyla remained unresolved. In contrast, the same genes robustly resolved phylogenetic relationships within a major clade of Fungi of approximately the same age as the Metazoa. The differences in resolution within the two Kingdoms suggest that the early history of metazoans was a radiation compressed in time, in agreement with paleontological inferences. Furthermore, simulation analyses as well as studies of other radiations in deep time indicate that, given adequate sequence data, the lack of resolution in phylogenetic trees is a signature of closely spaced series of cladogenetic events.
Random models for species formation and loss have played an important role in evolutionary biology since Yule's pioneering work in the 1920s. More recently these models have been investigated for the light they shed on both the topological properties (shape, balance, clade distribution, discrete tree reconstuction, tree rooting) and metric properties (branch length distribution, phylogenetic diversity) of phylogenies. In this talk I describe how these models are relevant for tree reconstruction and rooting, and the distribution of clade sizes, as well as the loss (and optimization) of phylogenetic diversity as taxa go extinct. The talk will include some historical survey, as well as some recent (and new) results.
Genomics research is generating vast molecular sequence data ranging from single genes to whole genomes across an increasing number of species. However, a fundamental difficulty in evolutionary studies emerges as the availability of sequences expands. Phylogenetics methods to reconstruct the evolutionary tree relating the sequences traditionally condition on a single, sometimes poorly estimated sequence alignment, where an alignment specifies which residues in the sequences derive from a common origin. This conditioning can cause bias and inappropriate infer in genomic studies, particularly when the sequences are highly diverse. For example, the early branching-order of Bacteria, Archaea and Eukaryotes, the three major domains of life, is troublesome to determine.
As a solution, I describe a novel Bayesian model for simultaneously estimating alignments and the phylogenetic trees that relate the sequences. This sidesteps the bias issue inherent in sequential estimation. Joint estimation also allows one to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) in the sequences to group sister species in the tree. I base this indel process on a Hidden Markov Model that makes use of affine gap penalties and considers indels of multiple residues.
I develop a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. I describe a new MCMC transition kernel based on the Forward-Backward algorithm and a careful choice of parameter marginalization that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Finally, my software implementation can estimate alignment uncertainty and I describe a method for summarizing this uncertainty in a single plot.
Phylogenetic trees, also known as evolutionary trees, model the evolution of biological species or genes from a common ancestor. Most computational problems associated with phylogenetic tree reconstruction are very hard (specifically, they are NP-hard, and are practically hard, as real datasets can take years of analysis, without provably optimal solutions being found). Finding ways of speeding up the solutions to these problems is of major importance to systematic biologists. Other approaches take only polynomial time and have provable performance guarantees under Markov models of evolution; however, our recent work shows that the sequence lengths that suffice for these methods to be accurate with high probability grows exponentially in the diameter of the underlying tree.
In this talk, we will describe new dataset decomposition techniques, called the Disk-Covering Methods, for phylogenetic tree reconstruction. This basic algorithmic technique uses interesting graph theory, and can be used to reduce the sequence length requirement of polynomial time methods, so that polynomial length sequences suffice for accuracy with high probability (instead of exponential). We also use this technique to speed up the solution of NP-hard optimization problems, such as maximum likelihood and maximum parsimony.