|
|
Workshop 1 Titles and Abstracts
Author: Elizabeth S. Allman, University of
Alaska, Fairbanks / University of Southern Maine (Joint work with
J. Rhodes)
Title: Progress and potential for phylogenetic invariants
Presentation materials: PDF
Streaming Video: Real
Media
This talk will highlight recent developments in the study of phylogenetic
invariants. In particular, assuming a general model of the mutation
process of orthologous sequences, 'most' polynomial relationships
in expected pattern frequencies can be explicitly constructed. These
constructions are tied to specific topological features (edges and
nodes) of a phylogenetic tree. This new understanding of invariants
leads to theoretical results on the identifiability of the tree
topology for models with increased biological realism, such as the
covarion model and certain mixture models.
Author: Stuart Baird, CBGP, Montpellier
Title: A lattice implementation of Wright's neighborhood model
Presentation materials: PPT
Streaming Video: Real
Media
Wright's neighborhood size can be seen as a statement about the
probability of coalescence of lineages integrated over space. Moving
backwards in time neighbourhood size increases and the probability
of coalescence decreases. As such Wright's neighborhood model could
potentially be used for coalescent inference over structured populations
parameterised by parent-offspring dispersal and population density.
This is in contrast to models parameterised by the size of panmictic
units and migration vectors between them. If we wish to use coalescent
inference over a study system, and lack prior knowledge of the scale
at which panmixis can be assumed, Wright's neighborhood model seems
appropriate. Here I show how Wright's neighborhood model can be
implemented on a lattice, allowing sampling of the properties of
genealogies in space and time for a set of georeferenced field observations.
I contrast two sampling approaches that allow Bayesian inference
over these genealogies and discuss the implications for inference
over recent timescales (geneflow, population structure) and deeper
timescales (phylogeographic process).
Author: Mark A. Beaumont, School of Animal
and Microbial Sciences, The University of Reading, Whiteknights
Title: Joint determination of topology, time of splitting and immigration
in population trees
Presentation materials: PPT
Streaming Video: Real
Media
I describe a Bayesian method that uses summary statistics measured
from microsatellite loci to make inferences about demographic parameters
in 2- and 3-population models. Preliminary results with an infinite
sites model of sequence evolution are also described. The method
can be used to infer effective sizes of current and ancestral populations,
immigration rates, splitting times and tree topology (in the 3-population
case). A novel method for model selection is introduced. Comparisons
are made with the IM program of Hey and Nielsen, and a data set
of 19 microsatellite loci from Channel Island foxes is analysed.
It is concluded that the method is competitive with IM on 2-population
data. There appears to be little scope for accurate inference with
microsatellite data unless very large numbers of loci are used.
Author: Chuck Cannon, Texas Tech University
Title: Applying phylogenies to practical problems in SE Asia: data,
methods, and speculation
Presentation materials: PPT
Streaming Video: Real
Media
The results of human action, from the scale of the climate to the
niche, will dominate our evolutionary future. Meaningful ways of
intersecting theoretical and empirical studies with conservation
and management of our natural resources are important. Using tropical
tree communities as an example, the application of phylogenetic
and biogeographic evidence to mitigating some of this change will
be discussed. Emergent questions, with implications for the utility
of this data, will be explored. These questions have inspired the
development of a DNA microarray based technique for gathering genomic
samples of neutral variation in previously unstudied organisms.
The approach, called Hyperdispersed Illiterate Primer Screening
(HIPS), will be particularly effective for developing a database
of genomic signatures that can allow phylogenetically scalable queries,
virtual subtractive hybridizations, and the rapid development of
simple downstream bioassays for screening large numbers of individuals.
Authors: Bryan C. Carstens and L. Lacey Knowles,
Department of Ecology & Evolutionary Biology, Museum of Zoology,
University of Michigan
Title: What multilocus data reveal about estimating species divergence
in Melanoplus grasshoppers: effects of taxon sampling
Streaming Video: Real
Media
Speciation within the western Melanoplus grasshoppers has probably
been influenced by Pleistocene climate change. In order to test
models of speciation in this clade, we used the isolation-with-migration
model to estimate the species divergence between Melanoplus montanus
and M. oregonesis. We created a genomic library, gathered sequence
data from four anonymous nuclear loci and the mitochondrial cytochrome
oxidase I gene, and then estimated population divergence at approximately
200,000 years before present. However, due to widespread incomplete
lineage sorting in western Melanoplus, we can not be confident that
M. montanus and M. oregonesis are sister taxa. In order to explore
the potential bias introduced by incomplete taxon sampling, we conducted
a simulation study. It suggested that incomplete taxon sampling
can cause the ancestral effective population size to be underestimated
and population divergence to be overestimated, particularly as the
actual population divergence increases.
Author: Scott Edwards, Organismic and Evolutionary
Biology, Harvard University
Title: From gene trees to species trees: empirical data sets from
birds and priorities for new implementations of theory
Presentation materials: PPT
Streaming Video: Real
Media
The problem of inferring trees of closely related species from
multilocus data sets suffers from a lack of robust implementations
of existing theory and from lack of empirical data on which to help
set priorities for new directions. We have been accumulating multilocus
DNA sequence data sets of anonymous, noncoding regions of Australian
songbird genomes to examine the historical demography of speciation
and population structure. Using two data sets from northern Australia,
one from grassfinches (Poephila) and one from treecreepers (Climacteris),
I illustrate the potential of anonymous loci to provide a higher
resolving power for current and ancestral population parameters
than mitochondrial DNA, and for inferring relationships among closely
related species when gene trees conflict with one another. However,
our studies also pinpoint several gaps in existing software packages
that prevent full exploration of the data. In particular our data
reveals the need for an integrated approach to estimating the sequence
of speciation events (species phylogeny) from multilocus data sets
that does not require a priori assumptions. In addition, the data
sets reveal a need for analyses of gene flow that can encompass
more than one species even when there is no current gene flow between
those species. These studies, like those in Drosophila and humans,
show that even phylogeographic analyses focused on single species
in general will require analysis of sequence data from multiple
species, especially those that continue to share residual polymorphisms
with the focal species, and will require implementations of theory
that can accommodate multispecies data sets.
Author: Bob Griffiths, Department of Statistics,
University of Oxford
Title: Ancestral inference from gene trees
Presentation materials: PDF
Streaming Video: Real
Media
A unique gene tree describing the mutation history of a sample
of DNA sequences can be constructed as a perfect phylogeny under
an assumption of non-recurrent point mutations. An empirical distribution
of the stochastic history of the gene tree, conditional on its topology,
can be found by an advanced simulation technique of importance sampling
on coalescent histories. The distribution of the time to the most
recent common ancestor and ages of mutations in the gene tree, conditional
on its topology, can be found from empirical distribution. This
talk will present examples of ancestral inference from gene trees,
microsatellite data, and sketch the importance sampling technique.
Author: Mike Hickerson1, Eli Stahl2,
and Harilaos Lessios3.
1University of California, Berkeley,
2University of Massachusetts-Dartmouth ,3Smithsonian
Tropical Research Institute, Replublic of Panama
Title: Pipeline for comparative phylogeographic
inference using approximate Bayesian computation (ABC)
Streaming Video: Real
Media
When geological data suggests that co-distributed taxon-pairs arose
simultaneously from allopatric divergence across an emergent biogeographic
barrier, often one finds elevated variability in genetic divergences
across phylogeographic datasets. Assuming there are no undetected
extinctions or large variation in mutation rates, such disparity
in genetic divergences often leads to ecologically deterministic
explanations for the variance in divergence times without accounting
for mutational and coalescent variance. To test for simultaneous
divergence across phylogeographic datasets while accounting for
variability associated with such stochastic processes, we combine
Beaumont's flexible approximate Bayesian computational (ABC) framework
and a finite-sites version of Hudson's coalescent simulator. This
highly parameterized framework is extensively tested across a range
of conditions and is shown to be somewhat accurate with only single
locus mitochondrial data. We use this method to reject a history
of simultaneous vicariance in eight taxon-pairs of tropical sea
urchins thought to have arisen by the rise of the Panamanian Isthmus
~3.1 Mya, with only 3 % of the posterior density signifying a history
of simultaneous vicariance. By simulation, two of the taxon-pairs
are suggested to be outliers, and after their removal, the posterior
density suggests a history of concordance in divergence times resulting
in an estimate of the CO1 mtDNA mutation rate being 1.07% per million
years.
Author: Susan Holmes, Department of Statistics,
Stanford University (Joint work with Daniel Ford, Robert Shafer
and Soo-Yon Rhee)
Title: Using Multivariate and Phylogenetic decompositions in the
search for Drug Resistant Mutations in HIV
Streaming Video: Real
Media
Conditioning out phylogenetic information in HIV sequences, we
performed multivariate studies of eventual drug resistant mutations
using multidimensionnal scaling and correspondence analyses methods,
we propose several approaches to the problem of correlated variables
in this context.
Author: John Huelsenbeck, Division of Biological
Sciences, University of California, San Diego
Title: Detecting positive natural selection in protein-coding DNA
under a Dirichlet process prior
Most methods for detecting Darwinian natural selection at the molecular
level rely on estimating the rates or numbers of nonsynonymous and
synonymous changes in an alignment of protein- coding DNA sequences.
In some of these methods, the nonsynonymous rate of substitution
is allowed to vary across the sequence, permitting the identification
of single amino-acid positions that are under positive natural selection.
However, it is unclear which probability distribution should be
used to describe how the nonsynonymous rate of substitution varies
across the sequence. One widely used solution is to model variation
in the nonsynonymous rate across the sequence as a mixture of several
discrete or continuous probability distributions. Unfortunately,
there is little population genetics theory to inform us of the appropriate
probability distribution for among-site variation in the nonsynonymous
rate of substitution. Here, we describe an approach to modeling
variation in the nonsynonymous rate of substitution using a Dirichlet
process mixture model. The Dirichlet process allows there to be
a countably infinite number of nonsynonymous rate classes, and is
very flexible in accommodating different potential distributions
for the nonsynonymous rate of substitution. We implemented the model
in a fully Bayesian approach, with all parameters of the model considered
as random variables.
Authors: Flavia F. Jesus*, Vera N. Solferini*,
Jon F.Wilkins** & John Wakeley**
* DGE, IB, UNICAMP (Brazil); ** Harvard University
Title: Genetic consequences of glacial cycles: a coalescent approach
Streaming Video: Real
Media
The climatic cycles of the Quaternary have influenced the distribution
of many species. Despite large amounts of data available there is
still a need for theoretical studies to help understand the genetic
consequences of those cycles. We have used a coalescent approach
to examine some of these consequences. We have modeled demographic
history as two alternating phases corresponding to glacial and interglacial
periods. An island model was assumed for both phases, and the transition
between them happened in one generation. The number of demes, deme
size and migration rate were allowed to differ between the two phases.
We have examined the effects of cyclic changes of both population
size and structure. For a sample of two alleles from the same or
separate demes we have obtained the distribution of coalescence
times and its mean as functions of these population parameters and
the duration of each type of phase. Deme size and migration rate
were kept smaller in the glacial phases, but the number of demes
could be larger - as when glaciations cause fragmentation. Reduced
deme size during glacials produced peaks in the distribution of
coalescence times and made the mean times small in some cases, whereas
the effect of increased structure during these periods - e.g. from
a smaller migration rate -attenuated these peaks and stretched the
genealogy. These results are in accordance with inferences of long
genealogies for many species, predating the last glaciation. This
approach may help further our understanding of the genetic consequences
of climatic cycles.
Author: Lacey Knowles, Department of Ecology
and Evolutionary Biology, Museum of Zoology, University of Michigan
Title: Inferring species histories despite incomplete lineage sorting
Presentation materials: PPT
Streaming Video: Real
Media
It is now well known that incomplete lineage sorting can cause
serious difficulties for phylogenetic and phylogeographic inference.
Yet, little attention has been paid to methods that attempt to overcome
these difficulties by explicitly considering the processes that
produce them. Here I explore approaches to historical inference
designed to consider retention and sorting of ancestral polymorphism.
I examine how the reconstructability of a species (or population)
histories is affected by (a) the number of loci used to estimate
the phylogeny and (b) the number of individuals sampled per species
(or population). Even in difficult cases with considerable incomplete
lineage sorting (divergences times separated by less than 1Ne generations),
accurate historical reconstructions are possible, as long as a reasonable
numbers of individuals and loci are sampled. Moreover, tradeoffs
between sampling more loci versus more individuals shift depending
on the depth of the species history under study. Taken together
these results demonstrate that gene sequences retain enough signal
to achieve an accurate estimate of history despite widespread incomplete
lineage sorting. Continued methodological improvements for inference
near the species level require not only a statistical framework
for evaluating the likelihood of particular gene trees, but also
a shift to compound models that consider the molecular evolutionary
process of nucleotide substitutions, as well as the population genetics
processes of lineage sorting.
Authors: Liang Liu and Dennis Pearl, Department
of Statistics, The Ohio State University
Title: Reconstructing posterior distributions of a species phylogeny
using estimated gene tree distributions
Streaming Video: Real
Media
It has been known that gene trees need not agree with species trees,
because of deep coalescence, gene duplication and loss and horizontal
transfer. To reconcile the difference and estimate the species tree
has been a hot spot in the field of evolutionary molecular biology.
Here, we propose a Bayesian method to estimate the phylogenetic
tree of a group of species using multiple estimated gene tree distributions
such as those that arise in a Bayesian analysis of DNA sequence
data. It is assumed that DNA sequences are conditional independent
of species tree given gene tree. The whole process can be represented
as a 2-step Markov chain, from species tree to gene tree and from
gene tree to DNA sequences. The first step can be explained by coalescent
theory. The process of the second step follows an evolutionary model
at the molecular level. For each group of DNA sequences, MrBayes
is used together with an importance sampling technique to sample
gene trees from the posterior distribution P(gene tree | DNA sequences).
Given those gene trees, species trees are sampled from its posterior
distribution P(species tree | gene tree). The species trees generated
follows the posterior distribution of P(species tree | DNA sequences).
The posterior probability of parameters such as population sizes
of ancestral species are estimated as well. Multiple chains are
used to monitor the convergence. The time consumed is linear with
the number of genes. The method is applied to analyze both simulated
and experimental DNA sequence data.
Author: Ligia Mateiu, Department of Medical
Genetics, University of Alberta
Title: Inferring Complex DNA Substitution Processes on Phylogenies
Using Uniformization and Data Augmentation
Presentation materials: PDF
Streaming Video: Real
Media
Authors: C. Moritz1, C. Graham2,
S. Williams3, J. MacKenzie1, M. Hickerson1,
G. Dolman4, A. Moussalli4, C. Hoskin4,
A. Hugall4 K. Bell4, M. Tonione1,
and A. Carnaval1
1MVZ, UC Berkeley, 2SUNY
Stonybrook, 3Dept. Tropical Biology, James Cook Univ.,
4School of Integrative Biology, University of Queensland.
Streaming Video: Real
Media
Evolutionary biogeography seeks to determine the effects of fluctuations
in habitat distribution and occupancy on properties of biological
diversity (community, species and genetic), as mediated by spatio-temporal
patterns of speciation, local extinction and dispersal/colonization.
Estimation of population responses to past climate &/or geological
change is of central importance and is most powerful when combined
with independent evidence (or hypotheses) about past habitat distributions.
We present a rich empirical case study concerning the fauna of rainforests
from north-east Australia. Previously, we have developed comparative
mtDNA phylogeographies and qualitatively compared the results to
models of species or rainforest distributions under paleoclimates
inferred from the cool-dry conditions of the Last Glacial Maximum,
though cool-wet (8-6 Kya) and warm-wet (5-3 Kya) phases of the Holocene
to the present. These models and molecular data suggest considerable
variation in the temporal and spatial scale of species responses
to rainforest fluctuation, but are limited in that only single locus
has been examined (albeit for many species) and by the use of naivity
distribution models. We are now generating multi-locus data to improve
precision of parameter estimates and initial studies reveal both
the promise of such data, but also significant limitations of current
analytical methods. The greatest challenge is to incorporate spatially
explicit hypotheses about habitat fluctuation directly into parameter
estimation (see also talk by S. Baird; poster by M. Hickerson).
Authors: Jeff Pan1, Dennis K. Pearl1,
and J. Dennis Pollack2; Ohio State University (Departments
of 1Statistics and 2Molecular Virology, Immunology
and Medical Genetics)
Title: Should phylogenetic reconstructions use an amino acid substitution
model that allows for rate variation to depend on spatial location?
Recent studies on the conserved amino acid regions of phosphoglycerate
kinase (PGK) and other examples with known crystallographic structure,
suggest that the rate of their evolution may depend on their location
in the molecule. PGK is a highly conserved enzyme central to the
process of fermentation that is found in all Kingdoms in nature.
While the most conserved amino acids tend to be near the functional
core of the molecule, the least conserved amino acids are mainly
found on the periphery. In this study, we examine the relationship
between the rate variation of each amino acid site and its distance
from its associated metal ion ligand (Mg2+). Based on these results,
a refined model of molecular evolution incorporating rate variation
at each amino acid site that is dependent on the Euclidean distance
to the metal ligand is proposed.
Author: Antonis Rokas, Laboratory of Molecular
Biology, R. M. Bock Labs, University of Wisconsin - Madison
Title: Animal Evolution and the Molecular Signature of Radiations
Compressed in Time Presentation materials: PPT
Streaming Video: Real
Media
The phylogenetic relationships among most metazoan phyla remain
uncertain. Here, we obtained large numbers of gene sequences from
metazoans, including key understudied taxa. Despite the amount of
data and breadth of taxa analyzed, relationships among most metazoan
phyla remained unresolved. In contrast, the same genes robustly
resolved phylogenetic relationships within a major clade of Fungi
of approximately the same age as the Metazoa. The differences in
resolution within the two Kingdoms suggest that the early history
of metazoans was a radiation compressed in time, in agreement with
paleontological inferences. Furthermore, simulation analyses as
well as studies of other radiations in deep time indicate that,
given adequate sequence data, the lack of resolution in phylogenetic
trees is a signature of closely spaced series of cladogenetic events.
Authors: Amy L. Russell1, Steven
M. Goodman2, Anne D. Yoder3
1University of Arizona, 2Field Museum of Natural
History, 3Duke University
Title: An integrative approach to historical biogeography: picking
up where phylogenetics leaves off
Presentation materials: PDF
Streaming Video: Real Media
New methods for applying genetic data to questions of historical
biogeography have revolutionized our understanding of how organisms
have moved around the planet to occupy their present distributions.
Increasingly sophisticated phylogenetic methods, especially in combination
with divergence time estimation, can reveal biogeographic centers
of origin, differentiate between hypotheses of vicariance and dispersal,
and in the latter case, reveal the directionality of dispersal events.
Despite their power, however, phylogenetic methods often yield patterns
that are compatible with multiple equally well-supported biogeographic
hypotheses. We describe a multi-disciplinary approach to this problem,
using a combination of coalescent, population genetic, and ecological
analyses to discriminate among multiple phylogenetically well-supported
hypotheses. This approach is used to discern which of several dispersal
hypotheses is most probable for a genus of Old-World leaf-nosed
bats, given the available data. From these synthetic analyses of
the data, we are able to conclude that the best-supported hypothesis
involves two independent dispersal events from Africa to Madagascar.
Furthermore, divergence dates estimated with coalescent methods
suggest that the two dispersal events occurred quite recently in
geological time.
Authors: Yoko Satta and Naoyuki Takahata, Department
of Biosystems Science, The Graduate University for Advanced Studies
(Sokendai)
Title: Population structure in sub-Saharan Africa revealed by haplotype
analysis
Recent extensive analyses of human DNA polymorphism reveal that
the time of most recent common ancestor (TMRCA) at neutral loci
seldom exceeds 2 myr. However, we recently found that the TMRCA
at CMP-N-acetyleneuraminic acid hydroxylase (CMAH) locus is ca.
3myr. The phylogenetic analysis of CMAH haplotypes shows two distinct
lineages which diverged ca. 3 myr ago: One is represented by a single
descendant haplotype in the sub-Saharan Biaka Pygmy population and
the other by the common ancestral lineage of all other haplotypes
which began to diversify extensively 1 myr ago. For these two distinct
lineages to be maintained this long, one may assume an operation
of balancing selection at this locus. However, because CMAH lacks
its function by Alu-mediated deletion of an exon, such selection
is unlikely. In accord with this, neither Tajima's D nor the HKA
test show any signature of selection. Here we examine the possibility
that African populations have been relatively large and partially
isolated from each other throughout the Plio-Pleistocene, thereby
contributing the greater genetic diversity than non-Africa populations
of young origins. Phylogenetic analyses of a dozen of loci for which
reliable haplotype data are available reveal that their ancestral
haplotypes occur almost exclusively in African samples. Computer
simulation confirms that this bias of the occurrence of ancestral
haplotypes in Afrcan samples can be observed only under population
structure in Africa and reduction of the effective size of the entire
population since 2 myr ago. We also argue that there must have been
some African populations which were not directly involved in the
Out-of Africa expansion in the late Pleistocene.
Author: Mike Steel, Biomathematics Research
Centre, University of Canterbury
Title: Random models of speciation and extinction, and their relevance
for phylogeny
Presentation materials: PDF
Streaming Video: Real
Media
Random models for species formation and loss have played an important
role in evolutionary biology since Yule's pioneering work in the
1920s. More recently these models have been investigated for the
light they shed on both the topological properties (shape, balance,
clade distribution, discrete tree reconstuction, tree rooting) and
metric properties (branch length distribution, phylogenetic diversity)
of phylogenies. In this talk I describe how these models are relevant
for tree reconstruction and rooting, and the distribution of clade
sizes, as well as the loss (and optimization) of phylogenetic diversity
as taxa go extinct. The talk will include some historical survey,
as well as some recent (and new) results.
Author: Marc A. Suchard, M.D., Ph.D., Departments
of Biomathematics and Human Genetics, David Geffen School of Medicine
at UCLA
Title: Joint inference of alignment and phylogeny from molecular
sequence data
Presentation materials: PDF
Streaming Video: Real
Media
Genomics research is generating vast molecular sequence data ranging
from single genes to whole genomes across an increasing number of
species. However, a fundamental difficulty in evolutionary studies
emerges as the availability of sequences expands. Phylogenetics
methods to reconstruct the evolutionary tree relating the sequences
traditionally condition on a single, sometimes poorly estimated
sequence alignment, where an alignment specifies which residues
in the sequences derive from a common origin. This conditioning
can cause bias and inappropriate infer in genomic studies, particularly
when the sequences are highly diverse. For example, the early branching-order
of Bacteria, Archaea and Eukaryotes, the three major domains of
life, is troublesome to determine.
As a solution, I describe a novel Bayesian model for simultaneously
estimating alignments and the phylogenetic trees that relate the
sequences. This sidesteps the bias issue inherent in sequential
estimation. Joint estimation also allows one to model rate variation
between sites when estimating the alignment and to use the evidence
in shared insertion/deletions (indels) in the sequences to group
sister species in the tree. I base this indel process on a Hidden
Markov Model that makes use of affine gap penalties and considers
indels of multiple residues.
I develop a Markov chain Monte Carlo (MCMC) method to sample from
the posterior of the joint model, estimating the most probable alignment
and tree and their support simultaneously. I describe a new MCMC
transition kernel based on the Forward-Backward algorithm and a
careful choice of parameter marginalization that improves our algorithm's
mixing efficiency, allowing the MCMC chains to converge even when
started from arbitrary alignments. Finally, my software implementation
can estimate alignment uncertainty and I describe a method for summarizing
this uncertainty in a single plot.
Author: Tandy Warnow, Department of Computer
Science, The University of Texas at Austin
Title: The Disk-Covering Method for Phylogenetic Tree Reconstruction
Presentation materials: PPT
Streaming Video: Real
Media
Phylogenetic trees, also known as evolutionary trees, model the
evolution of biological species or genes from a common ancestor.
Most computational problems associated with phylogenetic tree reconstruction
are very hard (specifically, they are NP-hard, and are practically
hard, as real datasets can take years of analysis, without provably
optimal solutions being found). Finding ways of speeding up the
solutions to these problems is of major importance to systematic
biologists. Other approaches take only polynomial time and have
provable performance guarantees under Markov models of evolution;
however, our recent work shows that the sequence lengths that suffice
for these methods to be accurate with high probability grows exponentially
in the diameter of the underlying tree.
In this talk, we will describe new dataset decomposition techniques,
called the Disk-Covering Methods, for phylogenetic tree reconstruction.
This basic algorithmic technique uses interesting graph theory,
and can be used to reduce the sequence length requirement of polynomial
time methods, so that polynomial length sequences suffice for accuracy
with high probability (instead of exponential). We also use this
technique to speed up the solution of NP-hard optimization problems,
such as maximum likelihood and maximum parsimony.
|
|
|