Laura Kubatko
Co-Director of MBI and Professor in the Departments of Statistics and Evolution, Ecology, and Organismal Biology, The Ohio State University
The advent of rapid and inexpensive DNA sequencing technologies has necessitated the development of computationally efficient methods for analyzing sequence data for many genes simultaneously in an evolutionary framework. This is particularly important for systems in which evolution occurs rapidly in response to environmental conditions and for which it is important to quickly diagnose what species are present. The multispecies coalescent is the most commonly used model for estimating species-level phylogenetic trees from multi-locus data, but inference under the coalescent model is computationally daunting in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. By viewing the data arising under the phylogenetic coalescent model as a collection of site patterns, the algebraic structure associated with the probability distribution on the site patterns can be used to develop computationally efficient methods for inference. In this talk, I will describe how identifiability results for four-taxon species trees based on site pattern probabilities can be used to build a quartet-based inference algorithm for trees of arbitrary size. The method will be applied to data on viruses that infect cassava plants worldwide, with losses of approximately $100 million USD annually in East Africa alone.
The methods discussed in this talk are the result of joint work with former MBI post-docs Julia Chifman and Colby Long.