Phylogenetic Analaysis of Large Datasets

Diego Pol (Mathematical Biosciences Institute (MBI), The Ohio State University)

(March 17, 2005 10:30 AM - 11:30 AM)

Phylogenetic Analaysis of Large Datasets


Phylogenetic trees are representations of the evolutionary history of groups of organisms. The leaves of these graphs represent biological species (or a higher level taxonomic unit) and the internal nodes are interpreted as hypothetical evolutionary ancestors. Although it was considered relevant only to taxonomic and evolutionary studies, phylogenetics is becoming a critical tool for numerous disciplines in biology and medicine, providing a unique organizing framework for biological variation and predictive analysis.

Several phylogenetic methods aim to find the optimal phylogenetic trees from the space of all possible trees, evaluating the hypotheses with an objective function. Thus, this combinatorial optimization problem (phylogenetic tree search) is compute bound and must be approached through heuristics for large and biologically interesting datasets. Large phylogenetic problems are of interest to biologists because they provide a rich context of phenotypes and genotypes. Here, I will approach the problem of analyzing datasets with large number of species (between several hundreds and several thousands) using recently developed tree search algorithms and diverse parallelization strategies using Beowulf clusters for parallel computing.