Haplotype Inference Using a Bayesian Hidden Markov Model

Shuying Sun
Mathematical Biosciences Institute (MBI), The Ohio State University

(March 1, 2007 10:30 AM - 11:30 AM)

Haplotype Inference Using a Bayesian Hidden Markov Model

Abstract

Knowledge of haplotypes is useful for understanding block structure in the genome and disease risk associations. Direct measurement of haplotypes in the absence of family data is presently impractical, and hence, several methods have been developed for reconstructing haplotypes from population data. We have developed a new population-based method using a Bayesian Hidden Markov Model (HMM) for the source of the ancestral haplotype segments. In our Bayesian model, a higher order Markov model is used as the prior for ancestral haplotypes, to account for linkage disequilibrium. Our model includes parameters for the genotyping error rate, the mutation rate, and the recombination rate at each position. Computation is done by Markov Chain Monte Carlo (MCMC) using the Forward-Backward algorithm to efficiently sum over all possible state sequences of the HMM. We have used the model to reconstruct the haplotypes of 129 children at a region on chromosome 5 in the data set of Daly et al. [2001] (for which true haplotypes are obtained based on parental genotypes), and of 30 children at selected regions in the CEU and YRI data of the HAPMAP project. The results are quite close to the family-based reconstructions and comparable to the state-of-the-art PHASE program [Stephens et al., 2001, Stephens and Donnelly, 2003, Stephens and Scheet, 2005]. Our haplotype reconstruction method does not require division of the markers into small blocks of loci. The recombination rates inferred from our model can help to predict haplotype block boundaries, and estimate recombination hotspots.