|
Workshop 6 Abstracts and Lecture Materials:
Author: Carlos Bustamante, Biostatistics and Computational Biology,
Cornell University
Title: Inferring the distribution of selective effects among mutation,
SNPs, and fixed differences using Polymorphism and Divergence Data
I will present recent work on frequentist and Bayesian approaches
to the problem of inferring the distribution of selection coefficients
on newly arising mutations within the context of "shift-models".
Methods that use whole-genome SNP frequency data, polymorphism and
divergence across protein-coding gene data, and combined SNP frequency,
invariant, and divergence data will be presented. Forward simulation
with selection and recombination are used to gauge the sensitivity,
robustness, and accuracy of our models. Lastly, we apply the method
to human polymorphism and divegence data to estimate the proportion
of mutations, SNPs, and nucleotide substitutions in the human genome
that are deleterious, neutral, and adaptive.
[Joint work w/ Rasmus Nielsen, Andy Clark, Scott Williamson, Adi
Fledel-Alon, and Ryan Hernandez]
Author: Ranajit Chakraborty, PhD, Robert A. Kehoe Professor and
Director, Center for Genome Information, Department of Environmental
Health, University of Cincinnati College of Medicine
Title: Effects of Mutation and Population Demography on the Dynamics
of Linkage Disequilibria and their Relevance for Mapping Complex
Disease Genes
Streaming Video: Real
Media
Complex diseases constitute the major public health burden in all
societies around the world. However, the success in determining
etiology of such diseases has been rather limited for several reasons.
This presentation starts with a brief outline of possible reasons
of the difficulties involved in elucidating the genetic basis of
complex diseases. From these discussions, we argue that population-based
association studies are more likely to provide insights of genetic
basis of complex diseases, rather than traditional family-based
study designs. However, since disease-gene association at population
level stems from inter-locus association of alleles, a thorough
understanding of population genetic properties of linkage disequilibrium
(LD) is needed for appropriate genetic interpretation of disease-gene
association data. To this effect, some properties of genome-wide
background LD are examined through a coalescence-based simulation
study. We show that when microsatellite loci are used as genomic
markers for disease-gene association studies, the expectation of
the weighted normalized LD between two loci decreases with recombination
distance between loci. However, the extent and trend of such decay
is dependent on the rate and pattern of mutations as well as on
the demographic history of populations. For example, for any specified
recombination distance, the simulation results show that the power
of detection of LD is larger in populations of constant smaller
size. In a growing population, the power of detecting LD is substantially
reduced, making it comparable to that expected in a constant population
of the largest size reached by the population. In presence of population
growth, the enhancement of LD detection power with increasing sample
size is less conspicuous than in populations of constant size. Power
of detection of LD is also larger for loci with higher mutation
rate in populations of constant size, although under population
growth, the effect of mutation rate is reversed, particularly for
markers of larger recombination distances. Multistep forward-backward
mutations at microsatellite loci actually increase the power. Finally,
presence of multiple alleles at microsatellite loci makes such markers
more powerful to detect LD, than the common single nucleotide polymorphism
sites (SNPs) residing at the same recombination distance. (Research
supported by US Public Health Research Grants from the National
Institutes of Health).
Author: Rick Durrett, Department of Mathematics, Cornell University
Title: The impact of spatial structure on genetic data
We will review recent results on the stepping stone model which show
that it has a much different impact on genetic data than the often
used island model. We will describe theoretical results for coalescing
times done in collaboration with Ted Cox and Iljana Zahle, as well
as simulation results of Arkendra De for the site frequncy spectrum
and decay of linkage disequilibirum along a chromosome.
Author: Paul Fearnhead, Departments of Mathematics and Statistics,
Lancaster University
Title: Likelihood-based methods for detecting recombination Hotspots
from Population Data
Presentation materials: PDF
Streaming Video: Real
Media
Recent evidence suggests that recombination hotspots are common
across the human genome. We show how (approximate) likelihood-based
methods for estimating recombination rates from population data
can be adapted to the problem of detecting recombination hotspots.
We extend an existing method for detecting recombination hotspots,
which uses likelihood curves for the recombination rate across small
sub-regions of the genome. This new method appears more powerful
than existing methods at detecting hotspots - simulation results
suggest a power of around 60-80% with a false positive rate of 1-5%.
Analysis of Seattle SNP genes suggests that recombination hotspots
are randomly distributed across the genome, with an average spacing
of around 1 per 30-40kb. Many genes contain more than one hotspot.
There is little evidence for hotspots which occur in only one of
the two (European American and African American) populations.
Author: Eran Halperin, International Computer Science Institute,
Berkeley, CA
Title: Estimating haplotype frequencies efficiently
Streaming Video: Real
Media
In this talk I will introduce a new method, HAPLOFREQ, to estimate
haplotype frequencies over a short genomic region given the genotypes
or haplotypes with missing data and/or sequencing errors. Our approach
is based on rigorous analysis of the likelihood function, and in
particular the method is guaranteed to efficiently converge to the
global optimum of the likelihood function. Finally, I will discuss
the relations between haplotype frequency estimation and tag SNP
selection.
Author: Paul Joyce, Departments of Mathematics, Statistics, and
Bioinformatics, University of Idaho
Title: Efficient Simulation Methods for a Class of Nonneutral Population
Genetics Models
Presentation materials: PDF
Streaming Video: Real
Media
Many of the current methods for uncovering the genetic basis of
common complex diseases in humans aim to exploit linkage disequilibrium
(LD). Patterns of LD depend crucially on the shape of genealogical
trees at the loci involved, so there is considerable interest in
understanding how these would be affected be selection. An algorithm
for exact simulation from the genealogical history of a sample,
for population genetics models with general diploid selection and
parent independent mutation was developed by Stephens and Donnelly
(2003) based on the work of Slade (2000). Central to their approach
is the need to calculate the constant of integration for the $K$
allele model with selection. Donnelly, Nordborg, and Joyce (DNJ
2001) developed methods for likelihood analysis under the $K$ allele
model with selection. Here we present a new method for likelihood
analysis that is substantially more efficient than DNJ (2001) and
can be used to improve the efficiency of Stephens and Donnelly (2003).
The method uses numerical analysis techniques, including fast Fourier
transforms to calculate the intractable constant of integration.
The method provides a perfect simulation approach for directly drawing
allele frequency samples from the distribution under selection.
This research is joint work with Alan Genz, Washington State University.
Author: Rasmus Nielsen, Center for Biostatistics,
Universitetsparken 15
Title: Analysis of ascertained SNP data
Most available human Single Nucleotide Polymorphism (SNP) data
have been obtained through a complicated process in which SNPs are
first discovered in a small sample and then genotyped in a larger
sample. This fundamentally affects the data and affects many properties
of the data including linkage disequilibrium, frequency spectrum,
levels of population differentiation, etc. It also implies that
standard population genetic analyses are not aplicable to the vast
amjority of the human SNP data. I wll discuss the ascertainment
process in some of the major SNP data sets available (Perlegen and
HapMap), discuss how the ascertainment process has affected the
data, and how various corrections methods can implemented to allow
valid populaiton genetic inferences.
Author: Jonathan Pritchard, Department of
Human Genetics, University of Chicago
Title: Detecting partial selective sweeps from SNP data
The new HapMap and Perlegen data sets offer the first opportunities
to scan the human genome for signatures of natural selection. One
promising approach is to search for polymorphic variants that have
undergone recent directional selection using patterns of long-range
LD (e.g., as in Sabeti et al 2002). In this talk I outline a new
approach that extends the PAC-likelihood model of Li and Stephens
(2003) in order to test for this type of signal in an approximate
likelihood framework. The new test controls appropriately for local
recombination rate heterogeneity, which may confound simpler approaches.
I discuss applications to the genome-wide data sets.
Author: Susan Ptak, Evolutionary Genetics,
Max Planck Institute for Evolutionary Genetics
Title: Fine-scale recombination patterns differ between chimpanzees
and humans
Presentation materials: PPT
Streaming Video: Real
Media
Two recent studies examined single human recombination hotspots
in other primates and found no evidence for an increased rate of
recombination. These findings raised the question of how conserved
recombination rates are among closely related species. To address
this, we estimated recombination rates from 14 Mb of linkage disequilibrium
data in chimpanzees and in humans. The results suggest that recombination
hotspots are not conserved between the two species and that recombination
rates in larger (50 kb) genomic regions are only weakly conserved.
Thus, the recombination landscape has changed dramatically between
the two species.
Author: Noah Rosenberg, University of Michigan
Title: Population structure and homozygosity-based measures of linkage
disequilibrium
Streaming Video: Real
Media
Inferences about linkage disequilibrium are often based on haplotypes
estimated from genotype data. To avoid using estimated haplotypes
in measuring pairwise linkage disequilibrium, it is possible to
employ statistics of Ohta (1980) and Sabatti and Risch (2002) that
utilize the difference between the observed proportion of double
homozgygotes and the prediction made about double homozygosity from
the homozygosities of individual loci. In this talk, I investigate
some properties of homozygosity-based linkage disequilibrium statistics,
paying particular attention to how the statistics are affected by
population structure.
Author: Fengzhu Sun, PhD, Center for Computational
and Experimental Genomics, University of Southern California
Title: Haplotype block partition and tag SNP selection and their
applications to association studies
Streaming Video: Real
Media
The HapMap project will generate enormous amount of data on human
genomic variation. Due to high linkage disequilibrium in many regions,
a small fraction of SNPs (tag SNPs) are sufficient to capture most
of the haplotype structure of the human genome. We developed a suite
of dynamic algorithms for haplotype block partition and tag SNP
selection to minimize the total number of tag SNPs across the region
of interest or the whole genome. Our algorithm can be applied to
both haplotype and genotype data as well as any pedigree structures.
We also studied the power issues in association studies related
to tag SNP selection using simulated as well as real data.
Author: Marcy Uyenoyama, Department of Biology, Duke University
Title: Likelihoods from summary statistics
I address methods for inferring population parameters from multiple
summary statistics. We generated a maximum-likelihood estimate of
the rate of recombination between a neutral marker locus and the
target of strong balancing selection to which it shows nearly completely
linkage. Recently, we developed an importance sampling (IS) approximation
to the time-consuming computation of exact likelihoods on which
this method relies. In a study of the demographic history of closely
related Drosophila species, we found that the IS approach yielded
accurate estimates under a much reduced computational burden. I
end with a discussion of ongoing explorations of the effects of
genomic location, including cosegregation with incompatibility genes,
targets of selection, or centromeres, on rates of introgression.
Short Talks Abstracts
Author: Yuguo Chen, Institute
of Statistics and Decision Sciences, Duke University
Title: Stopping-Time Resampling for Sequential Monte Carlo Methods
with Applications to Population Genetics
Motivated by the statistical inference problem in population genetics,
we present a new sequential importance sampling with resampling
strategy. The idea of resampling is key to the recent surge of popularity
of sequential Monte Carlo methods in the statistics and engineering
communities, but existing resampling techniques do not work well
for coalescent-based inference problems in population genetics.
We develop a new method called ``stopping-time resampling,'' which
allows one to compare partially simulated samples at different stages
so as to terminate unpromising partial samples and multiply promising
ones early on.
Author: Graham M. Coop, Department
of Human Genetics, University of Chicago; Simon R. Myers, Department
of Statistics, University of Oxford
Title: Live hot, die young: transmission distortion in recombination
hotspots
Presentation materials: PPT
There is increasing evidence that hotspots of meiotic recombination
in humans, as well as in other organisms, are a transient features
of the genome. This observation is commonly believed to be the result
of biased gene conversion in favour of alleles that locally disrupt
hotspots. We investigate the effect of such alleles on the short-term
evolution of hotspots through population genetic models. Our results
indicate that a lack of sharing of intense hotspots between species
is to be expected even if there are few sites where hotspot-disrupting
alleles can arise. Effective population size is found to play a
significant role in the fate of hotspots. The distribution of hotspot
intensities in a population under different models of hotspot genesis
is discussed. The effect of alleles that influence the intensity
of a hotspot on patterns of diversity are explored, we find that
alleles that reduce the intensity of a hotspot leave little trace
of their presence in the patterns found in population data.
Author: Vincent Plagnol,
Department of Computational Biology, University of Southern California
Title: Demographic inference of human populations history
Presentation materials: PDF
We focus in this paper on the issue of model fitting for the history
of two human populations: European and African-American. Most of
the analysis is based on the Seattle SNPS's database, but it also
highlights the issue of ascertainment bias when dealing with the
HapMap dataset. We designed a simple model whose features can explain
the pattern of variation observed in the data. After estimating
its parameters and assessing its goodness-of-fit we use our model
to understand the relation between frequency and age of SNPs in
each subpopulation. It illustrates key differences between the genealogical
histories of African and European populations. These findings have
implications for disease mapping and scan for selection.
Author: Kui Zhang, Department
of Biostatistics, University of Alabama at Birmingham
Title: Haplotype Inference for Tightly Linked SNPs in General Pedigrees
Presentation materials: PDF
Haplotype reconstruction for tight linked markers in general pedigrees
remains a challenging problem. Not only a few methods are available
to efficiently estimate haplotype frequencies and accurately infer
haplotype configurations in general pedigrees with a large number
of tightly linked SNPs, especially in the presence of missing data,
but also performances of them have not been carefully evaluated. We
have developed an efficient computer program, HAPLORE, for haplotype
frequency estimation and reconstruction in general pedigrees with
tightly linked SNP markers. In this report, we compare and contrast
HAPLORE with other two previously published methods. We review the
methods and point out the differences between them in terms of the
models and computational strategies they use. The performances of
them are assessed through simulated haplotypes based on real pedigrees.
Our results indicate HAPLORE outperforms other methods. |