The field of Genetic Epidemiology has historically focused on the inheritance of genetic factors and phenotypes within families. However, the increase in ever improving technologies brought a shift from familial study designs to genome wide association studies (GWAS) utilizing samples of unrelated individuals. While GWAS has yielded greater knowledge of genomic structure and disease associated variants, the estimated effect sizes are small and often to not explain a large proportion of disease heritability. One of the explanations for the missing heritability is that the variants identified in GWAS are common (> 5%) and thus we are missing an entire class of variation (rare) that substantially contributes to disease risk. The innovation of next-generation sequencing technology made the comprehensive discovery of rare variants feasible, however the sample size of unrelated individuals needed to identify associations between these rare variants and diseases is in the thousands (> 10,000 samples are necessary to detect a variant showing evidence of modest association with minor allele frequency 0.1%). While sequencing costs have decreased, the financial burden is still nontrivial and sample heterogeneity can easily confound results. Thus, efficient study designs and improved statistical approaches are necessary to untangle the contribution of rare variation to complex disease. Family studies have always been robust to confounding and a powerful approach for identifying genetic variation. In the age of sequencing, family studies are again an appealing approach for studying the relationship between complex disease and genetic variation.
This workshop will focus on the use of family studies in the hunt for disease associated genes, include the development of novel methodologies and statistics for assessing variant disease relationships as well as the important role of the family study design in a clinical sequencing setting.
All participants are encouraged to bring posters to display. Posters should not necessarily be limited to work on family-based data and can cover any topics in statistical genetics, genomics, and bioinformatics.
This MBI workshop is being co-sponsored by the National Institute of Statistical Sciences.
Organizers
Shili Lin
Statistics
The Ohio State University
shili@stat.ohio-state.edu
Lara Sucheston-Campbell
Pharmacy Practice and Science
The Ohio State University
sucheston-campbell.1@osu.edu
Asuman Turkmen
Department of Statistics
The Ohio State University
turkmen.2@osu.edu
Talks and Participants
Christopher Bartlett:
Using Quantum Mechanical Devices to Perform Genomic Studies in Families: Challenges, Promises, Changes
Video
Applying quantum physics to build quantum devices for computing has recently become reality with companies such as Google, IBM, and Intel making prototypes for algorithm experimentation. These devices demonstrate that binary computing states (0 vs. 1) can be manipulated using the rules of quantum mechanics to include superposition, entanglement, and wave interference as fundamentally new avenues for computing algorithms. While quantum algorithms have already shown in-principle speed-ups over classical computation for certain classes of problems such as factoring prime numbers, finding new algorithms for statistical computation such as machine learning is ongoing. The key differences between classical and quantum computing will be discussed in the context addressing genomics questions through simple quantum machine learning examples.
Saonli Basu (Biostatistics, University of Minnesota):
A Robust and Unified Framework for Estimating Heritability in Twin Studies using Generalized Estimating Equations
Video
The development of a complex disease is an intricate interplay of genetic and environmental factors. "Heritability" is defined as the proportion of total trait variance due to genetic factors within a given population. Studies with monozygotic and dizygotic twins allow us to estimate heritability by fitting an "ACE" model which estimates the proportion of trait variance explained by additive genetic (A), common shared environment (C), and unique non-shared environmental (E) latent effects, thus helping us better understand disease risk and etiology. IIn this paper, we develop a flexible generalized estimating equations framework (``GEE2'') for fitting twin ACE models that requires minimal distributional assumptions; rather only the first two moments need to be correctly specified. We show that two commonly used methods for estimating heritability, the normal ACE model (``NACE'') and Falconer's method, can both be fit within this unified GEE2 framework, which additionally provides robust standard errors. Although the traditional Falconer's method cannot directly adjust for covariates, the corresponding GEE2 version (``GEE2-Falconer'') can incorporate covarimate effects (e.g. let heritability vary by sex or age). Given non-normal data, these GEE2 models attain significantly better coverage of the true heritability compared to the traditional NACE and Falconer's methods. Finally, we demonstrate that Falconer's method can consistently estimate heritability when the ACE variance parameters differ between MZ and DZ twins; whereas the NACE will produce biased estimates in such settings.
Joint work with Jaron Arbet, Department of Biostatistics, University of Minnesota
Pamela Brock (Wexner Medical Center, Human Genetics, The Ohio State University):
Gene hunting studies in familial non-medullary thyroid cancer
Video
Abstract not available
Alyssa Clay-Gilmour (Health Science Research / Epidemiology/Biostatistics, Mayo Clinic):
Large-scale Linkage Analysis of Multiple Myeloma (MM) and Monoclonal Gammopathy of Undetermined Significance (MGUS) Families
Video
Multiple myeloma (MM) is a result of a malignant transformation of plasma cells that is preceded by the presence of an asymptomatic clonal plasma cell expansion, a condition referred to as monoclonal gammopathy of undetermined significance (MGUS). We and others have shown familial aggregation of MM and MGUS. Evidence from epidemiologic, family and genome-wide association studies (GWAS) suggests a genetic component underlying MM etiology. GWAS have successfully established 17 common genetic risk loci for MM to date and recently, rare inherited susceptibility variants in the LSD1 / KDM1A and USP45 genes were identified in familial MM / MGUS kindreds. Family-based approaches may be used to elucidate genetic variation contributing to familial MM. Genetic linkage analysis has historically been used to detect the chromosomal location of disease genes. The objective of this study was to conduct a linkage analysis of MM / MGUS families to identify genomic regions for MM / MGUS.
Genetic Studies in the Mid-Western Amish
Jonathan Haines (Department of Population and Quantitative Health Sciences, Case Western Reserve University):
Genetic Studies in the Mid-Western Amish
Video
Genetic studies of diseases of aging have been done predominantly in clinic-based case-control datasets drawn from the general population. While these have the advantage of being relatively easy to collect and thus can generate large sample sizes, they do have limitations in ascertainment bias, differences in case and control ascertainment, and focus on genetic association analyses. Using special populations, such as the mid-Western Amish, overcomes several of these limitations.
Over the past 15 years, we have worked collaboratively to collect phenotype and genotype information on the Amish of Holmes county in Ohio, and Elkhart, LaGrange, and Adams counties in Indiana. The Amish are culturally and genetically isolated and their lifestyle tends to be quite homogeneous, making genetic studies quite valuable. We have focused our efforts on two significant diseases of aging: Alzheimer disease (AD) and Age Related Macular Degeneration (AMD). Our ongoing studies have demonstrated that the genetic architecture of Ad and AMD differ significantly from the general population, strongly suggesting that novel loci exist in the Amish. Current studies are aimed at finding these novel loci using a combination of genome wide association and whole genome sequencing data.
Daniel Kinnamon:
Applying Quantitative Genetics Approaches to Understand the Genetic Etiology of Idiopathic Dilated Cardiomyopathy in the Exome Era
Idiopathic dilated cardiomyopathy (DCM), defined as the presence of systolic dysfunction and left ventricular enlargement in the absence of non-genetic clinical cause, is a major cause of heart failure. Nearly 40 genes have been established as relevant for idiopathic DCM, typically assuming a Mendelian monogenic disease model with autosomal dominant inheritance. While exome sequencing of probands and affected relatives in families has made it possible to discover rare variants across all of these genes with a single assay, determining the biological relevance of these variants is often difficult. When applying American College of Medical Genetics variant adjudication criteria, many potentially relevant rare variants are classified as variants of uncertain significance, and the presence of such variants in multiple genes in a single family introduces additional ambiguity. Moreover, due to variable age at onset and disease severity, unaffected relatives provide limited information on the biological relevance of these variants without additional data. To overcome these challenges, gain additional insight into the biological relevance of variants identified through exome sequencing, and evaluate more complex genetic disease models, we have turned to examining familial variation in quantitative endophenotypes using adaptations of the measured genotype model. I will describe the rationale for this approach, an initial successful application, and our plans to apply it more broadly in the cohort of 1300 families currently being recruited as part of the NHLBI- and NHGRI-funded DCM Precision Medicine Study.
Daniel Koboldt:
Whole Genome Sequencing and Analysis for Rare Pediatric Conditions
Video
Children with rare inherited conditions are increasingly referred for clinical exome sequencing, which yields a positive finding in only ~25-35% of them. For the remaining as-yet-undiagnosed cases, research sequencing of the proband and available family members has the potential to uncover new genetic etiologies of disease. Our institute has enrolled more than 40 families suffering rare inherited conditions into a research genomics protocol. Using predominantly whole genome sequencing (WGS) of multiple family members, we have identified likely causal variants in 30% of cases and strong candidate variants in another 20%. Here, I describe the workflow of our rare disease genomics research program, including recruitment and case selection, sequencing/analysis strategies, candidate validation, and reporting of results. I will also highlight some solved cases whose underlying etiology or phenotypic association challenges the current knowledge of genotype-phenotype relationships.
Genetic Studies of Tuberculosis: Importance of the Family-based Design
Catherine Stein:
Genetic Studies of Tuberculosis: Importance of the Family-based Design
Video
Tuberculosis (TB) remains a major public health threat globally, and several studies have demonstrated a role for human genetic factors underlying TB risk. However, exposure to the causal bacterium, Mycobacterium tuberculosis, is a necessary risk factor for TB, and few population-based studies appropriately account for this exposure. In this talk, I will describe how we’ve utilized a family study to examine the genetic epidemiology of TB and address limitations in the extant literature. I will present both key findings and future directions.
William Stewart:
Using Millions of SNPs and a Few Extended Pedigrees to Accelerate Disease-Gene Discovery
For cosegregation studies involving a large number of small affected families and modern SNP arrays, it is difficult to improve upon the DSE -- our near-optimal estimator of disease-gene location that averages location estimates from random subsamples of the available dense SNP data. However, for studies involving dense SNPs and a small number of large families, the usual asymptotics no longer apply. As such, accurate estimation of the variance of the DSE is nontrivial. Here, I describe an importance sampling approach that accurately approximates the variance of the DSE. In principle, additional gains in precision are possible by using publicly available reference samples to better account for the correlations between SNPs. I applied my approximate importance sampling approach to dense SNP data simulated under recessive and dominant models. In each setting, the variance of the DSE was accurately estimated, and relative to approximate 95% confidence intervals (CIs) constructed from existing methods, my CIs for disease-gene location were substantially shorter. As such, researchers with large affected families and dense SNP data should now be able to significantly reduce their targeted re-sequencing costs, and greatly expedite the rate at which disease-genes are found.
Asuman Turkmen (Department of Statistics, The Ohio State University):
Rare Variant Analysis of Autosomal and X Chromosome Genetic Data in Family-based Sequencing Studies
The inability of common variants identified by genome-wide association studies (GWAS) to explain much of the heritability of most complex diseases, and the advances of next-generation (NGS) technologies have led to an increased interest in investigating the etiology of complex disease due to rare variants. Despite numerous methodologies proposed for rare variant associations over the years, discovery of these variants has remained elusive, mostly restricted to population-based designs. Further, there is only a little known about rare X-linked variants associated with complex diseases. Here we propose a method to test for association of rare variants obtained by sequencing in family-based samples utilizing a variance component test. The proposed approach can be used for both autosomes and the X-chromosome. Using simulations, we show that the method perform at par with the existing methods for autosomes while being more computationally efficient. Its performance for X-chromosome is also promising based on the results of the simulated and real data from the University of Miami Study on Genetics of Autism and Related Disorders.
Joint work with Shili Lin, Department of Statistics, The Ohio State University
Veronica Vieland (Pediatrics & Statistics, The Ohio State University):
Linkage Analysis of Complex Traits: Failed Paradigm or Powerful Tool?
One obvious thing that families are good for in human genetic research is linkage analysis (LA): mapping disease genes based on co-segregation within pedigrees (violations of Mendel II) between phenotypes of interest and DNA marker genotypes. LA has yielded genes for thousand of Mendelian disorders, but for complex disorders it has fallen out of vogue in favor of GWAS- and NGS-based designs. Psychiatric genetics is often held out as the poster child for the failure of LA to yield meaningful results for complex traits, but in this talk I will utilize the schizophrenia collection within the data repository of the National Institutes of Mental Health (NIMH) to argue that we don't actually know yet whether LA 'works'; or not for psychiatric disorders. I will describe a large-scale study to revisit the families in the NIMH collection, and illustrate the need for updating genotypes, phenotypes and statistical methods before we can assess the efficacy of LA in psychiatric applications.
Rosalie Waller:
Shared Genomic Segment Analysis for Complex Traits
Video
High-risk pedigrees (HRPs) are a key design in mapping rare and highly-penetrant genes in Mendelian-like diseases. However, success with the HRP design in complex diseases has been modest, in part because standard methods do not adequately address genetic heterogeneity. Novel methods are needed to re-invigorate HRP designs for gene-discovery in complex diseases. Extended high-risk pedigrees can contain sufficient meioses to gain power for gene mapping as single pedigrees; however, intrafamilial heterogeneity may still exist. To address intrafamilial heterogeneity, we expanded on the Shared Genomic Segment (SGS) method, a large pedigree mapping method that identifies subsets of cases within an extended pedigree that share segregating chromosomal regions. Here, I will describe this strategy and our application to high-risk myeloma pedigrees.
Meng Wang (Stewart Lab, Nationwide Childrens Hospital):
FamLBL: Detecting Rare Haplotype Disease Association Based on Common SNPs Using Case-parent Triads
Video
Motivation: In recent years, there has been an increasing interest in using common single-nucleotide polymorphisms (SNPs) amassed in genome-wide association studies to investigate rare haplotype effects on complex diseases. Evidence has suggested that rare haplotypes may tag rare causal single-nucleotide variants, making SNP-based rare haplotype analysis not only cost effective, but also more valuable for detecting causal variants. Although a number of methods for detecting rare haplotype association have been proposed in recent years, they are population based and thus susceptible to population stratification.
Results: We propose family-triad-based logistic Bayesian Lasso (famLBL) for estimating effects of haplotypes on complex diseases using SNP data. By choosing appropriate prior distribution, effect sizes of unassociated haplotypes can be shrunk toward zero, allowing for more precise estimation of associated haplotypes, especially those that are rare, thereby achieving greater detection power. We evaluate famLBL using simulation to gauge its type I error and power. Compared with its population counterpart, LBL, highlights famLBL’s robustness property in the presence of population substructure. Further investigation by comparing famLBL with Family-Based Association Test (FBAT) reveals its advantage for detecting rare haplotype association.
Xiaofei Zhou (Department of Statistics, The Ohio State University):
Detecting Rare Associated Haplotypes from Independent Case, Control and Family Trio Data: Combined Logistic Bayesian Lasso
The Genome Wide Association Study (GWAS) has brought some successes regarding the genetic mechanism of disease development: by now, it has identified more than 5,000 SNPs associated with more than 200 common diseases. However, most detected SNPs from GWAS are of moderate and mini effect sizes, and they suffer from missing heritability, a problem where a large portion of expected genomic variability is unexplained by the observed genomic variants. A popular theory that explains missing heritability is the Common Disease Rare Variant Hypothesis (CDRV), which states that the genetic cause of the disease can be comprised of multiplicity of rare variants with high penetrance [Manolio et al., 2009]. The association between diseases and rare variants is usually difficult to be detected due to the large noise resulted from variants low frequencies, but the introduction of penalization methods such as Lasso makes it possible to identify such association. Moreover, the variants of interest don’t have to be single nucleotide variants. In fact, the haplotype variants are of more interesting side as the allele in close loci tend to act together. Several methods have been developed in recent years to reveal rare associated haplotypes. Biswas ’s Logistic Bayesian Lasso (LBL) [Biswas and Lin, 2012] works on independent case/control data and Wang’s family triad Logistic Bayesian Lasso (famLBL) [Wang and Lin, 2014] caters to family trio data. In real application, many data sets provide both case/control and family trio data, and focusing on only one type of data means abandoning valuable information. Here, we develop a new logistic Bayesian Lasso method (called combined logistic Bayesian Lasso, or cLBL) that can jointly analyze the two data types. Our simulation shows cLBL provides higher statistical power compared with LBL and famLBL. When applied to Framingham Heart Study (FHS), cLBL detects quite a few associated haplotypes that are otherwise insignificant under both LBL and famLBL.
Xiaofeng Zhu (Population and Quantitative Health Sciences, Case Western Reserve University):
Analysing Rare Variants Using Families in Large Whole Genome Sequencing Data
Advances in genomics have led a substantial increase in the availability of whole genome sequencing data, which allows us to identify rare variants associated with complex traits and potentially seek for an alternative explanation of “missing heritability”. Many statistical methods for analysing rare variant association have been developed, but these statistical methods have mainly focused on weighting rare variants using genome annotation. In this talk, I will discuss the ways in which traditional linkage information from family data can help in prioritizing rare variants, which is independent from genome annotation. I will use both simulations and real whole genome sequencing data to illustrate that family data can improve statistical power to detect rare variant associations.
Name | Affiliation | |
---|---|---|
Christopher Bartlett | christopher.bartlett@nationwidechildrens.org | Mathematical Medicine, Nationwide Children's Hospital |
Saonli Basu | saonli@umn.edu | Biostatistics, University of Minnesota |
Pamela Brock | pamela.brock@osumc.edu | Wexner Medical Center, Human Genetics, The Ohio State University |
Alyssa Clay-Gilmour | Clay.alyssa@mayo.edu | Health Science Research / Epidemiology/Biostatistics, Mayo Clinic |
Daniel Comiskey | daniel.comiskey@osumc.edu | Comprehensive Cancer Center, The Ohio State University |
Albert de la Chapelle | albert.delachapelle@osumc.edu | SBS-Cancer Biology & Genetics, The Ohio State University |
Han Fu | fu.607@osu.edu | Statistics, The Ohio State University |
Kesh Govinder | govinder@ukzn.ac.za | Mathematics, Statistics and Computer Science, University of KwaZulu-Natal |
Jonathan Haines | jlh213@case.edu | Department of Population and Quantitative Health Sciences, Case Western Reserve University |
Chenggong Han | han.1071@osu.edu | Biostatistics, The Ohio State University |
Chenggong Han | han.1071@osu.edu | Statistics, The Ohio State University |
Ezgi Karaesmen | karaesmen.1@osu.edu | College of Pharmacy, The Ohio State University |
Ezgi Karaesmen | karaesmen.1@osu.edu | Pharmaceutics and Pharmaceutical Chemistry, The Ohio State University |
Daniel Kinnamon | Daniel.Kinnamon@osumc.edu | Internal Medicine, The Ohio State University |
Daniel Koboldt | daniel.koboldt@nationwidechildrens.org | Institute for Genomic Medicine, Nationwide Children's Hospital |
Shili Lin | shili@stat.ohio-state.edu | Statistics, The Ohio State University |
Cheryl London | london.20@osu.edu | Department of Veterinary Biosciences, The Ohio State University |
Shuyuan Lou | lou.59@osu.edu | Biostatistics, The Ohio State University |
Hengrui Luo | luo.619@osu.edu | Department of Statistics, The Ohio State University |
Taina Nieminen | taina.nieminen@osumc.edu | Comprehensive Cancer Center, The Ohio State University |
Abbas Razvi | razvi.7@osu.edu | College of Pharmacy, The Ohio State University |
Abbas Rizvi | rizvi.33@buckeyemail.osu.edu | Pharmaceutical Sciences, The Ohio State University |
Xiaoqing Rong-Mullins | rong-mullins.1@osu.edu | COPH-Division of Biostatistics, The Ohio State University |
Michael Sovic | sovic.1@osu.edu | College of Pharmacy, The Ohio State University |
Catherine Stein | catherine.stein@case.edu | Population & Quantitative Health Sciences, Case Western Reserve University |
William Stewart | William.Stewart@nationwidechildrens.org | Pediatrics/Statistics, Nationwide Children's Hospital/The Ohio State University |
Lara Sucheston-Campbell | sucheston-campbell.1@osu.edu | Pharmacy Practice and Science, The Ohio State University |
Neeraja Sundar Rajan | sundarrajan.3@osu.edu | Ctr for Life Science Education, The Ohio State University |
Hancong Tang | tang.889@osu.edu | Department of Biostatistics, The Ohio State University |
Asuman Turkmen | turkmen.2@osu.edu | Department of Statistics, The Ohio State University |
Veronica Vieland | veronica.vieland@nationwidechildrens.org | Pediatrics & Statistics, The Ohio State University |
Rosalie Waller | Rosalie.Waller@hci.utah.edu | Biomedical Informatics, University of Utah School of Medicine |
Meng Wang | wang.1357@osu.edu | Stewart Lab, Nationwide Childrens Hospital |
Yiwen Wang | wang.11518@osu.edu | College of Pharmacy, The Ohio State University |
Junke Wang | wang.12157@osu.edu | Pharmacy, Ohio State University |
Yanqiang Wang | yanqiang.wang@osumc.edu | Comprehensive Cancer Center, The Ohio State University |
Hsiu-Chuan Wei | hsiuwei@fcu.edu.tw | Mathematical Biosciences Institute, The Ohio State University |
Qing Xie | xie.735@osu.edu | Department of Statistics, Ohio State University |
Qing Xie Xie | xie.735@osu.edu | Statistics, The Ohio State University |
Pei Yang | yang.1736@osu.edu | Department of Statistics, The Ohio State University |
Ayesha Zafar | zafar.34@osu.edu | Pharmacogenomics, The Ohio State University |
Xiaofei Zhou | zhou.1150@osu.edu | Department of Statistics, The Ohio State University |
Xiaofeng Zhu | xxz10@case.edu | Population and Quantitative Health Sciences, Case Western Reserve University |