Workshop 1: Family-based Genomic Studies

(September 17,2018 - September 19,2018 )

Organizers


Shili Lin
Statistics, The Ohio State University
Lara Sucheston-Campbell
Pharmacy Practice and Science, The Ohio State University
Asuman Turkmen
Department of Statistics, The Ohio State University

The field of Genetic Epidemiology has historically focused on the inheritance of genetic factors and phenotypes within families. However, the increase in ever improving technologies brought a shift from familial study designs to genome wide association studies (GWAS) utilizing samples of unrelated individuals. While GWAS has yielded greater knowledge of genomic structure and disease associated variants, the estimated effect sizes are small and often to not explain a large proportion of disease heritability. One of the explanations for the missing heritability is that the variants identified in GWAS are common (> 5%) and thus we are missing an entire class of variation (rare) that substantially contributes to disease risk. The innovation of next-generation sequencing technology made the comprehensive discovery of rare variants feasible, however the sample size of unrelated individuals needed to identify associations between these rare variants and diseases is in the thousands (> 10,000 samples are necessary to detect a variant showing evidence of modest association with minor allele frequency 0.1%). While sequencing costs have decreased, the financial burden is still nontrivial and sample heterogeneity can easily confound results. Thus, efficient study designs and improved statistical approaches are necessary to untangle the contribution of rare variation to complex disease. Family studies have always been robust to confounding and a powerful approach for identifying genetic variation. In the age of sequencing, family studies are again an appealing approach for studying the relationship between complex disease and genetic variation. 

This workshop will focus on the use of family studies in the hunt for disease associated genes, include the development of novel methodologies and statistics for assessing variant disease relationships as well as the important role of the family study design in a clinical sequencing setting. 

 

All participants are encouraged to bring posters to display. Posters should not necessarily be limited to work on family-based data and can cover any topics in statistical genetics, genomics, and bioinformatics.

 

This MBI workshop is being co-sponsored by the National Institute of Statistical Sciences.

 

Poster Presenters

  • Chenggong Han: Differential Methylation Analysis Adjusted for Covariates by Bayesian Curve Credible Bands Approach
  • Alyssa Clay-Gilmour: Rare Germline Variants Segregating in Chronic Lymphocytic Leukemia (CLL) Families
  • Shuyuan Lou: ChIA-Sim - A Simulator of 3D Interactions Data and Evaluation of Methods
  • Catherine Stein: Association Between Immunity Genes, Severity of TB and Interaction with M. Tuberculosis Lineage
  • Lara Sucheston-Campbell: Exomechip Analyses Identify Genes affecting Mortality after HLA-Matched Unrelated Donor Blood and Marrow Transplantation
  • Rosalie Waller: Novel Strategy to Map GWAS Functional Variants Using Sequencing in High-Risk Cases Identifies Putative Risk-Variants in Myeloma

 

Accepted Speakers

Christopher Bartlett
Mathematical Medicine, Nationwide Children's Hospital
Saonli Basu
Biostatistics, University of Minnesota
Pamela Brock
Wexner Medical Center, Human Genetics, The Ohio State University
Alyssa Clay
Health Science Research / Epidemiology/Biostatistics, Mayo Clinic
Alyssa Clay-Gilmour
Health Science Research / Epidemiology/Biostatistics, Mayo Clinic
Jonathan Haines
Department of Population and Quantitative Health Sciences, Case Western Reserve University
Daniel Kinnamon
Internal Medicine, The Ohio State University
Daniel Koboldt
Institute for Genomic Medicine, Nationwide Children's Hospital
Cheryl London
Department of Veterinary Biosciences, The Ohio State University
Catherine Stein
Population & Quantitative Health Sciences, Case Western Reserve University
William Stewart
Pediatrics/Statistics, Nationwide Children's Hospital/The Ohio State University
Asuman Turkmen
Department of Statistics, The Ohio State University
Veronica Vieland
Pediatrics & Statistics, The Ohio State University
Rosalie Waller
Biomedical Informatics, University of Utah School of Medicine
Meng Wang
Stewart Lab, Nationwide Childrens Hospital
Xiaofeng Zhu
Population and Quantitative Health Sciences, Case Western Reserve University
Monday, September 17, 2018
Time Session
08:30 AM
09:15 AM

Breakfast and Conference Registration

09:15 AM
09:30 AM

Welcome

09:30 AM
10:00 AM
Catherine Stein - Genetic Studies of Tuberculosis: Importance of the Family-based Design

Tuberculosis (TB) remains a major public health threat globally, and several studies have demonstrated a role for human genetic factors underlying TB risk. However, exposure to the causal bacterium, Mycobacterium tuberculosis, is a necessary risk factor for TB, and few population-based studies appropriately account for this exposure. In this talk, I will describe how we’ve utilized a family study to examine the genetic epidemiology of TB and address limitations in the extant literature. I will present both key findings and future directions.

10:00 AM
10:30 AM
Christopher Bartlett - Using Quantum Mechanical Devices to Perform Genomic Studies in Families: Challenges, Promises, Changes

Applying quantum physics to build quantum devices for computing has recently become reality with companies such as Google, IBM, and Intel making prototypes for algorithm experimentation. These devices demonstrate that binary computing states (0 vs. 1) can be manipulated using the rules of quantum mechanics to include superposition, entanglement, and wave interference as fundamentally new avenues for computing algorithms. While quantum algorithms have already shown in-principle speed-ups over classical computation for certain classes of problems such as factoring prime numbers, finding new algorithms for statistical computation such as machine learning is ongoing. The key differences between classical and quantum computing will be discussed in the context addressing genomics questions through simple quantum machine learning examples.

10:30 AM
11:00 AM

Break and Refreshments

11:00 AM
11:30 AM
Daniel Koboldt - Whole Genome Sequencing and Analysis for Rare Pediatric Conditions

Children with rare inherited conditions are increasingly referred for clinical exome sequencing, which yields a positive finding in only ~25-35% of them. For the remaining as-yet-undiagnosed cases, research sequencing of the proband and available family members has the potential to uncover new genetic etiologies of disease. Our institute has enrolled more than 40 families suffering rare inherited conditions into a research genomics protocol. Using predominantly whole genome sequencing (WGS) of multiple family members, we have identified likely causal variants in 30% of cases and strong candidate variants in another 20%. Here, I describe the workflow of our rare disease genomics research program, including recruitment and case selection, sequencing/analysis strategies, candidate validation, and reporting of results. I will also highlight some solved cases whose underlying etiology or phenotypic association challenges the current knowledge of genotype-phenotype relationships.

11:30 AM
12:00 PM
Rosalie Waller - Shared Genomic Segment Analysis for Complex Traits

High-risk pedigrees (HRPs) are a key design in mapping rare and highly-penetrant genes in Mendelian-like diseases. However, success with the HRP design in complex diseases has been modest, in part because standard methods do not adequately address genetic heterogeneity. Novel methods are needed to re-invigorate HRP designs for gene-discovery in complex diseases. Extended high-risk pedigrees can contain sufficient meioses to gain power for gene mapping as single pedigrees; however, intrafamilial heterogeneity may still exist. To address intrafamilial heterogeneity, we expanded on the Shared Genomic Segment (SGS) method, a large pedigree mapping method that identifies subsets of cases within an extended pedigree that share segregating chromosomal regions. Here, I will describe this strategy and our application to high-risk myeloma pedigrees.

12:00 PM
01:30 PM

Lunch and Participant Collaboration

01:30 PM
02:00 PM
Pamela Brock - Gene hunting studies in familial non-medullary thyroid cancer

Abstract not available

02:00 PM
02:30 PM
Jonathan Haines - Genetic Studies in the Mid-Western Amish

Genetic studies of diseases of aging have been done predominantly in clinic-based case-control datasets drawn from the general population. While these have the advantage of being relatively easy to collect and thus can generate large sample sizes, they do have limitations in ascertainment bias, differences in case and control ascertainment, and focus on genetic association analyses. Using special populations, such as the mid-Western Amish, overcomes several of these limitations.

Over the past 15 years, we have worked collaboratively to collect phenotype and genotype information on the Amish of Holmes county in Ohio, and Elkhart, LaGrange, and Adams counties in Indiana. The Amish are culturally and genetically isolated and their lifestyle tends to be quite homogeneous, making genetic studies quite valuable. We have focused our efforts on two significant diseases of aging: Alzheimer disease (AD) and Age Related Macular Degeneration (AMD). Our ongoing studies have demonstrated that the genetic architecture of Ad and AMD differ significantly from the general population, strongly suggesting that novel loci exist in the Amish. Current studies are aimed at finding these novel loci using a combination of genome wide association and whole genome sequencing data.

02:30 PM
03:30 PM

Panel Discussion (all speakers for the day)

03:30 PM
04:00 PM

Break and Refreshments

04:00 PM
05:00 PM

Breakout Sessions (spontaneous organization)

Tuesday, September 18, 2018
Time Session
08:30 AM
09:30 AM

Breakfast and Daily Introduction

09:30 AM
10:00 AM
Alyssa Clay - Large-scale Linkage Analysis of Multiple Myeloma (MM) and Monoclonal Gammopathy of Undetermined Significance (MGUS) Families

Multiple myeloma (MM) is a result of a malignant transformation of plasma cells that is preceded by the presence of an asymptomatic clonal plasma cell expansion, a condition referred to as monoclonal gammopathy of undetermined significance (MGUS). We and others have shown familial aggregation of MM and MGUS. Evidence from epidemiologic, family and genome-wide association studies (GWAS) suggests a genetic component underlying MM etiology. GWAS have successfully established 17 common genetic risk loci for MM to date and recently, rare inherited susceptibility variants in the LSD1 / KDM1A and USP45 genes were identified in familial MM / MGUS kindreds. Family-based approaches may be used to elucidate genetic variation contributing to familial MM. Genetic linkage analysis has historically been used to detect the chromosomal location of disease genes. The objective of this study was to conduct a linkage analysis of MM / MGUS families to identify genomic regions for MM / MGUS.

10:00 AM
10:30 AM
Asuman Turkmen - Rare Variant Analysis of Autosomal and X Chromosome Genetic Data in Family-based Sequencing Studies

The inability of common variants identified by genome-wide association studies (GWAS) to explain much of the heritability of most complex diseases, and the advances of next-generation (NGS) technologies have led to an increased interest in investigating the etiology of complex disease due to rare variants. Despite numerous methodologies proposed for rare variant associations over the years, discovery of these variants has remained elusive, mostly restricted to population-based designs. Further, there is only a little known about rare X-linked variants associated with complex diseases. Here we propose a method to test for association of rare variants obtained by sequencing in family-based samples utilizing a variance component test. The proposed approach can be used for both autosomes and the X-chromosome. Using simulations, we show that the method perform at par with the existing methods for autosomes while being more computationally efficient. Its performance for X-chromosome is also promising based on the results of the simulated and real data from the University of Miami Study on Genetics of Autism and Related Disorders.

Joint work with Shili Lin, Department of Statistics, The Ohio State University

10:30 AM
11:00 AM

Break and Refreshments

11:00 AM
11:30 AM
Veronica Vieland - Linkage Analysis of Complex Traits: Failed Paradigm or Powerful Tool?

One obvious thing that families are good for in human genetic research is linkage analysis (LA): mapping disease genes based on co-segregation within pedigrees (violations of Mendel II) between phenotypes of interest and DNA marker genotypes. LA has yielded genes for thousand of Mendelian disorders, but for complex disorders it has fallen out of vogue in favor of GWAS- and NGS-based designs. Psychiatric genetics is often held out as the poster child for the failure of LA to yield meaningful results for complex traits, but in this talk I will utilize the schizophrenia collection within the data repository of the National Institutes of Mental Health (NIMH) to argue that we don't actually know yet whether LA 'works'; or not for psychiatric disorders. I will describe a large-scale study to revisit the families in the NIMH collection, and illustrate the need for updating genotypes, phenotypes and statistical methods before we can assess the efficacy of LA in psychiatric applications.

11:30 AM
12:45 PM

Lunch and Participant Collaboration

12:45 PM
01:45 PM

Poster Session

01:45 PM
02:00 PM

Break

02:00 PM
02:30 PM
Meng Wang - FamLBL: Detecting Rare Haplotype Disease Association Based on Common SNPs Using Case-parent Triads

Motivation: In recent years, there has been an increasing interest in using common single-nucleotide polymorphisms (SNPs) amassed in genome-wide association studies to investigate rare haplotype effects on complex diseases. Evidence has suggested that rare haplotypes may tag rare causal single-nucleotide variants, making SNP-based rare haplotype analysis not only cost effective, but also more valuable for detecting causal variants. Although a number of methods for detecting rare haplotype association have been proposed in recent years, they are population based and thus susceptible to population stratification.

Results: We propose family-triad-based logistic Bayesian Lasso (famLBL) for estimating effects of haplotypes on complex diseases using SNP data. By choosing appropriate prior distribution, effect sizes of unassociated haplotypes can be shrunk toward zero, allowing for more precise estimation of associated haplotypes, especially those that are rare, thereby achieving greater detection power. We evaluate famLBL using simulation to gauge its type I error and power. Compared with its population counterpart, LBL, highlights famLBL’s robustness property in the presence of population substructure. Further investigation by comparing famLBL with Family-Based Association Test (FBAT) reveals its advantage for detecting rare haplotype association.

02:30 PM
03:00 PM
Xiaofei Zhou - Detecting Rare Associated Haplotypes from Independent Case, Control and Family Trio Data: Combined Logistic Bayesian Lasso

The Genome Wide Association Study (GWAS) has brought some successes regarding the genetic mechanism of disease development: by now, it has identified more than 5,000 SNPs associated with more than 200 common diseases. However, most detected SNPs from GWAS are of moderate and mini effect sizes, and they suffer from “missing heritability”, a problem where a large portion of expected genomic variability is unexplained by the observed genomic variants. A popular theory that explains missing heritability is the Common Disease Rare Variant Hypothesis (CDRV), which states that the genetic cause of the disease can be comprised of multiplicity of rare variants with high penetrance [Manolio et al., 2009]. The association between diseases and rare variants is usually difficult to be detected due to the large noise resulted from variants’ low frequencies, but the introduction of penalization methods such as Lasso makes it possible to identify such association. Moreover, the variants of interest don’t have to be single nucleotide variants. In fact, the haplotype variants are of more interesting side as the allele in close loci tend to act together. Several methods have been developed in recent years to reveal rare associated haplotypes. Biswas ’s Logistic Bayesian Lasso (LBL) [Biswas and Lin, 2012] works on independent case/control data and Wang’s family triad Logistic Bayesian Lasso (famLBL) [Wang and Lin, 2014] caters to family trio data. In real application, many data sets provide both case/control and family trio data, and focusing on only one type of data means abandoning valuable information. Here, we develop a new logistic Bayesian Lasso method (called combined logistic Bayesian Lasso, or cLBL) that can jointly analyze the two data types. Our simulation shows cLBL provides higher statistical power compared with LBL and famLBL. When applied to Framingham Heart Study (FHS), cLBL detects quite a few associated haplotypes that are otherwise insignificant under both LBL and famLBL.

03:00 PM
03:30 PM

Break and Refreshments

03:30 PM
04:00 PM

Panel Discussion (all speakers for the day)

04:00 PM
05:00 PM

Breakout Sessions (spontaneous organization)

Wednesday, September 19, 2018
Time Session
08:30 AM
09:30 AM

Breakfast and Daily Introduction

09:30 AM
10:00 AM
Saonli Basu - A Robust and Unified Framework for Estimating Heritability in Twin Studies using Generalized Estimating Equations

The development of a complex disease is an intricate interplay of genetic and environmental factors. "Heritability" is defined as the proportion of total trait variance due to genetic factors within a given population. Studies with monozygotic and dizygotic twins allow us to estimate heritability by fitting an "ACE" model which estimates the proportion of trait variance explained by additive genetic (A), common shared environment (C), and unique non-shared environmental (E) latent effects, thus helping us better understand disease risk and etiology. IIn this paper, we develop a flexible generalized estimating equations framework (``GEE2'') for fitting twin ACE models that requires minimal distributional assumptions; rather only the first two moments need to be correctly specified. We show that two commonly used methods for estimating heritability, the normal ACE model (``NACE'') and Falconer's method, can both be fit within this unified GEE2 framework, which additionally provides robust standard errors. Although the traditional Falconer's method cannot directly adjust for covariates, the corresponding GEE2 version (``GEE2-Falconer'') can incorporate covarimate effects (e.g. let heritability vary by sex or age). Given non-normal data, these GEE2 models attain significantly better coverage of the true heritability compared to the traditional NACE and Falconer's methods. Finally, we demonstrate that Falconer's method can consistently estimate heritability when the ACE variance parameters differ between MZ and DZ twins; whereas the NACE will produce biased estimates in such settings.

Joint work with Jaron Arbet, Department of Biostatistics, University of Minnesota

10:00 AM
10:30 AM
William Stewart - Using Millions of SNPs and a Few Extended Pedigrees to Accelerate Disease-Gene Discovery

For cosegregation studies involving a large number of small affected families and modern SNP arrays, it is difficult to improve upon the DSE -- our near-optimal estimator of disease-gene location that averages location estimates from random subsamples of the available dense SNP data. However, for studies involving dense SNPs and a small number of large families, the usual asymptotics no longer apply. As such, accurate estimation of the variance of the DSE is nontrivial. Here, I describe an importance sampling approach that accurately approximates the variance of the DSE. In principle, additional gains in precision are possible by using publicly available reference samples to better account for the correlations between SNPs. I applied my approximate importance sampling approach to dense SNP data simulated under recessive and dominant models. In each setting, the variance of the DSE was accurately estimated, and relative to approximate 95% confidence intervals (CIs) constructed from existing methods, my CIs for disease-gene location were substantially shorter. As such, researchers with large affected families and dense SNP data should now be able to significantly reduce their targeted re-sequencing costs, and greatly expedite the rate at which disease-genes are found.

10:30 AM
11:00 AM

Break and Refreshments

11:00 AM
11:30 AM
Xiaofeng Zhu - Analysing Rare Variants Using Families in Large Whole Genome Sequencing Data

Advances in genomics have led a substantial increase in the availability of whole genome sequencing data, which allows us to identify rare variants associated with complex traits and potentially seek for an alternative explanation of “missing heritability”. Many statistical methods for analysing rare variant association have been developed, but these statistical methods have mainly focused on weighting rare variants using genome annotation. In this talk, I will discuss the ways in which traditional linkage information from family data can help in prioritizing rare variants, which is independent from genome annotation. I will use both simulations and real whole genome sequencing data to illustrate that family data can improve statistical power to detect rare variant associations.

11:30 AM
12:00 PM
Daniel Kinnamon - Applying Quantitative Genetics Approaches to Understand the Genetic Etiology of Idiopathic Dilated Cardiomyopathy in the Exome Era

Idiopathic dilated cardiomyopathy (DCM), defined as the presence of systolic dysfunction and left ventricular enlargement in the absence of non-genetic clinical cause, is a major cause of heart failure. Nearly 40 genes have been established as relevant for idiopathic DCM, typically assuming a Mendelian monogenic disease model with autosomal dominant inheritance. While exome sequencing of probands and affected relatives in families has made it possible to discover rare variants across all of these genes with a single assay, determining the biological relevance of these variants is often difficult. When applying American College of Medical Genetics variant adjudication criteria, many potentially relevant rare variants are classified as variants of uncertain significance, and the presence of such variants in multiple genes in a single family introduces additional ambiguity. Moreover, due to variable age at onset and disease severity, unaffected relatives provide limited information on the biological relevance of these variants without additional data. To overcome these challenges, gain additional insight into the biological relevance of variants identified through exome sequencing, and evaluate more complex genetic disease models, we have turned to examining familial variation in quantitative endophenotypes using adaptations of the measured genotype model. I will describe the rationale for this approach, an initial successful application, and our plans to apply it more broadly in the cohort of 1300 families currently being recruited as part of the NHLBI- and NHGRI-funded DCM Precision Medicine Study.

12:00 PM
12:30 PM

Panel Discussion (all speakers)

12:30 PM
02:00 PM

Lunch and Participant Collaboration

Name Email Affiliation
Bartlett, Christopher christopher.bartlett@nationwidechildrens.org Mathematical Medicine, Nationwide Children's Hospital
Basu, Saonli saonli@umn.edu Biostatistics, University of Minnesota
Brock, Pamela pamela.brock@osumc.edu Wexner Medical Center, Human Genetics, The Ohio State University
Clay-Gilmour, Alyssa Clay.alyssa@mayo.edu Health Science Research / Epidemiology/Biostatistics, Mayo Clinic
Comiskey, Daniel daniel.comiskey@osumc.edu Comprehensive Cancer Center, The Ohio State University
de la Chapelle, Albert albert.delachapelle@osumc.edu SBS-Cancer Biology & Genetics, The Ohio State University
Fu, Han fu.607@osu.edu Statistics, The Ohio State University
Govinder, Kesh govinder@ukzn.ac.za Mathematics, Statistics and Computer Science, University of KwaZulu-Natal
Haines, Jonathan jlh213@case.edu Department of Population and Quantitative Health Sciences, Case Western Reserve University
Han, Chenggong han.1071@osu.edu Biostatistics, The Ohio State University
Han, Chenggong han.1071@osu.edu Statistics, The Ohio State University
Karaesmen, Ezgi karaesmen.1@osu.edu College of Pharmacy, The Ohio State University
Karaesmen, Ezgi karaesmen.1@osu.edu Pharmaceutics and Pharmaceutical Chemistry, The Ohio State University
Kinnamon, Daniel Daniel.Kinnamon@osumc.edu Internal Medicine, The Ohio State University
Koboldt, Daniel daniel.koboldt@nationwidechildrens.org Institute for Genomic Medicine, Nationwide Children's Hospital
Lin, Shili shili@stat.ohio-state.edu Statistics, The Ohio State University
London, Cheryl london.20@osu.edu Department of Veterinary Biosciences, The Ohio State University
Lou, Shuyuan lou.59@osu.edu Biostatistics, The Ohio State University
Luo, Hengrui luo.619@osu.edu Department of Statistics, The Ohio State University
Nieminen, Taina taina.nieminen@osumc.edu Comprehensive Cancer Center, The Ohio State University
Razvi, Abbas razvi.7@osu.edu College of Pharmacy, The Ohio State University
Rizvi, Abbas rizvi.33@buckeyemail.osu.edu Pharmaceutical Sciences, The Ohio State University
Rong-Mullins, Xiaoqing rong-mullins.1@osu.edu COPH-Division of Biostatistics, The Ohio State University
Sovic, Michael sovic.1@osu.edu College of Pharmacy, The Ohio State University
Stein, Catherine catherine.stein@case.edu Population & Quantitative Health Sciences, Case Western Reserve University
Stewart, William William.Stewart@nationwidechildrens.org Pediatrics/Statistics, Nationwide Children's Hospital/The Ohio State University
Sucheston-Campbell, Lara sucheston-campbell.1@osu.edu Pharmacy Practice and Science, The Ohio State University
Sundar Rajan, Neeraja sundarrajan.3@osu.edu Ctr for Life Science Education, The Ohio State University
Tang, Hancong tang.889@osu.edu Department of Biostatistics, The Ohio State University
Turkmen, Asuman turkmen.2@osu.edu Department of Statistics, The Ohio State University
Vieland, Veronica veronica.vieland@nationwidechildrens.org Pediatrics & Statistics, The Ohio State University
Waller, Rosalie Rosalie.Waller@hci.utah.edu Biomedical Informatics, University of Utah School of Medicine
Wang, Meng wang.1357@osu.edu Stewart Lab, Nationwide Childrens Hospital
Wang, Yiwen wang.11518@osu.edu College of Pharmacy, The Ohio State University
Wang, Junke wang.12157@osu.edu Pharmacy, Ohio State University
Wang, Yanqiang yanqiang.wang@osumc.edu Comprehensive Cancer Center, The Ohio State University
Wei, Hsiu-Chuan hsiuwei@fcu.edu.tw Mathematical Biosciences Institute, The Ohio State University
Xie, Qing xie.735@osu.edu Department of Statistics, Ohio State University
Xie, Qing Xie xie.735@osu.edu Statistics, The Ohio State University
Yang, Pei yang.1736@osu.edu Department of Statistics, The Ohio State University
Zafar, Ayesha zafar.34@osu.edu Pharmacogenomics, The Ohio State University
Zhou, Xiaofei zhou.1150@osu.edu Department of Statistics, The Ohio State University
Zhu, Xiaofeng xxz10@case.edu Population and Quantitative Health Sciences, Case Western Reserve University
Using Quantum Mechanical Devices to Perform Genomic Studies in Families: Challenges, Promises, Changes

Applying quantum physics to build quantum devices for computing has recently become reality with companies such as Google, IBM, and Intel making prototypes for algorithm experimentation. These devices demonstrate that binary computing states (0 vs. 1) can be manipulated using the rules of quantum mechanics to include superposition, entanglement, and wave interference as fundamentally new avenues for computing algorithms. While quantum algorithms have already shown in-principle speed-ups over classical computation for certain classes of problems such as factoring prime numbers, finding new algorithms for statistical computation such as machine learning is ongoing. The key differences between classical and quantum computing will be discussed in the context addressing genomics questions through simple quantum machine learning examples.

A Robust and Unified Framework for Estimating Heritability in Twin Studies using Generalized Estimating Equations

The development of a complex disease is an intricate interplay of genetic and environmental factors. "Heritability" is defined as the proportion of total trait variance due to genetic factors within a given population. Studies with monozygotic and dizygotic twins allow us to estimate heritability by fitting an "ACE" model which estimates the proportion of trait variance explained by additive genetic (A), common shared environment (C), and unique non-shared environmental (E) latent effects, thus helping us better understand disease risk and etiology. IIn this paper, we develop a flexible generalized estimating equations framework (``GEE2'') for fitting twin ACE models that requires minimal distributional assumptions; rather only the first two moments need to be correctly specified. We show that two commonly used methods for estimating heritability, the normal ACE model (``NACE'') and Falconer's method, can both be fit within this unified GEE2 framework, which additionally provides robust standard errors. Although the traditional Falconer's method cannot directly adjust for covariates, the corresponding GEE2 version (``GEE2-Falconer'') can incorporate covarimate effects (e.g. let heritability vary by sex or age). Given non-normal data, these GEE2 models attain significantly better coverage of the true heritability compared to the traditional NACE and Falconer's methods. Finally, we demonstrate that Falconer's method can consistently estimate heritability when the ACE variance parameters differ between MZ and DZ twins; whereas the NACE will produce biased estimates in such settings.

Joint work with Jaron Arbet, Department of Biostatistics, University of Minnesota

Gene hunting studies in familial non-medullary thyroid cancer

Abstract not available

Large-scale Linkage Analysis of Multiple Myeloma (MM) and Monoclonal Gammopathy of Undetermined Significance (MGUS) Families

Multiple myeloma (MM) is a result of a malignant transformation of plasma cells that is preceded by the presence of an asymptomatic clonal plasma cell expansion, a condition referred to as monoclonal gammopathy of undetermined significance (MGUS). We and others have shown familial aggregation of MM and MGUS. Evidence from epidemiologic, family and genome-wide association studies (GWAS) suggests a genetic component underlying MM etiology. GWAS have successfully established 17 common genetic risk loci for MM to date and recently, rare inherited susceptibility variants in the LSD1 / KDM1A and USP45 genes were identified in familial MM / MGUS kindreds. Family-based approaches may be used to elucidate genetic variation contributing to familial MM. Genetic linkage analysis has historically been used to detect the chromosomal location of disease genes. The objective of this study was to conduct a linkage analysis of MM / MGUS families to identify genomic regions for MM / MGUS.

Large-scale Linkage Analysis of Multiple Myeloma (MM) and Monoclonal Gammopathy of Undetermined Significance (MGUS) Families

Multiple myeloma (MM) is a result of a malignant transformation of plasma cells that is preceded by the presence of an asymptomatic clonal plasma cell expansion, a condition referred to as monoclonal gammopathy of undetermined significance (MGUS). We and others have shown familial aggregation of MM and MGUS. Evidence from epidemiologic, family and genome-wide association studies (GWAS) suggests a genetic component underlying MM etiology. GWAS have successfully established 17 common genetic risk loci for MM to date and recently, rare inherited susceptibility variants in the LSD1 / KDM1A and USP45 genes were identified in familial MM / MGUS kindreds. Family-based approaches may be used to elucidate genetic variation contributing to familial MM. Genetic linkage analysis has historically been used to detect the chromosomal location of disease genes. The objective of this study was to conduct a linkage analysis of MM / MGUS families to identify genomic regions for MM / MGUS.

Genetic Studies in the Mid-Western Amish

Genetic studies of diseases of aging have been done predominantly in clinic-based case-control datasets drawn from the general population. While these have the advantage of being relatively easy to collect and thus can generate large sample sizes, they do have limitations in ascertainment bias, differences in case and control ascertainment, and focus on genetic association analyses. Using special populations, such as the mid-Western Amish, overcomes several of these limitations.

Over the past 15 years, we have worked collaboratively to collect phenotype and genotype information on the Amish of Holmes county in Ohio, and Elkhart, LaGrange, and Adams counties in Indiana. The Amish are culturally and genetically isolated and their lifestyle tends to be quite homogeneous, making genetic studies quite valuable. We have focused our efforts on two significant diseases of aging: Alzheimer disease (AD) and Age Related Macular Degeneration (AMD). Our ongoing studies have demonstrated that the genetic architecture of Ad and AMD differ significantly from the general population, strongly suggesting that novel loci exist in the Amish. Current studies are aimed at finding these novel loci using a combination of genome wide association and whole genome sequencing data.

Genetic Studies in the Mid-Western Amish

Abstract not available

Applying Quantitative Genetics Approaches to Understand the Genetic Etiology of Idiopathic Dilated Cardiomyopathy in the Exome Era

Idiopathic dilated cardiomyopathy (DCM), defined as the presence of systolic dysfunction and left ventricular enlargement in the absence of non-genetic clinical cause, is a major cause of heart failure. Nearly 40 genes have been established as relevant for idiopathic DCM, typically assuming a Mendelian monogenic disease model with autosomal dominant inheritance. While exome sequencing of probands and affected relatives in families has made it possible to discover rare variants across all of these genes with a single assay, determining the biological relevance of these variants is often difficult. When applying American College of Medical Genetics variant adjudication criteria, many potentially relevant rare variants are classified as variants of uncertain significance, and the presence of such variants in multiple genes in a single family introduces additional ambiguity. Moreover, due to variable age at onset and disease severity, unaffected relatives provide limited information on the biological relevance of these variants without additional data. To overcome these challenges, gain additional insight into the biological relevance of variants identified through exome sequencing, and evaluate more complex genetic disease models, we have turned to examining familial variation in quantitative endophenotypes using adaptations of the measured genotype model. I will describe the rationale for this approach, an initial successful application, and our plans to apply it more broadly in the cohort of 1300 families currently being recruited as part of the NHLBI- and NHGRI-funded DCM Precision Medicine Study.

Whole Genome Sequencing and Analysis for Rare Pediatric Conditions

Children with rare inherited conditions are increasingly referred for clinical exome sequencing, which yields a positive finding in only ~25-35% of them. For the remaining as-yet-undiagnosed cases, research sequencing of the proband and available family members has the potential to uncover new genetic etiologies of disease. Our institute has enrolled more than 40 families suffering rare inherited conditions into a research genomics protocol. Using predominantly whole genome sequencing (WGS) of multiple family members, we have identified likely causal variants in 30% of cases and strong candidate variants in another 20%. Here, I describe the workflow of our rare disease genomics research program, including recruitment and case selection, sequencing/analysis strategies, candidate validation, and reporting of results. I will also highlight some solved cases whose underlying etiology or phenotypic association challenges the current knowledge of genotype-phenotype relationships.

Genetic Studies of Tuberculosis: Importance of the Family-based Design

Tuberculosis (TB) remains a major public health threat globally, and several studies have demonstrated a role for human genetic factors underlying TB risk. However, exposure to the causal bacterium, Mycobacterium tuberculosis, is a necessary risk factor for TB, and few population-based studies appropriately account for this exposure. In this talk, I will describe how we’ve utilized a family study to examine the genetic epidemiology of TB and address limitations in the extant literature. I will present both key findings and future directions.

Using Millions of SNPs and a Few Extended Pedigrees to Accelerate Disease-Gene Discovery

For cosegregation studies involving a large number of small affected families and modern SNP arrays, it is difficult to improve upon the DSE -- our near-optimal estimator of disease-gene location that averages location estimates from random subsamples of the available dense SNP data. However, for studies involving dense SNPs and a small number of large families, the usual asymptotics no longer apply. As such, accurate estimation of the variance of the DSE is nontrivial. Here, I describe an importance sampling approach that accurately approximates the variance of the DSE. In principle, additional gains in precision are possible by using publicly available reference samples to better account for the correlations between SNPs. I applied my approximate importance sampling approach to dense SNP data simulated under recessive and dominant models. In each setting, the variance of the DSE was accurately estimated, and relative to approximate 95% confidence intervals (CIs) constructed from existing methods, my CIs for disease-gene location were substantially shorter. As such, researchers with large affected families and dense SNP data should now be able to significantly reduce their targeted re-sequencing costs, and greatly expedite the rate at which disease-genes are found.

Rare Variant Analysis of Autosomal and X Chromosome Genetic Data in Family-based Sequencing Studies

The inability of common variants identified by genome-wide association studies (GWAS) to explain much of the heritability of most complex diseases, and the advances of next-generation (NGS) technologies have led to an increased interest in investigating the etiology of complex disease due to rare variants. Despite numerous methodologies proposed for rare variant associations over the years, discovery of these variants has remained elusive, mostly restricted to population-based designs. Further, there is only a little known about rare X-linked variants associated with complex diseases. Here we propose a method to test for association of rare variants obtained by sequencing in family-based samples utilizing a variance component test. The proposed approach can be used for both autosomes and the X-chromosome. Using simulations, we show that the method perform at par with the existing methods for autosomes while being more computationally efficient. Its performance for X-chromosome is also promising based on the results of the simulated and real data from the University of Miami Study on Genetics of Autism and Related Disorders.

Joint work with Shili Lin, Department of Statistics, The Ohio State University

Linkage Analysis of Complex Traits: Failed Paradigm or Powerful Tool?

One obvious thing that families are good for in human genetic research is linkage analysis (LA): mapping disease genes based on co-segregation within pedigrees (violations of Mendel II) between phenotypes of interest and DNA marker genotypes. LA has yielded genes for thousand of Mendelian disorders, but for complex disorders it has fallen out of vogue in favor of GWAS- and NGS-based designs. Psychiatric genetics is often held out as the poster child for the failure of LA to yield meaningful results for complex traits, but in this talk I will utilize the schizophrenia collection within the data repository of the National Institutes of Mental Health (NIMH) to argue that we don't actually know yet whether LA 'works'; or not for psychiatric disorders. I will describe a large-scale study to revisit the families in the NIMH collection, and illustrate the need for updating genotypes, phenotypes and statistical methods before we can assess the efficacy of LA in psychiatric applications.

Shared Genomic Segment Analysis for Complex Traits

High-risk pedigrees (HRPs) are a key design in mapping rare and highly-penetrant genes in Mendelian-like diseases. However, success with the HRP design in complex diseases has been modest, in part because standard methods do not adequately address genetic heterogeneity. Novel methods are needed to re-invigorate HRP designs for gene-discovery in complex diseases. Extended high-risk pedigrees can contain sufficient meioses to gain power for gene mapping as single pedigrees; however, intrafamilial heterogeneity may still exist. To address intrafamilial heterogeneity, we expanded on the Shared Genomic Segment (SGS) method, a large pedigree mapping method that identifies subsets of cases within an extended pedigree that share segregating chromosomal regions. Here, I will describe this strategy and our application to high-risk myeloma pedigrees.

FamLBL: Detecting Rare Haplotype Disease Association Based on Common SNPs Using Case-parent Triads

Motivation: In recent years, there has been an increasing interest in using common single-nucleotide polymorphisms (SNPs) amassed in genome-wide association studies to investigate rare haplotype effects on complex diseases. Evidence has suggested that rare haplotypes may tag rare causal single-nucleotide variants, making SNP-based rare haplotype analysis not only cost effective, but also more valuable for detecting causal variants. Although a number of methods for detecting rare haplotype association have been proposed in recent years, they are population based and thus susceptible to population stratification.

Results: We propose family-triad-based logistic Bayesian Lasso (famLBL) for estimating effects of haplotypes on complex diseases using SNP data. By choosing appropriate prior distribution, effect sizes of unassociated haplotypes can be shrunk toward zero, allowing for more precise estimation of associated haplotypes, especially those that are rare, thereby achieving greater detection power. We evaluate famLBL using simulation to gauge its type I error and power. Compared with its population counterpart, LBL, highlights famLBL’s robustness property in the presence of population substructure. Further investigation by comparing famLBL with Family-Based Association Test (FBAT) reveals its advantage for detecting rare haplotype association.

Detecting Rare Associated Haplotypes from Independent Case, Control and Family Trio Data: Combined Logistic Bayesian Lasso

The Genome Wide Association Study (GWAS) has brought some successes regarding the genetic mechanism of disease development: by now, it has identified more than 5,000 SNPs associated with more than 200 common diseases. However, most detected SNPs from GWAS are of moderate and mini effect sizes, and they suffer from “missing heritability”, a problem where a large portion of expected genomic variability is unexplained by the observed genomic variants. A popular theory that explains missing heritability is the Common Disease Rare Variant Hypothesis (CDRV), which states that the genetic cause of the disease can be comprised of multiplicity of rare variants with high penetrance [Manolio et al., 2009]. The association between diseases and rare variants is usually difficult to be detected due to the large noise resulted from variants’ low frequencies, but the introduction of penalization methods such as Lasso makes it possible to identify such association. Moreover, the variants of interest don’t have to be single nucleotide variants. In fact, the haplotype variants are of more interesting side as the allele in close loci tend to act together. Several methods have been developed in recent years to reveal rare associated haplotypes. Biswas ’s Logistic Bayesian Lasso (LBL) [Biswas and Lin, 2012] works on independent case/control data and Wang’s family triad Logistic Bayesian Lasso (famLBL) [Wang and Lin, 2014] caters to family trio data. In real application, many data sets provide both case/control and family trio data, and focusing on only one type of data means abandoning valuable information. Here, we develop a new logistic Bayesian Lasso method (called combined logistic Bayesian Lasso, or cLBL) that can jointly analyze the two data types. Our simulation shows cLBL provides higher statistical power compared with LBL and famLBL. When applied to Framingham Heart Study (FHS), cLBL detects quite a few associated haplotypes that are otherwise insignificant under both LBL and famLBL.

Analysing Rare Variants Using Families in Large Whole Genome Sequencing Data

Advances in genomics have led a substantial increase in the availability of whole genome sequencing data, which allows us to identify rare variants associated with complex traits and potentially seek for an alternative explanation of “missing heritability”. Many statistical methods for analysing rare variant association have been developed, but these statistical methods have mainly focused on weighting rare variants using genome annotation. In this talk, I will discuss the ways in which traditional linkage information from family data can help in prioritizing rare variants, which is independent from genome annotation. I will use both simulations and real whole genome sequencing data to illustrate that family data can improve statistical power to detect rare variant associations.