2014 Summer Undergraduate REU Program

The Ohio State University

OSU - Department of Mathematics

OSU - Department of Statistics

Topic 1:

Statistical Epigenetics – Shili Lin

Project Description:

Epigenetics is the study of heritable changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence, and DNA methylation is one of several epigenetic mechanisms that have been studied. The emerging awareness of the contribution of epigenetic process to genomic function in both health and disease states has led to extremely active research in studying DNA methylation as well as histone methylation. The Next Generation Sequencing technology has further fueled the interest in genome-wide methylation profiling. In this project, students will study the differences between data generated from two types of techniques for genome-wide methylation study – Bisulfite conversion based (e.g. Locus Specific Bisulfite Sequencing – the gold standard but expensive and therefore out of reach for most studies) and Nonconverted DNA based (e.g. MethylCap-seq – more economical but with non-nucleotide-specific resolution). They will then apply this knowledge to compare existing statistical and computational tools, and to develop novel ones, for analyzing MethylCap-seq data. They will especially focus on tools that can identify genomic regions that are differentially methylated for samples under two different conditions. Students will apply the methods to data from a sample of Acute Myeloid Leukemia patients.

Topic 2:

Statistical Shape Analysis – Sebastian Kurtek

Project Description:

Shape refers to the external appearance of an object as produced by its outline. Statistical analysis of shapes has played an important role in various biological applications including disease diagnosis based on brain morphometry, bioinformatics (structural analysis of proteins) or the study of leaf shapes. During this project the students will develop tools for generating comparisons between shapes, computing statistical summaries of shapes such as the mean and covariance, defining probability distributions on shape classes, and studying clustering and classification of shapes. These tools will then be applied to different types of biological data.