MBI Logo
MBI Logo

Workshop 1: Analysis of Gene Expression Data: Principles and Applications (October 11-15, 2004)

Organizers: Terry Speed and Shili Lin

A (protein coding) gene is determined to be expressed in a cell or group of cells when its transcribed messenger RNA (mRNA), or the resulting protein product, is detected. There are a wide variety of techniques for determining and quantifying gene expression, and most of these have substantial analytical components to them.

We measure gene expression in order to compare the expression levels of one or more genes in cells from different sources. Comparisons of interest include tumor versus normal cells, cells from a specific organ in a mutant, or genetically modified organism versus cells from the same organ in a normal organism of the same strain, and cells before and after an intervention such as a drug treatment.

There are many techniques for measuring gene expression, but perhaps most common at the moment are ones which rely on DNA-RNA or DNA-DNA hybridization. This is the process through which single-stranded DNA and RNA molecules find and base-pair with their complementary sequences amidst a complex mixture of many molecules of the same kind.

The older cellular-wide method for measuring gene expression at the protein level was two-dimensional gel (2D-Gel) analysis, where complex mixtures were separated by pH and size using isoelectric focusing and polyacrylamide gel electrophoresis (PAGE). The technique was combined with mass spectrometry (MS) in the 1990s, and now there are a number of electrophoresis-free MS based approaches to measuring protein levels. More recently, protein arrays have been developed, and some of these will be discussed later in the year in Workshop 4.

On what scale do we measure gene expression? Much of the recent interest by statisticians in this area stems from the availability of data sets giving expression measurements on tens of thousands of genes; so-called microarray gene expression data. However, nylon membrane filters with thousands of genes spotted on them have been around for over a decade, and smaller-scale quantitative expression data for much longer. Similarly 2D-Gel data are quite extensive, and MS-techniques, especially when done in conjunction with other separation techniques can produce up to 10^8 data points per sample. There are many differences between these different technologies, but from the analytical viewpoint, many similarities as well.

In this workshop, we will survey some of the computational, mathematical, and statistical models and methods used in analyzing gene expression data. Much of our focus will be on approaches quantifying mRNA, as that is the most well developed. We shall also present a small sample of the extensive biological and technological background to gene expression anaylsis.

Schedule

Monday, October 11
8:45-9:15am Coffee and Registration
9:15-9:30am Welcome and Introduction: Avner Friedman, Shili Lin, and Terry Speed
9:30-10:30am Earl Hubbell: Designing estimators for low-level expression analysis
10:30-11:00am Coffee Break
11:00-11:30am M. Kathleen Kerr: Comparison of Affymetrix and quantitative rtPCR measurements of relative gene expression
11:30-2:00pm Lunch Break
2:00-3:00pm David Kreil: From spot to biology: challenges in microarray data analysis
3:00-3:30pm Coffee break
3:30-4:30pm Informal Discussions
5:00-8:00pm Reception
Tuesday, October 12
9:00-10:00am Darlene Goldstein: Strategies for quantifying GeneChip expression for large studies
10:00-10:30am Coffee Break
10:30-11:30am W. Evan Johnson: Adjusting for the batch effect: an empirical Bayes approach to combining microarray data from multiple sources
11:30-2:00pm Lunch break
2:00-3:00pm Raymond Carroll: Efficient estimation of gene-environment interactions in case-control studies with quantitative gene information
3:00-3:30pm Coffee break
3:30-4:40pm Informal Discussions
Wednesday, October 13
9:00-10:00am Jason Hsu: Statistically designing microarray experiments and analyzing gene expression data in a decision-making processes
10:00-10:30am Coffee Break
10:30-11:30am Susmita Datta: Significant analysis using P-values for multiple hypotheses testing in microarray experiments
11:30-2:00pm Lunch break
2:00-3:00pm David Allison: Opportunities, challenges, and issues posed by massive multiple inference in high dimensional biology
3:00-3:30pm Coffee break
3:30-4:30pm Eric Schadt: Complex systems to understand complex traits: beyond reagent driven science
4:30-5:00pm Informal Discussion
Thursday, October 14
9:00-10:00am Kim-Anh Do: A Bayesian mixture model for differential gene expression
10:00-10:30am Coffee Break
10:30-11:30am Rainer Spang: Differential co-expression of genes
11:30-2:00pm Lunch break
2:00-3:00pm Ina Hoeschele: Genetical genomics analysis to infer gene regulatory networks
3:00-3:30pm Coffee break
3:30-4:30pm Informal Discussions
6:00-9:00pm Banquet
Friday, October 15
9:00-10:00am Harmen Bussemaker: Inferring regulatory circuitry through model-based analysis of mRNA expression and ChIP data
10:00-10:30am Coffee Break
10:30-11:30am Hongyu Zhao: Integrated statistical analysis of gene expression data
11:30-2:00pm Lunch break
2:00-3:00pm Terry Speed: Overview and open problems in the analysis of gene expression microarray data
3:00-3:30pm Coffee break
3:30-4:30pm Informal Discussions