A (protein coding) gene is determined to be expressed in a cell or group of cells when its transcribed messenger RNA (mRNA), or the resulting protein product, is detected. There are a wide variety of techniques for determining and quantifying gene expression, and most of these have substantial analytical components to them.
We measure gene expression in order to compare the expression levels of one or more genes in cells from different sources. Comparisons of interest include tumor versus normal cells, cells from a specific organ in a mutant, or genetically modified organism versus cells from the same organ in a normal organism of the same strain, and cells before and after an intervention such as a drug treatment.
There are many techniques for measuring gene expression, but perhaps most common at the moment are ones which rely on DNA-RNA or DNA-DNA hybridization. This is the process through which single-stranded DNA and RNA molecules find and base-pair with their complementary sequences amidst a complex mixture of many molecules of the same kind.
The older cellular-wide method for measuring gene expression at the protein level was two-dimensional gel (2D-Gel) analysis, where complex mixtures were separated by pH and size using isoelectric focusing and polyacrylamide gel electrophoresis (PAGE). The technique was combined with mass spectrometry (MS) in the 1990s, and now there are a number of electrophoresis-free MS based approaches to measuring protein levels. More recently, protein arrays have been developed, and some of these will be discussed later in the year in Workshop 4.
On what scale do we measure gene expression? Much of the recent interest by statisticians in this area stems from the availability of data sets giving expression measurements on tens of thousands of genes; so-called microarray gene expression data. However, nylon membrane filters with thousands of genes spotted on them have been around for over a decade, and smaller-scale quantitative expression data for much longer. Similarly 2D-Gel data are quite extensive, and MS-techniques, especially when done in conjunction with other separation techniques can produce up to 10^8 data points per sample. There are many differences between these different technologies, but from the analytical viewpoint, many similarities as well.
In this workshop, we will survey some of the computational, mathematical, and statistical models and methods used in analyzing gene expression data. Much of our focus will be on approaches quantifying mRNA, as that is the most well developed. We shall also present a small sample of the extensive biological and technological background to gene expression anaylsis.
Schedule |
|||
| Monday, October 11 | |||
| 8:45-9:15am | Coffee and Registration | ||
| 9:15-9:30am | Welcome and Introduction: Avner Friedman, Shili Lin, and Terry Speed | ||
| 9:30-10:30am | Earl Hubbell: Designing estimators for low-level expression analysis | ||
| 10:30-11:00am | Coffee Break | ||
| 11:00-11:30am | M. Kathleen Kerr: Comparison of Affymetrix and quantitative rtPCR measurements of relative gene expression | ||
| 11:30-2:00pm | Lunch Break | ||
| 2:00-3:00pm | David Kreil: From spot to biology: challenges in microarray data analysis | ||
| 3:00-3:30pm | Coffee break | ||
| 3:30-4:30pm | Informal Discussions | ||
| 5:00-8:00pm | Reception | ||
| Tuesday, October 12 | |||
| 9:00-10:00am | Darlene Goldstein: Strategies for quantifying GeneChip expression for large studies | ||
| 10:00-10:30am | Coffee Break | ||
| 10:30-11:30am | W. Evan Johnson: Adjusting for the batch effect: an empirical Bayes approach to combining microarray data from multiple sources | ||
| 11:30-2:00pm | Lunch break | ||
| 2:00-3:00pm | Raymond Carroll: Efficient estimation of gene-environment interactions in case-control studies with quantitative gene information | ||
| 3:00-3:30pm | Coffee break | ||
| 3:30-4:40pm | Informal Discussions | ||
| Wednesday, October 13 | |||
| 9:00-10:00am | Jason Hsu: Statistically designing microarray experiments and analyzing gene expression data in a decision-making processes | ||
| 10:00-10:30am | Coffee Break | ||
| 10:30-11:30am | Susmita Datta: Significant analysis using P-values for multiple hypotheses testing in microarray experiments | ||
| 11:30-2:00pm | Lunch break | ||
| 2:00-3:00pm | David Allison: Opportunities, challenges, and issues posed by massive multiple inference in high dimensional biology | ||
| 3:00-3:30pm | Coffee break | ||
| 3:30-4:30pm | Eric Schadt: Complex systems to understand complex traits: beyond reagent driven science | ||
| 4:30-5:00pm | Informal Discussion | ||
| Thursday, October 14 | |||
| 9:00-10:00am | Kim-Anh Do: A Bayesian mixture model for differential gene expression | ||
| 10:00-10:30am | Coffee Break | ||
| 10:30-11:30am | Rainer Spang: Differential co-expression of genes | ||
| 11:30-2:00pm | Lunch break | ||
| 2:00-3:00pm | Ina Hoeschele: Genetical genomics analysis to infer gene regulatory networks | ||
| 3:00-3:30pm | Coffee break | ||
| 3:30-4:30pm | Informal Discussions | ||
| 6:00-9:00pm | Banquet | ||
| Friday, October 15 | |||
| 9:00-10:00am | Harmen Bussemaker: Inferring regulatory circuitry through model-based analysis of mRNA expression and ChIP data | ||
| 10:00-10:30am | Coffee Break | ||
| 10:30-11:30am | Hongyu Zhao: Integrated statistical analysis of gene expression data | ||
| 11:30-2:00pm | Lunch break | ||
| 2:00-3:00pm | Terry Speed: Overview and open problems in the analysis of gene expression microarray data | ||
| 3:00-3:30pm | Coffee break | ||
| 3:30-4:30pm | Informal Discussions | ||