|
Workshop 5 Abstracts and Lecture Materials:
Authors: Donna Pauler Ankerst (speaker), Chen Chi, Phyllis Goodman,
Catherine Tangen; Sylvia Lawry Center for MS Research
Title: A Verification Bias Adjustment for Inferring Operating Characteristics
of a Biomarker Used to Screen
Calculations of the operating characteristics of a biomarker for
disease are subject to verification bias if the disease status is
only verified for individuals with biomarkers within a specified-range,
such as values greater than what is considered the "upper limit
of normal". Such types of data predominate in prospective studies
that employ a biomarker to screen, such as in the Prostate Cancer
Prevention Trial (PCPT), necessitating statistical methods to accommodate
potential biomarker-based verification bias for utilizing samples
from these studies.
The PCPT randomized 18,882 men aged 55 or older with a normal digital
rectal examination (DRE) and prostate-specific antigen (PSA) level
less than or equal to 3 ng per milliliter (ng/mL) to either finasteride
or placebo for seven years. A PSA and DRE were performed annually.
Whenever PSA exceeded 4 ng/mL or the DRE was positive indicating
suspicion of cancer, the participant was referred to biopsy. At
the end of seven years all individuals not previously diagnosed
with cancer were requested to have an end-of-study biopsy. The aim
of our correlative study was to derive the operating characteristics
of PSA for biopsy-detectable prostate cancer using the seven year
screening histories and outcomes from the PCPT placebo arm. We walk
through this case study, illustrating a Markov Chain Monte Carlo
algorithm to adjust for verification bias, and ending with our conclusions
concerning the operating characteristics of PSA and open questions
for the design of future prospective screening studies.
Authors: Kerry Bemis (Speaker), Jin-Sam You, Sheng-Liu and Mu Wang,
Indiana Centers for Applied Protein Sciences, Indianapolis, IN
Title: Statistical Issues with LC/MS Proteomics for Biomarker Discovery
Examined Using Data from Vinblastine Resistant and Sensitive Ovarian
Cancer Cells
Presentation materials: PPT
The difficult issues for the statistician designing and analyzing
proteomic studies are similiar to the issues with genomic studies.
I will discuss the following:
1. Normalization: controlling the systematic biases affecting all
proteins in a sample
2. Control of the number of false positives by estimating the False
Discovery Rate instead of the False Positive Rate.
3. Sample size calculation for controlling the False Discovery Rate.
4. Visualizing the significant results when you have thousands of
proteins to examine
These ideas will be examined using an experiment on vinblastine
resistant and sensitive ovarian cancer cell lines. Lessons learned
may facilitate discovery of biomarkers for vinblastine resistance.
Authors: Chengcheng Hu and Victor De Gruttola (speaker), Department
of Biostatistics, Harvard School of Public Health
Title: Joint Modeling of Progression of HIV Resistance Mutations
Measured with Uncertainty and Time to Virological Failure
Streaming Video: Real
Media
Development of HIV resistance mutations is a major cause for failure
of antiretroviral treatment. This article proposes a method for
jointly modeling the processes of viral genetic changes and treatment
failure. Because the viral genome is measured with uncertainty,
a hidden markov model is used to fit the viral genetic process.
The uncertain viral genotype is included as a time-dependent covariate
in a Cox model for failure time, and an EM algorithm is used to
estimate the model parameters. This model allows simultaneous evaluation
of the sequencing uncertainty and the effect of resistance mutation
on the risk of virological failure. The method is then applied to
data collected in three phase II clinical trials testing antiretroviral
treatments containing the drug efavirenz. Various model checking
tests are provided to assess the appropriateness of the model.
Author: Steven G. Deeks, University of California, San Francisco
Title: Pathogenesis of Drug-Resistant HIV: Implications for Novel
Treatment Strategies
Many patients treated with combination antiretroviral therapy fail
to achieve complete viral suppression. Optimizing individual treatment
strategies requires an understanding of the complex relationship
between replication of drug-resistant virus and the host response.
In particular, the distinction between persistent drug activity,
alterations in replicative capacity ("fitness") and the ability
of a newly emergent variant to cause disease ("virulence") may prove
to be important in designing long-term therapeutic strategies. These
issues will likely become even more relevant with entry inhibitors,
where drug-pressure may select for X4 variants that may be less
fit but more virulent. To address these issue we have performed
a series of studies focusing on the determinants of disease outcome
in patients with drug-resistant viremia, and have observed the following:
(1) HIV is often constrained in its ability to develop high-level
drug resistance while maintaining replicative capacity, (2) immune
activation is reduced in patients with drug-resistant HIV (after
controlling for the level of viremia) and (3) patients who durably
control HIV replication despite the presence of drug-resistance
exhibit immunologic characteristics comparable to that observed
in long-term non-progressors (e.g, low levels of T cell proliferation
and activation and preserved HIV-specific IL-2 and gamma-interferon-high
producing CD4+ T cells). We have initiated a number of interventional
studies based on the hypothesis that drug-mediated alterations in
HIV fitness/virulence may be clinically useful in patients with
limited therapeutic options.
Supported by NIAID (AI052745,AI055273), the UCSF/Gladstone CFAR
(P30 MH59037), the California AIDS Research Center (CC99-SF, ID01-SF-049)
and the SFGH GCRC (5-MO1-RR00083-37).
Author: Eleftherios Diamandis, Department of Pathology and Laboratory
Medicine, Mount Sinai Hospital
Title: Strategies for discovering new cancer biomarkers: Opportunities
and Pitfalls
Streaming Video: Real
Media
Author: Jane Fridlyand, Assistant Professor, Department of Epidemiology
and Biostatistics, University of California, San Francisco
Title: Application of array CGH to the analysis of cancer data
Streaming Video: Real
Media
The development of solid tumors is associated with acquisition
of complex genetic alterations, indicating that failures in the
mechanisms that maintain the integrity of the genome contribute
to tumor evolution. Thus, one expects that the particular types
of genomic derangement seen in tumors reflect underlying failures
in maintenance of genetic stability, as well as selection for changes
that provide growth advantage. In order to investigate genomic alterations
we are using microarray-based comparative genomic hybridization
(array CGH). The computational task is to map and characterize the
number and types of copy number alterations present in the tumors,
and so define copy number phenotypes as well as to associate them
with known biological markers and with gene expression data. We
discuss general analytical and visualization approaches applicable
to the array CGH data. We also use unsupervised Hidden Markov Models
approach to utilize the spatial coherence between nearby clones.
The clones are partitioned into the states which represent underlying
copy number of the group of clones. The output of the algorithm
is given as an input to higher-level analyses such as testing and
classification. We will also discuss some preliminary results on
joint analysis of the copy number and gene expression data. The
methods are demonstrated on simulated data as well as cell line
and clinical tumor datasets.
Author: Debashis Ghosh, Ph.D., Assistant Professor of Biostatistics,
School of Public Health, University of Michigan
Title: Combining Genomic Data in Cancer Microarray Experiments
Presentation materials: PPT
Streaming Video: Real
Media
With the advent of new high-throughput molecular technologies,
consideration of high-dimensional data is becoming more common.
A major role for statisticians to play in the future of this area
of bioinformatics is combining genomic data from different sources.
In this talk, we will discuss two examples of such analyses. The
first is combining gene expression datasets from multiple cancer
studies. The second is using gene expression data to infer chromosomal
alterations.
Author: M. Elizabeth Halloran, Department of Biostatistics, Emory
University
Title: Using validation sets for outcomes with time-to-event data
in vaccine studies
In many vaccine studies, confirmatory diagnosis of a suspected
case is made by doing a culture to confirm that the infectious agent
of interest is present. However, often such cultures are too expensive
or difficult to collect, so that an operational case definition,
such as ``any respiratory illness'', is used. This leads to many
misclassified cases and serious attenuation of efficacy and effectiveness
estimates. A validation sample can be used to improve the attenuated
estimates. We propose a new method of analysis for validation sets
with time-to-event in vaccine studies when the baseline hazards
of both the illness of interest and similar, nonspecific illnesses
are changing. We analyze data from an influenza vaccine field study
with these methods.
Authors: Richard E. Higgs (speaker), Michael D. Knierman, John E.
Hale, and Valentina Gelfanova; Genomic and Molecular Informatics,
Eli Lilly
Title: A comprehensive label-free method for the relative quantification
of proteins from biological samples
Global proteomics measurements are rapidly being developed to identify
biomarkers for drug development applications. A major challenge
with this strategy is the analysis of the raw data generated by
high throughput HPLC-MS/MS experiments of protein digests from complex
biological samples. This presentation will focus on a computational
pipeline to automatically process HPLC-MS/MS data including: estimation
of peptide charge and mass, noise filtering of MS/MS spectra, and
peptide identification. Following this pre-processing of individual
study samples we describe methods for chromatographic alignment
and label-free relative quantification using integrated ion current
of peptides from all samples in a biomarker study. Results from
a rat serum variability study will be used to demonstrate how the
method can be applied to biomarker discovery.
Author: Joe Hogan, Community Health - Center for Statistical Sciences,
Brown University
Title: Biomarker Evaluation and Analysis in a Causal Framework
Biomarkers can be used for several purposes, for example as surrogate
markers of treatment effect or as inputs to a diagnostic algorithm.
This talk will describe applications of causal modeling and inference
for both settings, and highlight the role of potential outcomes
for understanding properties of a biomarker.
First, we illustrate the use of instrumental variables and associated
sensitivity analysis for estimating causal treatment effects of
HAART from observational cohort studies. Our focus will be on transparent
representation of underlying assumptions, and on the role of coherent
sensitivity analyses to understand the effects of departures from
those assumptions.
Second, we will describe the role of potential outcomes for assessing
diagnostic utility of a continuous biomarker. An important measure
of diagnostic utility is area under the ROC curve. The area represents
P(X>Y), where X and Y are, respectively, randomly-drawn marker values
from the 'case' and 'non-case' populations. In some observational
studies, the 'case' and 'non-case' populations may be systematically
different, and bias can be introduced by confounders. We propose
a new definition for area under the ROC curve that is written in
terms of potential outcomes, and appeals to a causal interpretation
of diagnostic utility. Standard methods for causal inference can
be used to estimate the area under the curve; the ideas are illustrated
by examining the diagnostic utility of viral load and CD4 as markers
for HIV-related mortality, using inverse probability weighting to
adjust for potential confounders. We also make qualitative and quantitative
comparisons to standard methods.
Author: Steve Horvath, Assistant Professor, Biostatics and Human
Genetics, University of California, Los Angeles
Title: Improving Tumor Marker Validation Success Using Random Forest
Clustering and Gene Co-expression Network Methods
Presentation Materials: PPT
Streaming Video: Real
Media
Molecular data are widely used to screen for biomarkers that have
prognostic significance for clinical outcomes, e.g. gene expression
data or immuno-histochemical staining data may be used to screen
for biomarkers that could predict post-operative survival time.
A challenge is that such candidate biomarkers can sometimes not
be validated in independent data sets. Here we will describe 2 different
approaches that we have found to be useful for identifying biomarkers
that have an increased chance of being validated.
The first approach is based on weighted gene co-expression network
analysis. A clustering method is used to identify prognostic gene
modules, i.e. sets of tightly co-expressed genes. Using brain cancer
microarray data, we will show that highly connected prognostic `hub'
genes in these modules have a substantially increased likelihood
of being validated. The second approach seems to be quite different:
first, it uses random forest clustering to identify high risk patient
clusters. Second, a biomarker based threshold rule is derived for
predicting cluster membership. Using prostate cancer data, we will
provide empirical evidence that these rules can be validated while
traditional approaches may lead to candidate biomarkers that cannot
be validated.
There seems to be a mathematical and biological connection between
these 2 approaches. Both rely on a clustering as an essential pre-processing
step to identify "prognostic" clusters. The clusters correspond
to global patterns that are more likely to be found in independent
data sets as well. We provide empirical evidence that biomarker
screening procedures that are based on prognostic clusters have
an increased chance of validation success.
Acknowledgement: The gene co-expression network part was done in
collaboration with Bin Zhang, Paul Mischel, and Stan Nelson. The
random forest part was done in collaboration with Tao Shi, Siavash
Kurdistani, and David Seligson.
Author: Martin McIntosh, Ph.D. PI, Computational Proteomics Laboratory
Fred Hutchinson Cancer Research Center
Title: Comparative profiling of complex protein mixtures with peptide
arrays generated from LC-MS mass spectrometry
Advancements in mass spectrometry (MS) instrumentation, liquid
chromatography (LC) and maturing protein databases are leading many
advances in the field of proteomics. Among the potential uses of
this technology is the identification of predictive protein biological
markers or biomarkers that can differentiate two or more groups
of complex biological samples. Despite its proteome-wide potential
few clinically relevant discoveries have come forth from these technologies
when applied to complex protein mixtures, such as serum or tissue,
characterized by a high complexity and dynamic range. Current approaches
to profile proteins are dominated by the use of MALDI or LC-MS/MS
mass spectrometry (MS/MS), and both approaches have difficulties
in practice; MALDI can identify a large number of "peaks", but identification
(sequence) of low abundant features can be difficult, and MS/MS
lacks sensitivity and has poor reproducibility and low protein coverage
due to its data-dependent sampling. It has been our hypothesis that
greater efficiency of protein/peptide profiling could be obtained
by more efficient use of high resolution LC-MS instrumentation where,
like MALDI approaches, differential peptides are first identified
from the list of potential precursor ions (LC-MS) and then those
only those differential peptides are sequenced in subsequence LC-MS
measurements. To evaluate this hypothesis, our group has developed
a suite of software algorithms that produce a peptide array from
a sequence of LC-MS measurements; the peptide array can be evaluated
in much the same way as a transcript array with members identified
by their accurate mass and time tags. Production of the peptide
array requires substantial signal (image) processing, image alignment,
and specialized normalization routines. We demonstrate that we can
identify and compare hundreds or thousands of peptides and proteins
across multiple replicates of biological samples. The algorithms
will be demonstrated using data of increasingly complex biological
samples; bacteria, yeast, and human serum.
Author: Adam B. Olshen, Ph.D., Assistant Attending Biostatistician,
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering
Cancer Center
Title: Some Statistical Issues in the Analysis of Array CGH Data
Presentation materials: PDF
Streaming Video: Real
Media
Cancer progression often involves alterations in DNA sequence copy
number. Multiple microarray platforms now facilitate high-resolution
copy number assessment of entire genomes in single experiments.
This technology is generally referred to as array comparative genomic
hybridization (array CGH). In my talk, I will discuss issues that
have arisen in the analysis of array CGH data. Topics will include
pre-processing and normalization, identification of regions of abnormal
copy number, and determination as to whether copy number abnormalities
can be seen in gene expression data. Our method of identifying abnormal
copy number, which we call circular binary segmentation (CBS), will
be introduced.
This is joint work with E.S. Venkatraman.
Author: Alan S. Perelson, Theoretical Division, Los Alamos National
Laboratory
Title: Modeling Drug Efficacy and HIV Dynamics
Presentation materials: PPT
Streaming Video: Real
Media
Simple models of HIV infection and the effects of antiretroviral
therapy have typically assumed that drugs have a constant efficacy.
Here I will summarize some new models that incorporate ideas from
pharmacokinetics and pharmacodynamics such that drug efficacy depends
on drug concentration, which in turn depends on drug dose and time
at which drug is taken. These models allow estimation of the relative
efficacy of different drug combinations and also allow one to explicitly
incorporate the effects of missed drug doses or intentional stopping
of therapy for short periods of time. Effects of drug resistance
can also be incorporated.
Author: Mark A. Rubin, MD Department of Pathology, Brigham and Women's
Hospital, Boston, Massachusetts
Title: Defining Aggressive Prostate Cancer Biomarkers using a Combination
of High Throughput Technologies
Developing molecular tests to predict prostate cancer progression
requires first defining a meaningful endpoint. There is controversy
regarding the use of PSA or biochemical failure following prostatectomy
or radiation therapy for clinically localized prostate cancer as
a marker of progression. As a consequence, advances in prostate
cancer biomarker development may require using population-based
cohorts or cases from clinical trials to identify meaningful associations.
Whereas the discovery of novel candidate biomarkers was slow 5-10
years ago and often resulted from serendipity, advances in high-throughput
technologies have lead to the identification of a large number of
candidate genes. Strategies to identify candidate genes include
the use of novel software for genomic analysis. This presentation
will provide an approach to validation of these candidate genes
using tissue microarrays and other high throughput technologies.
Since a critical factor in the evaluation of tissue markers is reproducibility,
approaches to quantitative protein expression will be presented.
The approaches presented here should be applicable to other tumor
types and disease processes.
Authors: Mark Segal (speaker) and Yuanyuan Xiao, Division of Biostatistics,
University of California
Title: Genomewide Prediction of HIV-1 Epitopes Using Ensemble Classifiers
and Amino Acid Sequence of MHC Binding Peptides
Presentation materials: PDF
Streaming Video: Real
Media
Following infection, HIV-1 proteins are digested into short peptides
that bind to major histocompatibility complex (MHC) molecules. Subsequently,
these bound complexes are displayed by antigen presenting cells.
T cells with receptors that recognize the complexes are activated,
triggering an immune response. Peptides with this ability to induce
T cell response are called T cell epitopes -- prediction thereof
is important for vaccine development. Sung and Simon (JCB, 2004)
start with compilations of peptide sequences that {bind/don't bind}
to specific MHC molecules and, using biophysical properties of the
constituent amino acids, develop a classifier. Properties are used
because of the inability of select classifiers to effectively handle
amino acid sequence itself. Tree-structured methods are not so limited
(Segal et al., Biometrics, 2001). Here, we apply these methods,
along with their ensemble extensions (bagging, boosting, random
forests), and show they provide improved accuracy. Both additional
properties (QSAR derived) and classifiers (SVMs, ANNs) are also
investigated. HIV-1 genomewide comparisons with respect to predicted
/ conserved epitopes are also presented.
Author: John Semmes, Ph.D., Scientific Director, Virginia Prostate
Center, Director, Center for Biomedical Proteomics, Professor, Department
of Microbiology and Molecular Cell Biology, Eastern Virginia Medical
School
Title: Tackling Cancer Diagnostic Needs with Clinical Proteomics
Streaming Video: Real
Media
Our group utilizes a variety of proteomic approaches to biomarker
discovery for the early detection of cancer. Discussed will be the
current mass spectrometry based studies and their application to
solid tissue cancers such as prostate and head and neck. In addition
recent studies examining serum from patients infected with the Human
T-cell leukemia Virus type 1 will be presented with emphasis on
the utility of the expressed biomarkers in the discrimination of
Adult T-cell leukemia, HAM/TSP and asymptomatic infected individuals.
Author: Steven J. Skates, Massachusetts General Hospital and Harvard
Medical School
Title: Longitudinal Biomarkers in Detection of Ovarian Cancer in
Asymptomatic Women
Streaming Video: Real
Media
Detecting ovarian cancer in asymptomatic women through regular
screening tests is an appealing approach to reducing mortality from
this disease due to the large survival difference between early
and late stage disease, and the high proportion of cases detected
at late stage (80%) under usual care. However, due to the low incidence
of the disease, ovarian cancer screening is a delicate balance between
detecting as many cancers as possible while limiting the number
of false positive results per true positive. As the bar is lowered
for declaring a test positive, the proportion of cancers detected
usually increases; however, the number of false positives per cancer
detected also increases. The definitive diagnosis of ovarian cancer
requires invasive pelvic surgery. A method for screening requires
at least one ovarian cancer to be found in ten screen related surgeries,
and at least 70% of the ovarian cancers screen detected to be considered
acceptable.
Prospective clinical screening trials with the blood test CA125,
followed by ultrasound for elevated CA125 above a fixed cutpoint,
resulted in a positive predictive value (# cancers at surgery/#
surgeries), or PPV, exceeding 20% and with 70% of ovarian cancers
screen detected, demonstrating this screening method is acceptable.
However only 40% of screen detected cancers were found in early
stage. While this result doubled the percentage found in early stage
under usual care, a greater increase was required before the impact
on mortality would be substantial. A method was required for increasing
the sensitivity while maintaining a sufficiently high PPV. Retrospective
analysis of longitudinal CA125 values indicated that CA125 values
rose exponentially above an individual's baseline level prior to
diagnosis of ovarian cancer, while in most other women the CA125
fluctuated around a baseline level. Incorporating this differential
CA125 behavior into the screening decision for referral to ultrasound
would potentially allow greater sensitivity (rise above a baseline
but prior to achieving a level exceeding the fixed cutpoint) while
maintaining specificity (rule out subjects with elevated yet stable
CA125 levels). Modeling the longitudinal CA125 values in cases with
a hierarchical longitudinal change-point model, and the CA125 in
other women with a hierarchical longitudinal model, provided the
basis for assessing referral to ultrasound with the Bayes factor
calculated for subjects with new CA125 values. This approach has
been used in a prospective randomized ovarian cancer screening trial
in the UK and will be discussed at the workshop.
Author: Jeremy Taylor, Professor of Biostatistics, School of Public
Health, University of Michigan
Title: Statistical issues in cancer biomarker assessment
Presentation materials: PDF
Streaming Video: Real
Media
Cancer biomarkers can be used in many different ways in cancer
research. They can be used as surrogate endpoints or auxiliary variables
to help assess new therapies. They can be used for risk stratification
prior to deciding on therapy. A biomarker might suggest responsiveness
to a particular biological agent, and would thus assist in individualizing
therapy. Modern technologies, such as from genomics and proteomics,
are producing high dimensional sets of biomarkers, which give rise
to numerous complex statistical issues. A longitudinal series of
a biomarker can be useful for early detection of disease or for
monitoring disease progression after therapy. There is a general
feeling that combinations of biomarkers, that measure different
aspects of the underlying biology, may be more useful than any single
biomarker. This raises the statistical challenge of how to combine
biomarkers. When using combinations of biomarkers to detect disease
it is frequently appropriate to assume that the probability of disease
is a monotonic function of the each biomarker. By incorporating
this monotonicity into the analysis it may be possible to improve
its efficiency. We consider the situation of two ordered categorical
variables and a binary response. The probability of response is
assumed to be monotonic in each of the biomarkers. Two approaches
are considered, one Bayesian in which the monotonicity is built
into the prior distributions and a second in which isotonic regression
in two dimensions is used. When using a biomarker as a surrogate
endpoint in a clinical trial it is well known that one requires
more than a strong association between the biomarker and the true
endpoint, one also needs the biomarker to explain the effect of
the treatment on the true endpoint. Various measures of the proportion
of treatment effect explained by the surrogate have been proposed.
An alternative approach is to view the biomarker as an auxiliary
variable, and use it to predict the true endpoint, and then perform
inference on the true endpoint. Thus the problem is converted into
one of missing data, for which there are various approaches. We
have developed an approach of multiple imputation, in which the
true endpoint is imputed based on information in the auxiliary variable,
the treatment group and possibly other prognostic factors. This
approach generalizes to more complex situations such as multivariate
biomarkers or longitudinally measured biomarkers. A more general
approach is to formulate and estimate the joint distribution of
the biomarker and the true endpoint, once this is achieved measures
such as the proportion explained and predictive distributions of
true endpoint values are a natural consequence of the model.
Author: Rodolphe Thiébaut, INSERM E0338 Biostatistics, ISPED,
Université Victor Segalen Bordeaux 2
Title: Issues in longitudinal modelling of HIV markers using mixed
models
Presentation materials: PPT
Streaming Video: Real
Media
Plasma HIV RNA and T lymphocytes CD4+ count are major biomarkers
used to decide when to start, change or stop a treatment as well
as to evaluate treatment efficacy in HIV-infected patients. Thus,
repeated measurements of those biomarkers are common in HIV studies.
Those data may be analysed by using models for longitudinal data
such as mixed models. However, the statistical analysis is complicated
by several methodological difficulties. Three of them are of particular
importance: (i) left-censoring of HIV RNA due to a lower quantification
limit; (ii) correlation between CD4+ T lymphocytes and plasma HIV
RNA; (iii) missing data due to informative dropout or disease progression.
I will present a unified approach to deal with those issues by jointly
modelling longitudinal measurement data and event history data.
Likelihood inference can be used to estimate the parameters of such
model. I will illustrate it by studying HIV markers response to
antiretroviral treatment in randomised clinical trials and observational
cohort studies. This approach might help in studying the change
in markers, their prognostic value and their surrogacy.
Author: Bruce J. Trock, Ph.D., Associate Professor, Departments
of Urology, Epidemiology, and Oncology
Johns Hopkins School of Medicine
Title: Surrogate endpoint biomarkers in chemoprevention: Mathematical
evaluation of feasibility
Presentation materials: PPT
Streaming Video: Real
Media
To establish efficacy of cancer chemoprevention agents using cancer
incidence as the endpoint requires very large sample sizes (thousands)
and long follow-up. Surrogate endpoint biomarkers (SEBs) are biomarkers
of (presumably critical) intermediate steps in the carcinogenic pathway
that may permit smaller and more rapid studies. If the chemopreventive
agent modulates the SEB in a manner consistent with blocking or reducing
progression to carcinogenesis it may be possible to infer the reduction
in cancer risk attributable to the agent. However, if there is not
a perfect one-to-one correspondence between the SEB and cancer then
the SEB induces misclassification of the cancer outcome. The extent
of bias in the SEB as a surrogate for cancer is measured by its sensitivity
and specificity. This paper will show that the relative risk (RR)
observed using the SEB as a surrogate for cancer can severely underestimate
the true RR when specificity is less than perfect. Furthermore, if
specificity in the group receiving the chemopreventive agent is less
than that in the untreated group, the RR based on the SEB may even
indicate that the agent increases cancer risk. The performance characteristics
of SEBs as a function of sensitivity, specificity and cancer incidence
will be explored, and criteria to determine if SEBs can realistically
be used will be defined.
Authors: Mark van der Laan (Speaker), Maya Petersen, Sandra Sinisi,
Division of Biostatistics, UC Berkeley
Title: Interpreting HIV mutations to predict response to antiretroviral
therapy:
The deletion/substitution/addition (DSA) algorithm for the estimation
of direct causal effects
Presentation materials: PDF
Streaming Video: Real
Media
Our goal is to estimate the causal effect of mutations detected
in the HIV strains infecting a patient on clinical virologic response
to specific anti retroviral drugs and drug combinations. We consider
the following data structure: 1) viral genotype, which we summarize
as the presence or absence of each viral mutation considered by
the Stanford HIV Database as likely to have some effect on virologic
response to antiretroviral therapy; 2) drug regimen initiated following
assessment of viral genotype (the regimen may involve changing some
or all of the drugs in a patient's previous regimen); and, 3) change
in plasma HIV RNA level (viral load) over baseline at twelve and
twenty-four weeks after starting this regimen.
The effects of a set of mutations on virologic response are heavily
confounded by past treatment. In addition, viral mutation profiles
are often used by physicians to make treatment choices; we are interested
in the direct causal effect of mutations on virologic outcome, not
mediated by choice of other drugs in a patient's regimen. Finally,
the need to consider multiple mutations and treatment history variables,
as well as multi-way interactions between these variables, results
in a high-dimensional modeling problem. This application thus requires
data-adaptive estimation of the direct causal effect of a set of
mutations on viral load under a particular drug, controlling for
confounding and blocking the effect the mutations have on the assignment
of other drugs. We developed such an algorithm based on a mix of
the direct effect causal inference framework and the data adaptive
regression deletion/substitution/addition (DSA) algorithm.
Author: Hulin Wu, Ph.D., Department of Biostatistics and Computational
Biology, University of Rochester
Title: Modeling and Prediction of Biomarkers Longitudinally in AIDS
Clinical Studies
Presentation materials: PDF
Streaming Video: Real
Media
Although a single event endpoint such as time to virological failure
is simple and easy to use in large AIDS clinical trials, the longitudinal
biomarker data from closely monitoring of viral load and CD4+ T
cell counts can provide more detailed information regarding pathogenesis
of HIV infection and characteristics of antiretroviral regimens.
I will present a mechanistic HIV-1 dynamic model that will incorporate
the information of pharmacokinetics, drug adherence and drug susceptibility
to predict viral load trajectory. A Bayesian approach is proposed
to fit this model to clinical data from ACTG A5055, a study of two
dosage regimens of indinavir (IDV) with ritonavir (RTV) in subjects
failing their first PI treatment. HIV RNA testing was completed
at days 0, 7, 14, 28, 56, 84, 112, 140 and 168. An intensive PK
evaluation was performed on day 14 and multiple trough concentrations
were subsequently collected. Pill counts were used to monitor adherence.
IC50 for IDV and RTV was determined at baseline and at virologic
failure. Viral dynamic model fitting residuals were used to assess
the significance of covariate effects on long-term virologic response.
As univariate predictors, none of the four PK parameters C_trough,
C_12h, C_max and AUC_0-12h was significantly related to virologic
response (p>0.05). By including drug susceptibility (IC50), or IC50
and adherence together, C_trough, C_12h, C_max and AUC_0-12h were
each significantly correlated to long-term virologic response (p=0.0055,0.0002,0.0136,0.0002
with IC50 and adherence considered). IC50 and adherence alone were
not related to the virologic response. Adherence did not provide
any additional information to PK parameters (p=0.064), to drug susceptibility
IC50 (p=0.086), and to their combination (p=0.22) in predicting
virologic response. Simple regression approaches did not detect
any significant PD relationships. Any single factor of PK, adherence
and drug susceptibility cannot be detected to have significant contribution
to long-term virologic response. But appropriate combination of
these factors using viral dynamic modeling approach was shown to
be significant to predict virologic response. Adherence measured
by pill counts and multiple trough drug concentrations did not provide
additional information for virologic response presumably due to
the data quality and noise problems. HIV dynamic modeling is a powerful
tool to establish a PD relationship and correlate other factors
such as adherence and drug susceptibility to long-term virologic
response, since it can appropriately capture the complicated nonlinear
relationships and interactions among multiple covariates. Our findings
may help clinicians better understand the roles of these clinical
factors in antiviral activities and predict the virologic response
of various antiretroviral regimens.
Discussions
Discussants: Colin Begg, Department of Epidemiology & Biostatistics,
Memorial Sloan-Kettering Cancer Center; and Kevin Coombes, Biostatistics-MD
Anderson Cancer Center, University of Texas
Streaming Video: Real
Media
Discussants: Elizabeth Slate, Department of Biostatistics, MUSC;
Alex Tsodikov, Biostatistics-Dept. of Public Health Services, University
of California, Davis; Kevin Baggerly, MD Anderson Cancer Center,
University of Texas; Zhen Zhang, Pathology-Center for Biomarker
Discovery, Johns Hopkins Medical Institutions
Streaming Video: Real
Media
Poster Titles and Abstracts
Author: Tarek A. Bismar 1,2,
Francesca Demichelis3,4(presenter), Alberto Riva2,5,
Robert Kim1, Sooryanarayana Varambally6, Le
He1, Jeff Kutok1,2, Jonathan C. Aster1,2,
Jeffery Tang1,2, Rainer Kuefer7, Matthias
D. Hofer1,2, Phillip G. Febbo2,8, Arul M.
Chinnaiyan6, and Mark A. Rubin 1,2,8
1 Department of Pathology,
Brigham and Women s Hospital
2 Harvard Medical School
3 Bioinformatics, SRA Division, ITC-irst
4 Department of Information and Communication Technology, University
of Trento
5Children's Hospital Informatics Program, Children's Hospital
6 Department of Pathology and Urology, University of Michigan
7 University Hospital of Ulm
8 Dana Farber Cancer Institute
Title: Defining Aggressive Prostate Cancer Using a 12 Gene Model
Streaming Video: Real
Media
Background: The critical clinical question in prostate cancer
research is to develop means of distinguishing aggressive from indolent
prostate cancer. Expression array technology has lead to the development
of discrete molecular signatures but the development of a robust
signature to characterize aggressive prostate cancer has yet to
be achieved. We describe a multi-stage approach to develop a model
of prostate cancer progression.
Methods: A recent study from our group employed high-throughput
immunoblotting using antibodies against 1383 distinct proteins or
post-translational modifications in order to interrogate tissue
extracts derived from benign prostate, clinically localized prostate
cancer, and metastatic prostate cancer. An integrative analysis
of this compendium of proteomic alterations and transcriptomic data
derived from 8 prostate cancer profiling studies was used to select
a smaller set of genes that demonstrated concordance between protein
and transcript levels. 41 of these genes could be evaluated on archival
tissue samples. Using a prostate cancer progression tissue microarray,
the protein products of these genes were tested using quantitative
analysis of immunohistochemistry. The best model was validated using
prostate cancer expression array data with associated clinical outcomes
data.
Authors: Annette Molinaro1,
Mark van der Laan2, Sandrine Dudoit2, 1National
Cancer Institute, Rockville, MD, 2University of California,
Berkeley, CA
Title: Prediction of Survival with Regression Trees and Cross-Validation:
Applications in Genomics
Streaming Video: Real
Media
Clinicians and researchers collect a tremendous amount of data
on cancer patients in the hopes of finding significant prognostic
factors. Medical studies commonly involve thousands of clinical,
epidemiological, and genomic measurements collected on each patient,
along with a time to the clinical event of interest, such as disease
recurrence or death. At the end of the study, some patients may
have dropped out, been lost to follow-up, or not had the particular
event. In this situation, the last date of follow-up is recorded
and referred to as the censored time to event. These studies are
intended to model time to event by the measured variables for the
purposes of predicting time to event for future patients and identifying
which of the variables are integral in affecting this outcome. We
present a generalization of classification and regression trees
(CART) (Breiman, et al., 1984) in the presence of censoring. This
approach is based on a strategy to generate possible predictors
of time to event, choose the best predictor, and assess its performance.
As this strategy is not limited to CART, a new more aggressive
algorithm for generating possible predictors is introduced. To illustrate
this approach, both CART and the new algorithm have been applied
to simulation studies as well as example data from Comparative Genomic
Hybridization array analysis. The proposed approach is applicable
to numerous settings, including univariate and multivariate prediction
and density estimation. Thus, this method provides a powerful predictive
tool for linking complex data sets with censored (or non-censored)
outcomes.
Authors: Jeffrey S. Morris (1), Philip J. Brown (2), Kevin R. Coombes
(1), and Keith A. Baggerly (1)
Title: Bayesian Modeling and Inference for Mass Spectrometry Data
using Functional Mixed Models
(1) Department of Biostatistics and Applied Mathematics, The University
of Texas MD Anderson Cancer Center, Houston, TX
(2) Institute of Mathematics and Statistics, University of Kent,
Canterbury, England
Presentation materials: PPT
Streaming Video: Real
Media
In this work, we demonstrate how to analyze MALDI-TOF mass spectrometry
data using the wavelet-based functional mixed model approach of
Morris and Carroll (2004), which is a generalization of the linear
mixed model to functional data. This approach models each spectrum
as a function, and is very general, accommodating a wide class of
experimental designs and allowing one to identify protein peaks
related to various outcomes of interest, including dichotomous outcomes,
categorical outcomes, continuous outcomes, and any interactions
among factors. These factors can be conditions of interest (e.g.
cancer/normal) or experimental factors for which we wish to account
(blocking factors). Random effects make it possible to model correlation
between spectra from the same individual or block. The MCMC output
can be used to perform peak detection, find which peaks are related
to factors of interest while controlling the false discovery rate,
and to classify future samples based on their proteomic spectra
without having to search high dimensional spaces. These analyses
are all done while automatically adjusting for nonlinear block effects
that are characteristic of these data. We apply this method to two
MALDI-TOF data sets from experiments run at MD Anderson, one a clinical
study whose goal is diagnosis of pancreatic cancer from blood serum,
and the other an animal study studying the serum proteome of mice
injected with one of two cell lines in one of two organs. This methodology
appears promising for the analysis of mass spectrometry data.
Authors: Daniel Normolle
(University of Michigan), David Ransohoff (University of North Carolina),
Richard Drake (Eastern Virginia Medical University) and Dean Brenner
(University of Michigan)
Title: SELDI-TOF as a Screening Tool for Colon Cancer
Streaming Video: Real
Media
GLNE 001 is a prospective study conducted by a Clinical Epidemiology
Center of the Early Detection and Research Network that collects
serum samples from patients presenting at colonoscopy clinics at
several sites. We are using the samples to assess the uility of
SELDI-TOF to classify patients who are normal from those with adenocarcinoma.
One hundred unblinded samples are used as training set, and 155
blinded samples are used for validation. Issues in the analysis
include the identification of peaks, and the construction of a useful
classifier where there are a multiplicity of candidate markers.
We will discuss the use of wavelets for de-trending and de-noising
the spectra, issues in peak identification and alignment, and a
comparison of several machine learning algorithms for constructing
a classifier. SELDI-TOF is found to have limited capability to classify
sera from normal patients versus those with adenocarcinomas.
Authors: Natasa Rajicic,
Dianne Finkelstein, and David Schoenfeld; Harvard Medical School
Massachusetts General Hospital Biostatistics Center
Title: Survival Analysis of Longitudinal Genearray Data
Presentation materials: PPT
Streaming Video: Real
Media
We describe an approach to the survival analysis of longitudinally
collected genomic data. We construct a measure of association between
the survival endpoint and gene expressions collected over time and
find significance levels using permutations. This nonparametric
approach does not depend on any untestable assumptions about the
unknown distributions of gene expressions. The issue of high dimensionality
and dependence present in the genomic data is addressed through
a multiple testing procedure. We also address missing data problem
which occurs as a result of using permutations on possibly censored,
longitudinal data. Our proposed method is illustrated on a dataset
from a multi-centered research study of inflammation and the host
response to traumatic injury.
Keywords : gene microarrays, survival analysis, longitudinal data,
permutation tests, false discovery rate
Authors: Ronglai Shen*, Debashis Ghosh*, and Arul M. Chinnaiyan**;
*Department of Biostatistics, University of Michigan, **Department
of Pathology, Urology, and the Comprehensive Cancer Center, University
of Michigan
Title: Prognostic meta-signature of breast cancer developed by two-stage
mixture modeling of microarray Data
Background: An increasing number of studies have profiled tumor
specimens using distinct
microarray platforms and analysis techniques. With the accumulating
amount of microarray data, one of the most
intriguing yet challenging tasks is to develop robust statistical
models to integrate the findings.
Results: By applying a two-stage Bayesian mixture modeling strategy,
we were able to
assimilate and analyze four independent microarray studies to derive
an inter-study validated ``meta-signature''
associated with breast cancer prognosis. Combining multiple studies
($n= 305$ samples) on a common probability
scale, we developed a 90-gene meta-signature, which strongly associated
with survival in breast cancer patients.
Given the set of independent studies using different microarray
platforms which included spotted cDNAs,
Affymetrix GeneChip, and inkjet oligonucleotides, the individually
identified classifiers yielded gene sets
predictive of survival in each study cohort. The study-specific
gene signatures, however, had minimal overlap
with each other, and performed poorly in pairwise cross-validation.
The meta-signature, on the other hand,
accommodated such heterogeneity and achieved comparable or better
prognostic performance when compared with the
individual signatures. Further by comparing to a global standardization
method, the mixture model based data
transformation demonstrated superior properties for data integration
and provided solid basis for building
classifiers at the second stage. Functional annotation revealed
that genes involved in cell cycle and signal
transduction activities were over-represented in the meta-signature.
Conclusion: The mixture modeling approach unifies disparate gene
expression data on a common
probability scale allowing for robust, inter-study validated prognostic
signatures to be obtained. With the
emerging utility of microarrays for cancer pro
gnosis, it will be important to establish paradigms to
meta-analyze disparate gene expression data for prognostic signatures
of potential clinical use.
Author: Yu Shyr, Ph.D., Department
of Biostatistics, Vanderbilt University School of Medicine
Title: Recent Development in MALDI-TOF MS Protein Profiling
Streaming Video: Real
Media
Matrix-assisted laser desorption-ionization, time-of-flight (MALDI-TOF)
mass spectrometry (MS) is a leading technology in proteomics. This
technology allows direct measurement of "expression signature" of
tissue, serum, plasma, or other biological specimens. It has tremendous
potential for disease screening, diagnosis and treatment. The processing
goal of MS data is to effectively and correctly obtain the true
information from the raw MS data for further statistical analysis.
Two general approaches have been studied recently: functional data
analysis approach (Morris and Carroll 2004, Billheimer 2004) and
the feature extraction approach (Coombes et al 2004, Chen, Hong
and Shyr 2004). To provide a final peak list for future statistical
analysis, the whole processing procedure by feature extraction approach
usually takes the following steps: de-noising (smoothing), baseline
correction, normalization, peak detection and alignment. In this
talk, we will introduce some recent progress on MS data processing
using mathematical tools and statistical methods. Some experimental
results will be shown using the data processing software packages
developed by High Dimensional Data Core in Vanderbilt-Ingram Cancer
Center.
Author: Sally W. Thurston,
Department of Biostatistics and Computational Biology, University
of Rochester
Title: Modeling Measurement Error in a Biomarker on the Pathway
from Smoking to Lung Cancer
Presentation materials: PDF
Streaming Video: Real
Media
Carcinogens derived from cigarette smoke can bind to DNA to form DNA
adducts, and this process is believed to initiate smoking-induced
lung cancer. The goal of this work is to incorporate knowledge of
this process to improve cancer risk estimates. We use data from a
large case-control study of lung cancer conducted at Massachusetts
General Hospital for our models, which also incorporate data on several
DNA repair genes. We face several difficulties including (a) adducts
were only measured on a very small subset of the dataset; (b) for
some individuals, the number of adducts was below the limit of detection;
and (c) DNA adducts in lung tissue can be measured in lung cancer
cases but never in controls. DNA adducts were also measured in blood
mononuclear cells for a small number cases and controls, and we consider
blood adducts to be measured with error relative to lung adducts.
By introducing a latent variable for true lung DNA adducts, we allow
for measurement error in both types of observed adduct measurements,
but assume greater measurement error in blood adducts. We compare
the performance of models that incorporate DNA adducts versus those
that do not, in predicting the case status of individuals not used
in fitting the models.
|