Many statistical methods have already been developed to select genes that are differentially expressed under different conditions or in different populations. Often the selected genes are subsequently examined for over representation in known pathways, thus implicating activation of the pathway as relevant in explaining the observed differences. Potential pathways are typically not incorporated into the initial discovery process for fear of biasing discovery toward what is already known.
Here we show how covariance can be used to exploit pathway structure without biasing selection in favor of known pathways. Starting with a simple model for differences of expression in a paired-subject experiment, we show that, for large, highly coordinated gene networks, the eigenvectors of the covariance matrix may contain substantial information about which genes are relevant to the differential processes. A similar type covariance structure is identified for gene expression at different epochs in the reproductive cycle of rainbow trout, and a robust method for feature selection, called SCOOP, is developed to select genes that naturally describe reproductive processes and the suggest implication of genes with previously unknown function.
This is joint work with Yushi Liu and Bill Hayton.