Space Oriented Rank-based Data Integration

Shili Lin
Department of Statistics, The Ohio State University

(April 29, 2010 11:30 AM - 12:18 PM)

Space Oriented Rank-based Data Integration

Abstract

One of the major challenges facing researchers studying complex biological systems is integration of data from omics platforms. Omic-scale data include DNA variations, transcriptom profiles, and RAomics. Selection of an appropriate approach for a data integration task is problem dependent, primarily dictated by the information contained in the data. In situations where modeling of multiple raw data sets jointly might be extremely challenging due to their vast differences, rankings from each data set would provide a commonality based on which results could be integrated. Because the underlying spaces of genes (elements) from which each ranked list come from are likely to be different, taking the underlying spaces into consideration is paramount, as failure to do so would lead to inefficient use of data and might render biases and/or sub-optimal results. However, this important aspect is usually overlooked in the literature on rank-based integration methods for omic-scale data. Nevertheless, although no assumptions about the underlying spaces are explicitly stated, carefully dissections of the algorithms reveal implicit assumptions about the spaces regardless of whether such assumptions are valid for a particular integration problem. In this talk, I will discuss a number of space oriented methods, including Markov chain based heuristic algorithms and optimization based cross entropy Monte Carlo methods for integrating ranking data. Examples will be shown to dissect the methods and to demonstrate the effects of assumptions about the underlying spaces.