Classes of Reciprocal Sequence Homologs

Sam Handelman
Pharmacology, College of Medicine and Public Health, The Ohio State University

(January 21, 2010 10:30 AM - 11:18 AM)

Classes of Reciprocal Sequence Homologs

Abstract

Sequence variations altering protein function are a fundamental driving force in evolution. While the rapid proliferation of whole-genome sequence data should provide unprecedented insight into the evolution of protein function, especially in bacterial organisms for which thousands of complete genome sequences will soon be available, there are practical obstacles to achieving this potential. New methods to assess the functional similarity between proteins are needed to overcome these obstacles. We have evaluated an orthology-based method to group bacterial proteins based on likely similarity in biochemical function. The foundation of this method involves using the occurrence of multiple homologous proteins in a single microbial organism as evidence of functional diversification among those homologs. The resulting groups of functionally similar proteins are called Classes of Reciprocal Sequence Homologs (CRSHs). Different CRSHs vary tremendously in their degree of sequence conservation in widely diverged organisms (ranging from 25-70% identity). However, once this variation is taken into account, a simple model using only the mean evolutionary distance between pairs of microbial organisms accounts for the vast majority of the sequence differences within each CRSH. The likely functional similarity of the proteins in each CRSH is also supported by preservation of gene neighborhood in remotely related microbial organisms, which in turn is strongly correlated with transcriptional co-regulation in the model bacterium E. coli. Furthermore, a CRSH-based metric achieves 30% accuracy in predicting manually validated physical inter-protein interactions in E. coli. A webserver at www.orthology.org provides access to the CRSHs along with related quality-control, gene-neighborhood, and annotation information.