skip to main content
article

Molecular Function Prediction Using Neighborhood Features

Published: 01 April 2010 Publication History

Abstract

The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes.

References

[1]
Yeast Genome Database, http://www.yeastgenome.org/, 2009.
[2]
Ensembl--On-Line Genome Database, http://www.ensembl.org/, 2009.
[3]
"Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, pp. 25-29, 2000.
[4]
BioGRID: General Repository for Interaction Datasets, http:// www.thebiogrid.org/, 2006.
[5]
V. Arnau, S. Mars, and I. Marin, "Iterative Clustering Analysis of Protein Interaction Data," Bioinformatics, vol. 21, pp. 364- 378, 2005.
[6]
C. Brun, F. Chevenet, D. Martin, J. Wojcik, A. Guenoche, and B. Jacq, "Functional Classification of Proteins for the Prediction of Cellular Function from a Protein-Protein Interaction Network," Genome Biology, vol. 5, no. 1, R6, http://www.ncbi.nlm.nih.gov/ pmc/articles/PMC395738/, 2003.
[7]
T. Can, O. Camoglu, and A.K. Singh, "Analysis of Protein Interaction Networks Using Random Walks," Proc. Fifth ACM SIGKDD Workshop Data Mining in Bioinformatics, 2005.
[8]
J. Chen, W. Hsu, M.L. Lee, and S.-K. Ng, "Labeling Network Motifs in Protein Interactomes for Protein Function Prediction," Proc. Int'l Conf. Data Eng. (ICDE), 2007.
[9]
H. Chua, W. Sung, and L. Wong, "Exploiting Indirect Neighbors and Topological Weight to Predict Protein Function from Protein-Protein Interactions," Bioinformatics, vol. 19, pp. i197-i204, 2006.
[10]
H. Chua, W. Sung, and L. Wong, "Using Indirect Protein Interactions for the Prediction of Gene Ontology Functions," BMC Bioinformatics, vol. 8, suppl. 4, p. S8, http://www.biomed central.com/1471-2105/8/S4/S8, 2007.
[11]
M. Deng, Z. Tu, F. Sun, and T. Chen, "Mapping Gene Ontology to Proteins Based on Protein-Protein Interaction Data," Bioinformatics, vol. 20, pp. 895-902, Apr. 2004.
[12]
M. Deng, K. Zhang, S. Mehta, T. Chen, and F. Sun, "Prediction of Protein Function Using Protein-Protein Interaction Data," J. Computational Biology, vol. 10, pp. 947-960, 2003.
[13]
R. Dunn, F. Dudbridge, and C. Sanderson, "The Use of Edge-Betweenness Clustering to Investigate the Biological Function in Protein Interaction Networks," BMC Bioinformatics, vol. 6, article 1, pp. 39-53, http://www.biomedcentral.com/1471- 2105/6/39, 2005.
[14]
J. Han et al., "Evidence for Dynamically Organized Modularity in the Yeast Protein-Protein Interaction Network," Nature, vol. 430, pp. 88-93, 2004.
[15]
T. Hawkins, S. Luban, and D. Kihara, "Enhanced Automated Function Prediction Using Distantly Related Sequences and Contextual Association by PFP," Protein Science, vol. 15, pp. 1550-1556, June 2006.
[16]
H. Hishigaki, K. Nakai, T. Ono, A. Tanigami, and T. Takagi, "Assessment of Prediction Accuracy of Protein Function from Protein-Protein Interaction Data," Yeast, vol. 18, pp. 523-531, 2001.
[17]
U. Karaoz, T.M. Murali, S. Letovsky, Y. Zheng, C. Ding, C.R. Cantor, and S. Kasif, "Whole-Genome Annotation by Using Evidence Integration in Functional-Linkage Networks," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 2888-2893, 2004.
[18]
S. Kim, J. Lund, M. Kiraly, K. Duke, M. Jiang, J. Stuart, A. Eizinger, B. Wylie, and G. Davidson, "A Gene Expression Map for Caenorhabditis Elegans," Science, vol. 293, pp. 2087-2092, Sept. 2001.
[19]
O.D. King, R.E. Foulger, S.S. Dwight, J.V. White, and F.P. Roth, "Predicting Gene Function from Patterns of Annotation," Genome Research, vol. 13, pp. 896-904, May 2003.
[20]
M. Kirac and G. Ozsoyoglu, "Protein Function Prediction Based on Patterns in Biological Networks," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 197-213, 2008.
[21]
S. Kohler, S. Bauer, D. Horn, and P.N. Robinson, "Walking the Interactome for Prioritization of Candidate Disease Genes," Am. J. Human Genetics, vol. 82, pp. 949-958, Apr. 2008.
[22]
S. Letovsky and S. Kasif, "Predicting Protein Function from Protein/Protein Interaction Data: A Probabilistic Approach," Bioinformatics, vol. 19, i197-i204, 2003.
[23]
K. Maciag, S. Altschuler, M. Slack, N. Krogan, A. Emili, J. Greenblatt, T. Maniatis, and L. Wu, "Systems-Level Analysis Identify Extensive Coupling Among Gene Expression Machines," Molecular Systems Biology, vol. 2, http://www.nature.com/msb/ journal/v2/n1/full/msb4100045.html, 2006.
[24]
C.V. Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, and B. Snel, "String: A Database of Predicted Functional Associations between Proteins," Nucleic Acids Research, vol. 31, pp. 258-261, 2003.
[25]
H.W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkotter, S. Rudd, and B. Weil, "Mips: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 30, pp. 31-34, 2002.
[26]
E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh, "Whole-Proteome Prediction of Protein Function via Graph-Theoretic Analysis of Interaction Maps," Bioinformatics, vol. 21, i302-i310, 2005.
[27]
K. O'Brien, M. Remm, and E. Sonnhammer, "Inparanoid: A Comprehensive Database of Eukaryotic Orthologs," Nucleic Acids Research, vol. 33, D476-D480, Jan. 2005.
[28]
M.P. Samanta and S. Liang, "Predicting Protein Functions from Redundancies in Large-Scale Protein Interaction Networks," Proc. Nat'l Academy of Sciences USA, vol. 100, pp. 12579-12583, 2003.
[29]
B. Schwikowski, P. Uetz, and S. Fields, "A Network of Protein-Protein Interactions in Yeast," Nature Biotechnology, vol. 18, pp. 1257-1261, 2000.
[30]
P. Sen, G.M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad, "Collective Classification in Network Data," Technical Report CS-TR-4905, Univ. of Maryland, College Park, 2008.
[31]
R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, pp. 1-13, 2007.
[32]
V. Spirin and L. Mirny, "Protein Complexes and Functional Modules in Molecular Networks," Proc. Nat'l Academy of Sciences USA, pp. 12123-12128, 2003.
[33]
O. Vanunu and R. Sharan, "A Propagation-Based Algorithm for Inferring Gene-Disease Associations," Proc. German Conf. Bioinformatics, 2008.
[34]
Y. Wu and S. Lonardi, "A Linear-Time Algorithm for Predicting Functional Annotations from PPI Networks," J. Bioinformatics and Computational Biology, vol. 6, pp. 1049-1065, Dec. 2008.
[35]
G.X. Yu, E.M. Glass, N.T. Karonis, and N. Maltsev, "Knowledge-Based Voting Algorithm for Automated Protein Functional Annotation," Proteins: Structure, Function, and Bioinformatics, vol. 61, pp. 907-917, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 7, Issue 2
April 2010
189 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 April 2010
Published in TCBB Volume 7, Issue 2

Author Tags

  1. Gene function prediction
  2. classification
  3. feature extraction
  4. functional interaction network.

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media