skip to main content
research-article

Inferring the Functions of Proteins from the Interrelationships between Functional Categories

Published: 01 January 2018 Publication History

Abstract

This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein $p$ can be considered as characteristics terms for $p$, which are highly predictive of the potential functions of $p$. Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category $f$ can be considered as characteristics terms of $f$. We introduce in this paper an information extraction system called IFP_IFC that predicts the functions of an unannotated protein $p$ by representing $p$ and each functional category $f$ by a vector of weights. Each weight reflects the degree of association between a characteristic term and $p$ or a characteristic term and $f$. First, IFP_IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of $p$ by employing random walks with restarts on the mentioned network. The walker is the vector of $p$ . Finally, $p$ is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP_IFC by comparing it experimentally with two other systems. Results showed marked improvement.

References

[1]
A. A. Freitas, D. C. Wieser, and R. Apweiler, "On the importance of comprehensible classification models for protein function prediction," IEEE/ACM Trans. Comp. Biol. Bioinf., vol. 7, no. 1, pp. 172-182, Jan.-Mar. 2010.
[2]
A. Mitrofanova, V. Pavlovic, and B. Mishra, "Prediction of protein functions with gene ontology and inter-species protein homology data," IEEE/ACM Trans. Comp. Biol. Bioinf., vol. 8, no. 3, pp. 775- 784, May/Jun. 2011.
[3]
A. Yakushiji, Y. Tateisi, Y. Miyao, and J. Tsujii, Event extraction from biomedical papers using a full parser, in Proc. 6th Pacific Symp. Biocomputing, 2001, vol. 6, pp. 408-419.
[4]
R. Al-Dalky, K. Taha, D. Al Homouz, and M. Qasaimeh, "Applying Monte Carlo simulation to biomedical literature to approximate genetic network," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 13, no. 3, pp. 494-504, May/Jun. 2016.
[5]
S. Brady and H. Shatkay, "Epiloc: A text-based system for predicting protein subcellular location," Pacific Symp. Biocomputing, 2008, vol. 13, pp. 604-615
[6]
A. Bairoch, et al., "The universal protein resource (UniProt)," Nucleic Acids Res., 2005, vol. 33, no. 1, pp. 154-159.
[7]
A. Conesa, S. Götz, J. M. García-Góme, J. Terol, M. Talón, and M. Robles, "Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research," Bioinf., 2005, vol. 21, no. 18, pp. 3674-3676.
[8]
CAFA, Automated Protein Function Prediction. [Online]. Available: http://biofunctionprediction.org/node/8
[9]
N. Domedel-Puig and L. Wernisch, "Applying GIFT, a gene interactions finder in text, to fly literature," Bioinf., 2005, vol. 21, no. 17, pp. 3582-3583.
[10]
E. M. Marcotte, I. Xenarios, and D. Eisenberg, "Mining literature for protein-protein interactions," Bioinf., vol. 17, pp. 359-363, 2001.
[11]
R. D. Finn, et al., "The Pfam protein families database: Towards a more sustainable future," Nucleic Acids Res, vol. 44, pp. D279- D285, 2016.
[12]
Gene Ontology (GO). [Online]. Available: http://www.geneontology.org/
[13]
S. Kim and W. J. Wilbur, "Classifying protein-protein interaction articles using word and syntactic features," BMC Bioinf., vol. 12, no. 8, 2010, Art. no. 59.
[14]
D. M. Martin, M. Berriman, and G. J. Barton, "GOtcha: A new method for prediction of protein function assessed by the annotation of seven genomes," BMC Bioinf., 2004, vol. 5, Art. no. 178.
[15]
M. Krallinger, R. Malik, and A. Valencia, "Text mining and protein annotations: The construction and use of protein description sentences," Genome Informat., vol. 17, no. 2, pp. 121-130.
[16]
A. Mitchell, et al., "The InterPro protein families database: The classification resource after 15 years," Nucleic Acids Res., 2015 vol. 43, pp. D213-D221.
[17]
A. Ozgür, T. Vu, G. Erkan, and D. R. Radev, "Identifying gene-disease associations using centrality on a literature mined gene-interaction network," Bioinf., 2008, vol. 24, no. 13, pp. i277-i285.
[18]
P. Bogdanov and A. K. Singh, "Molecular function prediction using neighborhood features," IEEE/ACM Trans. Comput. Biol. Bioinf., 2010, vol. 7, no. 2, pp. 208-217, Mar./Apr. 2010.
[19]
S. Tsumoto, R. Slowinski, and J. Grzymala-Busse, "Evaluation of two dependency parsers on biomedical corpus targeted at protein-- Protein interactions," 2004.
[20]
H. Shatkay, S. Brady, and A. Wong, "Text as data: Using text-based features for proteins representation and for computational prediction of their characteristics," Methods, vol. 74, pp. 54-64, 2015.
[21]
Stanford Tokenizer, Part-of-Speech Tagger, and Named Entity Recognizer. [Online]. Available: http://nlp.stanford.edu/software/
[22]
J. M. Schwarz, C. Rodelsperger, M. Schuelke, and D. Seelow, "MutationTaster evaluates disease-causing potential of sequence alterations," Nat. Methods, vol. 7, no. 8, pp. 575-576, 2010.
[23]
K. Taha, "Predicting the functions of a protein from its ability to associate with other molecules," BMC Bioinf., vol. 17, 2016, Art. no. 34.
[24]
H. Tong, C. Faloutsos, and J.-Y. Pan, "Fast random walk with restart and its applications," in Proc. Sixth Int. Conf. Data Mining, 2006, pp. 613-622.
[25]
K. Taha, "Extracting various classes of data from biological text using the concept of existence dependency," IEEE J. Biomed. Health Informat., vol. 19, no. 6, pp. 1918-1928, Nov. 2015.
[26]
K. Taha and P. Yoo, "An information extraction system for protein function prediction," presented at the 12th IEEE Comput. Intell. Bioinf. Comput. Biol., Niagara Falls, ON, Canada, Aug. 2015.
[27]
M. Verspoor, D. Cohn, E. Ravikumar, and E. Wall, "Text mining improves prediction of protein functional sites," PLoS One, vol. 7, no. 2, 2012, Art. no. e32171.
[28]
A. Wong and H. Shatkay, "Protein function prediction using textbased features extracted from the biomedical literature: The CAFA challenge," BMC Bioinf., vol. 14, no. Suppl 3, 2013, Art. no. S14.
[29]
M. Wass and M. Sternberg, "ConFunc--Functional annotation in the twilight zone," Bioinformatics, 2008, vol. 24, no. 6, pp. 798-806.
[30]
K. C. Wong and Z. Zhang, "SNPdryad: Predicting deleterious non-synonymous human SNPs using only orthologous protein sequences," Bioinf., vol. 30, no. 8, Apr. 2014, Art. no. 1112.
[31]
R. M. Warner, Applied Statistics: From Bivariate through Multivariate Techniques. Thousand Oaks, CA, USA: SAGE Publications, 2013.
[32]
X. Chi and J. Hou, "An iterative approach of protein function prediction," BMC Bioinf., vol. 12, no. 1, 2011, Art no. 437.

Cited By

View all
  1. Inferring the Functions of Proteins from the Interrelationships between Functional Categories

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
    IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 15, Issue 1
    January 2018
    352 pages

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 01 January 2018
    Published in TCBB Volume 15, Issue 1

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media