skip to main content
research-article

Applying Monte Carlo simulation to biomedical literature to approximate genetic network

Published: 01 May 2016 Publication History

Abstract

Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g, the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g. It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.

References

[1]
A. Butte and I. Kohane, "Relevance networks: a first step toward finding genetic regulatory networks within microarray data," in The Analysis of Gene Expression Data. New York, NY, USA: Springer, 2003, pp. 428--446.
[2]
A. Wong and H. Shatkay, "Protein function prediction using text-based features extracted from the biomedical literature," The CAFA Challenge BMC Bioinformat., vol. 14 (Suppl 3), p. S14, 2013.
[3]
L. A. Adamic, et al., "A literature based method for identifying gene-disease connections," in Proc. IEEE Comput. Soc. Conf. Bioin-format., Stanford, CA, USA, 2002, pp. 109--117.
[4]
H. Al-Mubaid and R. K. Singh, "A new text mining approach for finding protein-to-disease associations," Am. J. Biochem. Biotech-nol., vol. 1, pp. 145--152, 2005.
[5]
B. T. F. Alako, A. Veldhoven, S. van Baal, R. Jelier, S. Verhoeven, T. Rullmann, J. Polman, and G. Jenster, "CoPub Mapper: Mining MEDLINE based on search term co-publication," BMC Bioinformat., vol. 6, p. 51, 2005.
[6]
S. Bauer, J. Gagneur, and P. N. Robinson, "GOing Bayesian: Model-based gene set analysis of genome-scale data," Nucleic Acids Res., vol. 38, pp. 3523--3532, 2010.
[7]
S. Benabderrahmane, M. Smail-Tabbone, A. Napoli, and M. Devignes, "IntelliGO: A new vector-based semantic similarity measure including annotation origin," BMC Bioinformat., vol. 11, p. 588, 2010.
[8]
S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," Comput. Netw. ISDN Syst., vol. 30, pp. 107--117, 1998.
[9]
A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, D. A. Natale, C. O'Donovan, N. Redaschi, and L. L. Yeh, "The universal protein resource (UniProt).," Nucleic Acids Res., vol. 33, no. 1, pp. 154--159, 2005.
[10]
J. Baran, M. Gerner, M. Haeussler, G. Nenadic, and C. M. Bergman, " 2ensembl: A resource for mining the biological literature on genes," PLoS ONE, vol. 6, no. 9, p. e24716, 2011.
[11]
P. Carmona-Saez, M. Chagoyen, F. Tirado, J. M. Carazo, and A. Pascual-Montano, "GENECODIS: A web-based tool for finding significant concurrent annotations in gene lists," Genome Biol., vol. 8, p. R3, 2007.
[12]
J. Cheng, M. Cline, J. Martin, D. Finkelstein, T. Awad, D. Kulp, and M. A. Siani-Rose, "A knowledge-based clustering algorithm driven by gene ontology," J. Biopharmaceutical Statist., vol. 14, pp. 687--00, 2004.
[13]
T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Eds. Introduction to Algorithms. New York, NY, USA: McGraw-Hill, 2001.
[14]
(2013). DBGET database. {Online}. Available: http://www.genome.jp/dbget
[15]
(2015). DisGeNET: A database for gene-disease associations. {Online}. Available: http://www.disgenet.org/web/DisGeNET/v2.1/downloads
[16]
A. Doms and M. Schroeder, "Go : Exploring with the gene ontology," Nucleic Acids Res., vol. 33, no. 2, pp. W783--W786, 2005.
[17]
E. H. Davidson, J. P. Rast, P. Oliveri, A. Ransick, C. Calestani, C. H. Yuh, and T. Minokawa, "A genomic regulatory network for development," Science, vol. 295, pp. 1669--1678, 2002.
[18]
I. Donaldson, J. Martin, B. de Bruijn, C. Wolting, V. Lay, B. Tuekam, S. Zhang, B. Baskin, G. D. Bader, K. Michalickova, T. Pawson, and C. W. V. Hogue, "PreBIND and Textomy---Mining the biomedical literature for protein-protein interactions using a support vector machine," BMC Bioinformat., vol. 4, pp. 11, 2003.
[19]
H. Frohlich, "GOSim---An R-package for computation of information theoretic GO similarities between terms and gene products," BMC Bioinformat., vol. 8, no. 1, p. 166, 2007.
[20]
P. H. Guzzi, M. Mina, C. Guerra, and M. Cannataro, "Semantic similarity analysis of protein data: assessment with biological features and issues," Brief Bioinformat., vol. 13, pp. 569--585, 2012.
[21]
E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, and A. Valencia, "EnrichNet: Network-based gene set enrichment analysis," Bioinformatics, vol. 28, no. 18, pp. i451--i457, 2012.
[22]
X. Guo, R. Liu, C. D. Shriver, H. Hu, and M. N. Liebman, "Assessing semantic similarity measures for the characterization of human regulatory pathways," Bioinformatics, vol. 22, no. 8, pp. 967--973, 2006.
[23]
(2016, Apr.). Gene Ontology. {Online}. Available: http://www.geneontology.org/
[24]
E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, and A. Valencia, "EnrichNet: Network-based gene set enrichment analysis," Bioinformatics, vol. 28, no. 18, pp. i451--i457, 2012.
[25]
H. Yaghoobi, S. Haghipour, H. Hamzeiy, and M. Asadi-Khiavi, "A review of modeling techniques for genetic regulatory networks," J. Med. Signals Sens., vol. 2, no. 1, pp. 61--0, 2012.
[26]
H. Jong, "Modeling and simulation of geneticregulatory systems: A literature review," J. Comput. Biol., vol. 9, no. 1, pp. 67--103, 2004.
[27]
I. Shmulevich, E. Dougherty, and W. Zhang, "From Boolean to probabilistic Boolean networks as models of genetic regulatory networks," Proc. IEEE, vol. 90, no. 11, pp. 1778--1792, Nov. 2002.
[28]
T. -K. Jenssen, A. Lægreid, J. Komorowski, and E. Hovig, "A literature network of human genes for high-throughput analysis of gene expression," Nature Genetics, vol. 28, pp. 21--28, 2001.
[29]
J. Jiang and D. Conrath, "Semantic similarity based on corpus statistics and lexical taxonomy," presented at the Proc. 10th Int. Conf. Res. Comput. Linguistics, Taipei, Taiwan, 1997.
[30]
T. Lengauer, A. Schlicker, and M. Mario Albrecht, "Improving disease gene prioritization using the semantic similarity of gene ontology terms," Bioinformatics, vol. 26, pp. i561--i567, 2010.
[31]
Z. Lei and Y. Dai, "Assessing protein similarity with gene ontology and its use in subnuclear localization prediction," BMC Bioinformat., vol. 7, p. 491, 2006.
[32]
H. Liu, Z. Hu, and C. Wu, "DynGO: A tool for visualizing and mining of gene ontology and its associations," BMC Bioinformat., vol. 6, p. 201, 2005.
[33]
I. Lee, Z. Li, and E. M. Marcotte, "An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae," PLoS ONE, vol. 2, p. e988, 2007.
[34]
D. Lin, "An information-theoretic definition of similarity, semantic similarity based on corpus statistics and lexical taxonomy," in Proc. 15th Int. Conf. Mach. Learning, 1998, pp. 296--304.
[35]
R. Massanet, J. Gallardo-Chacon, P. Caminal, and A. Perera, "Search of phenotype related candidate genes using gene ontology-based semantic similarity and protein interaction information: Application to Brugada syndrome," presented at the Proc. 31st Annu. Int. Conf. IEEE EMBS, Minneapolis, MN, USA, Sep. 2--6, 2009.
[36]
M. Song, C. Lewis, E. Lance, E. Chesler, R. Yordanova, and M. Langston, "Reconstructing generalized logical networks of transcriptional regulation in mouse brain from temporal gene expression data," EURASIP J. Bioinformat. Syst. Biol., pp. 1--13, 2009.
[37]
K. Martin, M. Rainer, and V. Alfonso, "Text mining and protein annotations: The construction and use of protein description sentences," Genome Informat., vol. 17, no. 2, pp. 121--130, 2006.
[38]
(2015). Morbid Map of the OMIM downloads. {Online}. Available: http://www.omim.org/downloads
[39]
N. Friedman, M. Linial, I. Nachman, and D. Pe'er, "Using Bayes-ian networks to analyze expression data," J. Comput. Biol., vol. 7, pp. 601--620, 2000.
[40]
A. Ozgür, T. Vu, G. Erkan, and D. R. Radev, "Identifying gene-disease associations using centrality on a literature mined gene-interaction network," Bioinformatics, vol. 24, no. 13, pp. i277--i285, 2008.
[41]
P. D'Haeseleer, X. Wen, S. Fuhrman, and R. Somogyi, "Linear modeling of mRNA expression levels during CNS development and injury," in Proc. Pacific Symp. Biocomput., 1999, pp. 41--52.
[42]
R. Armaanzas, I. Inza, and P. Larraaga, "Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers," Comput. Methods Programs Biomed., vol. 91, pp. 110--121, 2008.
[43]
P. Resnik, "Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language," J. Artif. Intell. Res., vol. 11, pp. 95--130, 1999.
[44]
(2015). Sanger Pfam database. {Online}. Available: http://pfam.sanger.ac.uk/
[45]
Y. Shen, S. Zhang, H. Wong, and L. Zhang, "Characterisation of semantic similarity on gene ontology based on a shortest path approach," Int. J. Data Mining Bioinformat., vol. 10, no. 1, pp. 33--48, 2014.
[46]
M. Snoeck and G. Dedene, "Existence dependency: The key to semantic integrity between structural and behavioral aspects of object types," IEEE Trans. Softw. Eng., vol. 24, no. 24, pp. 233--251, Apr. 1998.
[47]
(2015). SGD: Saccharomyces Genome Database. {Online}. Available: http://www.yeastgenome.org/download-data/curation
[48]
K. Taha, "Determining the semantic similarities among gene ontology terms," IEEE J. Biomed. Health Informat., vol. 17, no. 3, pp. 512--525, May 2013.
[49]
K. Taha, D. Al Homouz, H. Al Muhairi, and Z. Al Mahmoud, "GRank: A middleware search engine for ranking genes by relevance to given genes," BMC Bioinformat., vol. 14, p. 251, 2013.
[50]
T. Chen, H. He, and G. Church, "Modeling gene expression with differential equations," in Proc. Pacific Symp. Biocomput., 1999, pp. 29--40.
[51]
K. Taha, "Determining semantically related significant genes," IEEE/ACM Trans. Comput. Biol. Bioinformat., vol. 11, no. 6, pp. 1119--1130, Nov./Dec. 2014.
[52]
K. Taha, "GRtoGR: A system for mapping GO relations to gene relations," IEEE Trans. NanoBiosci., vol. 12, no. 4, pp. 1--9, Dec. 2013.
[53]
A. Tversky, "Features of Similarity," Psycholog. Rev, vol. 84, pp. 327--352, 1977.
[54]
K. Taha and R. Elmasri, "CXLEngine: A comprehensive XML loosely structured search engine," presented at the Int. Workshop-Proc. Database Technol. Handling XML Inform. Web (DataX'08), Nantes, France, Mar. 2008.
[55]
W. Lee and K. Yang, "A clustering-based approach for inferring recurrent neural networks as gene regulatory networks," Neurocomputing, vol. 71, pp. 600--610, 2008.
[56]
J. Wang, Z. Du, R. Payattakool, P. Yu, and C. Chen, "A new method to measure the semantic similarity of go terms," Bioinformatics, vol. 23, pp. 1274--1281, 2007.
[57]
Y. Song and S. Chen, "Text mining biomedical literature for constructing gene regulatory networks," Interdiscip. Sci., vol. 1, pp. 179--186, 2009.
[58]
G. Yu, F. Li, Y. Qin, X. Bo, Y. Wu, and S. Wang, "GOSemSim: An R package for measuring semantic similarity among GO terms and gene products," Bioinformatics, vol. 26, no. 7, pp. 976--978, 2010.

Cited By

View all
  1. Applying Monte Carlo simulation to biomedical literature to approximate genetic network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
    IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 13, Issue 3
    May/June 2016
    200 pages

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 01 May 2016
    Published in TCBB Volume 13, Issue 3

    Author Tags

    1. Monte Carlo simulation
    2. biological NLP
    3. biomedical literature
    4. gene regulatory network
    5. gene-disease associations
    6. information extraction
    7. text mining

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media