research-article

Applying Monte Carlo simulation to biomedical literature to approximate genetic network

Authors:

Dirar Al Homouz,

Murad QasaimehAuthors Info & Claims

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 13, Issue 3

Pages 494 - 504

https://doi.org/10.1109/TCBB.2015.2481399

Published: 01 May 2016 Publication History

Abstract

Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g, the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g. It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.

References

[1]

A. Butte and I. Kohane, "Relevance networks: a first step toward finding genetic regulatory networks within microarray data," in The Analysis of Gene Expression Data. New York, NY, USA: Springer, 2003, pp. 428--446.

[2]

A. Wong and H. Shatkay, "Protein function prediction using text-based features extracted from the biomedical literature," The CAFA Challenge BMC Bioinformat., vol. 14 (Suppl 3), p. S14, 2013.

[3]

L. A. Adamic, et al., "A literature based method for identifying gene-disease connections," in Proc. IEEE Comput. Soc. Conf. Bioin-format., Stanford, CA, USA, 2002, pp. 109--117.

Digital Library

[4]

H. Al-Mubaid and R. K. Singh, "A new text mining approach for finding protein-to-disease associations," Am. J. Biochem. Biotech-nol., vol. 1, pp. 145--152, 2005.

[5]

B. T. F. Alako, A. Veldhoven, S. van Baal, R. Jelier, S. Verhoeven, T. Rullmann, J. Polman, and G. Jenster, "CoPub Mapper: Mining MEDLINE based on search term co-publication," BMC Bioinformat., vol. 6, p. 51, 2005.

[6]

S. Bauer, J. Gagneur, and P. N. Robinson, "GOing Bayesian: Model-based gene set analysis of genome-scale data," Nucleic Acids Res., vol. 38, pp. 3523--3532, 2010.

[7]

S. Benabderrahmane, M. Smail-Tabbone, A. Napoli, and M. Devignes, "IntelliGO: A new vector-based semantic similarity measure including annotation origin," BMC Bioinformat., vol. 11, p. 588, 2010.

[8]

S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," Comput. Netw. ISDN Syst., vol. 30, pp. 107--117, 1998.

Digital Library

[9]

A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, D. A. Natale, C. O'Donovan, N. Redaschi, and L. L. Yeh, "The universal protein resource (UniProt).," Nucleic Acids Res., vol. 33, no. 1, pp. 154--159, 2005.

[10]

J. Baran, M. Gerner, M. Haeussler, G. Nenadic, and C. M. Bergman, " 2ensembl: A resource for mining the biological literature on genes," PLoS ONE, vol. 6, no. 9, p. e24716, 2011.

[11]

P. Carmona-Saez, M. Chagoyen, F. Tirado, J. M. Carazo, and A. Pascual-Montano, "GENECODIS: A web-based tool for finding significant concurrent annotations in gene lists," Genome Biol., vol. 8, p. R3, 2007.

[12]

J. Cheng, M. Cline, J. Martin, D. Finkelstein, T. Awad, D. Kulp, and M. A. Siani-Rose, "A knowledge-based clustering algorithm driven by gene ontology," J. Biopharmaceutical Statist., vol. 14, pp. 687--00, 2004.

[13]

T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Eds. Introduction to Algorithms. New York, NY, USA: McGraw-Hill, 2001.

Digital Library

[14]

(2013). DBGET database. {Online}. Available: http://www.genome.jp/dbget

[15]

(2015). DisGeNET: A database for gene-disease associations. {Online}. Available: http://www.disgenet.org/web/DisGeNET/v2.1/downloads

[16]

A. Doms and M. Schroeder, "Go : Exploring with the gene ontology," Nucleic Acids Res., vol. 33, no. 2, pp. W783--W786, 2005.

[17]

E. H. Davidson, J. P. Rast, P. Oliveri, A. Ransick, C. Calestani, C. H. Yuh, and T. Minokawa, "A genomic regulatory network for development," Science, vol. 295, pp. 1669--1678, 2002.

[18]

I. Donaldson, J. Martin, B. de Bruijn, C. Wolting, V. Lay, B. Tuekam, S. Zhang, B. Baskin, G. D. Bader, K. Michalickova, T. Pawson, and C. W. V. Hogue, "PreBIND and Textomy---Mining the biomedical literature for protein-protein interactions using a support vector machine," BMC Bioinformat., vol. 4, pp. 11, 2003.

[19]

H. Frohlich, "GOSim---An R-package for computation of information theoretic GO similarities between terms and gene products," BMC Bioinformat., vol. 8, no. 1, p. 166, 2007.

[20]

P. H. Guzzi, M. Mina, C. Guerra, and M. Cannataro, "Semantic similarity analysis of protein data: assessment with biological features and issues," Brief Bioinformat., vol. 13, pp. 569--585, 2012.

[21]

E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, and A. Valencia, "EnrichNet: Network-based gene set enrichment analysis," Bioinformatics, vol. 28, no. 18, pp. i451--i457, 2012.

Digital Library

[22]

X. Guo, R. Liu, C. D. Shriver, H. Hu, and M. N. Liebman, "Assessing semantic similarity measures for the characterization of human regulatory pathways," Bioinformatics, vol. 22, no. 8, pp. 967--973, 2006.

Digital Library

[23]

(2016, Apr.). Gene Ontology. {Online}. Available: http://www.geneontology.org/

[24]

E. Glaab, A. Baudot, N. Krasnogor, R. Schneider, and A. Valencia, "EnrichNet: Network-based gene set enrichment analysis," Bioinformatics, vol. 28, no. 18, pp. i451--i457, 2012.

Digital Library

[25]

H. Yaghoobi, S. Haghipour, H. Hamzeiy, and M. Asadi-Khiavi, "A review of modeling techniques for genetic regulatory networks," J. Med. Signals Sens., vol. 2, no. 1, pp. 61--0, 2012.

[26]

H. Jong, "Modeling and simulation of geneticregulatory systems: A literature review," J. Comput. Biol., vol. 9, no. 1, pp. 67--103, 2004.

[27]

I. Shmulevich, E. Dougherty, and W. Zhang, "From Boolean to probabilistic Boolean networks as models of genetic regulatory networks," Proc. IEEE, vol. 90, no. 11, pp. 1778--1792, Nov. 2002.

[28]

T. -K. Jenssen, A. Lægreid, J. Komorowski, and E. Hovig, "A literature network of human genes for high-throughput analysis of gene expression," Nature Genetics, vol. 28, pp. 21--28, 2001.

[29]

J. Jiang and D. Conrath, "Semantic similarity based on corpus statistics and lexical taxonomy," presented at the Proc. 10th Int. Conf. Res. Comput. Linguistics, Taipei, Taiwan, 1997.

[30]

T. Lengauer, A. Schlicker, and M. Mario Albrecht, "Improving disease gene prioritization using the semantic similarity of gene ontology terms," Bioinformatics, vol. 26, pp. i561--i567, 2010.

Digital Library

[31]

Z. Lei and Y. Dai, "Assessing protein similarity with gene ontology and its use in subnuclear localization prediction," BMC Bioinformat., vol. 7, p. 491, 2006.

[32]

H. Liu, Z. Hu, and C. Wu, "DynGO: A tool for visualizing and mining of gene ontology and its associations," BMC Bioinformat., vol. 6, p. 201, 2005.

[33]

I. Lee, Z. Li, and E. M. Marcotte, "An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae," PLoS ONE, vol. 2, p. e988, 2007.

[34]

D. Lin, "An information-theoretic definition of similarity, semantic similarity based on corpus statistics and lexical taxonomy," in Proc. 15th Int. Conf. Mach. Learning, 1998, pp. 296--304.

Digital Library

[35]

R. Massanet, J. Gallardo-Chacon, P. Caminal, and A. Perera, "Search of phenotype related candidate genes using gene ontology-based semantic similarity and protein interaction information: Application to Brugada syndrome," presented at the Proc. 31st Annu. Int. Conf. IEEE EMBS, Minneapolis, MN, USA, Sep. 2--6, 2009.

[36]

M. Song, C. Lewis, E. Lance, E. Chesler, R. Yordanova, and M. Langston, "Reconstructing generalized logical networks of transcriptional regulation in mouse brain from temporal gene expression data," EURASIP J. Bioinformat. Syst. Biol., pp. 1--13, 2009.

Digital Library

[37]

K. Martin, M. Rainer, and V. Alfonso, "Text mining and protein annotations: The construction and use of protein description sentences," Genome Informat., vol. 17, no. 2, pp. 121--130, 2006.

[38]

(2015). Morbid Map of the OMIM downloads. {Online}. Available: http://www.omim.org/downloads

[39]

N. Friedman, M. Linial, I. Nachman, and D. Pe'er, "Using Bayes-ian networks to analyze expression data," J. Comput. Biol., vol. 7, pp. 601--620, 2000.

[40]

A. Ozgür, T. Vu, G. Erkan, and D. R. Radev, "Identifying gene-disease associations using centrality on a literature mined gene-interaction network," Bioinformatics, vol. 24, no. 13, pp. i277--i285, 2008.

Digital Library

[41]

P. D'Haeseleer, X. Wen, S. Fuhrman, and R. Somogyi, "Linear modeling of mRNA expression levels during CNS development and injury," in Proc. Pacific Symp. Biocomput., 1999, pp. 41--52.

[42]

R. Armaanzas, I. Inza, and P. Larraaga, "Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers," Comput. Methods Programs Biomed., vol. 91, pp. 110--121, 2008.

Digital Library

[43]

P. Resnik, "Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language," J. Artif. Intell. Res., vol. 11, pp. 95--130, 1999.

[44]

(2015). Sanger Pfam database. {Online}. Available: http://pfam.sanger.ac.uk/

[45]

Y. Shen, S. Zhang, H. Wong, and L. Zhang, "Characterisation of semantic similarity on gene ontology based on a shortest path approach," Int. J. Data Mining Bioinformat., vol. 10, no. 1, pp. 33--48, 2014.

Digital Library

[46]

M. Snoeck and G. Dedene, "Existence dependency: The key to semantic integrity between structural and behavioral aspects of object types," IEEE Trans. Softw. Eng., vol. 24, no. 24, pp. 233--251, Apr. 1998.

Digital Library

[47]

(2015). SGD: Saccharomyces Genome Database. {Online}. Available: http://www.yeastgenome.org/download-data/curation

[48]

K. Taha, "Determining the semantic similarities among gene ontology terms," IEEE J. Biomed. Health Informat., vol. 17, no. 3, pp. 512--525, May 2013.

[49]

K. Taha, D. Al Homouz, H. Al Muhairi, and Z. Al Mahmoud, "GRank: A middleware search engine for ranking genes by relevance to given genes," BMC Bioinformat., vol. 14, p. 251, 2013.

[50]

T. Chen, H. He, and G. Church, "Modeling gene expression with differential equations," in Proc. Pacific Symp. Biocomput., 1999, pp. 29--40.

[51]

K. Taha, "Determining semantically related significant genes," IEEE/ACM Trans. Comput. Biol. Bioinformat., vol. 11, no. 6, pp. 1119--1130, Nov./Dec. 2014.

Digital Library

[52]

K. Taha, "GRtoGR: A system for mapping GO relations to gene relations," IEEE Trans. NanoBiosci., vol. 12, no. 4, pp. 1--9, Dec. 2013.

[53]

A. Tversky, "Features of Similarity," Psycholog. Rev, vol. 84, pp. 327--352, 1977.

[54]

K. Taha and R. Elmasri, "CXLEngine: A comprehensive XML loosely structured search engine," presented at the Int. Workshop-Proc. Database Technol. Handling XML Inform. Web (DataX'08), Nantes, France, Mar. 2008.

Digital Library

[55]

W. Lee and K. Yang, "A clustering-based approach for inferring recurrent neural networks as gene regulatory networks," Neurocomputing, vol. 71, pp. 600--610, 2008.

Digital Library

[56]

J. Wang, Z. Du, R. Payattakool, P. Yu, and C. Chen, "A new method to measure the semantic similarity of go terms," Bioinformatics, vol. 23, pp. 1274--1281, 2007.

Digital Library

[57]

Y. Song and S. Chen, "Text mining biomedical literature for constructing gene regulatory networks," Interdiscip. Sci., vol. 1, pp. 179--186, 2009.

[58]

G. Yu, F. Li, Y. Qin, X. Bo, Y. Wu, and S. Wang, "GOSemSim: An R package for measuring semantic similarity among GO terms and gene products," Bioinformatics, vol. 26, no. 7, pp. 976--978, 2010.

Digital Library

Cited By

Taha K(2018)Inferring the Functions of Proteins from the Interrelationships between Functional CategoriesIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.261560815:1(157-167)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1109/TCBB.2016.2615608

Applying Monte Carlo simulation to biomedical literature to approximate genetic network
1. Applied computing
  1. Life and medical sciences

Recommendations

An Integrative Tool for Gene Regulatory Network Reconstruction Based on Microarray Data
BIBE '09: Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering

The transcriptional regulation of gene expression has been known to be a key mechanism in the functioning of the cell and the gene expression is influenced by the transcriptional regulatory strengths. In this paper we extend the function of a former ...
Integration of epigenetic data in Bayesian network modeling of gene regulatory network
PRIB'11: Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics

The reverse engineering of gene regulatory network (GRN) is an important problem in systems biology. While gene expression data provide a main source of insights, other types of data are needed to elucidate the structure and dynamics of gene regulation. ...
Inference of gene regulatory network using modified genetic algorithm
ISB '10: Proceedings of the International Symposium on Biocomputing

The major challenge of inferring genetic network is mining the dependencies and regulating relationship among genes. The paper tries to address this problem using Genetic Algorithms to infer the transcription regulatory network. While Genetic Algorithms(...

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 13, Issue 3

May/June 2016

200 pages

ISSN:1545-5963

Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 May 2016

Published in TCBB Volume 13, Issue 3

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
44
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Taha K(2018)Inferring the Functions of Proteins from the Interrelationships between Functional CategoriesIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.261560815:1(157-167)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1109/TCBB.2016.2615608

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents