skip to main content
10.1145/2166896.2166918acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesswat4lsConference Proceedingsconference-collections
research-article

Semantics-aware open information extraction in the biomedical domain

Published: 07 December 2011 Publication History

Abstract

The increasing amount of biomedical scientific literature published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on recognizing well-defined entities such as genes or proteins, which constitutes the basis for extracting the relations between the recognized entities. Most of the work has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of biomedical relations from text. The method is not geared to any specific sub-domain (e.g. protein-protein interactions, drug-drug interactions, etc.) and does not require any manual input or deep processing. Even better, the method uses the extracted relations to infer a set of abstract semantic relations and their signature types, which constitutes a valuable source of knowledge when constructing formal knowledge bases. We enable seamless integration of the extracted relations with the available biomedical resources through the process of semantic annotation. The proposed approach has successfully been applied to the CALBC corpus (i.e. almost a million text documents) and UMLS has been used as knowledge resource for semantic annotation.

References

[1]
C. B. Ahlers, M. Fiszman, D. Demner-Fushman, F.-M. Lang, and T. C. Rindflesch. Extracting semantic predications from medline citations for pharmacogenomics. In R. B. Altman, A. K. Dunker, L. Hunter, T. Murray, and T. E. Klein, editors, Pacific Symposium on Biocomputing, pages 209--220. World Scientific, 2007.
[2]
A. R. Aronson and F.-M. Lang. An overview of metamap: historical perspective and recent advances. JAMIA, 17(3):229--236, 2010.
[3]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In M. M. Veloso, editor, IJCAI, pages 2670--2676, 2007.
[4]
M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In ACL, pages 28--36. The Association for Computer Linguistics, 2008.
[5]
R. Berlanga, V. Nebot, and E. Jiménez-Ruiz. Semantic annotation of biomedical texts through concept retrieval. Procesamiento de Lenguaje Natural, 45:247--250, 2010.
[6]
T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 284(5):34--43, May 2001.
[7]
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H.-P. Kriegel. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics, 9, 2008.
[8]
A. Coulet, N. H. Shah, Y. Garten, M. Musen, and R. B. Altman. Using text to build semantic networks for pharmacogenomics. J. of Biomedical Informatics, 43:1009--1019, December 2010.
[9]
D. Rebholz-Schuhmann et al. CALBC silver standard corpus. J Bioinform Comput Biol, 8(1):163--79, 2010.
[10]
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In EACL. The Association for Computer Linguistics, 2006.
[11]
M. Huang, X. Zhu, S. Ding, H. Yu, and M. Li. Onbrires: Ontology-based biological relation extraction system. In T. Jiang, U.-C. Yang, Y.-P. P. Chen, and L. Wong, editors, APBC, pages 327--336. Imperial College Press, London, 2006.
[12]
M. Huang, X. Zhu, and M. Li. A hybrid method for relation extraction from biomedical literature. I. J. Medical Informatics, 75(6):443--455, 2006.
[13]
T. K. Jenssen, A. Laegreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature genetics, 28(1):21--28, May 2001.
[14]
J.-H. Kim, A. Mitchell, T. K. Attwood, and M. Hilario. Learning to extract relations for protein annotation. In ISMB/ECCB (Supplement of Bioinformatics), pages 256--263, 2007.
[15]
A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff. Semantic annotation, indexing, and retrieval. Web Semant., 2:49--79, December 2004.
[16]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '01, pages 111--119, New York, NY, USA, 2001. ACM.
[17]
J. Li, Z. Zhang, X. Li, and H. Chen. Kernel-based learning for biomedical relation extraction. J. Am. Soc. Inf. Sci. Technol., 59:756--769, March 2008.
[18]
A. Névéol and Z. Lu. Automatic integration of drug indications from multiple health resources. In Proceedings of the 1st ACM International Health Informatics Symposium, IHI '10, page 666--673, New York, NY, USA, 2010. ACM. ACM ID: 1883096.
[19]
J. C. Park, H. S. Kim, and J. jae Kim. Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar. In Pacific Symposium on Biocomputing, pages 396--407, 2001.
[20]
D. Rebholz-Schuhmann, M. Arregui, S. Gaudan, H. Kirsch, and A. Jimeno-Yepes. Text processing through Web services: calling Whatizit. Bioinformatics, 24(2):296--298, 2008.
[21]
D. Rebholz-Schuhmann, A. Jimeno-Yepes, M. Arregui, and H. Kirsch. Measuring prediction capacity of individual verbs for the identification of protein interactions. Journal of Biomedical Informatics, 43(2):200--207, 2010.
[22]
L. Reeve and H. Han. Survey of semantic annotation platforms. In Proceedings of the 2005 ACM symposium on Applied computing, SAC '05, pages 1634--1638, New York, NY, USA, 2005. ACM.
[23]
B. Rosario and M. A. Hearst. Classifying semantic relations in bioscience texts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL '04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics.
[24]
P. Ruch. Automatic assignment of biomedical categories: toward a generic approach. Bioinformatics, 22(6):658--664, 2006.
[25]
L. Tari, S. Anwar, S. Liang, J. Cai, and C. Baral. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics, 26(18), 2010.
[26]
J. M. Temkin and M. R. Gilder. Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics, 19(16):2046--2053, 2003.

Cited By

View all

Index Terms

  1. Semantics-aware open information extraction in the biomedical domain

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SWAT4LS '11: Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
    December 2011
    139 pages
    ISBN:9781450310765
    DOI:10.1145/2166896
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Ontotext
    • Corporate Semantic Web: Corporate Semantic Web
    • BBRC: Biotechnology and Biological Sciences Research Council
    • NCBO: National Center for BioMedical Ontology
    • BioMed Central: BioMed Central

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 December 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. biomedical domain
    2. relation extraction
    3. semantic annotation

    Qualifiers

    • Research-article

    Conference

    SWAT4LS '11
    Sponsor:
    • Corporate Semantic Web
    • BBRC
    • NCBO
    • BioMed Central

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media