skip to main content
research-article

KBPearl: a knowledge base population system supported by joint entity and relation linking

Published: 01 March 2020 Publication History

Abstract

Nowadays, most openly available knowledge bases (KBs) are incomplete, since they are not synchronized with the emerging facts happening in the real world. Therefore, knowledge base population (KBP) from external data sources, which extracts knowledge from unstructured text to populate KBs, becomes a vital task. Recent research proposes two types of solutions that partially address this problem, but the performance of these solutions is limited. The first solution, dynamic KB construction from unstructured text, requires specifications of which predicates are of interest to the KB, which needs preliminary setups and is not suitable for an in-time population scenario. The second solution, Open Information Extraction (Open IE) from unstructured text, has limitations in producing facts that can be directly linked to the target KB without redundancy and ambiguity. In this paper, we present an end-to-end system, KBPearl, for KBP, which takes an incomplete KB and a large corpus of text as input, to (1) organize the noisy extraction from Open IE into canonicalized facts; and (2) populate the KB by joint entity and relation linking, utilizing the context knowledge of the facts and the side information inferred from the source text. We demonstrate the effectiveness and efficiency of KBPearl against the state-of-the-art techniques, through extensive experiments on real-world datasets.

References

[1]
R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In International Workshop on Algorithms and Models for the Web-Graph, pages 25--37. Springer, 2009.
[2]
G. Angeli, M. J. J. Premkumar, and C. D. Manning. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 344--354, 2015.
[3]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722--735. 2007.
[4]
O. D. Balalau, F. Bonchi, T. Chan, F. Gullo, and M. Sozio. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 379--388. ACM, 2015.
[5]
D. Bamman, T. Underwood, and N. A. Smith. A bayesian mixed effects model of literary character. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 370--379, 2014.
[6]
K. Bernhard and J. Vygen. Combinatorial optimization: Theory and algorithms. Springer, Third Edition, 2005., 2008.
[7]
R. Blanco, B. B. Cambazoglu, P. Mika, and N. Torzec. Entity recommendations in web search. In International Semantic Web Conference, pages 33--48. Springer, 2013.
[8]
J. Callan, M. Hoy, C. Yoo, and L. Zhao. Clueweb09 data set, 2009.
[9]
A. X. Chang and C. D. Manning. Sutime: A library for recognizing and normalizing time expressions. In Lrec, volume 2012, pages 3735--3740, 2012.
[10]
L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. R. Reiss, and S. Vaithyanathan. Systemt: an algebraic approach to declarative information extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 128--137. Association for Computational Linguistics, 2010.
[11]
J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes. Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the 9th International Conference on Semantic Systems, pages 121--124. ACM, 2013.
[12]
C. De Sa, A. Ratner, C. Re, J. Shin, F. Wang, S. Wu, and C. Zhang. Deepdive: Declarative knowledge base construction. ACM SIGMOD Record, 45(1):60--67, 2016.
[13]
L. Del Corro and R. Gemulla. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web, pages 355--366. ACM, 2013.
[14]
A. Delpeuch. Opentapioca: Lightweight entity linking for wikidata. arXiv preprint arXiv:1904.09131, 2019.
[15]
M. Dubey. test set for lcquad 2.0. 7 2019.
[16]
M. Dubey, D. Banerjee, D. Chaudhuri, and J. Lehmann. Earl: Joint entity and relation linking for question answering over knowledge graphs. In International Semantic Web Conference, pages 108--126. Springer, 2018.
[17]
M. Dubey, S. Dasgupta, A. Sharma, K. Hoffner, and J. Lehmann. Asknow: A framework for natural language query formalization in sparql. In European Semantic Web Conference, pages 300--316. Springer, 2016.
[18]
H. Elsahar, P. Vougiouklis, A. Remaci, C. Gravier, J. Hare, F. Laforest, and E. Simperl. T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), 2018.
[19]
A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In Proceedings of the conference on empirical methods in natural language processing, pages 1535--1545. Association for Computational Linguistics, 2011.
[20]
P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1625--1628. ACM, 2010.
[21]
L. Galarraga, G. Heitz, K. Murphy, and F. M. Suchanek. Canonicalizing open knowledge bases. In CIKM, pages 1679--1688, 2014.
[22]
L. A. Galarraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, pages 413--422, 2013.
[23]
E. Galbrun, A. Gionis, and N. Tatti. Top-k overlapping densest subgraphs. Data Mining and Knowledge Discovery, 30(5):1134--1165, 2016.
[24]
K. Gashteovski, R. Gemulla, and L. Del Corro. Minie: minimizing facts in open information extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2630--2640, 2017.
[25]
J. Getman, J. Ellis, S. Strassel, Z. Song, and J. Tracey. Laying the groundwork for knowledge base population: Nine years of linguistic resources for tac kbp. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
[26]
M. Glass and A. Gliozzo. A dataset for web-scale knowledge base population. In European Semantic Web Conference, pages 256--271. Springer, 2018.
[27]
S. Guo, M.-W. Chang, and E. Kiciman. To link or not to link? a study on end-to-end tweet entity linking. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1020--1030, 2013.
[28]
J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 782--792. Association for Computational Linguistics, 2011. 29>H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pages 1148--1158. Association for Computational Linguistics, 2011.
[29]
H. Ji, J. Nothman, B. Hachey, et al. Overview of tac-kbp2014 entity discovery and linking tasks. In Proc. Text Analysis Conference (TAC2014), pages 1333--1339, 2014.
[30]
D. S. Johnson and M. R. Garey. Computers and intractability: A guide to the theory of NP-completeness, volume 1. WH Freeman San Francisco, 1979.
[31]
X. Lin and L. Chen. Canonicalization of open knowledge bases with side information from the source text. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 950--961. IEEE, 2019.
[32]
X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1304--1311, 2013.
[33]
E. Loper and S. Bird. Nltk: the natural language toolkit. arXiv preprint cs/0205028, 2002.
[34]
W. Lu, Y. Zhou, H. Lu, P. Ma, Z. Zhang, and B. Wei. Boosting collective entity linking via type-guided semantic embedding. In National CCF Conference on Natural Language Processing and Chinese Computing, pages 541--553. Springer, 2017.
[35]
C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pages 55--60, 2014.
[36]
P. N. Mendes, J. Daiber, M. Jakob, and C. Bizer. Evaluating dbpedia spotlight for the tac-kbp entity linking task. In Proceedings of the TAC-KBP 2011 Workshop, pages 118--120, 2011.
[37]
P. N. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, pages 1--8. ACM, 2011.
[38]
F. Mesquita, M. Cannaviccio, J. Schmidek, P. Mirza, and D. Barbosa. Knowledgenet: A benchmark dataset for knowledge base population. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 749--758, 2019.
[39]
G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39--41, 1995.
[40]
A. Moro, A. Raganato, and R. Navigli. Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2:231--244, 2014.
[41]
I. O. Mulang, K. Singh, and F. Orlandi. Matching natural language relations to knowledge graph properties for question answering. In Proceedings of the 13th International Conference on Semantic Systems, pages 89--96. ACM, 2017.
[42]
N. Nakashole, G. Weikum, and F. Suchanek. Patty: a taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1135--1145. Association for Computational Linguistics, 2012.
[43]
D. B. Nguyen, A. Abujabal, N. K. Tran, M. Theobald, and G. Weikum. Query-driven on-the-fly knowledge base construction. PVLDB, 11(1):66--79, 2017.
[44]
F. Niu, C. Zhang, C. Re, and J. W. Shavlik. Deepdive: Web-scale knowledge-base construction using statistical learning and inference. VLDS, 12:25--28, 2012.
[45]
H. Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3):489--508, 2017.
[46]
M. Ponza, L. Del Corro, and G. Weikum. Facts that matter. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1043--1048, 2018.
[47]
D. M. Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2011.
[48]
J. R. Raiman and O. M. Raiman. Deeptype: multilingual entity linking by neural type system evolution. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[49]
H. Rosales-Mendez, B. Poblete, and A. Hogan. Multilingual entity linking: Comparing english and spanish. In LD4IE@ ISWC, pages 62--73, 2017.
[50]
M. Rospocher and F. Corcoglioniti. Joint posterior revision of nlp annotations via ontological knowledge. In IJCAI, pages 4316--4322, 2018.
[51]
A. Sakor, I. O. Mulang, K. Singh, S. Shekarpour, M. E. Vidal, J. Lehmann, and S. Auer. Old is gold: linguistic driven approach for entity and relation linking of short text. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2336--2346, 2019.
[52]
M. Schmitz, R. Bart, S. Soderland, O. Etzioni, et al. Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523--534. Association for Computational Linguistics, 2012.
[53]
W. Shen, J. Wang, P. Luo, and M. Wang. Liege: link entities in web lists with knowledge base. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1424--1432. ACM, 2012.
[54]
W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In Proceedings of the 21st international conference on World Wide Web, pages 449--458. ACM, 2012.
[55]
W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 68--76. ACM, 2013.
[56]
J. Shin, S. Wu, F. Wang, C. De Sa, C. Zhang, and C. Re. Incremental knowledge base construction using deepdive. PVLDB, 8(11):1310--1321, 2015.
[57]
K. Singh, A. S. Radhakrishna, A. Both, S. Shekarpour, I. Lytra, R. Usbeck, A. Vyas, A. Khikmatullaev, D. Punjani, C. Lange, et al. Why reinvent the wheel: Let's build question answering systems together. In WWW, pages 1247--1256, 2018.
[58]
M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 939--948. ACM, 2010.
[59]
R. Usbeck, A.-C. N. Ngomo, B. Haarmann, A. Krithara, M. Roder, and G. Napolitano. 7th open challenge on question answering over linked data (qald-7). In Semantic Web Evaluation Challenge, pages 59--69. Springer, 2017.
[60]
S. Vashishth, P. Jain, and P. Talukdar. Cesi: Canonicalizing open knowledge bases using embeddings and side information. In WWW, pages 1317--1327, 2018.
[61]
M. Vijaymeena and K. Kavitha. A survey on similarity measures in text mining. Machine Learning and Applications: An International Journal, 3(2):19--28, 2016.
[62]
D. Vrandecic. Wikidata: A new platform for collaborative data collection. In WWW, pages 1063--1064, 2012.
[63]
T.-H. Wu, Z. Wu, B. Kao, and P. Yin. Towards practical open knowledge base canonicalization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 883--892. ACM, 2018.
[64]
J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: The state-of-the-art and comparative study. Acm computing surveys (csur), 45(4):43, 2013.
[65]
W. Zhang, Y.-C. Sim, J. Su, and C.-L. Tan. Entity linking with effective acronym expansion, instance selection and topic modeling. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
[66]
W. Zhang, J. Su, C. L. Tan, and W. T. Wang. Entity linking leveraging: automatically generated annotation. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1290--1298. Association for Computational Linguistics, 2010.
[67]
Y. Zhang, H. Dai, Z. Kozareva, A. J. Smola, and L. Song. Variational reasoning for question answering with knowledge graph. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

Cited By

View all
  1. KBPearl: a knowledge base population system supported by joint entity and relation linking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 7
    March 2020
    194 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 March 2020
    Published in PVLDB Volume 13, Issue 7

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 03 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media