skip to main content
10.1145/3366423.3380017acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Multi-Context Attention for Entity Matching

Published: 20 April 2020 Publication History

Abstract

Entity matching (EM) is a classic research problem that identifies data instances referring to the same real-world entity. Recent technical trend in this area is to take advantage of deep learning (DL) to automatically extract discriminative features. DeepER and DeepMatcher have emerged as two pioneering DL models for EM. However, these two state-of-the-art solutions simply incorporate vanilla RNNs and straightforward attention mechanisms. In this paper, we fully exploit the semantic context of embedding vectors for the pair of entity text descriptions. In particular, we propose an integrated multi-context attention framework that takes into account self-attention, pair-attention and global-attention from three types of context. The idea is further extended to incorporate attribute attention in order to support structured datasets. We conduct extensive experiments with 7 benchmark datasets that are publicly accessible. The experimental results clearly establish our superiority over DeepER and DeepMatcher in all the datasets.

References

[1]
Mikhail Bilenko and Raymond J. Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In KDD. 39–48.
[2]
Mikhail Bilenko, Raymond J. Mooney, William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. 2003. Adaptive Name Matching in Information Integration. IEEE Intelligent Systems 18, 5 (2003), 16–23.
[3]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. TACL 5(2017), 135–146.
[4]
William W. Cohen. 2000. Data integration using similarity joins and a word-based information representation language. ACM Trans. Inf. Syst. 18, 3 (2000), 288–321.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171–4186.
[6]
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. PVLDB 11, 11 (2018), 1454–1467.
[7]
Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. 2007. Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng. 19, 1 (2007), 1–16.
[8]
Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-End Multi-Perspective Matching for Entity Resolution. In IJCAI. ijcai.org, 4961–4967.
[9]
Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, and Lucian Popa. 2019. Low-resource Deep Entity Resolution with Transfer and Active Learning. In ACL (1). Association for Computational Linguistics, 5851–5861.
[10]
Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-Attentive Sentence Embedding. In ICLR (Poster). OpenReview.net.
[11]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP. 1412–1421.
[12]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS. 3111–3119.
[13]
Alvaro E. Monge and Charles Elkan. 1996. The Field Matching Problem: Algorithms and Applications. In KDD. 267–270.
[14]
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD. 19–34.
[15]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP. 1532–1543.
[16]
Pradeep D. Ravikumar and William W. Cohen. 2004. A Hierarchical Graphical Model for Record Linkage. In UAI. 454–461.
[17]
Sunita Sarawagi and Anuradha Bhamidipaty. 2002. Interactive deduplication using active learning. In KDD. 269–278.
[18]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. CoRR abs/1505.00387(2015).
[19]
Min Tang, Jiaran Cai, and Hankz Hankui Zhuo. 2019. Multi-Matching Network for Multiple Choice Reading Comprehension. In AAAI. 7088–7095.
[20]
Saravanan Thirumuruganathan, Shameem Ahamed Puthiya Parambath, Mourad Ouzzani, Nan Tang, and Shafiq R. Joty. 2018. Reuse and Adaptation for Entity Resolution through Transfer Learning. CoRR abs/1809.11084(2018).
[21]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998–6008.
[22]
Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bing Tian Dai, and Heng Tao Shen. 2019. Template-Based Math Word Problem Solvers with Recursive Neural Networks. In AAAI. AAAI Press, 7144–7151.
[23]
Dongxiang Zhang, Rui Cao, and Sai Wu. 2019. Information fusion in visual question answering: A Survey. Information Fusion 52(2019), 268–280.
[24]
Dongxiang Zhang, Long Guo, Xiangnan He, Jie Shao, Sai Wu, and Heng Tao Shen. 2018. A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution. In ICDE. IEEE Computer Society, 713–724.
[25]
D. Zhang, L. Wang, L. Zhang, B. T. Dai, and H. T. Shen. 2019. The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019), 1–1. https://doi.org/10.1109/TPAMI.2019.2914054
[26]
Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning. In WWW. 2413–2424.

Cited By

View all

Index Terms

  1. Multi-Context Attention for Entity Matching
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Proceedings of The Web Conference 2020
          April 2020
          3143 pages
          ISBN:9781450370233
          DOI:10.1145/3366423
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. classification network
          2. entity matching
          3. multi-context attention

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)65
          • Downloads (Last 6 weeks)6
          Reflects downloads up to 28 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media