skip to main content
10.1145/3474085.3476968acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multimodal Relation Extraction with Efficient Graph Alignment

Published: 17 October 2021 Publication History

Abstract

Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.

Supplementary Material

ZIP File (mfp2639aux.zip)
Source Code of MEGA Model for Multimodal Relation Extraction

References

[1]
Peter Anderson, X. He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 6077--6086.
[2]
Isabelle Augenstein, Diana Maynard, and Fabio Ciravegna. 2016. Distantly supervised web relation extraction for knowledge base population. Semantic Web, Vol. 7, 4 (2016), 335--349.
[3]
Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 2 (2018), 423--443.
[4]
Gregory Brown. 2011. An error analysis of relation extraction in social media documents. In Proceedings of the ACL 2011 Student Session. 64--68.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.
[6]
Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention Guided Graph Convolutional Networks for Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 241--251.
[7]
Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan Liu, and Maosong Sun. 2019. OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. In Proceedings of EMNLP-IJCNLP: System Demonstrations. 169--174. https://doi.org/10.18653/v1/D19-3029
[8]
Mark Heimann, H. Shen, Tara Safavi, and Danai Koutra. 2018. REGAL: Representation Learning-based Graph Alignment. Proceedings of the 27th ACM International Conference on Information and Knowledge Management (2018).
[9]
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation. 33--38.
[10]
Qingbao Huang, Jielong Wei, Yi Cai, Changmeng Zheng, Junying Chen, Ho-fung Leung, and Qing Li. 2020. Aligned Dual Channel Graph Convolutional Network for Visual Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7166--7176.
[11]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, K. Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In HLT-NAACL.
[12]
Linjie Li, Zhe Gan, Yu Cheng, and Jingjing Liu. 2019 a. Relation-aware graph attention network for visual question answering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10313--10322.
[13]
Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019 b. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1340--1350.
[14]
Zuoguo Liu and Xiaorong Chen. 2020. Research on relation extraction of named entity on social media in smart cities. Soft Computing, Vol. 24, 15 (2020), 11135--11147.
[15]
Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual attention model for name tagging in multimodal social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1990--1999.
[16]
Xuezhe Ma and E. Hovy. 2016. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. ArXiv, Vol. abs/1603.01354 (2016).
[17]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011.
[18]
Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 39--48.
[19]
Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2020. Learning from Context or Names? An Empirical Study on Neural Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3661--3672.
[20]
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics, Vol. 5 (2017), 101--115.
[21]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL-HLT.
[22]
Shaoqing Ren, Kaiming He, Ross B. Girshick, and J. Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39 (2015), 1137--1149.
[23]
Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2895--2905.
[24]
Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. N-ary Relation Extraction using Graph-State LSTM. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2226--2235.
[25]
Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation From Biased Training. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 3713--3722.
[26]
Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction. In Proceedings of NAACL-HLT. 872--884.
[27]
Linlin Wang, Zhu Cao, Gerard De Melo, and Zhiyuan Liu. 2016. Relation classification via multi-level attention cnns. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1298--1307.
[28]
Zhiwei Wu, Changmeng Zheng, Yi Cai, Junying Chen, Ho-fung Leung, and Qing Li. 2020. Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts. In Proceedings of the 28th ACM International Conference on Multimedia. 1038--1046.
[29]
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. Journal of machine learning research, Vol. 3, Feb (2003), 1083--1106.
[30]
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 conference on empirical methods in natural language processing. 1753--1762.
[31]
Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. 2335--2344.
[32]
Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018a. Adaptive Co-attention Network for Named Entity Recognition in Tweets. In AAAI. 5674--5681.
[33]
Yuhao Zhang, Peng Qi, and Christopher D Manning. 2018b. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2205--2215.
[34]
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D Manning. 2017. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 35--45.
[35]
Changmeng Zheng, Zhiwei Wu, Tao Wang, Cai Yi, and Qing Li. 2020. Object-aware Multimodal Named Entity Recognition in Social Media Posts with Adversarial Learning. IEEE Transactions on Multimedia (2020).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph alignment
  2. multimodal dataset
  3. multimodal relation extraction

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong Research Grants Council through a General Research Fund
  • National Natural Science Foundation of China
  • Fundamental Research Funds for the Central Universities, SCUT
  • the Science and Technology Planning Project of Guangdong Province

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)441
  • Downloads (Last 6 weeks)61
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)A Fine-Grained Network for Joint Multimodal Entity-Relation ExtractionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348510737:1(1-14)Online publication date: Jan-2025
  • (2025)The more quality information the better: Hierarchical generation of multi-evidence alignment and fusion model for multimodal entity and relation extractionInformation Processing & Management10.1016/j.ipm.2024.10387562:1(103875)Online publication date: Jan-2025
  • (2025)CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic informationExpert Systems with Applications10.1016/j.eswa.2024.125608262(125608)Online publication date: Mar-2025
  • (2024)RSRNeT: a novel multi-modal network framework for named entity recognition and relation extractionPeerJ Computer Science10.7717/peerj-cs.185610(e1856)Online publication date: 9-Feb-2024
  • (2024)Towards Complex-query Referring Image Segmentation: A Novel BenchmarkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3701733Online publication date: 4-Nov-2024
  • (2024)A Comprehensive Survey on Relation Extraction: Recent Advances and New FrontiersACM Computing Surveys10.1145/3674501Online publication date: 24-Jun-2024
  • (2024)Generative Multimodal Data Augmentation for Low-Resource Multimodal Named Entity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681598(7336-7345)Online publication date: 28-Oct-2024
  • (2024)Shapley Value-based Contrastive Alignment for Multimodal Information ExtractionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681367(5270-5279)Online publication date: 28-Oct-2024
  • (2024)Caption-Aware Multimodal Relation Extraction with Mutual Information MaximizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681219(1148-1157)Online publication date: 28-Oct-2024
  • (2024)HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681118(1544-1553)Online publication date: 28-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media