research-article

Multimodal Relation Extraction with Efficient Graph Alignment

Authors:

Changmeng Zheng,

Tao WangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5298 - 5306

https://doi.org/10.1145/3474085.3476968

Published: 17 October 2021 Publication History

Abstract

Relation extraction (RE) is a fundamental process in constructing knowledge graphs. However, previous methods on relation extraction suffer sharp performance decline in short and noisy social media texts due to a lack of contexts. Fortunately, the related visual contents (objects and their relations) in social media posts can supplement the missing semantics and help to extract relations precisely. We introduce the multimodal relation extraction (MRE), a task that identifies textual relations with visual clues. To tackle this problem, we present a large-scale dataset which contains 15000+ sentences with 23 pre-defined relation categories. Considering that the visual relations among objects are corresponding to textual relations, we develop a dual graph alignment method to capture this correlation for better performance. Experimental results demonstrate that visual contents help to identify relations more precisely against the text-only baselines. Besides, our alignment method can find the correlations between vision and language, resulting in better performance. Our dataset and code are available at https://github.com/thecharm/Mega.

Supplementary Material

ZIP File (mfp2639aux.zip)

Source Code of MEGA Model for Multimodal Relation Extraction

Download
139.51 KB

References

[1]

Peter Anderson, X. He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 6077--6086.

[2]

Isabelle Augenstein, Diana Maynard, and Fabio Ciravegna. 2016. Distantly supervised web relation extraction for knowledge base population. Semantic Web, Vol. 7, 4 (2016), 335--349.

[3]

Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 2 (2018), 423--443.

Digital Library

[4]

Gregory Brown. 2011. An error analysis of relation extraction in social media documents. In Proceedings of the ACL 2011 Student Session. 64--68.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.

[6]

Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention Guided Graph Convolutional Networks for Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 241--251.

[7]

Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan Liu, and Maosong Sun. 2019. OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. In Proceedings of EMNLP-IJCNLP: System Demonstrations. 169--174. https://doi.org/10.18653/v1/D19-3029

[8]

Mark Heimann, H. Shen, Tara Safavi, and Danai Koutra. 2018. REGAL: Representation Learning-based Graph Alignment. Proceedings of the 27th ACM International Conference on Information and Knowledge Management (2018).

Digital Library

[9]

Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation. 33--38.

Digital Library

[10]

Qingbao Huang, Jielong Wei, Yi Cai, Changmeng Zheng, Junying Chen, Ho-fung Leung, and Qing Li. 2020. Aligned Dual Channel Graph Convolutional Network for Visual Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7166--7176.

[11]

Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, K. Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In HLT-NAACL.

[12]

Linjie Li, Zhe Gan, Yu Cheng, and Jingjing Liu. 2019 a. Relation-aware graph attention network for visual question answering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10313--10322.

[13]

Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019 b. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1340--1350.

[14]

Zuoguo Liu and Xiaorong Chen. 2020. Research on relation extraction of named entity on social media in smart cities. Soft Computing, Vol. 24, 15 (2020), 11135--11147.

[15]

Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual attention model for name tagging in multimodal social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1990--1999.

[16]

Xuezhe Ma and E. Hovy. 2016. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. ArXiv, Vol. abs/1603.01354 (2016).

[17]

Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011.

Digital Library

[18]

Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 39--48.

[19]

Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2020. Learning from Context or Names? An Empirical Study on Neural Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3661--3672.

[20]

Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics, Vol. 5 (2017), 101--115.

[21]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL-HLT.

[22]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and J. Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39 (2015), 1137--1149.

Digital Library

[23]

Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2895--2905.

[24]

Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. N-ary Relation Extraction using Graph-State LSTM. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2226--2235.

[25]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation From Biased Training. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 3713--3722.

[26]

Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction. In Proceedings of NAACL-HLT. 872--884.

[27]

Linlin Wang, Zhu Cao, Gerard De Melo, and Zhiyuan Liu. 2016. Relation classification via multi-level attention cnns. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1298--1307.

[28]

Zhiwei Wu, Changmeng Zheng, Yi Cai, Junying Chen, Ho-fung Leung, and Qing Li. 2020. Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts. In Proceedings of the 28th ACM International Conference on Multimedia. 1038--1046.

Digital Library

[29]

Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. Journal of machine learning research, Vol. 3, Feb (2003), 1083--1106.

Digital Library

[30]

Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 conference on empirical methods in natural language processing. 1753--1762.

[31]

Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. 2335--2344.

[32]

Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018a. Adaptive Co-attention Network for Named Entity Recognition in Tweets. In AAAI. 5674--5681.

[33]

Yuhao Zhang, Peng Qi, and Christopher D Manning. 2018b. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2205--2215.

[34]

Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D Manning. 2017. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 35--45.

[35]

Changmeng Zheng, Zhiwei Wu, Tao Wang, Cai Yi, and Qing Li. 2020. Object-aware Multimodal Named Entity Recognition in Social Media Posts with Adversarial Learning. IEEE Transactions on Multimedia (2020).

Cited By

Yuan LCai YXu JLi QWang T(2025)A Fine-Grained Network for Joint Multimodal Entity-Relation ExtractionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348510737:1(1-14)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3485107
He XLi SZhang YLi BXu SZhou Y(2025)The more quality information the better: Hierarchical generation of multi-evidence alignment and fusion model for multimodal entity and relation extractionInformation Processing & Management10.1016/j.ipm.2024.10387562:1(103875)Online publication date: Jan-2025
https://doi.org/10.1016/j.ipm.2024.103875
Gong YLv XYuan ZHu FCai ZChen YWang ZYou X(2025)CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic informationExpert Systems with Applications10.1016/j.eswa.2024.125608262(125608)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125608
Show More Cited By

Recommendations

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners
WWW '24: Proceedings of the ACM Web Conference 2024

Multimodal relation extraction is a fundamental task of multimodal information extraction. Recent studies have shown promising results by integrating hierarchical visual features from local regions, like image patches, to the broader global regions that ...
Social Relation Graph Generation on Untrimmed Video
MultiMedia Modeling
Abstract
For a more intuitive understanding of videos, we demonstrate SRGG-UnVi, a social relation graph generation system for untrimmed videos. Given a video, the demonstration can combine existing knowledge to build a dynamic relation graph and a static ...
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

How can we better extract entities and relations from text? Using multimodal extraction with images and text obtains more signals for entities and relations, and aligns them through graphs or hierarchical fusion, aiding in extraction. Despite attempts at ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hong Kong Research Grants Council through a General Research Fund
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities, SCUT
the Science and Technology Planning Project of Guangdong Province

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
1,896
Total Downloads

Downloads (Last 12 months)441
Downloads (Last 6 weeks)61

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yuan LCai YXu JLi QWang T(2025)A Fine-Grained Network for Joint Multimodal Entity-Relation ExtractionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348510737:1(1-14)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3485107
He XLi SZhang YLi BXu SZhou Y(2025)The more quality information the better: Hierarchical generation of multi-evidence alignment and fusion model for multimodal entity and relation extractionInformation Processing & Management10.1016/j.ipm.2024.10387562:1(103875)Online publication date: Jan-2025
https://doi.org/10.1016/j.ipm.2024.103875
Gong YLv XYuan ZHu FCai ZChen YWang ZYou X(2025)CE-DCVSI: Multimodal relational extraction based on collaborative enhancement of dual-channel visual semantic informationExpert Systems with Applications10.1016/j.eswa.2024.125608262(125608)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125608
Wang MChen HShen DLi BHu S(2024)RSRNeT: a novel multi-modal network framework for named entity recognition and relation extractionPeerJ Computer Science10.7717/peerj-cs.185610(e1856)Online publication date: 9-Feb-2024
https://doi.org/10.7717/peerj-cs.1856
Ji WLi LFei HLiu XYang XLi JZimmermann R(2024)Towards Complex-query Referring Image Segmentation: A Novel BenchmarkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3701733Online publication date: 4-Nov-2024
https://doi.org/10.1145/3701733
Zhao XYang DYang MWang LZhang RCheng HLam WSHEN YXu R(2024)A Comprehensive Survey on Relation Extraction: Recent Advances and New FrontiersACM Computing Surveys10.1145/3674501Online publication date: 24-Jun-2024
https://doi.org/10.1145/3674501
Li ZYu JYang JWang WYang LXia RCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Generative Multimodal Data Augmentation for Low-Resource Multimodal Named Entity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681598(7336-7345)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681598
Luo WXia YTianshu SLi SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Shapley Value-based Contrastive Alignment for Multimodal Information ExtractionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681367(5270-5279)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681367
Zhang ZZhang WLi YBai TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Caption-Aware Multimodal Relation Extraction with Mutual Information MaximizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681219(1148-1157)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681219
Junwei HXu QJiang YWang ZSun YHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681118(1544-1553)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681118
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents