research-article

Few-shot Multimodal Sentiment Analysis Based on Multimodal Probabilistic Fusion Prompts

Authors:

Soujanya PoriaAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 6045 - 6053

https://doi.org/10.1145/3581783.3612181

Published: 27 October 2023 Publication History

Abstract

Multimodal sentiment analysis has gained significant attention due to the proliferation of multimodal content on social media. However, existing studies in this area rely heavily on large-scale supervised data, which is time-consuming and labor-intensive to collect. Thus, there is a need to address the challenge of few-shot multimodal sentiment analysis. To tackle this problem, we propose a novel method called Multimodal Probabilistic Fusion Prompts (MultiPoint) that leverages diverse cues from different modalities for multimodal sentiment detection in the few-shot scenario. Specifically, we start by introducing a Consistently Distributed Sampling approach called CDS, which ensures that the few-shot dataset has the same category distribution as the full dataset. Unlike previous approaches primarily using prompts based on the text modality, we design unified multimodal prompts to reduce discrepancies between different modalities and dynamically incorporate multimodal demonstrations into the context of each multimodal instance. To enhance the model's robustness, we introduce a probabilistic fusion method to fuse output predictions from multiple diverse prompts for each input. Our extensive experiments on six datasets demonstrate the effectiveness of our approach. First, our method outperforms strong baselines in the multimodal few-shot setting. Furthermore, under the same amount of data (1% of the full dataset), our CDS-based experimental results significantly outperform those based on previously sampled datasets constructed from the same number of instances of each class.

References

[1]

Sarah A. Abdu, Ahmed H. Yousef, and Ashraf Salem. 2021. Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey. Inf. Fusion (2021), 204--226. https://doi.org/10.1016/j.inffus.2021.06.003

Digital Library

[2]

Andrew Brock, Soham De, and Samuel L. Smith. 2021. Characterizing signal propagation to close the performance gap in unnormalized ResNets. In ICLR. https://openreview.net/forum?id=IX3Nnir2omJ

[3]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, and et al. 2020. Language Models are Few-Shot Learners. In NeurIPS. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

[4]

Yi-Ting Chen, Jinghao Shi, Christoph Mertz, Shu Kong, and Deva Ramanan. 2021. Multimodal object detection via bayesian fusion. arXiv preprint arXiv:2104.02904 (2021).

[5]

Hongsheng Dai, Murray Pollock, and Gareth Roberts. 2021. Bayesian Fusion: Scalable unification of distributed statistical analyses. arXiv preprint arXiv:2102.02123 (2021).

[6]

Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. In ACL/IJCNLP. 3816--3830. https://doi.org/10.18653/v1/2021.acl-long.295

[7]

Ehsan Hosseini-Asl, Wenhao Liu, and Caiming Xiong. 2022. A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Ivá n Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 770--787. https://doi.org/10.18653/v1/2022.findings-naacl.58

[8]

Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li, and Yiwei Lv. 2019. Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Llu'i s Mà rquez (Eds.). Association for Computational Linguistics, 537--546. https://doi.org/10.18653/v1/p19-1051

[9]

Yiren Jian, Chongyang Gao, and Soroush Vosoughi. 2022. Contrastive Learning for Prompt-based Few-shot Language Learners. In NAACL. 5577--5587. https://doi.org/10.18653/v1/2022.naacl-main.408

[10]

Xincheng Ju, Dong Zhang, Rong Xiao, Junhui Li, Shoushan Li, Min Zhang, and Guodong Zhou. 2021. Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 4395--4405. https://doi.org/10.18653/v1/2021.emnlp-main.360

[11]

Ramandeep Kaur and Sandeep Kautish. 2019. Multimodal Sentiment Analysis: A Survey and Comparison. Int. J. Serv. Sci. Manag. Eng. Technol. (2019), 38--58. https://doi.org/10.4018/IJSSMET.2019040103

[12]

Zaid Khan and Yun Fu. 2021. Exploiting BERT for Multimodal Target Sentiment Classification through Input Space Translation. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cé sar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 3034--3042. https://doi.org/10.1145/3474085.3475692

Digital Library

[13]

Zhen Li, Bing Xu, Conghui Zhu, and Tiejun Zhao. 2022. CLMLF: A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection. CoRR, Vol. abs/2204.05515 (2022). https://doi.org/10.48550/arXiv.2204.05515 showeprint[arXiv]2204.05515

[14]

Yan Ling, Jianfei Yu, and Rui Xia. 2022. Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 2149--2159. https://doi.org/10.18653/v1/2022.acl-long.152

[15]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. CoRR, Vol. abs/2107.13586 (2021). [arXiv]2107.13586 https://arxiv.org/abs/2107.13586

[16]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, Vol. abs/1907.11692 (2019). showeprint[arXiv]1907.11692 http://arxiv.org/abs/1907.11692

[17]

Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 1990--1999. https://doi.org/10.18653/v1/P18--1185

[18]

Ron Mokady, Amir Hertz, and Amit H. Bermano. 2021. ClipCap: CLIP Prefix for Image Captioning. CoRR, Vol. abs/2111.09734 (2021). [arXiv]2111.09734 https://arxiv.org/abs/2111.09734

[19]

Teng Niu, Shiai Zhu, Lei Pang, and Abdulmotaleb El-Saddik. 2016. Sentiment Analysis on Multi-View Social Data. In MMM. 15--27. https://doi.org/10.1007/978-3-319-27674-8_2

[20]

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).

[21]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980--3990. https://doi.org/10.18653/v1/D19-1410

[22]

Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, and Felix Hill. 2021. Multimodal Few-Shot Learning with Frozen Language Models. In NeurIPS. 200--212. https://proceedings.neurips.cc/paper/2021/hash/01b7575c38dac42f3cfb7d500438b875-Abstract.html

[23]

Nan Xu, Wenji Mao, and Guandan Chen. 2018. A Co-Memory Network for Multimodal Sentiment Analysis. In SIGIR. 929--932. https://doi.org/10.1145/3209978.3210093

Digital Library

[24]

Hao Yang, Yanyan Zhao, and Bing Qin. 2022b. Face-Sensitive Image-to-Emotional-Text Cross-modal Translation for Multimodal Aspect-based Sentiment Analysis. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 3324--3335. https://aclanthology.org/2022.emnlp-main.219

[25]

Li Yang, Jin-Cheon Na, and Jianfei Yu. 2022a. Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis. Inf. Process. Manag., Vol. 59, 5 (2022), 103038. https://doi.org/10.1016/j.ipm.2022.103038

Digital Library

[26]

Xiaocui Yang, Shi Feng, Daling Wang, and Yifei Zhang. 2021a. Image-Text Multimodal Emotion Classification via Multi-View Attentional Network. IEEE Trans. Multim. (2021), 4014--4026. https://doi.org/10.1109/TMM.2020.3035277

[27]

Xiaocui Yang, Shi Feng, Yifei Zhang, and Daling Wang. 2021b. Multimodal Sentiment Detection Based on Multi-channel Graph Neural Networks. In ACL/IJCNLP. 328--339. https://doi.org/10.18653/v1/2021.acl-long.28

[28]

Jianfei Yu, Kai Chen, and Rui Xia. 2022a. Hierarchical interactive multimodal transformer for aspect-based multimodal sentiment analysis. IEEE Transactions on Affective Computing (2022).

[29]

Jianfei Yu and Jing Jiang. 2019. Adapting BERT for Target-Oriented Multimodal Sentiment Classification. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 5408--5414. https://doi.org/10.24963/ijcai.2019/751

[30]

Yang Yu and Dong Zhang. 2022. Few-Shot Multi-Modal Sentiment Analysis with Prompt-Based Vision-Aware Language Modeling. In ICME. 1--6. https://doi.org/10.1109/ICME52920.2022.9859654

[31]

Yang Yu, Dong Zhang, and Shoushan Li. 2022b. Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning. In MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, João Magalh a es, Alberto Del Bimbo, Shin'ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni (Eds.). ACM, 189--198. https://doi.org/10.1145/3503161.3548306

Digital Library

[32]

Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018. Adaptive Co-attention Network for Named Entity Recognition in Tweets. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2019, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5674--5681. http://www.qizhang.info/paper/aaai2017-twitterner.pdf

[33]

Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, and Yoav Artzi. 2021. Revisiting Few-sample BERT Fine-tuning. In ICLR. OpenReview.net. https://openreview.net/forum?id=cO1IH43yUF

[34]

Jie Zhou, Jiabao Zhao, Jimmy Xiangji Huang, Qinmin Vivian Hu, and Liang He. 2021. MASAD: A large-scale dataset for multimodal aspect-based sentiment analysis. Neurocomputing, Vol. 455 (2021), 47--58. https://doi.org/10.1016/j.neucom.2021.05.040

Digital Library

[35]

Haidong Zhu, Zhaoheng Zheng, Mohammad Soleymani, and Ram Nevatia. 2022. Self-Supervised Learning for Sentiment Analysis via Image-Text Matching. In ICASSP. 1710--1714. https://doi.org/10.1109/ICASSP43922.2022.9747819

Cited By

Zou WSun XLu QWang XFeng J(2025)A vision and language hierarchical alignment for multimodal aspect-based sentiment analysisPattern Recognition10.1016/j.patcog.2025.111369162(111369)Online publication date: Jun-2025
https://doi.org/10.1016/j.patcog.2025.111369
Prakash VVijay S(2025)An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual contentLanguage Resources and Evaluation10.1007/s10579-024-09804-1Online publication date: 5-Jan-2025
https://doi.org/10.1007/s10579-024-09804-1
Wu HYang DLiu PLi X(2025)Chain of Thought Guided Few-Shot Fine-Tuning of LLMs for Multimodal Aspect-Based Sentiment ClassificationMultiMedia Modeling10.1007/978-981-96-2054-8_14(182-194)Online publication date: 3-Jan-2025
https://doi.org/10.1007/978-981-96-2054-8_14
Show More Cited By

Index Terms

Few-shot Multimodal Sentiment Analysis Based on Multimodal Probabilistic Fusion Prompts
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

Joint training strategy of unimodal and multimodal for multimodal sentiment analysis
Abstract
With the explosive growth of social media video content, research on multimodal sentiment analysis (MSA) has attracted considerable attention recently. Despite significant progress in MSA, there remains challenges: current research mostly focuses ...
Highlights
- Jointly training unimodal and multimodal tasks to optimize multimodal fusion.
- Using two modules for unimodal and multimodal learning.
- The proposed model achieves competitive results compared to latest baselines.
Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
Abstract
Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...
Highlights
- Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
- Multimodal attention mechanism addresses over-focusing on intra-modality attention.
- Multiple Softmax ...
Hybrid cross-modal interaction learning for multimodal sentiment analysis
Abstract
Multimodal sentiment analysis (MSA) predicts the sentiment polarity of an unlabeled utterance that carries multiple modalities, such as text, vision and audio, by analyzing labeled utterances. Existing fusion methods mainly focus on establishing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Doctoral Research Innovation of Northeastern University
National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
549
Total Downloads

Downloads (Last 12 months)429
Downloads (Last 6 weeks)22

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zou WSun XLu QWang XFeng J(2025)A vision and language hierarchical alignment for multimodal aspect-based sentiment analysisPattern Recognition10.1016/j.patcog.2025.111369162(111369)Online publication date: Jun-2025
https://doi.org/10.1016/j.patcog.2025.111369
Prakash VVijay S(2025)An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual contentLanguage Resources and Evaluation10.1007/s10579-024-09804-1Online publication date: 5-Jan-2025
https://doi.org/10.1007/s10579-024-09804-1
Wu HYang DLiu PLi X(2025)Chain of Thought Guided Few-Shot Fine-Tuning of LLMs for Multimodal Aspect-Based Sentiment ClassificationMultiMedia Modeling10.1007/978-981-96-2054-8_14(182-194)Online publication date: 3-Jan-2025
https://doi.org/10.1007/978-981-96-2054-8_14
Wu DYang DZhou YMa CCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and FusionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680653(5780-5789)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680653
Gu JNiu H(2024)Two-Stage Multi-Modal Prompt Tuning for Few-Shot Sentiment Analysis2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC54092.2024.10831782(2856-2863)Online publication date: 6-Oct-2024
https://doi.org/10.1109/SMC54092.2024.10831782
Zhu KLiu XXie HCai CFu RLi GWen ZTao JFan CLv ZWang LLin H(2024)Transferring Personality Knowledge to Multimodal Sentiment Analysis2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP)10.1109/ISCSLP63861.2024.10800671(431-435)Online publication date: 7-Nov-2024
https://doi.org/10.1109/ISCSLP63861.2024.10800671
V VManju Bargavi S(2024)A Systematic Approach on Attention Based Multimodal Sentiment Analysis in Convolutional Neural Network using Deep Learning Techniques2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC)10.1109/AIC61668.2024.10731135(490-493)Online publication date: 27-Jul-2024
https://doi.org/10.1109/AIC61668.2024.10731135
Gong LHe XYang J(2024)An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint LearningApplied Artificial Intelligence10.1080/08839514.2024.237171238:1Online publication date: 28-Jun-2024
https://doi.org/10.1080/08839514.2024.2371712
Zhao TMeng LSong D(2024)Multimodal Aspect-Based Sentiment AnalysisInformation Fusion10.1016/j.inffus.2024.102552112:COnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.inffus.2024.102552
Chen DChen JFang CZhang Z(2024)Complex visual question answering based on uniform form and contentApplied Intelligence10.1007/s10489-024-05383-454:6(4602-4620)Online publication date: 2-Apr-2024
https://dl.acm.org/doi/10.1007/s10489-024-05383-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten