skip to main content
10.1145/3343031.3350987acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Effective Sentiment-relevant Word Selection for Multi-modal Sentiment Analysis in Spoken Language

Published: 15 October 2019 Publication History

Abstract

Computational modeling of human spoken language is an emerging research area in multimedia analysis spanning across the text and acoustic modalities. Multi-modal sentiment analysis is one of the most fundamental tasks in human spoken language understanding. In this paper, we propose a novel approach to selecting effective sentiment-relevant words for multi-modal sentiment analysis with focus on both the textual and acoustic modalities. Unlike the conventional soft attention mechanism, we employ a deep reinforcement learning mechanism to perform sentiment-relevant word selection and fully remove invalid words of each modality for multi-modal sentiment analysis. Specifically, we first align the raw text and audio at the word level and extract independent handcraft features for each modality to yield the textual and acoustic word sequence. Second, we establish two collaborative agents to deal with the textual and acoustic modalities in spoken language respectively. On this basis, we formulate the sentiment-relevant word selection process in a multi-modal setting as a multi-agent sequential decision problem and solve it with a multi-agent reinforcement learning approach. Detailed evaluations of multi-modal sentiment classification and emotion recognition on three benchmark datasets demonstrate the great effectiveness of our approach over several conventional competitive baselines.

References

[1]
Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans. Systems, Man, and Cybernetics, Part C, Vol. 38, 2 (2008), 156--172.
[2]
Minghai Chen, Sen Wang, Paul Pu Liang, Tadas Baltrusaitis, Amir Zadeh, and Louis-Philippe Morency. 2017. Multimodal sentiment analysis with word-level fusion and reinforcement learning. In Proceedings of ICMI 2017. 163--171.
[3]
Eric Chu and Deb Roy. 2017. Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies. In Proceedings of 2017 IEEE ICDM. 829--834.
[4]
Junyoung Chung, cC aglar Gü lcc ehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, Vol. abs/1412.3555 (2014).
[5]
Gilles Degottex, John Kane, Thomas Drugman, Tuomo Raitio, and Stefan Scherer. 2014. COVAREP - A collaborative voice analysis repository for speech technologies. In Proceedings of IEEE ICASSP 2014. 960--964.
[6]
Thomas Drugman and Abeer Alwan. 2011. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. In Proceedings of INTERSPEECH 2011. 1973--1976.
[7]
Thomas Drugman, Mark R. P. Thomas, Jó n Guðnason, Patrick A. Naylor, and Thierry Dutoit. 2012. Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review. IEEE TASLP, Vol. 20, 3 (2012), 994--1006.
[8]
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In NIPS 2016 . 2137--2145.
[9]
Yue Gu, Shuhong Chen, and Ivan Marsic. 2018a. Deep Multimodal Learning for Emotion Recognition in Spoken Language. In Proceedings of IEEE ICASSP 2018. 5079--5083.
[10]
Yue Gu, Shuhong Chen, and Ivan Marsic. 2018b. Deep Multimodal Learning for Emotion Recognition in Spoken Language. CoRR, Vol. abs/1802.08332 (2018).
[11]
Yue Gu, Xinyu Li, Kaixiang Huang, Shiyu Fu, Kangning Yang, Shuhong Chen, Moliang Zhou, and Ivan Marsic. 2018c. Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder. In Proceedings of the 2018 ACM MM. 537--545.
[12]
Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, and Ivan Marsic. 2018 d. Hybrid Attention based Multimodal Network for Spoken Language Classification. In Proceedings of the 27th COLING. 2379--2390.
[13]
Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, and Ivan Marsic. 2018 e. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment. In Proceedings of ACL 2018 . 2225--2235.
[14]
Junling Hu and Michael P. Wellman. 1998. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In Proceedings of ICML 1998. 242--250.
[15]
Mohit Iyyer, Varun Manjunatha, Jordan L. Boyd-Graber, and Hal Daumé III. 2015. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. In Proceedings of ACL 2015. 1681--1691.
[16]
Qin Jin, Chengxin Li, Shizhe Chen, and Huimin Wu. 2015. Speech emotion recognition with acoustic and lexical features. In 2015 IEEE ICASSP . 4749--4753.
[17]
John Kane and Christer Gobl. 2013. Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination. IEEE TASLP, Vol. 21, 6 (2013), 1170--1179.
[18]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR 2015 .
[19]
Vijay R. Konda and John N. Tsitsiklis. 1999. Actor-Critic Algorithms. In NIPS 1999]. 1008--1014.
[20]
Shashidhar G. Koolagudi and K. Sreenivasa Rao. 2012. Emotion recognition from speech: a review. International Journal of Speech Technology, Vol. 15, 2 (2012), 99--117.
[21]
Shoushan Li, Jian Xu, Dong Zhang, and Guodong Zhou. 2016. Two-View Label Propagation to Semi-supervised Reader Emotion Classification. In Proceedings of COLING 2016. 2647--2655.
[22]
Michael L. Littman. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Proceedings of ICML 1994. 157--163.
[23]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of ACL 2014 . 55--60.
[24]
Seyedmahdad Mirsamadi, Emad Barsoum, and Cha Zhang. 2017. Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of IEEE ICASSP 2017 . 2227--2231.
[25]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.
[26]
Liviu Panait and Sean Luke. 2005. Cooperative Multi-Agent Learning: The State of the Art. Autonomous Agents and Multi-Agent Systems, Vol. 11, 3 (2005), 387--434.
[27]
Sunghyun Park, Han Suk Shim, Moitreya Chatterjee, Kenji Sagae, and Louis-Philippe Morency. 2014. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach. In Proceedings of ICMI 2014 . 50--57.
[28]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of EMNLP 2014 . 1532--1543.
[29]
Soujanya Poria, Erik Cambria, and Alexander F. Gelbukh. 2015. Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis. In Proceedings of EMNLP 2015 . 2539--2544.
[30]
Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis-Philippe Morency. 2017. Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis. In Proceedings of IEEE ICDM 2017. 1033--1038.
[31]
Richard S. Sutton and Andrew G. Barto. 1998 a. Reinforcement learning - an introduction .MIT Press.
[32]
Richard S. Sutton and Andrew G. Barto. 1998 b. Reinforcement Learning: An Introduction. IEEE Trans. Neural Networks, Vol. 9, 5 (1998), 1054--1054.
[33]
Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in NIPS. 1057--1063.
[34]
Edmund Tong, Amir Zadeh, Cara Jones, and Louis-Philippe Morency. 2017. Combating Human Trafficking with Multimodal Deep Models. In Proceedings of ACL 2017 . 1547--1556.
[35]
Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, Vol. 8 (1992), 229--256.
[36]
Nan Xu, Wenji Mao, and Guandan Chen. 2018. A Co-Memory Network for Multimodal Sentiment Analysis. In Proceedings of ACM SIGIR 2018 . 929--932.
[37]
Hongliang Yu, Liangke Gui, Michael A. Madaio, Amy Ogan, Justine Cassell, and Louis-Philippe Morency. 2017. Temporally Selective Attention Model for Social and Affective State Recognition in Multimedia Content. In Proceedings of the 2017 ACM MM . 1743--1751.
[38]
Jiahong Yuan and Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America, Vol. 123, 123 (2008), 3878.
[39]
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018b. Multi-attention Recurrent Network for Human Communication Comprehension. In Proceedings of AAAI 2018. 5642--5649.
[40]
Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos. CoRR, Vol. abs/1606.06259 (2016).
[41]
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018a. Multimodal language analysis in the wild: Carnegie Mellon University-MOSEI dataset and interpretable dynamic fusion graph. In Proceedings of ACL 2018, Vol. 1. 2236--2246.
[42]
Dong Zhang, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2016a. Modeling the Clause-level Structure to Multimodal Sentiment Analysis via Reinforcement Learning. In Proceedings of ICME 2019 . 730--735.
[43]
Dong Zhang, Liangqing Wu, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2016b. Multimodal Language Analysis with Hierarchical Interaction-level and Selection-level Attention. In Proceedings of ICME 2019 . 724--729.
[44]
Lu Zhang, Liangqing Wu, Shoushan Li, Zhongqing Wang, and Guodong Zhou. 2018b. Cross-Lingual Emotion Classification with Auxiliary and Attention Neural Networks. In Proceedings of NLPCC 2018. 429--441.
[45]
Shiqing Zhang, Shiliang Zhang, Tiejun Huang, Wen Gao, and Qi Tian. 2017. Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition. IEEE Transactions on Circuits & Systems for Video Technology, Vol. PP, 99 (2017), 1--1.
[46]
Tianyang Zhang, Minlie Huang, and Li Zhao. 2018a. Learning Structured Representation for Text Classification via Reinforcement Learning. In Proceedings of AAAI 2018 . 6053--6060.
[47]
Suyang Zhu, Shoushan Li, and Guodong Zhou. 2019. Adversarial Attention Modeling for Multi-dimensional Emotion Regression. In Proceedings of ACL 2019 . 471--480.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multi-modal sentiment analysis
  2. sentiment-relevant word selection
  3. spoken language

Qualifiers

  • Research-article

Funding Sources

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)7
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media