skip to main content
10.1145/3423327.3423669acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

AAEC: An Adversarial Autoencoder-based Classifier for Audio Emotion Recognition

Published: 15 October 2020 Publication History

Abstract

In recent years, automatic emotion recognition has attracted the attention of researchers because of its great effects and wide implementations in supporting humans' activities. Given that the data about emotions is difficult to collect and organize into a large database like the dataset of text or images, the true distribution would be difficult to be completely covered by the training set, which affects the model's robustness and generalization in subsequent applications. In this paper, we proposed a model, Adversarial Autoencoder-based Classifier (AAEC), that can not only augment the data within real data distribution but also reasonably extend the boundary of the current data distribution to a possible space. Such an extended space would be better to fit the distribution of training and testing sets. In addition to comparing with baseline models, we modified our proposed model into different configurations and conducted a comprehensive self-comparison with audio modality. The results of our experiment show that our proposed model outperforms the baselines.

References

[1]
Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).
[2]
Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis, Alexandros Potamianos, and Shrikanth Narayanan. 2019. Data Augmentation Using GANs for Speech Emotion Recognition. In INTERSPEECH. 171--175.
[3]
Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel, and Najim Dehak. 2019. Deep neural networks for emotion recognition combining audio and transcripts. arXiv preprint arXiv:1911.00432 (2019).
[4]
Dongyang Dai, Zhiyong Wu, Runnan Li, Xixin Wu, Jia Jia, and Helen Meng. 2019. Learning discriminative features from spectrograms using center loss for speech emotion recognition. In ICASSP 2019--2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7405--7409.
[5]
Chris Donahue, Julian McAuley, and Miller Puckette. 2018. Synthesizing audio with generative adversarial networks. arXiv preprint arXiv:1802.04208 1 (2018).
[6]
Changzeng Fu, Thilina Dissanayake, Kazufumi Hosoda, Takuya Maekawa, and Hiroshi Ishiguro. 2020. Similarity of Speech Emotion in Different Languages Revealed by a Neural Network with Attention. In 2020 IEEE 14th International Conference on Semantic Computing (ICSC). IEEE, 381--386.
[7]
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, and Julien Epps. 2019. Direct modelling of speech emotion from raw speech. arXiv preprint arXiv:1904.03833 (2019).
[8]
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, and Helen Meng. 2019. Dilated residual network with multi-head self-attention for speech emotion recognition. In ICASSP 2019--2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6675--6679.
[9]
Aswathy Madhu and Suresh Kumaraswamy. 2019. Data Augmentation Using Generative Adversarial Network for Environmental Sound Classification. In 2019 27th European Signal Processing Conference (EUSIPCO). IEEE, 1--5.
[10]
Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea, Alexander Gelbukh, and Erik Cambria. 2019. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6818--6825.
[11]
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015).
[12]
Giovanni Mariani, Florian Scheidegger, Roxana Istrate, Costas Bekas, and Cristiano Malossi. 2018. Bagan: Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655 (2018).
[13]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[14]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. 2642--2651.
[15]
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2018. MELD: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508 (2018).
[16]
Yanmin Qian, Hu Hu, and Tian Tan. 2019. Data augmentation using generative adversarial networks for robust speech recognition. Speech Communication 114 (2019), 1--9.
[17]
Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2018. I know the feeling: Learning to converse with empathy. (2018).
[18]
Justin Salamon and Juan Pablo Bello. 2017. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters 24, 3 (2017), 279--283.
[19]
Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W Schuller, Iulia Lefter, Erik Cambria, and Ioannis Kompatsiaris. 2020. MuSe 2020 Challenge and Workshop: Multimodal Sentiment Analysis, Emotion-target Engagement and Trustworthiness Detection in Real-life Media. In 1st International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, co-located with the 28th ACM International Conference on Multimedia (ACM MM). ACM.
[20]
Ilaria Torre, Emma Carrigan, Rachel McDonnell, Katarina Domijan, Killian Mc- Cabe, and Naomi Harte. 2019. The Effect of Multimodal Emotional Expression and Agent Appearance on Trust in Human-Agent Interaction. In Motion, Interaction and Games. 1--6.
[21]
Jianyou Wang, Michael Xue, Ryan Culhane, Enmao Diao, Jie Ding, and Vahid Tarokh. 2020. Speech Emotion Recognition with Dual-Sequence LSTM Architecture. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6474--6478.
[22]
Yue Xie, Ruiyu Liang, Zhenlin Liang, Chengwei Huang, Cairong Zou, and Björn Schuller. 2019. Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 11 (2019), 1675--1685.
[23]
Jianfeng Zhao, Xia Mao, and Lijiang Chen. 2019. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47 (2019), 312--323.
[24]
Ziping Zhao, Zhongtian Bao, Yiqin Zhao, Zixing Zhang, Nicholas Cummins, Zhao Ren, and Björn Schuller. 2019. Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access 7 (2019), 97515--97525.
[25]
Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing Liu. 2018. Emotional chatting machine: Emotional conversation generation with internal and external memory. In Thirty-Second AAAI Conference on Artificial Intelligence.

Cited By

View all

Index Terms

  1. AAEC: An Adversarial Autoencoder-based Classifier for Audio Emotion Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MuSe'20: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop
    October 2020
    59 pages
    ISBN:9781450381574
    DOI:10.1145/3423327
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio modality
    2. emotion recognition
    3. neural networks

    Qualifiers

    • Research-article

    Funding Sources

    • JST ERATO
    • Grant-in-Aid for Scientific Research on Innovative Areas

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 14 of 17 submissions, 82%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Artificial internet of medical things for wearable body sensor data analysis using radial basis autoencoder-based adversarial neural networkExpert Systems with Applications10.1016/j.eswa.2024.125338(125338)Online publication date: Sep-2024
    • (2023)An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian RegularizationIEEE Transactions on Affective Computing10.1109/TAFFC.2022.316909114:3(2361-2374)Online publication date: 1-Jul-2023
    • (2023)Learning Enhanced Acoustic Latent Representation for Small Scale Affective Corpus with Adversarial Cross Corpora IntegrationIEEE Transactions on Affective Computing10.1109/TAFFC.2021.312614514:2(1308-1321)Online publication date: 1-Apr-2023
    • (2022)C-CycleTransGAN: A Non-parallel Controllable Cross-gender Voice Conversion Model with CycleGAN and Transformer2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.23919/APSIPAASC55919.2022.9979821(553-559)Online publication date: 7-Nov-2022
    • (2021)A Review on Speech Emotion Recognition Using Deep Learning and Attention MechanismElectronics10.3390/electronics1010116310:10(1163)Online publication date: 13-May-2021
    • (2021)MuSe-Toolbox: The Multimodal Sentiment Analysis Continuous Annotation Fusion and Discrete Class Transformation ToolboxProceedings of the 2nd on Multimodal Sentiment Analysis Challenge10.1145/3475957.3484451(75-82)Online publication date: 24-Oct-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media