research-article

Efficient and effective strategies for cross-corpus acoustic emotion recognition

Authors:

Alexey A. KarpovAuthors Info & Claims

Neurocomputing, Volume 275, Issue C

Pages 1028 - 1034

https://doi.org/10.1016/j.neucom.2017.09.049

Published: 31 January 2018 Publication History

Abstract

We follow a cascaded normalization strategy for acoustic feature adaptation.We employ Kernel extreme learning machines for efficient and robust learning.The proposed framework is validated via the challenging cross-corpus setting.Results indicate the efficiency and effectiveness of the proposed framework. An important research direction in speech technology is robust cross-corpus and cross-language emotion recognition. In this paper, we propose computationally efficient and performance effective feature normalization strategies for the challenging task of cross-corpus acoustic emotion recognition. We particularly deploy a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker- and corpus-related effects as well as to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on five corpora representing five languages from different families, namely Danish, English, German, Russian and Turkish. Using a standard set of suprasegmental features, the proposed normalization strategies show superior performance compared to benchmark normalization approaches commonly used in the literature.

References

[1]

R. Cowie, N. Sussman, A. Ben-Zev, P. In Petta, C. Pelechaud, Emotion-Oriented Systems: The Humaine Handbook, in: Emotion-Oriented Systems: The Humaine Handbook, Springer, Emotion: concepts and definitions, 2011, pp. 9-32.

[2]

M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic, AVEC 2014-3d dimensional affect and depression recognition challenge, 2014.

[3]

E. Lyakso, O. Frolova, E. Dmitrieva, A. Grigorev, H. Kaya, A.A. Salah, A. Karpov, Springer, 2015.

[4]

F. Eyben, K.R. Scherer, B.W.S.e. al., The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing, IEEE Trans. Affect. Comput., 7 (2016) 190-202.

Digital Library

[5]

H. Kaya, A.A. Salah, Combining modality-specific extreme learning machines for emotion recognition in the wild, J. Multimodal User Interfaces, 10 (2016) 139-149.

[6]

B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies, IEEE Trans. Affect. Comput., 1 (2010) 119-131.

Digital Library

[7]

J.M.K. Kua, V. Sethu, P. Le, E. Ambikairajah, The UNSW submission to interspeech 2014 compare cognitive load challenge, Singapore, 2014.

[8]

M.V. Segbroeck, R. Travadi, C. Vaz, J. Kim, M.P. Black, A. Potamianos, S.S. Narayanan, Classification of cognitive load from speech using an i-vector framework, Singapore, 2014.

[9]

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech, Lang. Process., 19 (2011) 788-798.

Digital Library

[10]

F. Perronnin, J. Snchez, T. Mensink, Improving the Fisher kernel for large-scale image classification, 2010.

[11]

H. Kaya, A.A. Karpov, A.A. Salah, Fisher vectors with cascaded normalization for paralinguistic analysis, Dresden, Germany, 2015.

[12]

H. Kaya, A.A. Karpov, Fusing acoustic feature representations for computational paralinguistics tasks, San Francisco, USA, 2016.

[13]

B. Schuller, S. Steidl, A. Batliner, S. Hantke, F. Hnig, J.R. Orozco-Arroyave, E. Nth, Y. Zhang, F. Weninger, The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinsons & eating condition, Dresden, Germany, 2015.

[14]

B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J.K. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, K. Evanini, The INTERSPEECH 2016 computational paralinguistics challenge: deception, sincerity & native language, San Francisco, USA, 2016.

[15]

H. Kaya, A.A. Karpov, A.A. Salah, Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines, Springer, 2016.

[16]

G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing, 70 (2006) 489-501.

[17]

F. Eyben, M. Wllmer, B. Schuller, Opensmile: the Munich versatile and fast open-source audio feature extractor, 2010.

[18]

A. Akusok, K.M. Bjrk, Y. Miche, A. Lendasse, High-performance extreme learning machines: a complete toolbox for big data applications, IEEE Access 3 (2015) 10111025.

[19]

H. Kaya, F. Grpinar, S. Afshar, A.A. Salah, Contrasting and combining least squares based learners for emotion recognition in the wild, ICMI, Seattle, Washington, USA, 2015.

Digital Library

[20]

F. Grpinar, H. Kaya, H. Dibekliolu, A.A. Salah, Kernel ELM and CNN based facial age estimation, Las Vegas, Nevada, USA, 2016.

[21]

B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, S. Kim, The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism, Lyon, France, 2013.

[22]

B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, Y. Zhang, The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load, ISCA, Singapore, 2014.

[23]

H. Kaya, T. zkaptan, A.A. Salah, S.F. Grgen, Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction, ISCA, Singapore, 2014.

[24]

Z. Zhang, F. Weninger, M. Wllmer, B. Schuller, Unsupervised learning in cross-corpus acoustic emotion recognition, 2011.

[25]

B. Schuller, Z. Zhang, F. Weninger, G. Rigoll, Using multiple databases for training in emotion recognition: to unite or to vote?, 2011.

[26]

W. Zong, G.-B. Huang, Y. Chen, Weighted extreme learning machine for imbalance learning, Neurocomputing, 101 (2013) 229-242.

Digital Library

[27]

K. Li, X. Kong, Z. Lu, L. Wenyin, J. Yin, Boosting weighted ELM for imbalanced learning, Neurocomputing, 128 (2014) 15-21.

Digital Library

[28]

D. Wu, Z. Wang, Y. Chen, H. Zhao, Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset, Neurocomputing, 190 (2016) 35-49.

Digital Library

[29]

X. Gao, Z. Chen, S. Tang, Y. Zhang, J. Li, Adaptive weighted imbalance learning with application to abnormal activity recognition, Neurocomputing, 173 (2016) 1927-1935.

Digital Library

[30]

N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accurate online sequential learning algorithm for feedforward networks, IEEE Trans. Neural Netw., 17 (2006) 1411-1423.

Digital Library

[31]

M.J. Er, V.K. Yalavarthi, N. Wang, R. Venkatesan, Springer, 2016.

[32]

G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst., Man, Cybernet., Part b: Cybernet., 42 (2012) 513-529.

Digital Library

[33]

G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine : a new learning scheme of feedforward neural networks, 2004.

[34]

C.R. Rao, S.K. Mitra, Wiley New York, 1971.

[35]

J.A. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett., 9 (1999) 293-300.

Digital Library

[36]

H.M. Meral, H.K. Ekenel, A. Ozsoy, Analysis of emotion in turkish, 2003.

[37]

H. Kaya, A.A. Salah, S.F. Gurgen, H. Ekenel, Protocol and baseline for experiments on bogazici university Turkish emotional speech corpus, Trabzon, Turkey, 2014.

[38]

F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of German emotional speech, Lisbon, Portugal, 2005.

[39]

I. Engberg, A. Hansen, Documentation of the Danish Emotional Speech Database (DES), Internal AAU report (1996). Center for Person Kommunikation, Denmark.

[40]

O. Martin, I. Kotsia, B. Macq, I. Pitas, The eNTERFACE 05 audio-visual emotion database, 2006.

[41]

V. Makarova, V.A. Petrushin, RUSLANA: a database of Russian emotional utterances, Denver, Colorado, USA, 2002.

[42]

C. Stanislavski, An Actor Prepares, Routledge, 1989.

[43]

B. Schuller, Z. Zhang, F. Weninger, F. Burkhardt, Synthesized speech for model training in cross-corpus recognition of human emotion, Int. J. Speech Technol., 15 (2012) 313-323.

Digital Library

[44]

B. Schuller, S. Steidl, A. Batliner, The interspeech 2009 emotion challenge, Brighton, UK, 2009.

[45]

T.G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Comput., 10 (1998) 1895-1923.

Digital Library

Cited By

Godslove JNayak A(2024)Trilingual conversational intent decoding for response retrievalKnowledge and Information Systems10.1007/s10115-023-01972-w66:1(535-556)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s10115-023-01972-w
Takir SToprak EUluer PErol Barkana DKose H(2023)Exploring the Potential of Multimodal Emotion Recognition for Hearing-Impaired Children Using Physiological Signals and Facial ExpressionsCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3616240(398-405)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3610661.3616240
Gerczuk MAmiriparian SOttl SSchuller B(2023)EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2021.313515214:2(1472-1487)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TAFFC.2021.3135152
Show More Cited By

Efficient and effective strategies for cross-corpus acoustic emotion recognition
1. Computing methodologies

Recommendations

Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies

As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to overestimation in this respect: Acted data is often ...
Synthesized speech for model training in cross-corpus recognition of human emotion

Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion ...
Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Automatic Speech Emotion Recognition SER is a current research topic in the field of Human Computer Interaction HCI with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances ...

Comments

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 275, Issue C

January 2018

2070 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 31 January 2018

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Godslove JNayak A(2024)Trilingual conversational intent decoding for response retrievalKnowledge and Information Systems10.1007/s10115-023-01972-w66:1(535-556)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s10115-023-01972-w
Takir SToprak EUluer PErol Barkana DKose H(2023)Exploring the Potential of Multimodal Emotion Recognition for Hearing-Impaired Children Using Physiological Signals and Facial ExpressionsCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3616240(398-405)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3610661.3616240
Gerczuk MAmiriparian SOttl SSchuller B(2023)EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2021.313515214:2(1472-1487)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TAFFC.2021.3135152
de Lope JGraña M(2023)An ongoing review of speech emotion recognitionNeurocomputing10.1016/j.neucom.2023.01.002528:C(1-11)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1016/j.neucom.2023.01.002
Singh YGoel S(2023)A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech CorporaMultimedia Tools and Applications10.1007/s11042-023-14577-w82:15(23055-23073)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.1007/s11042-023-14577-w
Albadr MTiun SAyob MNazri MAL-Dhief F(2023)Grey wolf optimization-extreme learning machine for automatic spoken language identificationMultimedia Tools and Applications10.1007/s11042-023-14473-382:18(27165-27191)Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1007/s11042-023-14473-3
Zhang SLiu RYang YZhao XYu JMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Unsupervised Domain Adaptation Integrating Transformer and Mutual Information for Cross-Corpus Speech Emotion RecognitionProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548328(120-129)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548328
Ryumina EDresvyanskiy DKarpov A(2022)In search of a robust facial expressions recognition modelNeurocomputing10.1016/j.neucom.2022.10.013514:C(435-450)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1016/j.neucom.2022.10.013
Wang ZLi JJin YWang JYang FLi GNi XDing W(2022)Reprint ofDigital Signal Processing10.1016/j.dsp.2022.103571125:COnline publication date: 15-Jun-2022
https://dl.acm.org/doi/10.1016/j.dsp.2022.103571
Albadr MTiun SAyob MAL-Dhief FOmar KMaen M(2022)Speech emotion recognition using optimized genetic algorithm-extreme learning machineMultimedia Tools and Applications10.1007/s11042-022-12747-w81:17(23963-23989)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-12747-w
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents