skip to main content
research-article

Efficient and effective strategies for cross-corpus acoustic emotion recognition

Published: 31 January 2018 Publication History

Abstract

We follow a cascaded normalization strategy for acoustic feature adaptation.We employ Kernel extreme learning machines for efficient and robust learning.The proposed framework is validated via the challenging cross-corpus setting.Results indicate the efficiency and effectiveness of the proposed framework. An important research direction in speech technology is robust cross-corpus and cross-language emotion recognition. In this paper, we propose computationally efficient and performance effective feature normalization strategies for the challenging task of cross-corpus acoustic emotion recognition. We particularly deploy a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker- and corpus-related effects as well as to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on five corpora representing five languages from different families, namely Danish, English, German, Russian and Turkish. Using a standard set of suprasegmental features, the proposed normalization strategies show superior performance compared to benchmark normalization approaches commonly used in the literature.

References

[1]
R. Cowie, N. Sussman, A. Ben-Zev, P. In Petta, C. Pelechaud, Emotion-Oriented Systems: The Humaine Handbook, in: Emotion-Oriented Systems: The Humaine Handbook, Springer, Emotion: concepts and definitions, 2011, pp. 9-32.
[2]
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic, AVEC 2014-3d dimensional affect and depression recognition challenge, 2014.
[3]
E. Lyakso, O. Frolova, E. Dmitrieva, A. Grigorev, H. Kaya, A.A. Salah, A. Karpov, Springer, 2015.
[4]
F. Eyben, K.R. Scherer, B.W.S.e. al., The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing, IEEE Trans. Affect. Comput., 7 (2016) 190-202.
[5]
H. Kaya, A.A. Salah, Combining modality-specific extreme learning machines for emotion recognition in the wild, J. Multimodal User Interfaces, 10 (2016) 139-149.
[6]
B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: variances and strategies, IEEE Trans. Affect. Comput., 1 (2010) 119-131.
[7]
J.M.K. Kua, V. Sethu, P. Le, E. Ambikairajah, The UNSW submission to interspeech 2014 compare cognitive load challenge, Singapore, 2014.
[8]
M.V. Segbroeck, R. Travadi, C. Vaz, J. Kim, M.P. Black, A. Potamianos, S.S. Narayanan, Classification of cognitive load from speech using an i-vector framework, Singapore, 2014.
[9]
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification, IEEE Trans. Audio Speech, Lang. Process., 19 (2011) 788-798.
[10]
F. Perronnin, J. Snchez, T. Mensink, Improving the Fisher kernel for large-scale image classification, 2010.
[11]
H. Kaya, A.A. Karpov, A.A. Salah, Fisher vectors with cascaded normalization for paralinguistic analysis, Dresden, Germany, 2015.
[12]
H. Kaya, A.A. Karpov, Fusing acoustic feature representations for computational paralinguistics tasks, San Francisco, USA, 2016.
[13]
B. Schuller, S. Steidl, A. Batliner, S. Hantke, F. Hnig, J.R. Orozco-Arroyave, E. Nth, Y. Zhang, F. Weninger, The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinsons & eating condition, Dresden, Germany, 2015.
[14]
B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J.K. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, K. Evanini, The INTERSPEECH 2016 computational paralinguistics challenge: deception, sincerity & native language, San Francisco, USA, 2016.
[15]
H. Kaya, A.A. Karpov, A.A. Salah, Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines, Springer, 2016.
[16]
G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing, 70 (2006) 489-501.
[17]
F. Eyben, M. Wllmer, B. Schuller, Opensmile: the Munich versatile and fast open-source audio feature extractor, 2010.
[18]
A. Akusok, K.M. Bjrk, Y. Miche, A. Lendasse, High-performance extreme learning machines: a complete toolbox for big data applications, IEEE Access 3 (2015) 10111025.
[19]
H. Kaya, F. Grpinar, S. Afshar, A.A. Salah, Contrasting and combining least squares based learners for emotion recognition in the wild, ICMI, Seattle, Washington, USA, 2015.
[20]
F. Grpinar, H. Kaya, H. Dibekliolu, A.A. Salah, Kernel ELM and CNN based facial age estimation, Las Vegas, Nevada, USA, 2016.
[21]
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, S. Kim, The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism, Lyon, France, 2013.
[22]
B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, Y. Zhang, The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load, ISCA, Singapore, 2014.
[23]
H. Kaya, T. zkaptan, A.A. Salah, S.F. Grgen, Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction, ISCA, Singapore, 2014.
[24]
Z. Zhang, F. Weninger, M. Wllmer, B. Schuller, Unsupervised learning in cross-corpus acoustic emotion recognition, 2011.
[25]
B. Schuller, Z. Zhang, F. Weninger, G. Rigoll, Using multiple databases for training in emotion recognition: to unite or to vote?, 2011.
[26]
W. Zong, G.-B. Huang, Y. Chen, Weighted extreme learning machine for imbalance learning, Neurocomputing, 101 (2013) 229-242.
[27]
K. Li, X. Kong, Z. Lu, L. Wenyin, J. Yin, Boosting weighted ELM for imbalanced learning, Neurocomputing, 128 (2014) 15-21.
[28]
D. Wu, Z. Wang, Y. Chen, H. Zhao, Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset, Neurocomputing, 190 (2016) 35-49.
[29]
X. Gao, Z. Chen, S. Tang, Y. Zhang, J. Li, Adaptive weighted imbalance learning with application to abnormal activity recognition, Neurocomputing, 173 (2016) 1927-1935.
[30]
N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accurate online sequential learning algorithm for feedforward networks, IEEE Trans. Neural Netw., 17 (2006) 1411-1423.
[31]
M.J. Er, V.K. Yalavarthi, N. Wang, R. Venkatesan, Springer, 2016.
[32]
G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst., Man, Cybernet., Part b: Cybernet., 42 (2012) 513-529.
[33]
G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine : a new learning scheme of feedforward neural networks, 2004.
[34]
C.R. Rao, S.K. Mitra, Wiley New York, 1971.
[35]
J.A. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett., 9 (1999) 293-300.
[36]
H.M. Meral, H.K. Ekenel, A. Ozsoy, Analysis of emotion in turkish, 2003.
[37]
H. Kaya, A.A. Salah, S.F. Gurgen, H. Ekenel, Protocol and baseline for experiments on bogazici university Turkish emotional speech corpus, Trabzon, Turkey, 2014.
[38]
F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of German emotional speech, Lisbon, Portugal, 2005.
[39]
I. Engberg, A. Hansen, Documentation of the Danish Emotional Speech Database (DES), Internal AAU report (1996). Center for Person Kommunikation, Denmark.
[40]
O. Martin, I. Kotsia, B. Macq, I. Pitas, The eNTERFACE 05 audio-visual emotion database, 2006.
[41]
V. Makarova, V.A. Petrushin, RUSLANA: a database of Russian emotional utterances, Denver, Colorado, USA, 2002.
[42]
C. Stanislavski, An Actor Prepares, Routledge, 1989.
[43]
B. Schuller, Z. Zhang, F. Weninger, F. Burkhardt, Synthesized speech for model training in cross-corpus recognition of human emotion, Int. J. Speech Technol., 15 (2012) 313-323.
[44]
B. Schuller, S. Steidl, A. Batliner, The interspeech 2009 emotion challenge, Brighton, UK, 2009.
[45]
T.G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Comput., 10 (1998) 1895-1923.

Cited By

View all
  1. Efficient and effective strategies for cross-corpus acoustic emotion recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Neurocomputing
    Neurocomputing  Volume 275, Issue C
    January 2018
    2070 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 31 January 2018

    Author Tags

    1. Acoustic emotion recognition
    2. Cross-corpus adaptation
    3. Extreme learning machines

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media