research-article

Assessing the effectiveness of ensembles in Speech Emotion Recognition: : Performance analysis under challenging scenarios▪

Authors:

Juan-Miguel López-Gil,

Nestor Garay-VitoriaAuthors Info & Claims

Volume 243, Issue C

https://doi.org/10.1016/j.eswa.2023.122905

Published: 01 June 2024 Publication History

Abstract

Speech Emotion Recognition (SER) is an important application in areas such as online gaming, e-learning, and medical care. However, recognizing emotion in speech is computationally difficult since it necessitates a thorough search for feature selection, algorithm hyperparameter tuning, or algorithm combinations, making ensemble use interesting. Although ensembles are frequently employed in SER, their application has not been greatly explored, and their potential benefits for enhancing recognition accuracy and robustness to variability in speech signals have not been fully realized. The purpose of this article is to assess the effectiveness of ensembles in SER by analyzing their performance under challenging scenarios. The experiment made in this study involved evaluating speech samples from various languages, using an out-of-date set of features, and using simple algorithms with default hyperparameters. For classifier set selection, a basic ensemble technique with decision-level voting and a rudimentary heuristic were applied.

The results indicated that basic classifiers significantly improved the SER rate, with an absolute improvement ranging from 0.57% to 9.89%. The suggested ensemble approach outperformed state of the art SER methods, including deep learning-based ones, in terms of recognition rates. The findings justify the use of ensembles in SER applications, particularly in circumstances with insufficient data or out-of-date features and algorithms. The work recommends further investigation of ensembles to enhance recognition accuracy and improve robustness in the face of voice signal variability. Finally, the results of the experiment show that ensembles have the potential to increase SER accuracy, and future research in this field can benefit from the study’s conclusions.

Highlights

•

Assessed the effectiveness of ensembles in SER under challenging scenarios.

•

Evaluated speech samples from various languages.

•

The suggested ensemble approach outperformed state-of-the-art SER methods.

•

The findings justify the use of ensembles in SER applications.

•

The work advises that ensembles be investigated further.

References

[1]

Abbaschian B.J., Sierra-Sosa D., Elmaghraby A., Deep learning techniques for speech emotion recognition, from databases to models, Sensors 21 (4) (2021) 1249.

Abstract

Highlights

References

Cited By

Recommendations

Application of Emotion Recognition and Modification for Emotional Telugu Speech Recognition

Voting ensembles for spoken affect classification

Speech Emotion Recognition: A Comprehensive Survey

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations