skip to main content
10.1145/2647868.2654904acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning

Published: 03 November 2014 Publication History

Abstract

Music emotion recognition, which aims to automatically recognize the affective content of a piece of music, has become one of the key components of music searching, exploring, and social networking applications. Although researchers have given more and more attention to music emotion recognition studies, the recognition performance has come to a bottleneck in recent years. One major reason is that experts' labels for music emotion are mostly song-level, while music emotion usually varies within a song. Traditional methods have considered each song as a single instance and have built models based on song-level features. However, they ignored the dynamics of music emotion and failed to capture accurate emotion-feature correlations. In this paper, we model music emotion recognition as a novel multi-label multi-layer multi-instance multi-view learning problem: music is formulated as a hierarchical multi-instance structure (e.g., song-segment-sentence) where multiple emotion labels correspond to at least one of the instances with multiple views of each layer. We propose a Hierarchical Music Emotion Recognition model (HMER) -- a novel hierarchical Bayesian model using sentence-level music and lyrics features. It captures music emotion dynamics with a song-segment-sentence hierarchical structure. HMER also considers emotion correlations between both music segments and sentences. Experimental results show that HMER outperforms several state-of-the-art methods in terms of $F_1$ score and mean average precision.

References

[1]
AllMusic moods. Online: http://www.allmusic.com/moods (9 Dec 2011).
[2]
Bohemian rhapsody. Online (22 March 2014): http://www.queensongs.info/the-book/songwritinganalyses/no-synth-era/a-night-at-the-opera/bohemianrhapsody.html.
[3]
T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In Proceedings of the International Society for Music Information Retrieval Conference, pages 591--596, 2011.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of Machine Learning Research, 3:993--1022, 2003.
[5]
M. M. Bradley and P. J. Lang. Affective norms for English words (ANEW): Instruction manual and affective ratings. Psychology, (C-1):1--45, 1999.
[6]
R. Cai, C. Zhang, C. Wang, L. Zhang, and W.-Y. Ma. Musicsense: contextual music recommendation using emotional allocation modeling. In Proceedings of the 15th ACM International Conference on Multimedia, pages 553--556, 2007.
[7]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.
[8]
Z. Gu, T. Mei, X.-S. Hua, J. Tang, and X. Wu. Multi-layer multi-instance learning for video concept detection. IEEE Transactions on Multimedia, 10(8):1605--1616, 2008.
[9]
X. Hu and J. S. Downie. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, pages 159--168. ACM, 2010.
[10]
Y. Hu, X. Chen, and D. Yang. Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In Proceedings of the International Society for Music Information Retrieval Conference, pages 123--128, 2009.
[11]
B. Jun Han, S. Rho, R. B. Dannenberg, and E. Hwang. Smers: Music emotion recognition using support vector regression. In Proceedings of the International Society for Music Information Retrieval Conference, pages 651--656, 2009.
[12]
P. N. Juslin and J. A. Sloboda. Music and emotion: Theory and research. Oxford University Press, 2001.
[13]
C. Laurier, J. Grivolla, and P. Herrera. Multimodal music mood classification using audio and lyrics. In International Conference on Machine Learning and Applications, pages 688--693. IEEE, 2008.
[14]
T. Li and M. Ogihara. Content-based music similarity search and emotion detection. In IEEE International Conference on Acoustics, Speech, and Signal Processing., volume 5, pages 705--708, 2004.
[15]
L. Lu, D. Liu, and H. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):5--18, 2006.
[16]
M. I. Mandel and D. P. Ellis. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval Conference, pages 577--582, 2008.
[17]
C.-T. Nguyen, D.-C. Zhan, and Z.-H. Zhou. Multi-modal image annotation with multi-instance multi-label lda. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pages 1558--1564, 2013.
[18]
T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Machine Learning, 88(1--2):157--208, 2012.
[19]
E. M. Schmidt and Y. E. Kim. Prediction of time-varying musical mood distributions using kalman filtering. In IEEE International Conference on Machine Learning and Applications, pages 655--660, 2010.
[20]
E. M. Schmidt and Y. E. Kim. Modeling musical emotion dynamics with conditional random fields. In Proceedings of the International Society for Music Information Retrieval Conference, pages 777--782, 2011.
[21]
K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In Proceedings of the International Society for Music Information Retrieval Conference, volume 8, pages 325--330, 2008.
[22]
G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data mining and knowledge discovery handbook, pages 667--685. Springer, 2010.
[23]
G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 10(5):293--302, 2002.
[24]
J.-C. Wang, Y.-H. Yang, H.-M. Wang, and S.-K. Jeng. The acoustic emotion gaussians model for emotion-based music annotation and retrieval. In Proceedings of the 20th ACM International Conference on Multimedia, pages 89--98, 2012.
[25]
X. Wang, X. Chen, D. Yang, and Y. Wu. Music emotion classification of chinese songs based on lyrics using tf*idf and rhyme. In Proceedings of the International Society for Music Information Retrieval Conference, pages 765--770, 2011.
[26]
B. Wu, E. Zhong, D. H. Hu, A. Horner, and Q. Yang. Smart: Semi-supervised music emotion recognition with social tagging. In SIAM International Conference on Data Mining, pages 279--287. SIAM, 2013.
[27]
C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.
[28]
Y.-H. Yang and H. H. Chen. Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology, 3(3):40, 2012.
[29]
Y.-H. Yang and J.-Y. Liu. Quantitative study of music listening behavior in a social and affective context. IEEE Transactions on Multimedia, 15(6):1304--1315, 2013.
[30]
Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008.
[31]
Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems, pages 1609--1616, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '14: Proceedings of the 22nd ACM international conference on Multimedia
November 2014
1310 pages
ISBN:9781450330633
DOI:10.1145/2647868
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multi-label multi-layer multi-instance multi-view learning
  2. music emotion recognition

Qualifiers

  • Research-article

Funding Sources

Conference

MM '14
Sponsor:
MM '14: 2014 ACM Multimedia Conference
November 3 - 7, 2014
Florida, Orlando, USA

Acceptance Rates

MM '14 Paper Acceptance Rate 55 of 286 submissions, 19%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media