skip to main content
10.1145/3660570.3660577acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdlfmConference Proceedingsconference-collections
short-paper

Direct Labelling of Form of Classical-Period Piano Sonata Movements From Audio Recordings

Published: 27 June 2024 Publication History

Abstract

Musical form is defined as the overall structure of a music piece. The labelling of musical form types (for the purpose of, e.g., querying online music databases) by utilizing raw audio alone is a relatively unexplored area in the field of music information retrieval research. This study investigates the use of self-similarity matrices based on features derived from the raw audio as input into a convolutional neural network to label eight form types found in the movements of piano sonatas from the Classical period, composed by Mozart, Beethoven, Haydn, Clementi and Czerny. Specifically, the focus on pieces for solo piano allows for the use of piano roll features which are generated from the raw audio by state-of-the-art piano transcription software. This work entails the first time that passing the entire self-similarity matrix to a convolutional neural network for the purposes of overall musical form recognition is proposed and explored. The method circumvents the potential difficulties related to inferring form labels in a bottom-up manner based on audio segment boundary detection and segment matching, by directly generating form labels from the audio. Self-similarity matrices based on velocity piano rolls (that contain values that relate to the velocity of the notes being played) were found to outperform other self-similarity matrix types and achieved a macro average ROC-AUC score of 0.823 and a coverage score of 2.045 on a custom data set which was compiled from verified musicological sources. The study is posed as a multi-label classification problem rather than a multi-class classification problem as different form labels were found for several piano sonata movements.

References

[1]
A. Peter. Brown. 1986. Joseph Haydn’s keyboard music : sources and style. Indiana University Press, Bloomington.
[2]
Pedro Cano, Emilia Gómez, Fabien Gouyon, Perfecto Herrera, Markus Koppenberger, Beesuan Ong, Xavier Serra, Sebastian Streich, and Nicolas Wack. 2006. ISMIR 2004 Audio Description Contest. Tech. Rep. MTG-TR-2006-02, Universitat Pompeu Fabra (01 2006).
[3]
Francois Chollet 2015. Keras. https://github.com/fchollet/keras
[4]
Alex Clark. 2015. Pillow (PIL Fork) Documentation. https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf
[5]
Jacopo de Berardinis, Michalis Vamvakaris, Angelo Cangelosi, and Eduardo Coutinho. 2020. Unveiling the Hierarchical Structure of Music by Multi-Resolution Community Detection. Transactions of the International Society for Music Information Retrieval 3, 1 (24 6 2020), 82–97. https://doi.org/10.5334/tismir.41
[6]
Jonathan Foote. 1999. Visualizing Music and Audio Using Self-Similarity. In Proceedings of the Seventh ACM International Conference on Multimedia (Part 1) (Orlando, Florida, USA) (MULTIMEDIA ’99). Association for Computing Machinery, New York, NY, USA, 77–80. https://doi.org/10.1145/319463.319472
[7]
Stuart. Foster. 1993. Tonal methods of cyclic unification in Haydn’s mature keyboard sonatas. [U.M.I. Dissertation Services], [Ann Arbor, Mich.].
[8]
Thomas Grill and Jan Schluter. 2015. Music boundary detection using neural networks on spectrograms and self-similarity lag matrices. In 2015 23rd European Signal Processing Conference (EUSIPCO). 1296–1300. https://doi.org/10.1109/EUSIPCO.2015.7362593
[9]
Thomas Grill, jan schlüter, and karen ullrich. 2014. Boundary Detection in Music Structure Analysis using Convolutional Neural Networks. In International Society for Music Information Retrieval Conference.
[10]
Peter Grosche, Meinard Müller, and Frank Kurth. 2010. Cyclic tempogram—A mid-level tempo representation for musicsignals. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. 5522–5525. https://doi.org/10.1109/ICASSP.2010.5495219
[11]
H.A. Harding and P. Fleury. 2014. Analysis of Form: Beethoven’s 32 Piano Sonatas. CreateSpace Independent Publishing Platform.
[12]
Christopher Harte, Mark Sandler, and Martin Gasser. 2006. Detecting harmonic change in musical audio. In Proceedings of the ACM International Multimedia Conference and Exhibition. https://doi.org/10.1145/1178723.1178727
[13]
Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. 2019. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. In International Conference on Learning Representations.
[14]
Daniel Heartz and Bruce Alan Brown. 2001. Classical. Grove Music Online (2001). https://doi.org/10.1093/gmo/9781561592630.article.05889
[15]
Carlos Hernandez Oliván, José Ramón Beltrán Blázquez, and David Díaz-Guerra Aparicio. 2021. Music Boundary Detection using Convolutional Neural Networks: A Comparative Analysis of Combined Input Features. International Journal of Interactive Multimedia and Artificial Intelligence 7 (12 2021), 78. https://doi.org/10.9781/ijimai.2021.10.005
[16]
Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, and Yuxuan Wang. 2021. High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021). https://doi.org/10.1109/TASLP.2021.3121991
[17]
Levi Larson. 2015. An Underestimated Master: A Critical Analysis of Carl Czerny’s Eleven Piano Sonatas and his Contribution to the Genre. Ph. D. Dissertation. University of Nebraska - Lincoln.
[18]
F. Helena. Marks. 1921. The sonata, its form and meaning as exemplified in the piano sonatas by Mozart a descriptive analysis. W. Reeves, London.
[19]
Brian McFee, Colin Raffel, Dawen Liang, Daniel Ellis, Matt Mcvicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. In Proceeding of the 14th Python in Science Conference (SCIPY2015). 18–24. https://doi.org/10.25080/Majora-7b98e3ed-003
[20]
Meinard Müller. 2015. Fundamentals of Music Processing. Springer. https://doi.org/10.1007/978-3-319-21945-5
[21]
Ndiatenda Ndou, Ritesh Ajoodha, and Ashwini Jadhav. 2021. Music Genre Classification: A Review of Deep-Learning and Traditional Machine-Learning Approaches. In 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). 1–6. https://doi.org/10.1109/IEMTRONICS52119.2021.9422487
[22]
Tim O’Brien. 2016. MUSICAL STRUCTURE SEGMENTATION WITH CONVOLUTIONAL NEURAL NETWORKS. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR).
[23]
! T E K Radloff. 1987. The piano sonatas of Muzio Clementi: an investigation into compositional aspects with special emphasis on developments in form and style. Ph. D. Dissertation. Rhodes University.
[24]
Janet 1881-1951. Salsbury. 1917. Short and concise analysis of Mozart’s twenty-two pianoforte sonatas : with a description of some of the various forms. Weekes ;, London.
[25]
Jordan Smith, John Burgoyne, Ichiro Fujinaga, David De Roure, and J. Downie. 2011. Design and creation of a large-scale database of structural annotations. In Proceedings of the 12th International Society for Music Information Retrieval. 555–560.
[26]
Eileen Stainkamph. 1967. The form and analysis of Mozart’s pianoforte sonatas. Allan’s Music (Australia) Melbourne. 35 p. ; pages.
[27]
Eileen Stainkamph. 1968. Form and analysis of the complete Beethoven’s pianoforte sonatas. Allans Music (Australia) Melbourne. 61 p. ; pages.
[28]
Eileen Stainkamph. 1970. The form and analysis of twenty-seven Haydn pianoforte sonatas. Allans Music (Australia) Melbourne. 64 p. : pages.
[29]
Daniel Szelogowski. 2022. SMFSA-Database-And-Form-NN. https://doi.org/10.13140/RG.2.2.33554.12481
[30]
Daniel Szelogowski, Lopamudra Mukherjee, and Benjamin Whitcomb. 2022. A Novel Dataset and Deep Learning Benchmark for Classical Music Form Recognition and Analysis. In Proceedings of the 23rd International Society for Music Information Retrieval Conference. ISMIR, Bengaluru, India, 900–907. https://doi.org/10.5281/zenodo.7416689
[31]
Donald Francis 1875-1940. Tovey. 1951. A companion to Beethoven’s pianoforte sonatas (bar-to-bar analysis). Associated Board of the Royal Schools of Music, London.
[32]
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multi-label Data. Boston, MA : Springer US, 667–685. https://doi.org/10.1007/978-0-387-09823-4_34
[33]
Bertil H. Van Boer. 2012. Historical dictionary of music of the classical period. Scarecrow Press, Lanham, Md.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DLfM '24: Proceedings of the 11th International Conference on Digital Libraries for Musicology
June 2024
83 pages
ISBN:9798400717208
DOI:10.1145/3660570
  • Editor:
  • David M. Weigl
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Classical Music
  2. Deep Learning
  3. Form Recognition
  4. Music Information Retrieval
  5. Signal Processing

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

DLfM 2024

Acceptance Rates

Overall Acceptance Rate 27 of 48 submissions, 56%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 18
    Total Downloads
  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media