Article

LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Authors:

Jun YinAuthors Info & Claims

MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Pages 212 - 219

https://doi.org/10.1145/1027527.1027576

Published: 10 October 2004 Publication History

Abstract

We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a multimodal approach, where the appropriate pairing of audio and text processing helps create a more accurate system. Our audio processing technique uses a combination of top-down and bottom-up approaches, combining the strength of low-level audio features and high-level musical knowledge to determine the hierarchical rhythm structure, singing voice and chorus sections in the musical audio. Text processing is also employed to approximate the length of the sung passages using the textual lyrics. Results show an average error of less than one bar for per-line alignment of the lyrics on a test bed of 20 songs (sampled from CD audio and carefully selected for variety). We perform holistic and per-component testing and analysis and outline steps for further development.

References

[1]

Arifi, V., Clausen, M., Kurth, F., and Muller, M. Automatic Synchronization of Music Data in Score-, MIDI- and PCM-Format. In Proc. of Intl. Symp. on Music Info. Retrieval (ISMIR), 2003.

[2]

Berenzweig, A. and Ellis, D.P.W. Locating singing voice segments within music signals. In Proc. of orkshp. on App. of Signal Proc. to Audio and Acoustics (WASPAA), 2001.

[3]

Berenzweig, A., Ellis, D.P.W. and Lawrence, S. Using voice segments to improve artist classification of music. In Proc. of AES-22 Intl. Conf. on Virt., Synth., and Ent. Audio. Espoo, Finland, 2002.

[4]

Dannenberg, R. and Hu, N. Polyphonic Audio Matching for Score Following and Intellegent Audio Editor, In Proc. of Intl. Computer Music Conf. (ICMC), Singapore, 2003.

[5]

Furini, M. and Alboresi, L. Audio-Text Synchronization inside MP3 files: A new Approach and its Implementation. In Proc. of IEEE Consumer Communication and Networking Conf., Las Vegas, USA, 2004.

[6]

Goto, M. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sound. J. of New Music Research, 30(2):159-171, June 2001.

[7]

Goto, M. A Chorus-Section detection Method for Musical Audio Signals. In Proc. of IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2003.

[8]

Minibayeva, N. and Dunn, J-W. A Digital Library Data Model for Music. In Proc. of ACM/IEEE-CS Joint Conf. on Digital Libraries (JCDL), 2002.

Digital Library

[9]

Musicnotes.com. Commercial sheet music resource. http://www.musicnotes.com/.

[10]

Nwe T.L., Wei, F.S. and De Silva, L.C. Stress Classification Using Subband Based Features, IEICE Trans.on Info. and Systems, E86-D (3), pp. 565--573, 2003.

[11]

Shenoy, A., Mohapatra, R. and Wang, Y. Key Determination of Acoustic Musical Signals. In Proc. of the Int'l Conf. on Multimedia and Expo (ICME), Taipei, Taiwan, 2004.

[12]

Turetsky, R. J. and Ellis, D.P.W. Ground Truth Transcriptions of Real Music from Force-aligned MIDI Syntheses. In Proc. of Intl. Symp. On Music Info. Retrieval (ISMIR), 2003.

[13]

Tzanetakis, G. Song-specific bootstrapping of singing voice structure. In Proc. of the Int'l Conf. on Multimedia and Expo (ICME), Taipei, Taiwan, 2004.

[14]

Wang, C.K., Lyu, R.Y. and Chiang, Y.C. An Automatic Singing Transcription System with Multilingual Singing Lyric Recognizer and Robust Melody Tracker. In Proc. of EUROSpeech, Geneva, Switzerland, 2003.

[15]

Nwe T.L., Shenoy, A., Wang, Y., Singing Voice Detection in Popular Music, In Proc. of ACM Multimedia 2004

Digital Library

[16]

Weide, R. CMU Pronouncing Dictionary (release 0.6, 1995). http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Cited By

Jin ZHuang SNie XZhou XYi YZhou G(2023)Contrastive Learning-Based Generic Audio-to-Lyrics Alignment2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00152(908-914)Online publication date: 21-Dec-2023
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00152
Durand SStoller DEwert S(2023)Contrastive Learning-Based Audio to Lyrics Alignment for Multiple LanguagesICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096725(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10096725
Krause MWeiß CMüller M(2023)Soft Dynamic Time Warping for Multi-Pitch Estimation and BeyondICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095907(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10095907
Show More Cited By

Index Terms

LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A Query-by-Singing System for Retrieving Karaoke Music

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels ...
Aligning Incomplete Lyrics of Korean Folk Song Dataset using Whisper
DLfM '23: Proceedings of the 10th International Conference on Digital Libraries for Musicology

In this study, we introduce a method for time-alignment of lyrics in Korean folk song audio using a transformer encoder-decoder model specifically designed to utilize incomplete lyric data. We analyzed the characteristics of Korean folk song lyrics and ...
Exploring Vibrato-Motivated Acoustic Features for Singer Identification

Vibrato is a slightly tremulous effect imparted to vocal or instrumental tone for added warmth and expressiveness through slight variation in pitch. It corresponds to a periodic fluctuation of the fundamental frequency. It is common for a singer to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

October 2004

1028 pages

ISBN:1581138938

DOI:10.1145/1027527

General Chairs:
Henning Schulzrinne
Columbia University
,
Nevenka Dimitrova
Philips Research
,
Program Chairs:
Angela Sasse
UCL
,
Sue Moon
KAIST
,
Rainer Lienhart
U Augsburg

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM04

Sponsor:

MM04: 2004 12th Annual ACM International Conference on Multimedia

October 10 - 16, 2004

NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
495
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jin ZHuang SNie XZhou XYi YZhou G(2023)Contrastive Learning-Based Generic Audio-to-Lyrics Alignment2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00152(908-914)Online publication date: 21-Dec-2023
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00152
Durand SStoller DEwert S(2023)Contrastive Learning-Based Audio to Lyrics Alignment for Multiple LanguagesICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096725(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10096725
Krause MWeiß CMüller M(2023)Soft Dynamic Time Warping for Multi-Pitch Estimation and BeyondICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095907(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10095907
Gao XGupta CLi H(2022)Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task LearningIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.319074230(2280-2294)Online publication date: 2022
https://doi.org/10.1109/TASLP.2022.3190742
Gui WLi YZang XZhang J(2021)Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural NetworksApplied Sciences10.3390/app11241183811:24(11838)Online publication date: 13-Dec-2021
https://doi.org/10.3390/app112411838
Geng HHu YHuang H(2020)Monaural Singing Voice and Accompaniment Separation Based on Gated Nested U-Net ArchitectureSymmetry10.3390/sym1206105112:6(1051)Online publication date: 24-Jun-2020
https://doi.org/10.3390/sym12061051
Stoller DDurand SEwert S(2019)End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-character Recognition ModelICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8683470(181-185)Online publication date: May-2019
https://doi.org/10.1109/ICASSP.2019.8683470
Sharma BGupta CLi HWang Y(2019)Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic ModelsICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682582(396-400)Online publication date: May-2019
https://doi.org/10.1109/ICASSP.2019.8682582
Sutcliffe RHovy ECollins TWan SCrawford TRoot D(2019)Searching for musical features using natural language queriesLanguage Resources and Evaluation10.1007/s10579-018-9422-253:1(87-140)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s10579-018-9422-2
Lin KBalamurali BKoh ELui SHerremans D(2018)Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropyNeural Computing and Applications10.1007/s00521-018-3933-zOnline publication date: 13-Dec-2018
https://doi.org/10.1007/s00521-018-3933-z
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten