skip to main content
10.1145/1027527.1027576acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

Published: 10 October 2004 Publication History

Abstract

We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a multimodal approach, where the appropriate pairing of audio and text processing helps create a more accurate system. Our audio processing technique uses a combination of top-down and bottom-up approaches, combining the strength of low-level audio features and high-level musical knowledge to determine the hierarchical rhythm structure, singing voice and chorus sections in the musical audio. Text processing is also employed to approximate the length of the sung passages using the textual lyrics. Results show an average error of less than one bar for per-line alignment of the lyrics on a test bed of 20 songs (sampled from CD audio and carefully selected for variety). We perform holistic and per-component testing and analysis and outline steps for further development.

References

[1]
Arifi, V., Clausen, M., Kurth, F., and Muller, M. Automatic Synchronization of Music Data in Score-, MIDI- and PCM-Format. In Proc. of Intl. Symp. on Music Info. Retrieval (ISMIR), 2003.
[2]
Berenzweig, A. and Ellis, D.P.W. Locating singing voice segments within music signals. In Proc. of orkshp. on App. of Signal Proc. to Audio and Acoustics (WASPAA), 2001.
[3]
Berenzweig, A., Ellis, D.P.W. and Lawrence, S. Using voice segments to improve artist classification of music. In Proc. of AES-22 Intl. Conf. on Virt., Synth., and Ent. Audio. Espoo, Finland, 2002.
[4]
Dannenberg, R. and Hu, N. Polyphonic Audio Matching for Score Following and Intellegent Audio Editor, In Proc. of Intl. Computer Music Conf. (ICMC), Singapore, 2003.
[5]
Furini, M. and Alboresi, L. Audio-Text Synchronization inside MP3 files: A new Approach and its Implementation. In Proc. of IEEE Consumer Communication and Networking Conf., Las Vegas, USA, 2004.
[6]
Goto, M. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sound. J. of New Music Research, 30(2):159-171, June 2001.
[7]
Goto, M. A Chorus-Section detection Method for Musical Audio Signals. In Proc. of IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2003.
[8]
Minibayeva, N. and Dunn, J-W. A Digital Library Data Model for Music. In Proc. of ACM/IEEE-CS Joint Conf. on Digital Libraries (JCDL), 2002.
[9]
Musicnotes.com. Commercial sheet music resource. http://www.musicnotes.com/.
[10]
Nwe T.L., Wei, F.S. and De Silva, L.C. Stress Classification Using Subband Based Features, IEICE Trans.on Info. and Systems, E86-D (3), pp. 565--573, 2003.
[11]
Shenoy, A., Mohapatra, R. and Wang, Y. Key Determination of Acoustic Musical Signals. In Proc. of the Int'l Conf. on Multimedia and Expo (ICME), Taipei, Taiwan, 2004.
[12]
Turetsky, R. J. and Ellis, D.P.W. Ground Truth Transcriptions of Real Music from Force-aligned MIDI Syntheses. In Proc. of Intl. Symp. On Music Info. Retrieval (ISMIR), 2003.
[13]
Tzanetakis, G. Song-specific bootstrapping of singing voice structure. In Proc. of the Int'l Conf. on Multimedia and Expo (ICME), Taipei, Taiwan, 2004.
[14]
Wang, C.K., Lyu, R.Y. and Chiang, Y.C. An Automatic Singing Transcription System with Multilingual Singing Lyric Recognizer and Robust Melody Tracker. In Proc. of EUROSpeech, Geneva, Switzerland, 2003.
[15]
Nwe T.L., Shenoy, A., Wang, Y., Singing Voice Detection in Popular Music, In Proc. of ACM Multimedia 2004
[16]
Weide, R. CMU Pronouncing Dictionary (release 0.6, 1995). http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Cited By

View all

Index Terms

  1. LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia
      October 2004
      1028 pages
      ISBN:1581138938
      DOI:10.1145/1027527
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 October 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. audio/text synergy
      2. karaoke
      3. lyric alignment
      4. music knowledge
      5. vocal detection

      Qualifiers

      • Article

      Conference

      MM04

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media