Automatic Lyrics Alignment for Cantonese Popular Music

Chi Hang WONG, Wai Man SZETO, Kin Hong WONG
Department of Computer Science and Engineering
The Chinese University of Hong Kong
Shatin, N.T., Hong Kong
{chwong1, wmszeto, khwong}@cse.cuhk.edu.hk

Abstract

From lyrics-display on electronic music players and Karaoke videos to surtitles for live Chinese opera performance, one feature is common to all these everyday functionalities: temporal synchronization of the written text and its corresponding musical phrase. Our goal is to automate the process of lyrics alignment, a procedure which, to date, is still handled manually in the Cantonese popular song (Cantopop) industry.

In our system, a vocal signal enhancement algorithm is developed to extract vocal signals from a CD recording in order to detect the onsets of the syllables sung and to determine the corresponding pitches. The proposed system is specifically designed for Cantonese, in which the contour of the musical melody and the tonal contour of the lyrics must match perfectly. With this prerequisite, we use a dynamic time warping algorithm to align the lyrics. The robustness of this approach is supported by experiment results. The system was evaluated with 70 twenty-second music segments and most samples have their lyrics aligned correctly.

Appendix

Experimental Result of Onset Detection (pdf)

Demonstrations of the overall system

Explanation of the screen

Explanation of the demo videos. The upper panel displays the timing of a lyrics sentence estimated by the system. The lower panel displays the actual timing of a lyrics sentence found manually.

Demo videos

Video 1: In-range accuracy = 91.06% (3.9 MB)

Video 2: In-range accuracy = 81.42% (3.9 MB)

Video 3: In-range accuracy = 79.51% (3.9 MB)

Video 4: In-range accuracy = 61.69% (3.9 MB)




With the help of TEX by TTH, version 3.67.
On 1 May 2006.