Automatic Lyrics Alignment for Cantonese Popular Music
Chi Hang WONG, Wai Man SZETO, Kin Hong WONG
Department of Computer Science and Engineering
The Chinese University of Hong Kong
Shatin, N.T., Hong Kong
{chwong1, wmszeto, khwong}@cse.cuhk.edu.hk
Abstract
From lyrics-display on electronic music players and Karaoke videos
to surtitles for live Chinese opera performance, one feature is
common to all these everyday functionalities: temporal
synchronization of the written text and its corresponding musical
phrase. Our goal is to automate the process of lyrics alignment, a
procedure which, to date, is still handled manually in the Cantonese
popular song (Cantopop) industry.
In our system, a vocal signal enhancement algorithm is developed to
extract vocal signals from a CD recording in order to detect the
onsets of the syllables sung and to determine the corresponding
pitches. The proposed system is specifically designed for Cantonese,
in which the contour of the musical melody and the tonal contour of
the lyrics must match perfectly. With this prerequisite, we use a
dynamic time warping algorithm to align the lyrics. The robustness
of this approach is supported by experiment results. The system was
evaluated with 70 twenty-second music segments and most samples have
their lyrics aligned correctly.
Appendix
Experimental Result of Onset Detection (pdf)
Demonstrations of the overall system

Explanation of the demo videos.
The upper panel displays the timing of a lyrics sentence estimated by the system.
The lower panel displays the actual timing of a lyrics sentence found manually.
Demo videos
Video 1: In-range accuracy = 91.06% (3.9 MB)
Video 2: In-range accuracy = 81.42% (3.9 MB)
Video 3: In-range accuracy = 79.51% (3.9 MB)
Video 4: In-range accuracy = 61.69% (3.9 MB)
With the help of
TEX
by
TTH,
version 3.67.
On 1 May 2006.