Prof.
K.H. Wong's Mphil/PhD research projects

K.H.
Wong,
Publications
2011
- New projects for research students, write to me (Prof.
K.H. Wong)
if you are interested.
- Title: 3-D display machine for viewing without spectacles.
Summary: We propose a new 3-D display interface using multiple
projectors projecting on the surface of a white sphere, which can be
held and
played with by a user. If the images projected onto the sphere are
changing
according to its position, we can perceive a 3-D object inside the
sphere. It
is a natural 3-D graphics interface for viewing without spectacles. The
product
can be used in education, games and medical applications for showing
3-D models
easily at low cost. The work involves the tracking of the sphere and
the head
of the user, and generating suitable images for the projectors. The
proposal is based on two previous projects (i) "A
Projector-based Hand-held Display System" by us, (youtube demo: http://www.youtube.com/watch?v=oyxR_RT4NNc
), and (iii) Head Tracking for Desktop VR Displays using the WiiRemote
by
Johnny Lee (youtube demo : http://www.youtube.com/watch?v=Jd3-eiid-Uw
).
- Title: Face Expression tracking by computer
vision. Summary: Real time tracking of facial expressions
has many
applications in game and movie making. Accurate real time face
expression
tracking is currently achievable but some systems require a tedious
task of wearing
dots on the user’s face. We propose to mount multiple cameras on the
head, so
each camera is observing at close-up distance (about 10 cm) on a
particular
area of the face for accurate tracking. The cameras have these
features: (i)
The cameras are facing the chins, eyes and are fixed on a head mount.
(ii) The
relative positions of the cameras and the face is not changed by head
notion,
since they all fixed on the head mount. (iii) The head (with the head
mount) is
being tracked by an external camera about 1 meter away from the head.
By combining all the tracked results, the pose of the head as well as
the detailed expression of the face can be found, and it can be used to
drive the animation of a virtual human figure. The
tracking tasks are distributed, so accuracy and efficiency can be
enhanced. For
more information about face tracking see: http://www.captivemotion.com/
and http://www.image-metrics.com/projects
.
- Resarch Grants:
- Title: A Visual
Surveillance System for Sea Search and rescue Missions 電腦視覺應用於飛行拯救預警系統 (From
2009 dec30, for two years)
- Abstract:
The aim of this research is to develop a vision based system
to assist the aircrew in search and rescue operations at sea. During
the stormy
season, the Government flying service of Hong Kong
will send out fixed wing planes to search for survivors after a
maritime
accident is reported. The search can last up to 5 hours and is now
conducted by
the naked eyes to search for life rafts from a height of 500 ft. It is
a
fatigue job and targets may be missed because of human errors.
Therefore the
Government flying service has initiated a project on using computer
vision
techniques to assist the aircrew to detect life rafts at sea. We have
completed
some preliminary tests and have the following conclusion and
suggestions. (1)
One camera is not enough because the field of view is too narrow for
such an
application. We suggest a divide-and-conquer method of using 3 cameras
(each
attached to a computer) aligned vertically to increase the field of
view and
enhance data communication/processing speed. (2) Using one frame for
the
detection has a major problem of misinterpreting waves as targets.
Therefore we
suggest using tracking techniques such as the Kalman or Particle filter
to
solve the problem. So only objects last long enough in an image
sequence can be
accepted as potential targets. (3) We can further enhance the accuracy
by using
projective geometry and the calibrated parameters of the cameras on
board to
form a 3-D Kalman object tracker. This is to make sure only objects
move
reasonably on the surface of the sea are considered as possible targets.
- This work was
supported by a direct Grant of Project Code 2050455 from the Faculty of Engineering,
The Chinese University
of Hong Kong.
- Title: Music
nuance extraction from audio signals 從音頻信號中提取音樂色調的細微差別 (From 1 Feb 2011 for two years)
- Abstarct: What makes a good
musical performance? An expressive music performance owes its emotive
power to
the performer’s skills in shaping the music with nuances. For the
purpose of
performance analysis, nuance can be defined as any subtle manipulation
of sound
parameters including attack, timing, pitch, loudness and timbre. Our
research
goal is to understand the origin of expressiveness in a performance by
uncovering nuances from the recorded signal of the performance.One
difficulty of
quantifying nuances is that extracting the relevant sound parameters
from audio
signals is often nontrivial. For instance, in a piano performance, very
often
multiple keys are being struck throughout; key combinations often makes
finding
out exactly the onset time and strength of each individual key from the
combined signal very difficult. Previous works on
sound parameter extraction from audio signals using general source
separation
techniques have not yielded robust and accurate separations, largely
because of
the overlapping partials described above. Here, we propose a general
analytic
paradigm for extracting sound parameters using a source separation
method that
incorporates the following in its formulation: (1) Prior knowledge of
the
instrument so that the signal generation model can be properly
constrained, and
(2) The sequence of music notes played in the recorded piece. Our
proposed
technique will allow us to find out exactly the relative strength,
onset time
and other parameters of individual piano keys in a performance, thereby
enabling us to further understand the musical basis of performance
expressively.
- This work was
supported by a direct Grant of Project Code
2050486 from the
Faculty of Engineering, The Chinese University
of Hong Kong.
Vision
Research Demos
3D
DEMOS
Mphil projects
-
Title: Virtual reality 3D display on a flat
screen
The aim of this project is to display 3D
objects on a flat displaying
screen. The method is based on the fact that if the display is changing
according
to the relative position of the head to the screen, we can create a 3D
perception on screen. There are many methods to track the head motion,
such as
using a stationary camera to view and track the head, or mount a camera
on the head
to track the head motion. The first step of the project is to
investigate how
to use a webcam to track the head position. It is recommended the open
source
computer vision OPENCV library (http://opencvlibrary.sourceforge.net/ )
can be
used for the processing. Then the next step is to create 3D objects to
be
displayed on screen according to the head motion. There are many
interesting applications
of such systems, such as making interactive games and virtual reality
display
systems.
Reference: Johnny Chung Lee - Projects - Wii http://www.youtube.com/watch?v=Jd3-eiid-Uw
Funded Projects
RGC funded project (from Aug 2004,
project
no. CUHK4204/04E, HK$339,414) :
Title: Model reconstruction for computer game
develop
The work desrcibed in this page was
supported by
a grant from the Research Grant Council of Hong Kong Special
Administrative
Region. (Project Number. CUHK4202/04E)
- Abstract:
The aim of
this
research
is to help game developers to capture real scenes to be used in their
virtual
site construction. Many game players prefer interactive games situated
in real-world scenes to give them the realistic feeling. However,
capturing
real scene and modeling them are time consuming and expensive. Making
some
procedures automatic will save a lot of time and cost. Apart from game
developments there are other applications, for example, users can
develop
virtual walkthrough web pages for Internet users. Also home game
players
can capture their favorite environments and objects to be displayed
interactively
in three dimensions. Since
a
full
scale
environment
requires multiple image sequences to cover the
whole
area. Our research will concentrate on combining multiple sequences for
model reconstruction. The research issues are as follows. (1) We
propose
a tracking technique called Interacting Multiple Model Methods (IMM)
for
tracking the camera motion for a long image sequence to reduce error.
(2)
If the distance between the camera and the object in one sequence is
very
different from the others there is a multi-scale problem in combining
them.
We will try to solve this by a coarse-to-fine global optimization
technique.
(3) Additional user inputs may be used to assist the software for
finding
the correct model. After solving the above problems, we hope we can
develop
an efficient system that can capture the 3D model of a large
environment
with localized details to be used in game developments.
Direct Grant from
The Engineering Faculty, CUHK (from 1 Nov 2005,
project
ID. 2050350 HK$65,360) :
Title: Camera array for model reconstruction and pose estimation
- Abstract:
The
aim
of
this
research is to investigate how to utilize an array of low
cost cameras for model reconstruction and pose estimation. The input of
the system is a set of cameras and the output is the 3D model of the
objects being viewed and the pose of the cameras. The results are
useful for virtual walkthrough system, robot navigation and 3D computer
game development. The problem has been investigated by many researchers
before; however applying an array of cameras (8 x 8 or more, arranged
on a 1 meter square plane) is relatively new. Fundamentally, we believe
utilizing the camera array we can provide a solution to the occlusion
problem in computer vision. For example, if some of the cameras are
blocked, the others can fill in the information. Mathematical methods
for a single camera of camera pairs can be extended for a camera array
and we propose the following formulations. (1) Extend the traditional
formulations, such as the fundamental matrix or trifocal tensor, to an
array of cameras, (2) Use of a tracking method (Kalman filtering) for
the multiple image sequences obtained by the camera array. Since China
is major manufacturing base of low cost digital cameras, our work in
using camera arrays can become a lucrative new product line for our
computer manufacturing and trading business.
Current projects of myself and graduate students
- 3D Model reconstruction and pose acquisition
from
images
Finding the pose and structure of an unknown object
from
an image sequence has many applications in graphics, virtual reality
and
multimedia processing. In this paper we address this problem by using a
two-stage iterative method. Starting from an initial guess of the
structure,
the first stage estimates the pose of the object. The second stage uses
the estimated pose information to refine the structure. This process is
repeated until the difference between the observed data and data
re-projected
from the estimated model is minimized. This method is a variation of
the
classical bundle adjustment method, but is faster in execution and is
simpler
to implement. We used the KLT (Kanade-Lucas-Tomasi) feature tracker for
obtaining the image features. Synthetic and real data have been tested
with good results. On line demonstrations can be found at
http://www.cse.cuhk.edu.hk/~khwong/demo
- Recursive 3D Model Reconstruction Based on
Kalman
Filtering
In this project we investigate how to use a recursive
two-step method to recover structure and motion from image sequences
based
on Kalman filtering. The algorithm consists of two major steps. The
first
step is an extended Kalman filter for the estimation of the object’s
pose.
The second step is a set of extended Kalman filters, one for each model
point, for the refinement of the positions of the model features in the
3D space. These two steps alternate from frames to frames. The initial
model converges to the final structure as the image sequence is scanned
sequentially. The performance of the algorithm is demonstrated with
both
synthetic data and real world objects. Analytical and empirical
comparisons
are made among our approach, the interleaved bundle adjustment method
and
the Kalman filtering based recursive algorithm by Azarbayejani and
Pentland.
Our approach outperformed the other two algorithms in terms of
computation
speed without loss in the quality of model reconstruction. On line
demonstrations
can be found at http://www.cse.cuhk.edu.hk/~khwong/demo
- Merging Artificial Objects with Marker-less
Video
Sequences
Based on the Interacting Multiple Model Method
Inserting synthetically generated objects into real
world
environments has gained much interests in recent years. Fast and robust
vision-based algorithms are necessary to make such an application
possible.
Tradition pose tracking schemes using recursive structure from motion
techniques
adopt one Kalman filter and thus only favours a certain type of camera
motion. We propose a robust simultaneous pose tracking and structure
recovery
algorithm using the Interacting Multiple Model (IMM) to tackle the
problem.
A set of three extended Kalman filters (EKFs), each describes a
frequently
occurring camera motion in real situations (general, pure translation,
pure rotation), is applied within the IMM framework to track the pose
of
a scene. Another set of EKFs, one filter for each model point, is used
to refine the positions of the model features in the 3D space. The
filters
for pose tracking and structure refinement are executed in an
interleaved
manner. The results are used for inserting virtual objects into the
original
video footage. The performance of the algorithm is demonstrated with
both
synthetic and real data. Comparisons with different approaches have
been
performed and show that our method is more efficient and accurate. On
line
demonstrations can be found at http://www.cse.cuhk.edu.hk/~khwong/demo
Others
I am also interested in a number projects related to
the
Internet, computer vision and system programming. Here are some
examples:
- Internet information tracking: A technique
called
"condensation"
is found to be successful in tracking objects for computer vsion
systems.
This project is to apply this technique in Internet information
tracking.
The result can be used for making smarter Internet image search engins.
(for example http://images.google.com/)
- Internet image database and search engines
- Face retrieval in the web -- I have developed
a
color and
statistical based face retrieval system, it can be used for building up
the database of a face search engine.
- Image database using color search keys.
- Biometric research, such as face recognition
using
color
analysis, finger print recognition etc.
- Networking, visual programming for PDAs
(Personal
Digital
Assistants) -- Morden microprocessors used in making PDAs
(Personal
Digital Assistants) and home appliances will be equipped with
networking
modules, how to use them is a big business. There are plenty of
software
and system research projects around this issue.
- Mobile phone positioning -- try to locate the
positions of
cellular phones. There are unlimited applications of such a system in
E-commerce.
For example if the telephone company knows where the callers are,
callers
can receive more relevant information related to their physical
locations.
- Education and robotics -- The toy company LEGO
has
produced
a series of educational robots that have proved to be very successful
financially.
It also proves that education and play can be closed linked. Some
interesting
projects are going on along this line of thinking.
These titles are just some suggestions, I urge you to contact me
(Rm907,
X8397, khwong
) in person to
discuss about the possible projects.
Computer
vision and virtual reality research
Title(Mphil or PhD)
- Automatic building of virtual reality walk through
environments
The aim is to develop a system to create a 3D virtual reality
walk-through
environment without explicitly constructing the 3D map. The input of
the
system is a set of video pictures of the environment, the system will
calculate
and construct the geometric structures of the scene. This is a very
interesting
and useful project to be used in World Wide Web applications and games.
Howvere, it difficult because since the input pictures are all in 2D
and
the system requires 3D information to function, therefore we have to
develop
and use complex mathmatical models to map 2D pictures onto 3D
structures.
- Virtual Reality input devices
In many virtual reality systems, users are required to wear special
interfaces, such as data gloves, magnetic sensors etc. The objective of
this project is to develop a computer vision based human machine
interface,
so that users are free from those intrusive devices which may
hinder
their movements. Computer vision based hand gesture recognition and
head
movement detection are examples of such approaches. They can be used in
games and other virtual reality applications. We have already
developed
some mathematical techniques for object tracking, students are
required
to implement these approaches in PCs.
- Low bit rate motion
picture compression
Researchers designing future MPEG standards are looking for novel and
high compression ratio techniques for image transmission through low
bit
rate connections (e.g. 64Kbps). The model based scheme for human
head image compression is one of the approaches. This system tracks the
pose of a human head and produces a few parameters describing the
essential
features, for example rotation and translation of the head. These
parameters
will be sent to the receiver for the reproduction of the original
image.
This project integrates many different techniques, such as pose
estimation,
Internet programming etc., to make low bit rate video transmission
possible.
It is useful in virtual reality, game and Internet based systems.
Hardware and
robotics
Title:(Mphil or PhD)
- Evolutionary robot farm: I am now developing a
number
of
mobile robots and will use evolutionary techniques to see how the
behavior
of these robots will evolve over time under different environmental
conditions.
- Ultrasonic imaging and robot navigation: In the
animal kingdom,
for example the bats, ultrasonic sound is used extensively for
navigation
and also for sensing of locating prey in 3D. In the past, ultrasonic
radar
was developed for robots for just avoiding obstacles ahead, and we
believe
ultra sound methods should be mor euseful than just a point range
detector.
Thus, we have devised a mathematical technique for reconstructing the
surface
of an object by ultrasonic scanning method and hope that would be used
in robots to improve its sensory capability. Moreover, this technique
can
be extended for other scanning applications where 3D imaging is needed.
For example, it can be used to reconstruct the surface of objects or a
human face to be used for automatic 3D modeling. This project can be
merged
with the "Automatic building of virtual reality walk through
environments"
mentioned above to form an integrated system for automatic 3D walk
through
model building.
- Legged robots: We are building a series of six-,
four- and
two- leg robots. They do involve interesting control techniques similar
to those used in animals and insects and the aim is to have walking
robots
that can climb stairs, which cannot be achieved by wheel robots.
Music
and
audio signal processing
Since I myself is a music lover and play a few musical instruments,
it is natural that I do try to involve my research with music related
projects
so as to play and work at the same time, and why not? I hope other
music
lovers would also join with me to work as well as enjoy in the
wonderful
world of music.
Titles: (Mphil or PhD)
- Internet Music and MP3 extension: The
MP3 standard has now become the hottest music distribution medium
in
the Internet and looks it is going to revolutionize the whole music
industrial
in the years to come. With this new medium, one can think of many
interesting
applications. Here are the examples.
- 3D surround sound: we can combine a number of MP3
files it
to form
a 3D surround sounds, however, signal synchronization becomes a problem
which requires extensive research.
- Virtual bands: It is ideal if music lovers can play
music
together
through the Internet, however, the time lag in the network prohibits us
to do so. One simple idea is to have a standard file system so that one
can add or remove Channels(tracks), and record and play at the same
time.
It is like a Karaoke system that the computer plays a part and
the
user plays another part, the computer will record both tracks for
future
replay. We can use MP3 as the
backbone
and develop extension to it to fulfill the above requirement. The
research
issues are data compression, synchronization etc. With this file
standard,
a recording (an extended MP3 file) of one musician may start a snowball
and invites other artistic to join and it may evolve into many
different
forms of performances. The result may become another standard for
music distribution in the future.
- Music signal analysis: Music signals are being
analyzed by
Fourier transform, wavelet transform and other Time-frequency analysis
(Wigner transforms). A new time-domain analysis is being developed here
at our lab. and future investigation and analysis are needed. This
project
would concentrate on the techniques in showing the time varying
spectral
information within a signal for various western and Chinese musical
instruments.
It is suitable for those would love music and would like to
know more about the signal processing side of it. The result can be
used
in developing more efficient methods for musical signal compression in
the future.
- Music synthesis by wavelet transform: There are
three
major
techniques in music
synthesis for electronic musical instruments: (1) Frequency
modulation
(FM), (2) wavetable and (3) waveguide. The FM technique is only a
simple
engineering solution and cannot generate authentic sounds of
traditional
instruments, i.e., the violin, flute etc. The wavetable technique is
good
but tone variation is difficult to achieve. The waveguide technique can
generate sound of plucked or woodwind instrument and not suitable for
fiddle
instrument. Recently the wavelet technique is found to be useful in
data
compression but it has not been used for music synthesis which I think
is a very suitable candidate. The methodology is simple, just use
wavelet
transform to analyze the recording of a sound and use the parameters
obtained
for regeneration. By adjusting the parameters, I hope we can achieve a
wide range of sound expressions and effects. See
also this web page for a tutorial on music synthesis
- Environmental sound compression: The current
sound
compression
algorithms only compress sound signals without considering the
positioning
of sound sources and the environment that the recording was made. If
these
information is included, we will not only obtain good surround sound
replay
but also be able to manipulate sound positions according to our tatse.
It is similar to 3D graphics generation but the domain is in the audio
signal domain. I expect one can generate exciting results since this
idea
has not been explored in detail by others.