Prof. K.H.
Wong's Mphil/PhD research projects
K.H. Wong,
Publications
2013 July
- Projects for research students (PhD, Mphil, Msc), write to
me (Prof.
K.H. Wong)
if you are interested.
- Recent demos: (summer 2012)
- Computer vision for Google
Glasses: this new product will be come popular and
inspires a new generation of computer vision applications,
here are some ideas:
- Text translation of what the user is looking at: The
idea is to develop automatic translator of the user is
seeing. If the user is wearing Google
Glasses, what he is seeing is being captured by a
camera. Using Optical character recognition (OCR)
technology , the machine can translate the foreign texts
he/she is looking into the language he/she understands,
then displays using the Google glasses or speak to the
user through the earphones.
- Tourist Navigation: Using computer vision and GPS
technology, the system can display information of the
area, advertising materials and information to assist
tourists.
- Requirement:
Interest in computer vision and Programming.
- Intelligence desk : Many people are still prefer to read
printed papers and write on papers. If a camera is hanging
over a desk and monitoring the activities on the desk, the
system can store, translates the text into a database for
later use. This project involves the recognition of books
and texts printed on them, and lcoates where you finger is
point to for user input. We have already finished part of
the project and new students can continue what we have
already built.
- Requirement:
Interest in computer vision and Programming.
- 3D
reconstruction using the KINECT sensor and computer vision
techniques. Kinect is a new range sensing device for
finding the 3D range image of an environment in real time
at a very low cost. At CUHK, this is now being used
in a number of interesting applications, such as human
counting and human tracking for security and surveillance
applications. It is also be used in medical applications.
For example, it is being used to monitor the development
of new muscles of a patient after he/she has undergo a
surgery that the bad muscles has been removed. The
objective measurement can quantify the development and
give advice for doctors for further treatment. This is a
collaborated project with the Prince of Wales hospital Hong Kong
- Requirement:
Interest in computer vision and Programming.
- A music playing robot is being
developed. It is a pioneer work on making a robot that can
play the Chinese Flute. The system integrates various
mechanical control systems to produce an air jet to blow
at the embouchure hole of a flute to emulate the way a
human flutist play the instrument. Sound feedback system
is implemented to make sure the sound is rich and
appealing to the ears. It is an attempt to research into a
new paradigm of computer music production – that is the
computer generated sound is produced by an authentic
musical instrument but not by a simulated wave produced by
a loud speaker. (see our demo: http://www.youtube.com/watch?v=NJ7wv2z8Wgk&list=UUfy2EumiHMeoUorMFR0woZA&index=1&feature=plcp).
We are also interested to investigate techniques of
building Chinese flutes, such as making them more in tune,
expressive and easy to play.
- Requirement: interested in
music, programming and signal processing.
- Other projects:
- Face Expression tracking by computer vision. Summary:
Real time tracking of facial expressions has many
applications in game and movie making. Accurate real time
face expression tracking is currently achievable but some
systems require a tedious task of wearing dots on the
user’s face. We propose to mount multiple cameras on the
head, so each camera is observing at close-up distance
(about 10 cm) on a particular area of the face for
accurate tracking. The cameras have these features: (i)
The cameras are facing the chins, eyes and are fixed on a
head mount. (ii) The relative positions of the cameras and
the face is not changed by head notion, since they all
fixed on the head mount. (iii) The head (with the head
mount) is being tracked by an external camera about 1
meter away from the head. By combining all the tracked
results, the pose of the head as well as the detailed
expression of the face can be found, and it can be used to
drive the animation of a virtual human figure. The
tracking tasks are distributed, so accuracy and efficiency
can be enhanced. For more information about face tracking
see: http://www.captivemotion.com/
and http://www.image-metrics.com/projects
.
- Hardware
accelerated computer vision systems for virtual reality
applications. In many virtual reality applications such
as project camera systems, the demand of computation is
high and is usually not suitable for portable devices.
Prof. KH Wong’s team is now implementing many useful
computer vision algorithms such as Hough transform,
features tracking algorithms on low cost embedded
systems and FPGA systems. Therefore sophisticated
computer vision algorithms can be implemented on
consumer grade portable devices.
- Research Grants:
- Title: A Visual
Surveillance System for Sea Search and rescue Missions 電腦視覺應用於飛行拯救預警系統 (From 2009 dec30, for two years)
- Abstract:
The aim of this research is to develop a vision based
system to assist the aircrew in search and rescue
operations at sea. During the stormy season, the
Government flying service of Hong Kong will send out fixed
wing planes to search for survivors after a maritime
accident is reported. The search can last up to 5 hours
and is now conducted by the naked eyes to search for life
rafts from a height of 500 ft. It is a fatigue job and
targets may be missed because of human errors. Therefore
the Government flying service has initiated a project on
using computer vision techniques to assist the aircrew to
detect life rafts at sea. We have completed some
preliminary tests and have the following conclusion and
suggestions. (1) One camera is not enough because the
field of view is too narrow for such an application. We
suggest a divide-and-conquer method of using 3 cameras
(each attached to a computer) aligned vertically to
increase the field of view and enhance data
communication/processing speed. (2) Using one frame for
the detection has a major problem of misinterpreting waves
as targets. Therefore we suggest using tracking techniques
such as the Kalman or Particle filter to solve the
problem. So only objects last long enough in an image
sequence can be accepted as potential targets. (3) We can
further enhance the accuracy by using projective geometry
and the calibrated parameters of the cameras on board to
form a 3-D Kalman object tracker. This is to make sure
only objects move reasonably on the surface of the sea are
considered as possible targets.
- This work was
supported by a direct Grant of Project Code 2050455 from the Faculty of
Engineering, The Chinese University of Hong Kong.
- Title: Music nuance
extraction from audio signals 從音頻信號中提取音樂色調的細微差別 (From 1 Feb 2011 for
two years)
- Abstarct: What makes a good musical performance? An
expressive music performance owes its emotive power to
the performer’s skills in shaping the music with
nuances. For the purpose of performance analysis, nuance
can be defined as any subtle manipulation of sound
parameters including attack, timing, pitch, loudness and
timbre. Our research goal is to understand the origin of
expressiveness in a performance by uncovering nuances
from the recorded signal of the performance.One
difficulty of quantifying nuances is that extracting the
relevant sound parameters from audio signals is often
nontrivial. For instance, in a piano performance, very
often multiple keys are being struck throughout; key
combinations often makes finding out exactly the onset
time and strength of each individual key from the
combined signal very difficult. Previous works on sound
parameter extraction from audio signals using general
source separation techniques have not yielded robust and
accurate separations, largely because of the overlapping
partials described above. Here, we propose a general
analytic paradigm for extracting sound parameters using
a source separation method that incorporates the
following in its formulation: (1) Prior knowledge of the
instrument so that the signal generation model can be
properly constrained, and (2) The sequence of music
notes played in the recorded piece. Our proposed
technique will allow us to find out exactly the relative
strength, onset time and other parameters of individual
piano keys in a performance, thereby enabling us to
further understand the musical basis of performance
expressively.
- This work was
supported by a direct Grant of Project Code 2050486 from the Faculty of
Engineering, The Chinese University of Hong Kong.
Vision Research
Demos
3D DEMOS
Mphil projects
-
Title: Virtual reality 3D display on a flat
screen
The aim of this project is to display 3D
objects on a flat displaying screen. The method is based on
the fact that if the display is changing according to the
relative position of the head to the screen, we can create a
3D perception on screen. There are many methods to track the
head motion, such as using a stationary camera to view and
track the head, or mount a camera on the head to track the
head motion. The first step of the project is to investigate
how to use a webcam to track the head position. It is
recommended the open source computer vision OPENCV library
(http://opencvlibrary.sourceforge.net/ ) can be used for the
processing. Then the next step is to create 3D objects to be
displayed on screen according to the head motion. There are
many interesting applications of such systems, such as making
interactive games and virtual reality display systems.
Reference: Johnny Chung Lee - Projects -
Wii http://www.youtube.com/watch?v=Jd3-eiid-Uw
Funded Projects
RGC funded project (from Aug 2004,
project no. CUHK4204/04E, HK$339,414) :
Title: Model reconstruction for computer game
develop
The work desrcibed in this page was
supported by a grant from the Research Grant Council of Hong
Kong Special Administrative Region. (Project Number.
CUHK4202/04E)
- Abstract:
The aim of this research is to help game developers to
capture real scenes to be used in their virtual site
construction. Many game players prefer interactive games
situated in real-world scenes to give them the realistic
feeling. However, capturing real scene and modeling them are
time consuming and expensive. Making some procedures
automatic will save a lot of time and cost. Apart from game
developments there are other applications, for example,
users can develop virtual walkthrough web pages for Internet
users. Also home game players can capture their favorite
environments and objects to be displayed interactively in
three dimensions. Since a full scale environment requires multiple
image sequences to cover the whole area. Our research will
concentrate on combining multiple sequences for model
reconstruction. The research issues are as follows. (1) We
propose a tracking technique called Interacting Multiple
Model Methods (IMM) for tracking the camera motion for a
long image sequence to reduce error. (2) If the distance
between the camera and the object in one sequence is very
different from the others there is a multi-scale problem in
combining them. We will try to solve this by a
coarse-to-fine global optimization technique. (3) Additional
user inputs may be used to assist the software for finding
the correct model. After solving the above problems, we hope
we can develop an efficient system that can capture the 3D
model of a large environment with localized details to be
used in game developments.
Direct Grant from
The Engineering Faculty, CUHK (from 1 Nov 2005, project ID.
2050350 HK$65,360) :
Title: Camera array for model reconstruction and pose
estimation
- Abstract:
The aim of this research is to investigate how to
utilize an array of low cost cameras for model
reconstruction and pose estimation. The input of the system
is a set of cameras and the output is the 3D model of the
objects being viewed and the pose of the cameras. The
results are useful for virtual walkthrough system, robot
navigation and 3D computer game development. The problem has
been investigated by many researchers before; however
applying an array of cameras (8 x 8 or more, arranged on a 1
meter square plane) is relatively new. Fundamentally, we
believe utilizing the camera array we can provide a solution
to the occlusion problem in computer vision. For example, if
some of the cameras are blocked, the others can fill in the
information. Mathematical methods for a single camera of
camera pairs can be extended for a camera array and we
propose the following formulations. (1) Extend the
traditional formulations, such as the fundamental matrix or
trifocal tensor, to an array of cameras, (2) Use of a
tracking method (Kalman filtering) for the multiple image
sequences obtained by the camera array. Since China is major
manufacturing base of low cost digital cameras, our work in
using camera arrays can become a lucrative new product line
for our computer manufacturing and trading business.
Current projects of myself and graduate
students
- 3D Model reconstruction and pose acquisition
from images
Finding the pose and structure of an unknown
object from an image sequence has many applications in graphics,
virtual reality and multimedia processing. In this paper we
address this problem by using a two-stage iterative method.
Starting from an initial guess of the structure, the first stage
estimates the pose of the object. The second stage uses the
estimated pose information to refine the structure. This process
is repeated until the difference between the observed data and
data re-projected from the estimated model is minimized. This
method is a variation of the classical bundle adjustment method,
but is faster in execution and is simpler to implement. We used
the KLT (Kanade-Lucas-Tomasi) feature tracker for obtaining the
image features. Synthetic and real data have been tested with
good results. On line demonstrations can be found at
http://www.cse.cuhk.edu.hk/~khwong/demo
- Recursive 3D Model Reconstruction Based on
Kalman Filtering
In this project we investigate how to use a
recursive two-step method to recover structure and motion from
image sequences based on Kalman filtering. The algorithm
consists of two major steps. The first step is an extended
Kalman filter for the estimation of the object’s pose. The
second step is a set of extended Kalman filters, one for each
model point, for the refinement of the positions of the model
features in the 3D space. These two steps alternate from frames
to frames. The initial model converges to the final structure as
the image sequence is scanned sequentially. The performance of
the algorithm is demonstrated with both synthetic data and real
world objects. Analytical and empirical comparisons are made
among our approach, the interleaved bundle adjustment method and
the Kalman filtering based recursive algorithm by Azarbayejani
and Pentland. Our approach outperformed the other two algorithms
in terms of computation speed without loss in the quality of
model reconstruction. On line demonstrations can be found at
http://www.cse.cuhk.edu.hk/~khwong/demo
- Merging Artificial Objects with Marker-less
Video Sequences Based on the Interacting Multiple Model Method
Inserting synthetically generated objects into
real world environments has gained much interests in recent
years. Fast and robust vision-based algorithms are necessary to
make such an application possible. Tradition pose tracking
schemes using recursive structure from motion techniques adopt
one Kalman filter and thus only favours a certain type of camera
motion. We propose a robust simultaneous pose tracking and
structure recovery algorithm using the Interacting Multiple
Model (IMM) to tackle the problem. A set of three extended
Kalman filters (EKFs), each describes a frequently occurring
camera motion in real situations (general, pure translation,
pure rotation), is applied within the IMM framework to track the
pose of a scene. Another set of EKFs, one filter for each model
point, is used to refine the positions of the model features in
the 3D space. The filters for pose tracking and structure
refinement are executed in an interleaved manner. The results
are used for inserting virtual objects into the original video
footage. The performance of the algorithm is demonstrated with
both synthetic and real data. Comparisons with different
approaches have been performed and show that our method is more
efficient and accurate. On line demonstrations can be found at
http://www.cse.cuhk.edu.hk/~khwong/demo
Others
I am also interested in a number projects related
to the Internet, computer vision and system programming. Here
are some examples:
- Internet information tracking: A technique
called "condensation" is found to be successful in tracking
objects for computer vsion systems. This project is to apply
this technique in Internet information tracking. The result
can be used for making smarter Internet image search engins.
(for example http://images.google.com/)
- Internet image database and search engines
- Face retrieval in the web -- I have
developed a color and statistical based face retrieval
system, it can be used for building up the database of a
face search engine.
- Image database using color search keys.
- Biometric research, such as face recognition
using color analysis, finger print recognition etc.
- Networking, visual programming for PDAs
(Personal Digital Assistants) -- Morden microprocessors
used in making PDAs (Personal Digital Assistants) and home
appliances will be equipped with networking modules, how to
use them is a big business. There are plenty of software and
system research projects around this issue.
- Mobile phone positioning -- try to locate
the positions of cellular phones. There are unlimited
applications of such a system in E-commerce. For example if
the telephone company knows where the callers are, callers can
receive more relevant information related to their physical
locations.
- Education and robotics -- The toy company
LEGO has produced a series of educational robots that have
proved to be very successful financially. It also proves that
education and play can be closed linked. Some interesting
projects are going on along this line of thinking.
These titles are just some suggestions, I urge you to contact me
(Rm907, X8397, khwong) in
person to discuss about the possible projects.
Computer vision and virtual reality research
Title(Mphil or PhD)
- Automatic building of virtual reality walk through
environments
The aim is to develop a system to create a 3D virtual reality
walk-through environment without explicitly constructing the 3D
map. The input of the system is a set of video pictures of the
environment, the system will calculate and construct the geometric
structures of the scene. This is a very interesting and useful
project to be used in World Wide Web applications and games.
Howvere, it difficult because since the input pictures are all in
2D and the system requires 3D information to function, therefore
we have to develop and use complex mathmatical models to
map 2D pictures onto 3D structures.
- Virtual Reality input devices
In many virtual reality systems, users are required to wear
special interfaces, such as data gloves, magnetic sensors etc. The
objective of this project is to develop a computer vision based
human machine interface, so that users are free from those
intrusive devices which may hinder their movements. Computer
vision based hand gesture recognition and head movement detection
are examples of such approaches. They can be used in games and
other virtual reality applications. We have already
developed some mathematical techniques for object tracking,
students are required to implement these approaches in PCs.
- Low bit rate motion picture compression
Researchers designing future MPEG standards are looking for novel
and high compression ratio techniques for image transmission
through low bit rate connections (e.g. 64Kbps). The model
based scheme for human head image compression is one of the
approaches. This system tracks the pose of a human head and
produces a few parameters describing the essential features, for
example rotation and translation of the head. These parameters
will be sent to the receiver for the reproduction of the original
image. This project integrates many different techniques, such as
pose estimation, Internet programming etc., to make low bit rate
video transmission possible. It is useful in virtual reality, game
and Internet based systems.
Hardware
and robotics
Title:(Mphil or PhD)
- Evolutionary robot farm: I am now developing a
number of mobile robots and will use evolutionary techniques to
see how the behavior of these robots will evolve over time under
different environmental conditions.
- Ultrasonic imaging and robot navigation: In
the animal kingdom, for example the bats, ultrasonic sound is
used extensively for navigation and also for sensing of locating
prey in 3D. In the past, ultrasonic radar was developed for
robots for just avoiding obstacles ahead, and we believe ultra
sound methods should be mor euseful than just a point range
detector. Thus, we have devised a mathematical technique for
reconstructing the surface of an object by ultrasonic scanning
method and hope that would be used in robots to improve its
sensory capability. Moreover, this technique can be extended for
other scanning applications where 3D imaging is needed. For
example, it can be used to reconstruct the surface of objects or
a human face to be used for automatic 3D modeling. This project
can be merged with the "Automatic building of virtual reality
walk through environments" mentioned above to form an
integrated system for automatic 3D walk through model building.
- Legged robots: We are building a series of
six-, four- and two- leg robots. They do involve interesting
control techniques similar to those used in animals and insects
and the aim is to have walking robots that can climb stairs,
which cannot be achieved by wheel robots.
Music
and audio signal processing
Since I myself is a music lover and play a few musical
instruments, it is natural that I do try to involve my research
with music related projects so as to play and work at the same
time, and why not? I hope other music lovers would also join
with me to work as well as enjoy in the wonderful world of
music.
Titles: (Mphil or PhD)
- Internet Music and MP3 extension: The
MP3 standard has now become the hottest music distribution
medium in the Internet and looks it is going to revolutionize
the whole music industrial in the years to come. With this new
medium, one can think of many interesting applications. Here are
the examples.
- 3D surround sound: we can combine a number of MP3
files it to form a 3D surround sounds, however, signal
synchronization becomes a problem which requires extensive
research.
- Virtual bands: It is ideal if music lovers can play
music together through the Internet, however, the time lag in
the network prohibits us to do so. One simple idea is to have
a standard file system so that one can add or remove
Channels(tracks), and record and play at the same time. It is
like a Karaoke system that the computer plays a part and
the user plays another part, the computer will record both
tracks for future replay. We can use MP3 as
the backbone and develop
extension to it to fulfill the above requirement. The research
issues are data compression, synchronization etc. With this
file standard, a recording (an extended MP3 file) of one
musician may start a snowball and invites other artistic to
join and it may evolve into many different forms of
performances. The result may become another standard for music
distribution in the future.
- Music signal analysis: Music signals are being
analyzed by Fourier transform, wavelet transform and other
Time-frequency analysis (Wigner transforms). A new time-domain
analysis is being developed here at our lab. and future
investigation and analysis are needed. This project would
concentrate on the techniques in showing the time varying
spectral information within a signal for various western and
Chinese musical instruments. It is suitable for those
would love music and would like to know more about
the signal processing side of it. The result can be used in
developing more efficient methods for musical signal compression
in the future.
- Music synthesis by wavelet transform: There
are three major techniques in music
synthesis for electronic musical instruments: (1)
Frequency modulation (FM), (2) wavetable and (3) waveguide. The
FM technique is only a simple engineering solution and cannot
generate authentic sounds of traditional instruments, i.e., the
violin, flute etc. The wavetable technique is good but tone
variation is difficult to achieve. The waveguide technique can
generate sound of plucked or woodwind instrument and not
suitable for fiddle instrument. Recently the wavelet technique
is found to be useful in data compression but it has not been
used for music synthesis which I think is a very suitable
candidate. The methodology is simple, just use wavelet transform
to analyze the recording of a sound and use the parameters
obtained for regeneration. By adjusting the parameters, I hope
we can achieve a wide range of sound expressions and effects. See
also this web page for a tutorial on music synthesis
- Environmental sound compression: The current
sound compression algorithms only compress sound signals without
considering the positioning of sound sources and the environment
that the recording was made. If these information is included,
we will not only obtain good surround sound replay but also be
able to manipulate sound positions according to our tatse. It is
similar to 3D graphics generation but the domain is in the audio
signal domain. I expect one can generate exciting results since
this idea has not been explored in detail by others.