Prof. kh Wong's Mphil and Phd project list

Prof. K.H. Wong's Mphil/PhD research projects

K.H. Wong,

Publications

2013 July

2012-13 Final Year projects: fyp_khw_1213.ppt FYP_titles year 2012-13(CSE)

Projects for research students (PhD, Mphil, Msc), write to me (Prof. K.H. Wong) if you are interested.

Recent demos: (summer 2012)

http://sdrv.ms/MPLVOO

http://www.youtube.com/watch?v=jFqiw5y8Qp0&feature=youtu.be

Face tracking:
http://www.youtube.com/watch?v=TspKqJfEVTA&feature=youtu.be (teapot replacing face)
http://www.youtube.com/watch?v=4XhCdylIowE&feature=youtu.be
http://www.youtube.com/watch?v=zPHIA2_d0rE&feature=youtu.be

19 June 2012(Curtain AR) http://www.youtube.com/watch?v=Yu5jrJ2W8e0
or (Atonomy Book AR) http://www.youtube.com/watch?v=aHRPYqdDMXU
The wrong result is: (Head AR) http://www.youtube.com/watch?v=4KcoJ68f83U&feature=youtu.be
(Head Tracking) http://www.youtube.com/watch?v=0V763UvrFrw&feature=youtu.be

Lee Approach on a magazine (TabBox) http://www.youtube.com/watch?v=phvvoi0GTFE&feature=youtu.be
Yu Approach on a magazine (TabBox) http://www.youtube.com/watch?v=gy-DVyok_lo&feature=youtu.be

The AR result: (TabBox 14 June 2012) http://www.youtube.com/watch?v=aHRPYqdDMXU&feature=youtu.be
The tracking result: http://www.youtube.com/watch?v=G7w4zGlWCzc&feature=youtu.be

Computer vision for Google Glasses: this new product will be come popular and inspires a new generation of computer vision applications, here are some ideas:

Text translation of what the user is looking at: The idea is to develop automatic translator of the user is seeing. If the user is wearing Google Glasses, what he is seeing is being captured by a camera. Using Optical character recognition (OCR) technology , the machine can translate the foreign texts he/she is looking into the language he/she understands, then displays using the Google glasses or speak to the user through the earphones.
Tourist Navigation: Using computer vision and GPS technology, the system can display information of the area, advertising materials and information to assist tourists.
Requirement: Interest in computer vision and Programming.

Intelligence desk : Many people are still prefer to read printed papers and write on papers. If a camera is hanging over a desk and monitoring the activities on the desk, the system can store, translates the text into a database for later use. This project involves the recognition of books and texts printed on them, and lcoates where you finger is point to for user input. We have already finished part of the project and new students can continue what we have already built.

Requirement: Interest in computer vision and Programming.

3D reconstruction using the KINECT sensor and computer vision techniques. Kinect is a new range sensing device for finding the 3D range image of an environment in real time at a very low cost. At CUHK, this is now being used in a number of interesting applications, such as human counting and human tracking for security and surveillance applications. It is also be used in medical applications. For example, it is being used to monitor the development of new muscles of a patient after he/she has undergo a surgery that the bad muscles has been removed. The objective measurement can quantify the development and give advice for doctors for further treatment. This is a collaborated project with the Prince of Wales hospital Hong Kong

Requirement: Interest in computer vision and Programming.

A music playing robot is being developed. It is a pioneer work on making a robot that can play the Chinese Flute. The system integrates various mechanical control systems to produce an air jet to blow at the embouchure hole of a flute to emulate the way a human flutist play the instrument. Sound feedback system is implemented to make sure the sound is rich and appealing to the ears. It is an attempt to research into a new paradigm of computer music production – that is the computer generated sound is produced by an authentic musical instrument but not by a simulated wave produced by a loud speaker. (see our demo: http://www.youtube.com/watch?v=NJ7wv2z8Wgk&list=UUfy2EumiHMeoUorMFR0woZA&index=1&feature=plcp). We are also interested to investigate techniques of building Chinese flutes, such as making them more in tune, expressive and easy to play.

Requirement: interested in music, programming and signal processing.

Other projects:

Face Expression tracking by computer vision. Summary: Real time tracking of facial expressions has many applications in game and movie making. Accurate real time face expression tracking is currently achievable but some systems require a tedious task of wearing dots on the user’s face. We propose to mount multiple cameras on the head, so each camera is observing at close-up distance (about 10 cm) on a particular area of the face for accurate tracking. The cameras have these features: (i) The cameras are facing the chins, eyes and are fixed on a head mount. (ii) The relative positions of the cameras and the face is not changed by head notion, since they all fixed on the head mount. (iii) The head (with the head mount) is being tracked by an external camera about 1 meter away from the head. By combining all the tracked results, the pose of the head as well as the detailed expression of the face can be found, and it can be used to drive the animation of a virtual human figure. The tracking tasks are distributed, so accuracy and efficiency can be enhanced. For more information about face tracking see: http://www.captivemotion.com/ and http://www.image-metrics.com/projects .
Hardware accelerated computer vision systems for virtual reality applications. In many virtual reality applications such as project camera systems, the demand of computation is high and is usually not suitable for portable devices. Prof. KH Wong’s team is now implementing many useful computer vision algorithms such as Hough transform, features tracking algorithms on low cost embedded systems and FPGA systems. Therefore sophisticated computer vision algorithms can be implemented on consumer grade portable devices.

Research Grants:

Title: A Visual Surveillance System for Sea Search and rescue Missions 電腦視覺應用於飛行拯救預警系統 (From 2009 dec30, for two years)

Abstract: The aim of this research is to develop a vision based system to assist the aircrew in search and rescue operations at sea. During the stormy season, the Government flying service of Hong Kong will send out fixed wing planes to search for survivors after a maritime accident is reported. The search can last up to 5 hours and is now conducted by the naked eyes to search for life rafts from a height of 500 ft. It is a fatigue job and targets may be missed because of human errors. Therefore the Government flying service has initiated a project on using computer vision techniques to assist the aircrew to detect life rafts at sea. We have completed some preliminary tests and have the following conclusion and suggestions. (1) One camera is not enough because the field of view is too narrow for such an application. We suggest a divide-and-conquer method of using 3 cameras (each attached to a computer) aligned vertically to increase the field of view and enhance data communication/processing speed. (2) Using one frame for the detection has a major problem of misinterpreting waves as targets. Therefore we suggest using tracking techniques such as the Kalman or Particle filter to solve the problem. So only objects last long enough in an image sequence can be accepted as potential targets. (3) We can further enhance the accuracy by using projective geometry and the calibrated parameters of the cameras on board to form a 3-D Kalman object tracker. This is to make sure only objects move reasonably on the surface of the sea are considered as possible targets.
This work was supported by a direct Grant of Project Code 2050455 from the Faculty of Engineering, The Chinese University of Hong Kong.

Title: Music nuance extraction from audio signals 從音頻信號中提取音樂色調的細微差別 (From 1 Feb 2011 for two years)

Abstarct: What makes a good musical performance? An expressive music performance owes its emotive power to the performer’s skills in shaping the music with nuances. For the purpose of performance analysis, nuance can be defined as any subtle manipulation of sound parameters including attack, timing, pitch, loudness and timbre. Our research goal is to understand the origin of expressiveness in a performance by uncovering nuances from the recorded signal of the performance.One difficulty of quantifying nuances is that extracting the relevant sound parameters from audio signals is often nontrivial. For instance, in a piano performance, very often multiple keys are being struck throughout; key combinations often makes finding out exactly the onset time and strength of each individual key from the combined signal very difficult. Previous works on sound parameter extraction from audio signals using general source separation techniques have not yielded robust and accurate separations, largely because of the overlapping partials described above. Here, we propose a general analytic paradigm for extracting sound parameters using a source separation method that incorporates the following in its formulation: (1) Prior knowledge of the instrument so that the signal generation model can be properly constrained, and (2) The sequence of music notes played in the recorded piece. Our proposed technique will allow us to find out exactly the relative strength, onset time and other parameters of individual piano keys in a performance, thereby enabling us to further understand the musical basis of performance expressively.
This work was supported by a direct Grant of Project Code 2050486 from the Faculty of Engineering, The Chinese University of Hong Kong.

Previous titles (for admin. purpose FYP pojects update CSE personal_data_update,)

2011-12 Final Year projects: fyp_khw 1112.ppt FYP_titles1112 FYP pojects update CSE personal_data_update,
2010-11 Final Year projects: fyp_khw 1011.ppt FYP_titles1011 FYP pojects update CSE personal_data_update,
2009-10 Final Year projects: fyp_khw 0910.ppt Final Year projects 2009-10.ppt FYP_titles2009-10 FYP pojects update CSE personal_data_update,
2008-9: Final Year projects 2008-9.ppt FYP_titles 2008-09
2007-8: FYP_titles Final Year projects 2007-8.doc, khw0701 robot vision

Music Research Demos

Vision Research Demos

3D DEMOS demo_logo1

Mphil projects

Title: Virtual reality 3D display on a flat screen

The aim of this project is to display 3D objects on a flat displaying screen. The method is based on the fact that if the display is changing according to the relative position of the head to the screen, we can create a 3D perception on screen. There are many methods to track the head motion, such as using a stationary camera to view and track the head, or mount a camera on the head to track the head motion. The first step of the project is to investigate how to use a webcam to track the head position. It is recommended the open source computer vision OPENCV library (http://opencvlibrary.sourceforge.net/ ) can be used for the processing. Then the next step is to create 3D objects to be displayed on screen according to the head motion. There are many interesting applications of such systems, such as making interactive games and virtual reality display systems.

Reference: Johnny Chung Lee - Projects - Wii http://www.youtube.com/watch?v=Jd3-eiid-Uw

Funded Projects

RGC funded project (from Aug 2004, project no. CUHK4204/04E, HK$339,414) :
Title: Model reconstruction for computer game develop
The work desrcibed in this page was supported by a grant from the Research Grant Council of Hong Kong Special Administrative Region. (Project Number. CUHK4202/04E)

Abstract: The aim of this research is to help game developers to capture real scenes to be used in their virtual site construction. Many game players prefer interactive games situated in real-world scenes to give them the realistic feeling. However, capturing real scene and modeling them are time consuming and expensive. Making some procedures automatic will save a lot of time and cost. Apart from game developments there are other applications, for example, users can develop virtual walkthrough web pages for Internet users. Also home game players can capture their favorite environments and objects to be displayed interactively in three dimensions. Since a full scale environment requires multiple image sequences to cover the whole area. Our research will concentrate on combining multiple sequences for model reconstruction. The research issues are as follows. (1) We propose a tracking technique called Interacting Multiple Model Methods (IMM) for tracking the camera motion for a long image sequence to reduce error. (2) If the distance between the camera and the object in one sequence is very different from the others there is a multi-scale problem in combining them. We will try to solve this by a coarse-to-fine global optimization technique. (3) Additional user inputs may be used to assist the software for finding the correct model. After solving the above problems, we hope we can develop an efficient system that can capture the 3D model of a large environment with localized details to be used in game developments.

Direct Grant from The Engineering Faculty, CUHK (from 1 Nov 2005, project ID. 2050350 HK$65,360) :
Title: Camera array for model reconstruction and pose estimation

Abstract: The aim of this research is to investigate how to utilize an array of low cost cameras for model reconstruction and pose estimation. The input of the system is a set of cameras and the output is the 3D model of the objects being viewed and the pose of the cameras. The results are useful for virtual walkthrough system, robot navigation and 3D computer game development. The problem has been investigated by many researchers before; however applying an array of cameras (8 x 8 or more, arranged on a 1 meter square plane) is relatively new. Fundamentally, we believe utilizing the camera array we can provide a solution to the occlusion problem in computer vision. For example, if some of the cameras are blocked, the others can fill in the information. Mathematical methods for a single camera of camera pairs can be extended for a camera array and we propose the following formulations. (1) Extend the traditional formulations, such as the fundamental matrix or trifocal tensor, to an array of cameras, (2) Use of a tracking method (Kalman filtering) for the multiple image sequences obtained by the camera array. Since China is major manufacturing base of low cost digital cameras, our work in using camera arrays can become a lucrative new product line for our computer manufacturing and trading business.

Current projects of myself and graduate students

3D Model reconstruction and pose acquisition from images

Finding the pose and structure of an unknown object from an image sequence has many applications in graphics, virtual reality and multimedia processing. In this paper we address this problem by using a two-stage iterative method. Starting from an initial guess of the structure, the first stage estimates the pose of the object. The second stage uses the estimated pose information to refine the structure. This process is repeated until the difference between the observed data and data re-projected from the estimated model is minimized. This method is a variation of the classical bundle adjustment method, but is faster in execution and is simpler to implement. We used the KLT (Kanade-Lucas-Tomasi) feature tracker for obtaining the image features. Synthetic and real data have been tested with good results. On line demonstrations can be found at http://www.cse.cuhk.edu.hk/~khwong/demo

Recursive 3D Model Reconstruction Based on Kalman Filtering

In this project we investigate how to use a recursive two-step method to recover structure and motion from image sequences based on Kalman filtering. The algorithm consists of two major steps. The first step is an extended Kalman filter for the estimation of the object’s pose. The second step is a set of extended Kalman filters, one for each model point, for the refinement of the positions of the model features in the 3D space. These two steps alternate from frames to frames. The initial model converges to the final structure as the image sequence is scanned sequentially. The performance of the algorithm is demonstrated with both synthetic data and real world objects. Analytical and empirical comparisons are made among our approach, the interleaved bundle adjustment method and the Kalman filtering based recursive algorithm by Azarbayejani and Pentland. Our approach outperformed the other two algorithms in terms of computation speed without loss in the quality of model reconstruction. On line demonstrations can be found at http://www.cse.cuhk.edu.hk/~khwong/demo

Merging Artificial Objects with Marker-less Video Sequences Based on the Interacting Multiple Model Method

Inserting synthetically generated objects into real world environments has gained much interests in recent years. Fast and robust vision-based algorithms are necessary to make such an application possible. Tradition pose tracking schemes using recursive structure from motion techniques adopt one Kalman filter and thus only favours a certain type of camera motion. We propose a robust simultaneous pose tracking and structure recovery algorithm using the Interacting Multiple Model (IMM) to tackle the problem. A set of three extended Kalman filters (EKFs), each describes a frequently occurring camera motion in real situations (general, pure translation, pure rotation), is applied within the IMM framework to track the pose of a scene. Another set of EKFs, one filter for each model point, is used to refine the positions of the model features in the 3D space. The filters for pose tracking and structure refinement are executed in an interleaved manner. The results are used for inserting virtual objects into the original video footage. The performance of the algorithm is demonstrated with both synthetic and real data. Comparisons with different approaches have been performed and show that our method is more efficient and accurate. On line demonstrations can be found at http://www.cse.cuhk.edu.hk/~khwong/demo

Others

I am also interested in a number projects related to the Internet, computer vision and system programming. Here are some examples:

Internet information tracking: A technique called "condensation" is found to be successful in tracking objects for computer vsion systems. This project is to apply this technique in Internet information tracking. The result can be used for making smarter Internet image search engins. (for example http://images.google.com/)
Internet image database and search engines

Face retrieval in the web -- I have developed a color and statistical based face retrieval system, it can be used for building up the database of a face search engine.
Image database using color search keys.

Biometric research, such as face recognition using color analysis, finger print recognition etc.
Networking, visual programming for PDAs (Personal Digital Assistants) -- Morden microprocessors used in making PDAs (Personal Digital Assistants) and home appliances will be equipped with networking modules, how to use them is a big business. There are plenty of software and system research projects around this issue.
Mobile phone positioning -- try to locate the positions of cellular phones. There are unlimited applications of such a system in E-commerce. For example if the telephone company knows where the callers are, callers can receive more relevant information related to their physical locations.
Education and robotics -- The toy company LEGO has produced a series of educational robots that have proved to be very successful financially. It also proves that education and play can be closed linked. Some interesting projects are going on along this line of thinking.

These titles are just some suggestions, I urge you to contact me (Rm907, X8397, khwong cuhk ) in person to discuss about the possible projects.

Computer vision and virtual reality research
Hardware and robotics Music and audio signal processing

Computer vision and virtual reality research

Title(Mphil or PhD)

Automatic building of virtual reality walk through environments

Virtual Reality input devices

Low bit rate motion picture compression

Hardware and robotics
Title:(Mphil or PhD)

Evolutionary robot farm: I am now developing a number of mobile robots and will use evolutionary techniques to see how the behavior of these robots will evolve over time under different environmental conditions.
Ultrasonic imaging and robot navigation: In the animal kingdom, for example the bats, ultrasonic sound is used extensively for navigation and also for sensing of locating prey in 3D. In the past, ultrasonic radar was developed for robots for just avoiding obstacles ahead, and we believe ultra sound methods should be mor euseful than just a point range detector. Thus, we have devised a mathematical technique for reconstructing the surface of an object by ultrasonic scanning method and hope that would be used in robots to improve its sensory capability. Moreover, this technique can be extended for other scanning applications where 3D imaging is needed. For example, it can be used to reconstruct the surface of objects or a human face to be used for automatic 3D modeling. This project can be merged with the "Automatic building of virtual reality walk through environments" mentioned above to form an integrated system for automatic 3D walk through model building.

references:

Legged robots: We are building a series of six-, four- and two- leg robots. They do involve interesting control techniques similar to those used in animals and insects and the aim is to have walking robots that can climb stairs, which cannot be achieved by wheel robots.

Music and audio signal processing
Since I myself is a music lover and play a few musical instruments, it is natural that I do try to involve my research with music related projects so as to play and work at the same time, and why not? I hope other music lovers would also join with me to work as well as enjoy in the wonderful world of music.

Titles: (Mphil or PhD)

Internet Music and MP3 extension: The MP3 standard has now become the hottest music distribution medium in the Internet and looks it is going to revolutionize the whole music industrial in the years to come. With this new medium, one can think of many interesting applications. Here are the examples.

3D surround sound: we can combine a number of MP3 files it to form a 3D surround sounds, however, signal synchronization becomes a problem which requires extensive research.
Virtual bands: It is ideal if music lovers can play music together through the Internet, however, the time lag in the network prohibits us to do so. One simple idea is to have a standard file system so that one can add or remove Channels(tracks), and record and play at the same time. It is like a Karaoke system that the computer plays a part and the user plays another part, the computer will record both tracks for future replay. We can use MP3 as the backbone and develop extension to it to fulfill the above requirement. The research issues are data compression, synchronization etc. With this file standard, a recording (an extended MP3 file) of one musician may start a snowball and invites other artistic to join and it may evolve into many different forms of performances. The result may become another standard for music distribution in the future.

Music signal analysis: Music signals are being analyzed by Fourier transform, wavelet transform and other Time-frequency analysis (Wigner transforms). A new time-domain analysis is being developed here at our lab. and future investigation and analysis are needed. This project would concentrate on the techniques in showing the time varying spectral information within a signal for various western and Chinese musical instruments. It is suitable for those would love music and would like to know more about the signal processing side of it. The result can be used in developing more efficient methods for musical signal compression in the future.
Music synthesis by wavelet transform: There are three major techniques in music synthesis for electronic musical instruments: (1) Frequency modulation (FM), (2) wavetable and (3) waveguide. The FM technique is only a simple engineering solution and cannot generate authentic sounds of traditional instruments, i.e., the violin, flute etc. The wavetable technique is good but tone variation is difficult to achieve. The waveguide technique can generate sound of plucked or woodwind instrument and not suitable for fiddle instrument. Recently the wavelet technique is found to be useful in data compression but it has not been used for music synthesis which I think is a very suitable candidate. The methodology is simple, just use wavelet transform to analyze the recording of a sound and use the parameters obtained for regeneration. By adjusting the parameters, I hope we can achieve a wide range of sound expressions and effects. See also this web page for a tutorial on music synthesis

Environmental sound compression: The current sound compression algorithms only compress sound signals without considering the positioning of sound sources and the environment that the recording was made. If these information is included, we will not only obtain good surround sound replay but also be able to manipulate sound positions according to our tatse. It is similar to 3D graphics generation but the domain is in the audio signal domain. I expect one can generate exciting results since this idea has not been explored in detail by others.