M.Sc. project to be supervised by Prof. Kin
Hong Wong,
year 2025-6
Title 1: 3D rendering development for virtual reality and game
developers
We would like to develop a mobile app based on the method of
Gaussian Splatting
for real-time 3D reconstruction. Users can capture images or
videos, converting
them into high-quality 3D models with advanced photo-realistic
rendering in
real time. Such tools should have significant commercialization
potential in
gaming and Virtual reality. You may see the live demo at
https://poly.cam/tools/gaussian-splatting or get the base
code from
https://github.com/graphdeco-inria/gaussian-splatting .
Title 2: AI guide dog for the visually impaired
In this project we investigate the development of an AI-powered
guide robot/dog
to assist visually impaired users in navigating indoor and outdoor
environments. Using computer vision and embedded AI, the robot
will analyze the
surrounding images make decision for safe navigation. We may
consider using
mobile edge devices, such as NVIDIA Jetson or Raspberry Pi, which
enable low
power consumption and enhance mobility.
Title 3: Large Language Model (LLM) tool development for elderly
users
This project aims to develop an LLM tailored for elderly users
using a large
language base model such as https://huggingface.co/blog/stackllama
. The tool
will assist the users with daily tasks, memory recall, and
conversational
engagement. Face recognition, voice recognition may be added. The
goal is to
support individuals experiencing memory decline and related
challenges, helping
them to handle daily tasks independently.
Guidelines for students in CMSC5720/1
Normally, meet the supervisor weekly (or biweekly) face to
face or by Internet-Zoom, the Zoom link can be found in the course
page at https://blackboard.cuhk.edu.hk
.
Send 1-2 page weekly report (through the "Weekly report assignment"
in the course content of the course page at https://blackboard.cuhk.edu.hk) to the
supervisor to discuss your ideas and work achieved.
---------------------------------------------------------------------------
CMSC7250 (Term1): total 13 weeks
4 weeks: Define details of the project and write plans for the rest
of the term.
4 weeks: Literature search and testing of open source programs
related to the work.
4 weeks: Develop your own program, can be an integration of open
source routines/libraries from others.
1 week: Report writing and presentation preparation.
-----------------------------------------------------------------------
CMSC7251 (Term2): total 13 weeks
3 weeks: Improve work in the first term. Add original features that
different from others to the project.
3 Weeks: Enhance efficiency, add extra capabilities and features to
the project.
4 Weeks: Testing of the system and analysis of data. Increase
robustness and accuracy.
3 Weeks: Final Report writing, presentation preparation and rounding
up.
The Chinese University of Hong
Kong holds the copyright of this report. Any person(s)
intending to use a part or whole of the materials in the
report in a proposed publication must seek copyright
release from the Dean of the Graduate School.
------------ previous projects
------------------------- 2023-24 MSC projects to be supervised by Prof. Kin Hong
Wong, year
2023-24, 2023.6.2
Artificial Intelligence (AI) projects: The purpose of these
projects is to train students to learn about AI theories and
programming, it may involve the use of tools such as Tensor-flow,
Keras, Generative Pre-trained Transformer (GPT) etc. Students are
free to choose to work on one of the following topics and
applications.
1) Detect the type of sea vessel (ships) from
pictures taken at 300m-1000m away. We need to measure the travel
speed and location of the sea vessels. This industrial project may
be supported by a science park company which can provide us with
the data set and computation power.
2) Extraction of vital information from sounds
and images from videos. Using modern AI methods, we may be able to
extract useful information from video sources. It is an industrial
project to extract useful information (product type, weights, and
code) from working videos taken by mobile phones. A sample input
video can be found at (http://www.cse.cuhk.edu.hk/~khwong/www2/cmsc5720/fish_market.mp4).
3) Free projects: students can propose
projects that involve modern AI techniques. Such as experiments
with Large Language Model (LLM) and Generative Pre-trained
Transformer (GPT) .
Details of the projects can be found at http://www.cse.cuhk.edu.hk/~khwong/www2/cmsc5720/cmsc5720.html
.
2022-23
MSC projects to be supervised by Prof. Kin Hong
Wong2022-23, 2022.5.30
a)Computer
vision
processing research:
i)Generate
a
sentence from an image:
https://www.youtube.com/watch?v=c_bVBYxX5EU
.The idea is to
tell a story from a picture. Human can do it effortless, now
Artificial Intelligence may be applied to solve the problem. We
have done some primary work and the sentences produced are
accurate but small and fragmented. We would like to improve the
performance in the coming term. It can be applied to assist
visually impaired persons to read pictures or let them to better
understand the surrounds.
i)Vision
based
gesture or sign language recognition: This
is useful to automatically recognize hand gesture languages to
assist communication between a normal person and those lost the
ability to speak. This can also be used to develop exercise
tutorial systems for training tutors. Demos https://www.youtube.com/watch?v=vTC0QKR_uM0
or https://www.youtube.com/watch?v=doDUihpj6ro
.
b)Computer
audio
processing research:
i)Voice
cloning
and applications:
https://www.youtube.com/watch?v=1WN8Jhfd4uMThis interesting demo
may inspire new applications of audio sound generation for the
music industry.
iii)Audio
synthesis
and tone changes:
https://anonymous84654.github.io/RAVE_anonymous/Many
non-native English speakers speak English with a local accent.
The idea is to turn the non-native English recording into a
speech as if it is spoken by a native speaker. The idea can be
applied to all different target and destination languages, which
is useful for students and travelers.
Project 1: Convert 2D films into 3D videos automatically or
semi-automatically
3D movies have become more and popular recently. However,
many films produced in the past 100 years are mostly made in
2D. This project is to investigate techniques to turn these 2D
films into 3D videos automatically or semi-automatically.
Students need to study the methods of 3D computer vision and
investigate different approaches to achieve the goal.
The Google glasses will become popular and should inspire a
new generation of computer vision applications. Here are some
ideas: (i) Text translation of what the user is looking at.
The idea is to develop automatic translation of what the user
is seeing. When the user is wearing Google Glasses, what he is
seeing is being captured by a camera. Using Optical character
recognition (OCR) technology, the machine can translate the
foreign texts he/she is looking at into the language he/she
understands, then displays using the Google glasses or speak
to the user through the earphones. (ii) Tourist Navigation:
Using computer vision and GPS technology, the system can
display information of the area, advertising materials etc. to
the user.
Projects for research students (PhD, Mphil, Msc), write to
me (Prof.
K.H. Wong)
if you are interested.
Computer vision based
intelligence desk development (project
webpage) : Many people are still
preferred to read printed papers and write on papers. If a
camera is hanging over a desk and monitoring the
activities on the desk, the system can store, then
translate the text into a database for later use. This
project involves the recognition of books and texts
printed on them, and locates where you finger is pointing
to for user input. We have already finished part of the
project and new students can continue what we have already
built. Requirement: Interested in computer vision
and Programming (demo)
. Demos of our similar projects can be found at
http://www.youtube.com/watch?gl=TW&hl=zh-TW&v=DDfNqXUK6Uk
3D
reconstruction
using
the KINECT sensor.: Kinect is a
new device for finding the 3D range image of an
environment in real time at a very low cost. At
CUHK, this is now being used in a number of interesting
applications, such as human counting and human tracking
for security applications. It is also be used in medical
applications, for example, it is being used to monitor the
development of new muscles of a patient after he/she has
undergone a surgery after the bad muscles have been
removed. The objective measurement can quantify the muscle
development and give advice for doctors for further
treatment. This is a collaborated project with the Prince
of Wales hospital Hong Kong. Rrequirement:
Interested in computer vision and Programming.
Music playing robot
development: A music playing robot is being developed. It
is a pioneer work of making a robot that can play the
Chinese Flute. The system integrates various mechanical
control systems to produce an air jet to blow at the
embouchure hole of a flute to emulate the way a human
flutist plays the instrument. A sound based feedback
system is implemented to make sure the sound is rich and
appealing to the ears. It is an attempt to find a new
paradigm of computer music production – that is computer
music is produced by an authentic musical instrument, but
not produced by combining pre-recorded sounds as in
previous approaches (see our demos: http://www.youtube.com/watch?v=lDSA3s55iQc,
http://www.youtube.com/watch?v=NJ7wv2z8Wgk&list=UUfy2EumiHMeoUorMFR0woZA&index=1&feature=plcp).
Requirement:
interested
in
music, programming and signal processing.