Research |
|
I work in the area of computational biology and bioinformatics (CBB). It is a fascinating field of research under rapid growth, with an aim of using computational methods to study biological and medical phenomena. The very first reason for the need of CBB is the huge amount of experimental data pending analysis and interpretation. For example,
As you can see, CBB research touches on many areas of computer science, including data mining, machine learning, database management and algorithm design. There are many real-world applications, and there is an urgent need of new research to solve many evolving new problems. |
| Here are some of my recent projects: |
Whole-genome identification of sequence elementsThe genomes of human and many other organisms have been sequenced, yet understanding what the different parts of DNA do and their roles in the overall systems is still an ongoing endeavor. The ENCODE (Encyclopedia of DNA Elements) and modENCODE (ENCODE for model organisms) consortia combine the efforts of many academic and research institutes worldwide with an aim to identify and characterize the functional elements in genomes. We have been participating in the analysis working groups. Specific projects include the identification of non-coding RNAs, transcription factor binding sites and enhancers, and studying their functional roles. Computational problems and techniques: discriminative learning Selected publications:
|
Reconstruction of biological networksBiological objects do not work alone, but rather interact with other objects to form complex networks. Cataloging these interactions is a first step to understand the functions of individual objects and the biological systems as a whole. Specific interactions have been identified and studied thoroughly, but high-throughput techniques for probing whole networks are still catching up with the high data quality required for in-depth analyses. We have been using computational methods to integrate different types of data with a goal of reconstructing the not yet fully known biological networks with high precision and coverage. Computational problems and techniques: graph learning, data integration, kernel methods, probabilistic modeling, time-series analysis Selected publications:
|
Prediction of functionally coupled objects through co-evolutionary analysisBiological objects that are functionally coupled, such as protein domains that interact, are restrained from independent evolutionary events that prohibit them from normal interactions. Instead, they may undergo co-evolution, in which the fitness loss due to one of the evolutionary events is restored by the other event. Taking this idea in the reverse direction, by looking for objects that display co-evolutionary patterns, we could discover functionally coupled objects and advance our understanding of the corresponding biological pathways. Computational problems and techniques: statistical modeling, information theory, correlation analysis Selected publications:
|
Mining useful information from data with uncertaintyMany types of data contain certain uncertainty due to factors such as low measurement resolution, noise and staleness. The uncertainty could hinder or mislead the mining of useful information. Taking data uncertainty and related information such as repeated measurements into account in the mining process could help uncover hidden patterns. We have been designing algorithms to handle data uncertainty for a variety of data mining problems. Computational problems and techniques: clustering, classification, pattern mining, data structures, data pruning Selected publications:
|
Tool developmentsWe have also developed some tools for the community to perform computational analysis: |