New AI Approach by CUHK Engineering Investigates Multiple Gene Regulatory Mechanisms Concurrently for the Advancement of Biomedical Research

The following article is originally published in CUHK Press Releases on 30 August 2020.

Kevin Yip

A research team from the Department of Computer Science and Engineering (CSE) at The Chinese University of Hong Kong (CUHK) has developed a new Gene Expression Embedding frameworK (GEEK), which uses artificial intelligence technologies in machine learning and natural language processing to study the regulation of gene expression. In contrast to previous works that focused on one or a few regulatory mechanisms at a time, this new framework can study the joint effects of many mechanisms simultaneously. A research article describing this new study has been published in the renowned international science journal Nature Machine Intelligence. The framework may help study the causes of cancers and treatment methods.

Each human body contains tens of trillions of cells. While they mostly share the same DNA sequences, their gene activities can be markedly different. Such activities, referred to as “gene expression”, are affected by many regulatory mechanisms, such as transcription factor binding and protein interactions. In 2017, Prof. Kevin Yip from CUHK CSE and his research team studied one of the mechanisms that involves regulatory elements called enhancers. They investigated how enhancers are related to gene expression, and applied the results to discover three genes potentially related to liver cancer. This and other similar studies considered only individual gene regulatory mechanisms, and therefore could not fully understand the complex interplay between different mechanisms.

Prof. Yip used a metaphor to explain the intricate relationships among gene regulatory mechanisms. He said, “If you fail to turn on an electronic appliance using a remote controller, it seems like there is a problem with the controller, but the problem may also lie with the receiver or compatibility issues between the two. If we have a tool that can analyse the different components at the same time, it would be much easier to identify the root cause of the problem.”

The GEEK framework proposed by Prof. Yip’s team makes use of machine learning and natural language processing methods, treating genes as “words” to capture their relationships in “sentences”. In the published study, GEEK was used to study several diverse gene regulatory mechanisms, including contacts in three-dimensional genome architecture, protein interactions, genomic neighborhoods and broad chromatin accessibility domains. The results showed that gene expression could be better explained when these mechanisms were modeled together than when they were considered separately.

Cancer is caused by mutations that lead to abnormal cell proliferation. “GEEK represents a novel way to study gene expression in different types of cells, including cancer cells,” said Prof Yip. “We will work closely with medical experts to try explaining some causes of liver cancer using GEEK. In the long run, we hope to extend our research to other cancer types and contribute to the development of new prevention and treatment methods.”

Among cancer treatments, immunotherapies are receiving a lot of attention due to their much greater efficacy in some cancer types. Yet the treatment outcome varies from patient to patient. Prof. Yip hopes that artificial intelligence can be used in the future to predict patients’ responses to immunotherapies, which would improve treatment precision and reduce the burden on patients.

The research project was supported by the General Research Fund of the University Grants Council. Prof. Yip’s team took one and a half years to produce the results. In the area of gene regulation research, Prof. Yip has more than ten years of experience, and he was one of the first to use machine learning and natural language processing to study gene regulation.


本文轉載自【香港中文大學】2020年8月30日 新聞稿

香港中文大學(中大)計算機科學與工程學系的研究團隊,將機器學習和自然語言處理等人工智能技術應用於基因表達調控的研究,開發嶄新的「嵌入式基因表現框架」(Gene Expression Embedding frameworK,簡稱GEEK)。它可同時研究多種調控機制對基因表達的影響,突破以往只考慮單一或少量機制的傳統研究模式。論文已刊登於國際權威科學期刊Nature Machine Intelligence,研究成果或可延伸至探索癌症的成因及治療,推動醫學發展。






A schematic showing how to use machine learning and natural language processing investigate multiple gene regulatory mechanisms.
A schematic showing how to use machine learning and natural language processing investigate multiple gene regulatory mechanisms. 上圖顯示如何透過機器學習和自然語言處理技術研究多種基因調控機制。