RNA (ribonucleic acid) plays a pivotal role in life sciences, extending far beyond its function as a messenger of genetic information. It is essential in gene regulation, protein synthesis, and various cellular operations. Understanding how RNA folds into its precise tertiary (3D) structure is crucial because this conformation determines the molecule’s functionality and interactions. Despite significant advancements in artificial intelligence—particularly in protein biology—predicting RNA tertiary structures and designing RNA sequences that fold into specific structures (inverse folding) remain significant challenges. This difficulty arises largely due to RNA’s greater structural flexibility and the scarcity of available three-dimensional structural data compared to proteins.
While there is a shortage of RNA 3D structural data, there is an abundance of RNA sequence data available in databases. To leverage this wealth of sequence information and address the challenges in RNA structure prediction, Professor Li Yu’s team has developed the RNA-FM language model. This model employs a masked language modeling approach on over 23 million RNA sequences, effectively extracting both evolutionary and structural information to facilitate downstream tasks. Building upon this foundation, they introduced RhoFold+, an innovative deep learning model that utilizes an end-to-end framework for RNA 3D structure prediction. By integrating evolutionary information and multi-sequence alignments, RhoFold+ significantly enhances prediction accuracy.
RhoFold+ excels in RNA tertiary structure prediction, achieving significant results in benchmark tests. Retrospective analysis showed that RhoFold+ outperformed other teams including AlphaFold3 on the community-wide challenge, CASP15, on natural RNA targets. RhoFold+ also achieved an average RMSD of under 4 Å in another well-known RNA-Puzzles challenge, surpassing methods like FARFAR2 and other expert groups. Testing on 77 RNA newly determined single-stranded RNA structures from the PDB database confirmed its generalizability, outperforming models such as DeepRNAFold and AlphaFold3. In cross-validation, RhoFold+ maintained an average RMSD of 4.65 Å across various structures and demonstrated strong generalization capabilities. Beyond tertiary structures, it can infer secondary structures and inter-helical angles and supports MSA sampling for multiple conformations, enhancing its practical applicability. RhoFold+ represents a notable advancement in RNA tertiary structure prediction, offering a fully automated framework for accurate predictions directly from sequences. Although challenges remain, particularly with complex structures, RhoFold+ provides strong support for experimental RNA structure determination. These advancements will enhance its applicability in RNA biological research and accelerate the exploration of RNA functions.
With the success of RhoFold+ in RNA tertiary structure prediction, Professor Li Yu’s team further expanded their research to tackle the inverse folding problem. By integrating the predictive capabilities of RhoFold+, they developed RhoDesign, a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. This innovative approach is capable of designing RNA aptamers that are structurally similar yet sequence-dissimilar to known light-up aptamers that fluoresce in the presence of small molecules.
RhoDesign not only ensures structural fidelity but also allows for the in silico optimization of these aptamers to enhance their fluorescent activity. Experimental validations have confirmed that several of these generated RNA aptamers exhibit fluorescence mechanisms similar to those of known light-up aptamers. This demonstrates how structural predictions can guide the targeted and resource-efficient design of new RNA sequences. This breakthrough in RNA aptamer design underscores the potential of using deep learning-driven structural predictions to innovate novel biomedical applications.
In summary, the advancements achieved by Professor Li Yu’s team with RhoFold+ and RhoDesign mark milestones in RNA research. Together, these developments not only enhance our understanding of RNA structure-function relationships but also open new avenues for targeted RNA design in therapeutics and biotechnology, showcasing the transformative potential of deep learning in solving complex biological challenges.
The full text of the research paper can be found:
Accurate RNA 3D structure prediction using a language model-based deep learning approach – Nature Methods
https://www.nature.com/articles/s41592-024-02487-0
Large language modeling and deep learning shed light on RNA structure prediction – Nature Methods
https://www.nature.com/articles/s41592-024-02488-z
Deep generative design of RNA aptamers using structural predictions
https://www.nature.com/articles/s43588-024-00720-6