Accurate RNA 3D Structure Prediction Using a Language Model-based Deep Learning Approach

RNA (ribonucleic acid) plays a pivotal role in life sciences, extending far beyond its function as a messenger of genetic information. It is essential in gene regulation, protein synthesis, and various cellular operations. Understanding how RNA folds into its precise tertiary (3D) structure is crucial because this conformation determines the molecule’s functionality and interactions. Despite significant advancements in artificial intelligence—particularly in protein biology—predicting RNA tertiary structures and designing RNA sequences that fold into specific structures (inverse folding) remain significant challenges. This difficulty arises largely due to RNA’s greater structural flexibility and the scarcity of available three-dimensional structural data compared to proteins.

Figure 1. The RNA 3D structure and inverse folding problem.

 

While there is a shortage of RNA 3D structural data, there is an abundance of RNA sequence data available in databases. To leverage this wealth of sequence information and address the challenges in RNA structure prediction, Professor Li Yu’s team has developed the RNA-FM language model. This model employs a masked language modeling approach on over 23 million RNA sequences, effectively extracting both evolutionary and structural information to facilitate downstream tasks. Building upon this foundation, they introduced RhoFold+, an innovative deep learning model that utilizes an end-to-end framework for RNA 3D structure prediction. By integrating evolutionary information and multi-sequence alignments, RhoFold+ significantly enhances prediction accuracy.

Figure 2. The architecture of RNA foundation model RNA-FM, which utilised masked languange modelling and trained on 23 million unannotated sequences to generate rich evolutionary and structural embeddings.

 

 

Figure 3. The architecture of RhoFold+, which take advantages of several deep learning techniques to accurate predict RNA 3D structures.

 

RhoFold+ excels in RNA tertiary structure prediction, achieving significant results in benchmark tests. Retrospective analysis showed that RhoFold+ outperformed other teams including AlphaFold3 on the community-wide challenge, CASP15, on natural RNA targets. RhoFold+ also achieved an average RMSD of under 4 Å in another well-known RNA-Puzzles challenge, surpassing methods like FARFAR2 and other expert groups. Testing on 77 RNA newly determined single-stranded RNA structures from the PDB database confirmed its generalizability, outperforming models such as DeepRNAFold and AlphaFold3. In cross-validation, RhoFold+ maintained an average RMSD of 4.65 Å across various structures and demonstrated strong generalization capabilities. Beyond tertiary structures, it can infer secondary structures and inter-helical angles and supports MSA sampling for multiple conformations, enhancing its practical applicability. RhoFold+ represents a notable advancement in RNA tertiary structure prediction, offering a fully automated framework for accurate predictions directly from sequences. Although challenges remain, particularly with complex structures, RhoFold+ provides strong support for experimental RNA structure determination. These advancements will enhance its applicability in RNA biological research and accelerate the exploration of RNA functions.

Figure 4. The results of RhoFold+ on community-wide challenges, CASP15, and RNA-Puzzles, as well as many benchmarks to test RhoFold+ generalisability.

 

With the success of RhoFold+ in RNA tertiary structure prediction, Professor Li Yu’s team further expanded their research to tackle the inverse folding problem. By integrating the predictive capabilities of RhoFold+, they developed RhoDesign, a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. This innovative approach is capable of designing RNA aptamers that are structurally similar yet sequence-dissimilar to known light-up aptamers that fluoresce in the presence of small molecules.

Figure 5. The RhoDesign pipeline and experiments conducted.

 

RhoDesign not only ensures structural fidelity but also allows for the in silico optimization of these aptamers to enhance their fluorescent activity. Experimental validations have confirmed that several of these generated RNA aptamers exhibit fluorescence mechanisms similar to those of known light-up aptamers. This demonstrates how structural predictions can guide the targeted and resource-efficient design of new RNA sequences. This breakthrough in RNA aptamer design underscores the potential of using deep learning-driven structural predictions to innovate novel biomedical applications.

Figure 6. The results of RhoDesign.

 

In summary, the advancements achieved by Professor Li Yu’s team with RhoFold+ and RhoDesign mark milestones in RNA research. Together, these developments not only enhance our understanding of RNA structure-function relationships but also open new avenues for targeted RNA design in therapeutics and biotechnology, showcasing the transformative potential of deep learning in solving complex biological challenges.

 

The full text of the research paper can be found:

Accurate RNA 3D structure prediction using a language model-based deep learning approach – Nature Methods

https://www.nature.com/articles/s41592-024-02487-0

 

Large language modeling and deep learning shed light on RNA structure prediction – Nature Methods

https://www.nature.com/articles/s41592-024-02488-z

 

Deep generative design of RNA aptamers using structural predictions

https://www.nature.com/articles/s43588-024-00720-6