All-at-once RNA folding with 3D motif prediction framed by evolutionary information – Nature Methods
Researchers have unveiled CaCoFold-R3D, a method that predicts RNA secondary structure and 3D motifs in one integrated pass—guided by evolutionary covariation. Instead of bolting motif searches onto a precomputed fold, the approach uses statistical grammars to jointly infer canonical helices (including pseudoknots) and recurrent 3D motifs such as GNRA tetraloops, K-turns, Loop E, and multi-helix junctions. The result: faster, more accurate models of RNA architecture at family scale.
How it works
CaCoFold-R3D takes a single RNA sequence or, preferably, a multiple-sequence alignment as input. It first applies R-scape to detect “positive” base pairs that significantly covary beyond phylogenetic expectation and “negative” pairs unlikely to form. These constraints meaningfully boost prediction accuracy and, crucially, help localize where 3D motifs can occur.
To integrate everything in one model, the method introduces an SCFG called RGBJ3J4-R3D. It layers covarying base pairs: the first layer captures a maximum set of nested pairs for the main secondary structure; subsequent layers accommodate pseudoknotted helices and additional tertiary interactions that have covariation support. Within the primary layer, RGBJ3J4-R3D predicts both helices and loop-embedded 3D motifs through maximum-probability dynamic programming.
The grammar explicitly models the most common multiloop junctions—three-way (J3) and four-way (J4)—and incorporates a library of R3D motif grammars across six loop classes: hairpin (HL), bulge (BL), internal (IL), J3, J4, and branch segments (BS) in higher-order junctions. Motifs are parameterized with profile HMMs to capture sequence signatures and with SCFG states to capture structural correlations. Rather than modeling each non–Watson-Crick interaction directly (which is complex and non-nested), the method models correlated residue groups as segments, enabling efficient integration into standard dynamic programming.
Because comprehensive motif-annotated datasets are limited, motif versus generic-loop usage probabilities are currently curated and then evenly distributed across motifs within each class (maximum entropy). Critically, CaCoFold-R3D bounds motif searches to loop segments framed by helices with covariation support—substantially cutting false positives for small, information-sparse motifs.
What’s new compared to prior tools
- All-at-once prediction: secondary structure and a large catalog of 3D motifs are inferred under a single probabilistic model.
- Broad motif coverage: handles motifs in HL, BL, IL, J3, J4, and branch segments, including topological variants.
- Alignment-aware: folds entire alignments, using evolutionary signal to guide both helices and motif placement.
- Scalable: same time complexity as mainstream SCFG-based secondary structure predictors; runs on ribosomal RNAs.
- Customizable: a descriptor file defines the motif library; users can add or refine motifs without reengineering the model.
- Discovery-ready: evolutionary framing plus flexible grammars help surface recurring, previously overlooked motifs.
By contrast, MC-Fold/RNA-MoIP and RNAwolf extend base-pair types but don’t identify specific 3D motifs; JAR3D and RMfam require predefined loop inputs and search one motif at a time; RMDetect and BayesPairing2 model specific motifs but are limited in scope, require pre-specified secondary structures, and can be computationally heavy or need windowing for long RNAs. CaCoFold-R3D subsumes these steps and scales to large families with a single unified run.
Evidence and accuracy
A prototype that modeled two motifs (GNRA tetraloop and K-turn) showed how covariation boosts reliability: adding covariation raised motif detection sensitivity from 84% to 95% and reduced false positives, even though motifs themselves often lack internal covariation. This led to the full RGBJ3J4-R3D implementation with 51 motif architectures (96 nonredundant variants in testing).
Run across all Rfam seed alignments, CaCoFold-R3D recovers known motifs in their expected families—e.g., K-turns in U3 snoRNA, U4 snRNA, the SAM riboswitch, and newly reported bacterial examples; GNRA tetraloops and Loop E motifs in 5S rRNA; junction motifs in magnesium and TPP riboswitches; and multiple domain IV motifs in metazoan SRP. Of 44 literature-documented motifs assessed, the method detects all but three.
Overall, the tool predicts 2,124 motifs across families, with 1,460 supported by covariation (at least one bounding helix covaries) spanning 591 Rfam families. As a control, column-permuted alignments (which preserve base composition but disrupt covariation) yield 121 covariation-supported motifs versus 1,460 in real data—an estimated 8.3% false discovery rate for supported calls. For motifs lacking covariation support, the estimated FDR rises to 25.4%, underscoring the value of evolutionary framing.
Discovery in action
The team used CaCoFold-R3D to revisit a loop in group II intron RNA labeled as a left bulge. Covariation revealed a conserved three-way junction framed by three helices (including a lone pair), later confirmed by crystal structures showing coaxial stacking and a non–Watson-Crick pair. With the alignment-derived consensus in hand, the researchers added a new J3 descriptor; it emerged as the most frequent J3 junction in Rfam and one of the top five motifs overall—highlighting the method’s utility for motif discovery.
Performance and scale
On an Apple M3 Max (128 GB), 98% of Rfam families finish end-to-end in under 60 seconds; 95% in under 30 seconds. Even massive alignments are tractable: eukaryotic SSU rRNA (~1,978 columns, 90 sequences) completes in ~32 minutes; eukaryotic LSU rRNA (~3,680 columns, 88 sequences) in ~2.9 hours. Crucially, all 96 motifs and the secondary structure are predicted in a single pass—no per-motif searches, no separate folding step.
Why it matters
CaCoFold-R3D shows that RNA folding and motif recognition can be solved jointly and at scale by leveraging evolution. The method’s alignment-aware grammars, explicit modeling of common junctions, and bounded motif search dramatically improve precision while keeping run times practical—even for the largest RNAs. For labs building or curating RNA alignments, the tool doubles as an engine for hypothesis generation: it flags recurrent 3D solutions where covariation points the way, accelerating both annotation and discovery across the RNA world.