Download presentation
Presentation is loading. Please wait.
Published byMervyn McDowell Modified over 8 years ago
1
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology University of Georgia
2
Why another RNA folding algorithm? The need for RNA analysis tools has increased because of the number of recently found functional RNAs (i.e., ncRNAs). RNA folding algorithms are not completely satisfactory in spite of having been intensively studied for more than 25 years.
3
Increased number of ncRNAs ncRNA function other than coding proteins, e.g., structural, catalytic, and regulatory factors ncRNA genes do not have strong statistical features, such as ORFs, or polyadenylated, except Transcribed ncRNA molecules can fold into stable (and unique) secondary or tertiary structures
4
Increased number of ncRNAs rRNAs and tRNAs RNA maturation: snRNA in recognizing splicing sites RNA modification: snoRNA converting uridine to pseudo-uridine Regulation of gene expression and translation: e.g., miRNAs DNA replication: e.g., telomerase RNAs - template for addition of telomeric repeats Etc. In introns, intergenic regions, or 5’ and 3’ UTRs,
5
Increased number of ncRNAs (Bompfunewerer, et al, 2005) ClassSizeFunctionPhylogenetic distribution tRNA70-80Translationubiquitous rRNA 16S/18S 28S+5.8S/23S 5S 1.5K 3K 130 translationubiquitous RNase P MRP 220-440 250-350 tRNA -maturationubiquitous eukarya snoRNA telomerase 130 400-550 pseudouridinylation addition of repeats snRNA U1 ~ U6 100-600 130-140 Spliceosome mRNA maturation Eukarya Eukarya, archaea U7 7SK ~65 ~300 Histone mRNA Maturation Translational regulation Eukayotes vertebrata tmRNA300-400Tags protein For proteolysis bacteria miRNA~22Post-tran. Reg.Multi-cellular orgs
6
Long history of RNA foldings First simple RNA folding algorithm (Nussinov 1978) Thermodynamic based (Zuker&Stiegler 1981) Zuker’s (1989) mFOLD 3.2 RNAfold (a part of Vienna Package 1.6.1) Not all that accurate on single sequence Inherent computational complex from DP Unable to predict pseudoknots
7
Background Base pairings allow RNA to fold Watson-Crick base pairs: A-U, C-G Wobble pair G-U non-canonical pairs are also possible
8
NN N O H H 5’-u-u-c-c-g-a-a-g-c-u-c-a-a-c-g-g-g-a-a-a-u-g-a-g-c-u-3’ P a P c 5’ 3’ P u a P g P CYTOSINE N N N O H H H N N GUANINE URACIL ADENINE NN O O H N N N N N HH
9
Secondary structure is important to tertiary structure
10
Hairpin loop Junction (Multiloop) Bulge Loop Single-Stranded Interior Loop Stem Image– Wuchty Pseudoknot
11
aacguuccccucugg g gcagcccag a ugccc stem (double helix): stacked base pairs loop: strand of unpaired bases ac c gg u
12
aacguuccccucuac c gg g gcagcgg u ccag a ugcac c cc Pseudoknots: crossing patterns of stems
14
terminates translation errors Bacterial tmRNA consensus structure (Felden et al. 2001. NAR 29)
15
Pseudoknots in TMV 3’ UTR Promotes efficient translation Binds EF1A, cooperates with 5’UTR (Leathers et al. 1993 MCB 13 Zeenko et al. 2002 JVI 76)
16
Previous work (Nussinov’s) maximizing the number of base pairs (Nussinov et al, 1978) simple case (i, j) = 1
17
Previous work (Zuker’s) Thermodynamic energy based method (Zuker and Stiegler 1981) Energy minimization algorithm: find the secondary structure to minimize the free energy ( G) G calculated as sum of individual contributions of: –loops –base pairs –secondary structure elements
18
Previous work (Zuker’s) Free-energy values (kcal/mole at 37 o C ) Energies of stems calculated as stacking contributions between neighboring base pairs
19
Previous work (Zuker’s)
20
MFOLD: computing loop dependent energies Previous work (Zuker’s)
21
Difficult issues Energy associated with any position is only influenced by local sequence and structure mFOLD does not predict pseudoknots PKnots: (Eddy and Rivas 1999) predict restricted cases of pseudoknots, O(n 6 ) time and O(n 4 ) space Min energy-based pseudoknot prediction is NP-hard (Lyngso and Pederson 2000)
22
Pseudoknots drastically increase the complexity
23
Heuristic RNA folding algoithms ILM (Ruan et al 2004) HotKnots (Ren et al 2005) Fast, sometime slow unlimited class of pseudoknots do not guarantee the optimality of the predicted structure
24
This work Graph-theoretic based, aviod nucleotide level DP Unlimited pseudoknot structures Optimal solutions Fast Comparable performance in accuracy
25
This work (summary) 1.Model: similar to ILM, without loop energy 2.Approach: Find all stable stems, construct a stem graph Reduce folding to independent set problem 3.Techniques: tree-decompose the stem graph DP to obtain optimal solution
26
This work (approach)
27
A set of non-overlapping stems corresponds to an independent set of the stem graph. The weight of each vertex is related to the energy of the corresponding stem.
28
This work (techniques) A tree decomposition of the stem graph Tree width t = 4
29
This work (techniques) A tree decomposition of the stem graph Tree width t = 4 Find an approximate tree decomposition of width t MWIS can be found in time O(2 t N), N=O(n 2 ) by DP over the tree Time can be improved to O(e t/e ) = O(1.44 t )
30
This work (experimental results) Data sets: 50 tRNAs (length 71 - 79) 50 pseudoknots (23 - 113) 11 large RNAs (210 - 412 Compared with PKnots (DP, optimal, restricted pks) ILM (heuristic, unrestricted) HotKnots (heuristic, unrestricted Measure sensitivity = TP/Real total specificity = TP/(TP+FP) Time
31
This work (experimental results)
32
Conclusion A new graph-theoretic algorithm to RNA folding Performance comparable with the best in both accuracy and speed With much room to be improved Applications in multiple structure alignment as well as in folding single sequence A part of NIH project for ncRNA gene search
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.