RNA Secondary Structure Prediction Dynamic Programming Approaches Sarah Aerni http://www.tbi.univie.ac.at/
Outline RNA folding Dynamic programming for RNA secondary structure prediction
RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C Bases can only pair with one other base. Image: http://www.bioalgorithms.info/
Basics of RNA Structure
RNA Secondary Structure Pseudoknot Stem Interior Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop Image– Wuchty
Nesting
Circle Plot All loops must have at least 3 bases in them Images – David Mount Linear RNA strand folded back on itself to create secondary structure Circularized representation uses this requirement Arcs represent base pairing All loops must have at least 3 bases in them Equivalent to having 3 base pairs between all arcs Exception: Location where the beginning and end of RNA come together in circularized representation
Pseudoknots
Trouble with Pseudoknots Images – David Mount Pseudoknots cause a breakdown in the Dynamic Programming Algorithm. In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the recurrence relations
Base Pair Maximization C C Problem: Find the RNA structure with the maximum (weighted) number of nested pairings A G C C G G C A U A U U A A A C U G U G A C A C A A A A C U C G G C U G U G U C G G A G C C U U G A G G C G G A G C G A U G C A U C A A U U G A ACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCG
Base Pair Maximization – Dynamic Programming Algorithm S(i,j) is the folding of the subsequence of the RNA strand from index i to index j which results in the highest number of base pairs Simple Example: Maximizing Base Pairing Bifurcation Unmatched at i Umatched at j Base pair at i and j Images – Sean Eddy
S(i, j – 1) S(i + 1, j) S(i + 1, j – 1) +1 Base Pair Maximization – Dynamic Programming Algorithm Alignment Method Align RNA strand to itself Score increases for feasible base pairs Each score independent of overall structure Bifurcation adds extra dimension S(i, j – 1) S(i + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar to unmatched alignment Bases can pair, similar to matched alignment Dynamic Programming – possible paths S(i + 1, j – 1) +1 Images – Sean Eddy
k = 0 : Bifurcation max in this case Base Pair Maximization – Dynamic Programming Algorithm Alignment Method Align RNA strand to itself Score increases for feasible base pairs Each score independent of overall structure Bifurcation adds extra dimension k = 0 : Bifurcation max in this case S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar Bases can pair, similar to matched alignment Dynamic Programming – possible paths Bifurcation – add values for all k Images – Sean Eddy
Example
Base Pair Maximization - Drawbacks Base pair maximization will not necessarily lead to the most stable structure May create structure with many interior loops or hairpins which are energetically unfavorable Not biologically reasonable
References How Do RNA Folding Algorithms Work?. S.R. Eddy. Nature Biotechnology, 22:1457-1458, 2004.