Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sparse RNA Folding: Time and Space Efficient algorithms

Similar presentations


Presentation on theme: "Sparse RNA Folding: Time and Space Efficient algorithms"— Presentation transcript:

1 Sparse RNA Folding: Time and Space Efficient algorithms
Speaker: Shay Zakov Joint work with: Rolf Backofen Dekel Tsur Michal Ziv-Ukelson

2 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

3 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

4 What is RNA? A biological molecule, composed as a sequence over 4 types of building blocks called bases or nucleotides. The different base types are denoted by the letters A, G, C, and U.

5 Structure Function RNA’s Structure Primary Structure
Secondary Structure Tertiary Structure Structure לרנ"א יש מבנה, ואנו מבחינים בין שלוש רמות של מבנה. ברמה הראשונה- המבנה הראשוני – זהו רצף הבסיסים במולקולה – כלומר מי הולך אחרי מי ברצף. ברמה השנייה – מבנה שניוני, המבנה הזה מגדיר לנו את החיבור בין הבסיסים שאינם קרובים זה לזה ברצף, אבל יוצרים ביניהם אינטראקציה. המבנה השניוני מאופיין באלמנטים כמו לולאות ו- stem שהוא איזור base-paired. והרמה השלישית – מבנה שלישוני, מגדיר את המיקום של כל אטום במולקולה במרחב התלת מימדי, במבנה הזה ניתן לראות גם קשרים נוספים בין אזורים רחוקים במולקולה, קשרים הנקראים פסאודונוטס. הסיבה שמבנה הוא מאפיין חשוב ברנ"א, היא שמבנה של המולקולה מגדיר ומשפיע על הפונקציה שלה. חוקרים אשר עובדים על מולקולות הרנ"א, מסתכלים בדרך כלל על המבנה השניוני בלי להיכנס למבנה השלישוני. הסיבה לכך היא קודם כל היעדר מידע ניסויני על המבנה השלישוני, והחיזוי של המבנה השלישוני בעזרת מחשב הוא משימה שהיא כבדה חישובית. ******************* the primary structure of a biological molecule is the exact specification of its atomic composition and the chemical bonds connecting those atoms secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids (DNA/RNA). It does not, however, describe specific atomic positions in three-dimensional space, which are considered to be tertiary structure. the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates 1 – the sequence, 2 – loops and stems (base paired regions) Function 5

6 computationally predict the foldings of given RNA strings.
CAGUUUUUUCAGGUCUUUGGGGAACAUUCAACGCUGUCGGU The goal: computationally predict the foldings of given RNA strings. Main tools: Mfold, Vienna package

7 Approaches for folding prediction
Single strand folding (Waterman and Smith 78, Nussinov and Jacobson 80, Zuker and Stiegler 81, Wexler, Zilberstein and Ziv-Ukelson 07). Multiple strand simultaneous alignment with folding (Sankoff 85, Ziv-Ukelson, Gat-Viks, Wexler and Shamir 09).

8 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

9 (Simplified) Problem Definition
An RNA molecule is a string over the alphabet {A, C, G, U}. Each base (= letter) in the string may form a bond with at most one other base, where A can pair with U, and C with G. The base-pairs are nested: AAGUCUUGUCAGGCC A set of nested base-pairs is called a secondary structure, or a folding of the string.

10 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

11 A recursive solution Co-terminus foldings: Partitionable foldings:
Lc(i,j) – the maximum size of a co-terminus folding of Si,j Lp(i,j) – the maximum size of a partitionable folding of Si,j L(i,j) – the maximum size of a folding of Si,j (the objective function) A U C A U G G C A U A U C A U G G C A U - q-1 q i<q≤j

12 The Nussinov – Jacobson algorithm
A DP algorithm which performs a bottom-up computation of the recurrence. Uses a table M which stores solutions for substrings: M[i,j] = L(i,j). Upon reaching M[i,j], all entries which are needed for the computation of L(i,j) have already been computed and stored in M. j i

13 The Nussinov–Jacobson algorithm
A C A G U U G C A

14 The Nussinov–Jacobson algorithm
- A C A G U U G C A

15 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j A C A G U U G C A

16 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 2 A C A G U U G C A

17 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 3 A C A G U U G C A

18 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 4 A C A G U U G C A

19 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 5 A C A G U U G C A

20 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 6 A C A G U U G C A

21 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 7 A C A G U U G C A

22 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 8 A C A G U U G C A

23 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j q = 9 A C A G U U G C A

24 The Nussinov–Jacobson algorithm
q-1 q i < q ≤ j Space complexity: O(n2) Time complexity: O(n3)

25 > The result of Wexler, Zilberstein and Ziv-Ukelson
Optimally co-terminus strings (OCTs): strings of length 1, strings for which every optimal folding is co-terminus > Lc A A G C C U

26 > The result of Wexler, Zilberstein and Ziv-Ukelson
Optimally co-terminus strings (OCTs): strings of length 1, strings for which every optimal folding is co-terminus > Lc A A G C C U

27 = The result of Wexler, Zilberstein and Ziv-Ukelson ≥
Claim (WZZ-07): there is a maximum size partitionable folding of the following form: q* = Triangle inequality property: L

28 Qi,j = {i < q ≤ j : Sq,j is an OCT}
The result of Wexler, Zilberstein and Ziv-Ukelson q-1 q i<q≤j = = qQi,j q-1 q Qi,j = {i < q ≤ j : Sq,j is an OCT}

29 The result of Wexler, Zilberstein and Ziv-Ukelson
qQi,j q-1 q Qi,j = {i < q ≤ j : Sq,j is an OCT} Q1,9 = {6, 9} q = 6 A C A G U U G C A

30 The result of Wexler, Zilberstein and Ziv-Ukelson
qQi,j q-1 q Qi,j = {i < q ≤ j : Sq,j is an OCT} Q1,9 = {6, 9} q = 9 A C A G U U G C A

31 The result of Wexler, Zilberstein and Ziv-Ukelson
qQi,j q-1 q Qi,j = {i < q ≤ j : Sq,j is an OCT} Z = Z(S): the number of OCT substrings of S n ≤ Z ≤ n2 Time complexity: O(n3)  O(nZ)

32 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

33 Reducing space complexity
- qQi,j q-1 q Observation: the recurrence examines only solutions for substrings that either start at index i, start at index i+1, or are OCT substrings. Space complexity: O(n2)  O(Z)

34 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

35 Further branch point elimination
STEP strings: strings for which in all optimal foldings the first base is paired. A G U U G C

36 Further branch point elimination
STEP strings: strings for which in all optimal foldings the first base is paired. A G U U G C

37 Further branch point elimination
Claim: either or = Lp = Lp STEP? OCT A C U G G C G C G

38 Further branch point elimination
Claim: either or = Lp = Lp STEP? OCT A C U G G C G C G

39 Further branch point elimination
Pi,j = {i+1}  {i+1 < q  j : Si,q-1 is a STEP and Sq,j is an OCT} q-1 q

40 Efficient traversal of Pi,j
Pi,j = {i+1}  {i+1 < q  j : Si,q-1 is a STEP and Sq,j is an OCT} Problem: how to efficiently traverse Pi,j? Solution: forward dynamic programming. Invariant: upon reaching M[i,j], the entry already contains the value of Lp(i,j).

41 Efficient traversal of Pi,j
- q-1 q

42 Efficient traversal of Pi,j
- q-1 q

43 Efficient traversal of Pi,j
- q-1 q

44 Efficient traversal of Pi,j
- q-1 q i = 3, q = 6, j = 6

45 Efficient traversal of Pi,j
- q-1 q i = 3, q = 6, j = 9

46 Time complexity Computation of Lp: O(nZ)  O(LZ).
Traversing all entries and the computation of Lc: O(n2). Total time: O(n2 + LZ). Time complexity may be further reduced to O(LZ), applying step encoding.

47 Outline Background Problem definition Standard algorithm
Space reduction technique Time reduction technique Summary

48 Results: single strand folding
Space complexity: O(n2)  O(Z) Time complexity (base pairing maximization variant): O(nZ)  O(n2+LZ)  O(LZ)

49 Results: alignment with folding
Space complexity: O(n2m2)  O(nm2 + Z’)

50 Thank you!


Download ppt "Sparse RNA Folding: Time and Space Efficient algorithms"

Similar presentations


Ads by Google