Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 11. RNA Secondary Structure Prediction

Similar presentations


Presentation on theme: "Lecture 11. RNA Secondary Structure Prediction"— Presentation transcript:

1 Lecture 11. RNA Secondary Structure Prediction
The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics

2 Lecture outline From sequences to functions RNA secondary structures
Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

3 From Sequences to Functions
Part 1 From Sequences to Functions

4 From sequences to functions
One of the biggest questions in biology: Can one tell the function of a molecule (DNA/RNA/protein) from its sequence alone? Sometimes, but usually not Easier if we also know the structure Common believe: sequence  structure  function Of course, also depends on the environment Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

5 Molecular structures Four levels: Primary structures
The sequence Secondary structures First formed Local Tertiary structures Global “Folds”, “domains” Quaternary structures Multiple molecules Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

6 Primary structures Connections (strong covalent bonds vs. weak hydrogen bonds) Which molecules are connected Which atoms are connected First-level constraints of the possible structures Example: Molecules close in primary structure must also be close in secondary, tertiary and quaternary structures Image credit: Wikibooks Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

7 Primary structures Orientation: DNA, RNA: 5’-3’
Amino acids: Amino (N) terminus to carboxyl (C) terminus “Residue”: what remains after a water molecule is expelled Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

8 DNA secondary structures
Double helix A-DNA (dehydrated samples) Right-handed 11bp per turn Most common: B-DNA 10.5bp per turn Z-DNA (some methylated DNA) Left-handed 12bp per turn Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

9 DNA secondary structures
A-DNA B-DNA Z-DNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

10 RNA secondary structures
Largely possible to be projected onto a 2D plane Stem/hairpin loop Stacking pairs Bulge Internal loop Multi-loop Exterior loop Dangling nucleotides Less stable pair Coaxial stacking Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

11 RNA secondary structures
Pseudoknots: complex structures Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4): , (2008) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

12 Protein secondary structures
Three main types: -helixes -sheets Coils (connectors) Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

13 DNA tertiary structures
Wrapped around nucleosomes formed by histone proteins Condensed form at beginning of mitosis and meiosis Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

14 RNA tertiary structures
Overall structure of an RNA More studied for RNAs that do not translate into proteins -- “non-coding” RNAs Example: tRNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

15 Protein tertiary structures
Complex structures Mainly caused by weak forces (hydrogen bonds and hydrophobic interactions) Occasionally stronger forces (disulfide bonds between cysteines) The CATH hierarchy Class: composition of secondary structures Architecture: overall shape Topology: connection of secondary structures Homologous: with common ancestor Image credit: CATH Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

16 Quaternary structures
Types: Protein subunit-protein sub-unit Protein-protein Protein-DNA Protein-RNA (Protein-small molecules) RNA-RNA ... Protein-subunit interaction (Hemoglobin) Protein-DNA interaction Image credit: Wikipedia, Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

17 Structure and function
Why function depends on structure? Structure itself is the function (e.g., tubulins) Binding Complementarity of interacting structures Formation of special bonds Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

18 Structure and function
Why function depends on structure? (cont’d) Functional group (e.g., catalytic site) Determining localization (e.g., transporter membrane proteins) Image credit: Spudich , Science 288(5470): , 2000 Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

19 RNA Secondary Structures
Part 2 RNA Secondary Structures

20 Important RNA classes Coding: Non-coding: Messenger RNAs (mRNAs)
For translating into proteins Non-coding: Ribosomal RNAs (rRNAs) Parts of the ribosome complex Transfer RNAs (tRNAs) Delivering free amino acids during translation Micro RNAs (miRNAs) Binding mRNA targets to promote RNA degradation or repress translation Small nucleolar RNAs (snoRNAs) Guiding chemical modifications of other RNAs Small nuclear RNAs (snRNAs) Involved in mRNA splicing Long non-coding RNAs (lncRNAs) Some involved in gene regulation ... Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

21 Importance of RNA structures
Structure is important to many classes of RNA Examples: tRNA snoRNA Image sources: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

22 Representing RNA secondary structures
Formats: (see Dot-bracket format Stockholm format ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

23 Dot-bracket format Sequence (nucleotides 10, 20, 30, etc. marked in red): GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGACCCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG Structure: (((( ((((((.(((....((((((.(((( )))).)))))).))).)))))).((((((.....)))))).))))..... Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

24 Predicting RNA secondary structures
A basic assumption in structure predictions: Real structure has the lowest free energy In a simplified view, more stable bonds  lower free energy In the case of RNA secondary structures: Good to form more pairs A-U C-G Sometimes G-U (a “wobble base pair”) Good to form more stable pairs C-G > A-U > G-U Good to have stable sub-structures E.g., stacking pairs Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

25 Predicting RNA secondary structures
We will assume there are no pseudoknots With pseudoknots, currently there is no known algorithm that can find the optimal solution efficiently We need two things: A thermodynamic model for computing the free energy of a structure A method for finding the structure with the minimum free energy This setting sounds familiar? A pseudoknot Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

26 Further assumptions The free energy of a secondary structure is the sum of the free energies of the sub-structures Not the sum of individual bases/base pairs, as one base pair can participate in multiple sub-structures The free energies of the sub-structures are independent Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

27 Problem definition Given an RNA sequence, find a set of base pairs so that each base is paired at most once Example: Input sequence: GUGAAUGAUGAAUUU...ACG Output set of base pairs: (7, 97) (8, 96) ... (18, 74) (81, 87) Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

28 Linear view Last update: 21-Nov-2015
... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

29 Thermodynamics model We will consider four types of sub-structures here: Stacking pairs: both (i, j) and (i+1, j-1) are in the set Hairpin loop: there is a pair (i, j), where all bases from i+1 to j-1 are not paired Bulge/Internal loop: there are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Multi-loop: there are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired One base pair can participate in multiple structures Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

30 Stacking pairs Both (i, j) and (i+1, j-1) are in the set
E.g., i:20, j:72 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i+1 j-1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

31 Hairpin loop There is a pair (i, j), where all bases from i+1 to j-1 are not paired E.g., i: 81, j: 87 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i j Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

32 Bulge/Internal loop Internal loop: There are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Called a bulge if only one side has unpaired bases E.g., i:23, j:69, i1:25, j1:67 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

33 Multi-loop Multi-loop: There are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired E.g., k=2, i:10, j:94, i1:18, j1:74, i2:76, j2:92 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 i2 j2 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

34 One possible thermodynamic model
Unpaired bases have 0 free energy and all the terms below have negative free energy eS(i, j): for the stacking pairs (i, j) and (i+1, j-1) eH(i, j): for the hairpin loop closed at (i, j) eBI(i, j, i1, j1): for a bulge or internal loop enclosed by the pairs (i, j) and (i1, j1) eM(i, j, i1, j1, ..., ik, jk): for a multi-loop that consists of the pairs (i, j), (i1, j1), ..., (ik, jk) and satisfying i<i1<j1<...<ik<jk<j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

35 Finding the optimal structure
Dynamic programming Let s be the RNA sequence with n nucleotides Tables: V(j): free energy of the optimal structure for s[1..j] Final answer is based on V(n) VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

36 Update formulas V(j): free energy of the optimal structure for s[1..j]
For j > 1, 1 ... j ... j is unpaired 1 ... j-1 j ... j pairs with i 1 ... i-1 i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

37 Update formulas VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair We require that i < j ... i ... j ... Stacking pairs ... i i+1 ... j-1 j ... Hairpin loop ... i ... j ... All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

38 Update formulas VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i1 and j1) ... i ... j ... Budge or internal loop ... i ... i1 ... j1 ... j ... All unpaired All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

39 Update formulas VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop ... i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

40 Time and space requirements
V: n entries, each takes O(n) time VP(i, j): O(n2) entries, each takes constant time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

41 Time and space requirements
VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

42 Time and space requirements
Summary: V: n entries, each takes O(n) time VP: O(n2) entries, each takes constant time VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Total: O(n2) space, O(n2k+2) time Exponential if k is unbounded Some approximations could bring the time down to O(n4) – still huge for large n, but feasible for small or median n Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

43 Some remarks If we allow general pseudoknots, there is currently no efficient way to find the optimal RNA secondary structure with the minimum free energy Other methods to predict RNA secondary structures: Conservation and covariation High conservation: 2 and 4 Strong covariation: 1 and 5 Experimental methods (e.g., RNA footprinting) 12345 ACGGU ACUGU CCAGG UCCGA Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

44 Representing pseudoknots
Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs) What if there are pseudoknots? Need more types of brackets 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... GAAGUACAAUAUGUAACCG .{.((((.....))})).. Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

45 Case Study, Summary and Further Readings
Epilogue Case Study, Summary and Further Readings

46 Case study: Drug finding/design
Drugs are mostly chemicals with a specific structure that interacts with some biological objects Examples: Inhibiting the activities of an important protein of bacteria Blocking the interaction between virus and receptors of host cell Simulating the production of a hormone Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

47 Case study: Drug finding/design
Suppose we want to identify/design a chemical to target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

48 Case study: Drug finding/design
Computational problem: Input: a target protein and a list of chemicals Goal: find a chemical that binds the target well Try different locations and orientations Binding depends on structure and chemistry Output: One or more chemicals that bind the target well Difficulties: Computational complexity Large search space for each protein-chemical combination Need to try many chemicals Need to ensure specificity (not to target other proteins and cause side effects) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

49 Case study: Drug finding/design
There is a game for players to try folding proteins called FoldIt ( Score based on free energy Real time update of scores and ranks Players can discuss and share solutions Resulted in some amazingly good folds as compared to automatic predictions by computer programs Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

50 Summary Functions depend on structures Different levels of structures:
Primary (sequence) Secondary (local) Tertiary (global) Quaternary (interactions) RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model Important sub-structures Stacking pairs Hairpin loops Internal loops/bulges Multi-loops Pseoduknots Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

51 Further readings Chapter 11 of Algorithms in Bioinformatics: A Practical Introduction Speed up of algorithm Algorithm for RNA structure perdition with pseudoknots Free slides available Parts VII and VIII of Fundamental Concepts of Bioinformatics Protein folding and protein structure prediction Docking Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015


Download ppt "Lecture 11. RNA Secondary Structure Prediction"

Similar presentations


Ads by Google