Download presentation
1
Lecture 11. RNA Secondary Structure Prediction
The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics
2
Lecture outline From sequences to functions RNA secondary structures
Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
3
From Sequences to Functions
Part 1 From Sequences to Functions
4
From sequences to functions
One of the biggest questions in biology: Can one tell the function of a molecule (DNA/RNA/protein) from its sequence alone? Sometimes, but usually not Easier if we also know the structure Common believe: sequence structure function Of course, also depends on the environment Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
5
Molecular structures Four levels: Primary structures
The sequence Secondary structures First formed Local Tertiary structures Global “Folds”, “domains” Quaternary structures Multiple molecules Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
6
Primary structures Connections (strong covalent bonds vs. weak hydrogen bonds) Which molecules are connected Which atoms are connected First-level constraints of the possible structures Example: Molecules close in primary structure must also be close in secondary, tertiary and quaternary structures Image credit: Wikibooks Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
7
Primary structures Orientation: DNA, RNA: 5’-3’
Amino acids: Amino (N) terminus to carboxyl (C) terminus “Residue”: what remains after a water molecule is expelled Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
8
DNA secondary structures
Double helix A-DNA (dehydrated samples) Right-handed 11bp per turn Most common: B-DNA 10.5bp per turn Z-DNA (some methylated DNA) Left-handed 12bp per turn Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
9
DNA secondary structures
A-DNA B-DNA Z-DNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
10
RNA secondary structures
Largely possible to be projected onto a 2D plane Stem/hairpin loop Stacking pairs Bulge Internal loop Multi-loop Exterior loop Dangling nucleotides Less stable pair Coaxial stacking Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
11
RNA secondary structures
Pseudoknots: complex structures Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4): , (2008) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
12
Protein secondary structures
Three main types: -helixes -sheets Coils (connectors) Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
13
DNA tertiary structures
Wrapped around nucleosomes formed by histone proteins Condensed form at beginning of mitosis and meiosis Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
14
RNA tertiary structures
Overall structure of an RNA More studied for RNAs that do not translate into proteins -- “non-coding” RNAs Example: tRNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
15
Protein tertiary structures
Complex structures Mainly caused by weak forces (hydrogen bonds and hydrophobic interactions) Occasionally stronger forces (disulfide bonds between cysteines) The CATH hierarchy Class: composition of secondary structures Architecture: overall shape Topology: connection of secondary structures Homologous: with common ancestor Image credit: CATH Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
16
Quaternary structures
Types: Protein subunit-protein sub-unit Protein-protein Protein-DNA Protein-RNA (Protein-small molecules) RNA-RNA ... Protein-subunit interaction (Hemoglobin) Protein-DNA interaction Image credit: Wikipedia, Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
17
Structure and function
Why function depends on structure? Structure itself is the function (e.g., tubulins) Binding Complementarity of interacting structures Formation of special bonds Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
18
Structure and function
Why function depends on structure? (cont’d) Functional group (e.g., catalytic site) Determining localization (e.g., transporter membrane proteins) Image credit: Spudich , Science 288(5470): , 2000 Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
19
RNA Secondary Structures
Part 2 RNA Secondary Structures
20
Important RNA classes Coding: Non-coding: Messenger RNAs (mRNAs)
For translating into proteins Non-coding: Ribosomal RNAs (rRNAs) Parts of the ribosome complex Transfer RNAs (tRNAs) Delivering free amino acids during translation Micro RNAs (miRNAs) Binding mRNA targets to promote RNA degradation or repress translation Small nucleolar RNAs (snoRNAs) Guiding chemical modifications of other RNAs Small nuclear RNAs (snRNAs) Involved in mRNA splicing Long non-coding RNAs (lncRNAs) Some involved in gene regulation ... Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
21
Importance of RNA structures
Structure is important to many classes of RNA Examples: tRNA snoRNA Image sources: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
22
Representing RNA secondary structures
Formats: (see Dot-bracket format Stockholm format ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
23
Dot-bracket format Sequence (nucleotides 10, 20, 30, etc. marked in red): GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGACCCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG Structure: (((( ((((((.(((....((((((.(((( )))).)))))).))).)))))).((((((.....)))))).))))..... Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
24
Predicting RNA secondary structures
A basic assumption in structure predictions: Real structure has the lowest free energy In a simplified view, more stable bonds lower free energy In the case of RNA secondary structures: Good to form more pairs A-U C-G Sometimes G-U (a “wobble base pair”) Good to form more stable pairs C-G > A-U > G-U Good to have stable sub-structures E.g., stacking pairs Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
25
Predicting RNA secondary structures
We will assume there are no pseudoknots With pseudoknots, currently there is no known algorithm that can find the optimal solution efficiently We need two things: A thermodynamic model for computing the free energy of a structure A method for finding the structure with the minimum free energy This setting sounds familiar? A pseudoknot Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
26
Further assumptions The free energy of a secondary structure is the sum of the free energies of the sub-structures Not the sum of individual bases/base pairs, as one base pair can participate in multiple sub-structures The free energies of the sub-structures are independent Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
27
Problem definition Given an RNA sequence, find a set of base pairs so that each base is paired at most once Example: Input sequence: GUGAAUGAUGAAUUU...ACG Output set of base pairs: (7, 97) (8, 96) ... (18, 74) (81, 87) Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
28
Linear view Last update: 21-Nov-2015
... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
29
Thermodynamics model We will consider four types of sub-structures here: Stacking pairs: both (i, j) and (i+1, j-1) are in the set Hairpin loop: there is a pair (i, j), where all bases from i+1 to j-1 are not paired Bulge/Internal loop: there are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Multi-loop: there are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired One base pair can participate in multiple structures Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
30
Stacking pairs Both (i, j) and (i+1, j-1) are in the set
E.g., i:20, j:72 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i+1 j-1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
31
Hairpin loop There is a pair (i, j), where all bases from i+1 to j-1 are not paired E.g., i: 81, j: 87 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i j Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
32
Bulge/Internal loop Internal loop: There are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Called a bulge if only one side has unpaired bases E.g., i:23, j:69, i1:25, j1:67 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
33
Multi-loop Multi-loop: There are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired E.g., k=2, i:10, j:94, i1:18, j1:74, i2:76, j2:92 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 i2 j2 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
34
One possible thermodynamic model
Unpaired bases have 0 free energy and all the terms below have negative free energy eS(i, j): for the stacking pairs (i, j) and (i+1, j-1) eH(i, j): for the hairpin loop closed at (i, j) eBI(i, j, i1, j1): for a bulge or internal loop enclosed by the pairs (i, j) and (i1, j1) eM(i, j, i1, j1, ..., ik, jk): for a multi-loop that consists of the pairs (i, j), (i1, j1), ..., (ik, jk) and satisfying i<i1<j1<...<ik<jk<j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
35
Finding the optimal structure
Dynamic programming Let s be the RNA sequence with n nucleotides Tables: V(j): free energy of the optimal structure for s[1..j] Final answer is based on V(n) VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
36
Update formulas V(j): free energy of the optimal structure for s[1..j]
For j > 1, 1 ... j ... j is unpaired 1 ... j-1 j ... j pairs with i 1 ... i-1 i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
37
Update formulas VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair We require that i < j ... i ... j ... Stacking pairs ... i i+1 ... j-1 j ... Hairpin loop ... i ... j ... All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
38
Update formulas VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i1 and j1) ... i ... j ... Budge or internal loop ... i ... i1 ... j1 ... j ... All unpaired All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
39
Update formulas VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop ... i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
40
Time and space requirements
V: n entries, each takes O(n) time VP(i, j): O(n2) entries, each takes constant time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
41
Time and space requirements
VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
42
Time and space requirements
Summary: V: n entries, each takes O(n) time VP: O(n2) entries, each takes constant time VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Total: O(n2) space, O(n2k+2) time Exponential if k is unbounded Some approximations could bring the time down to O(n4) – still huge for large n, but feasible for small or median n Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
43
Some remarks If we allow general pseudoknots, there is currently no efficient way to find the optimal RNA secondary structure with the minimum free energy Other methods to predict RNA secondary structures: Conservation and covariation High conservation: 2 and 4 Strong covariation: 1 and 5 Experimental methods (e.g., RNA footprinting) 12345 ACGGU ACUGU CCAGG UCCGA Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
44
Representing pseudoknots
Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs) What if there are pseudoknots? Need more types of brackets 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... GAAGUACAAUAUGUAACCG .{.((((.....))})).. Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
45
Case Study, Summary and Further Readings
Epilogue Case Study, Summary and Further Readings
46
Case study: Drug finding/design
Drugs are mostly chemicals with a specific structure that interacts with some biological objects Examples: Inhibiting the activities of an important protein of bacteria Blocking the interaction between virus and receptors of host cell Simulating the production of a hormone Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
47
Case study: Drug finding/design
Suppose we want to identify/design a chemical to target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
48
Case study: Drug finding/design
Computational problem: Input: a target protein and a list of chemicals Goal: find a chemical that binds the target well Try different locations and orientations Binding depends on structure and chemistry Output: One or more chemicals that bind the target well Difficulties: Computational complexity Large search space for each protein-chemical combination Need to try many chemicals Need to ensure specificity (not to target other proteins and cause side effects) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
49
Case study: Drug finding/design
There is a game for players to try folding proteins called FoldIt ( Score based on free energy Real time update of scores and ranks Players can discuss and share solutions Resulted in some amazingly good folds as compared to automatic predictions by computer programs Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
50
Summary Functions depend on structures Different levels of structures:
Primary (sequence) Secondary (local) Tertiary (global) Quaternary (interactions) RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model Important sub-structures Stacking pairs Hairpin loops Internal loops/bulges Multi-loops Pseoduknots Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
51
Further readings Chapter 11 of Algorithms in Bioinformatics: A Practical Introduction Speed up of algorithm Algorithm for RNA structure perdition with pseudoknots Free slides available Parts VII and VIII of Fundamental Concepts of Bioinformatics Protein folding and protein structure prediction Docking Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.