Lecture 11. RNA Secondary Structure Prediction

Name: Lecture 11. RNA Secondary Structure Prediction
Uploaded: 2017-08-04T05:23:53+00:00
Duration: PTM45S8
Channel: Beryl Green
Description: Lecture 11. RNA Secondary Structure Prediction

Lecture 11. RNA Secondary Structure Prediction
The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics

Lecture outline From sequences to functions RNA secondary structures
Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

From Sequences to Functions
Part 1 From Sequences to Functions

From sequences to functions
One of the biggest questions in biology: Can one tell the function of a molecule (DNA/RNA/protein) from its sequence alone? Sometimes, but usually not Easier if we also know the structure Common believe: sequence  structure  function Of course, also depends on the environment Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Molecular structures Four levels: Primary structures
The sequence Secondary structures First formed Local Tertiary structures Global “Folds”, “domains” Quaternary structures Multiple molecules Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Primary structures Connections (strong covalent bonds vs. weak hydrogen bonds) Which molecules are connected Which atoms are connected First-level constraints of the possible structures Example: Molecules close in primary structure must also be close in secondary, tertiary and quaternary structures Image credit: Wikibooks Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Primary structures Orientation: DNA, RNA: 5’-3’
Amino acids: Amino (N) terminus to carboxyl (C) terminus “Residue”: what remains after a water molecule is expelled Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

DNA secondary structures
Double helix A-DNA (dehydrated samples) Right-handed 11bp per turn Most common: B-DNA 10.5bp per turn Z-DNA (some methylated DNA) Left-handed 12bp per turn Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

DNA secondary structures
A-DNA B-DNA Z-DNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA secondary structures
Largely possible to be projected onto a 2D plane Stem/hairpin loop Stacking pairs Bulge Internal loop Multi-loop Exterior loop Dangling nucleotides Less stable pair Coaxial stacking Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA secondary structures
Pseudoknots: complex structures Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4): , (2008) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Protein secondary structures
Three main types: -helixes -sheets Coils (connectors) Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

DNA tertiary structures
Wrapped around nucleosomes formed by histone proteins Condensed form at beginning of mitosis and meiosis Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA tertiary structures
Overall structure of an RNA More studied for RNAs that do not translate into proteins -- “non-coding” RNAs Example: tRNA Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Protein tertiary structures
Complex structures Mainly caused by weak forces (hydrogen bonds and hydrophobic interactions) Occasionally stronger forces (disulfide bonds between cysteines) The CATH hierarchy Class: composition of secondary structures Architecture: overall shape Topology: connection of secondary structures Homologous: with common ancestor Image credit: CATH Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Quaternary structures
Types: Protein subunit-protein sub-unit Protein-protein Protein-DNA Protein-RNA (Protein-small molecules) RNA-RNA ... Protein-subunit interaction (Hemoglobin) Protein-DNA interaction Image credit: Wikipedia, Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Structure and function
Why function depends on structure? Structure itself is the function (e.g., tubulins) Binding Complementarity of interacting structures Formation of special bonds Image credit: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Structure and function
Why function depends on structure? (cont’d) Functional group (e.g., catalytic site) Determining localization (e.g., transporter membrane proteins) Image credit: Spudich , Science 288(5470): , 2000 Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

RNA Secondary Structures
Part 2 RNA Secondary Structures

Important RNA classes Coding: Non-coding: Messenger RNAs (mRNAs)
For translating into proteins Non-coding: Ribosomal RNAs (rRNAs) Parts of the ribosome complex Transfer RNAs (tRNAs) Delivering free amino acids during translation Micro RNAs (miRNAs) Binding mRNA targets to promote RNA degradation or repress translation Small nucleolar RNAs (snoRNAs) Guiding chemical modifications of other RNAs Small nuclear RNAs (snRNAs) Involved in mRNA splicing Long non-coding RNAs (lncRNAs) Some involved in gene regulation ... Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Importance of RNA structures
Structure is important to many classes of RNA Examples: tRNA snoRNA Image sources: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Representing RNA secondary structures
Formats: (see Dot-bracket format Stockholm format ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Dot-bracket format Sequence (nucleotides 10, 20, 30, etc. marked in red): GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGACCCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG Structure: (((( ((((((.(((....((((((.(((( )))).)))))).))).)))))).((((((.....)))))).))))..... Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Predicting RNA secondary structures
A basic assumption in structure predictions: Real structure has the lowest free energy In a simplified view, more stable bonds  lower free energy In the case of RNA secondary structures: Good to form more pairs A-U C-G Sometimes G-U (a “wobble base pair”) Good to form more stable pairs C-G > A-U > G-U Good to have stable sub-structures E.g., stacking pairs Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Predicting RNA secondary structures
We will assume there are no pseudoknots With pseudoknots, currently there is no known algorithm that can find the optimal solution efficiently We need two things: A thermodynamic model for computing the free energy of a structure A method for finding the structure with the minimum free energy This setting sounds familiar? A pseudoknot Image credit: Wikipedia Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Further assumptions The free energy of a secondary structure is the sum of the free energies of the sub-structures Not the sum of individual bases/base pairs, as one base pair can participate in multiple sub-structures The free energies of the sub-structures are independent Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Problem definition Given an RNA sequence, find a set of base pairs so that each base is paired at most once Example: Input sequence: GUGAAUGAUGAAUUU...ACG Output set of base pairs: (7, 97) (8, 96) ... (18, 74) (81, 87) Image credit: Xihao Hu Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Linear view Last update: 21-Nov-2015
... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Thermodynamics model We will consider four types of sub-structures here: Stacking pairs: both (i, j) and (i+1, j-1) are in the set Hairpin loop: there is a pair (i, j), where all bases from i+1 to j-1 are not paired Bulge/Internal loop: there are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Multi-loop: there are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired One base pair can participate in multiple structures Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Stacking pairs Both (i, j) and (i+1, j-1) are in the set
E.g., i:20, j:72 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i+1 j-1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Hairpin loop There is a pair (i, j), where all bases from i+1 to j-1 are not paired E.g., i: 81, j: 87 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i j Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Bulge/Internal loop Internal loop: There are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired Called a bulge if only one side has unpaired bases E.g., i:23, j:69, i1:25, j1:67 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Multi-loop Multi-loop: There are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired E.g., k=2, i:10, j:94, i1:18, j1:74, i2:76, j2:92 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... i i1 j1 i2 j2 j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

One possible thermodynamic model
Unpaired bases have 0 free energy and all the terms below have negative free energy eS(i, j): for the stacking pairs (i, j) and (i+1, j-1) eH(i, j): for the hairpin loop closed at (i, j) eBI(i, j, i1, j1): for a bulge or internal loop enclosed by the pairs (i, j) and (i1, j1) eM(i, j, i1, j1, ..., ik, jk): for a multi-loop that consists of the pairs (i, j), (i1, j1), ..., (ik, jk) and satisfying i<i1<j1<...<ik<jk<j Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Finding the optimal structure
Dynamic programming Let s be the RNA sequence with n nucleotides Tables: V(j): free energy of the optimal structure for s[1..j] Final answer is based on V(n) VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas V(j): free energy of the optimal structure for s[1..j]
For j > 1, 1 ... j ... j is unpaired 1 ... j-1 j ... j pairs with i 1 ... i-1 i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair We require that i < j ... i ... j ... Stacking pairs ... i i+1 ... j-1 j ... Hairpin loop ... i ... j ... All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i1 and j1) ... i ... j ... Budge or internal loop ... i ... i1 ... j1 ... j ... All unpaired All unpaired Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Update formulas VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop ... i ... j ... Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Time and space requirements
V: n entries, each takes O(n) time VP(i, j): O(n2) entries, each takes constant time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Summary: V: n entries, each takes O(n) time VP: O(n2) entries, each takes constant time VBI: O(n2) entries, each takes O(n2) time VM: O(n2) entries, each takes O(n2k) time Total: O(n2) space, O(n2k+2) time Exponential if k is unbounded Some approximations could bring the time down to O(n4) – still huge for large n, but feasible for small or median n Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Some remarks If we allow general pseudoknots, there is currently no efficient way to find the optimal RNA secondary structure with the minimum free energy Other methods to predict RNA secondary structures: Conservation and covariation High conservation: 2 and 4 Strong covariation: 1 and 5 Experimental methods (e.g., RNA footprinting) 12345 ACGGU ACUGU CCAGG UCCGA Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Representing pseudoknots
Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs) What if there are pseudoknots? Need more types of brackets 1 ... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... . ... ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ) ) ... GAAGUACAAUAUGUAACCG .{.((((.....))})).. Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Case Study, Summary and Further Readings
Epilogue Case Study, Summary and Further Readings

Case study: Drug finding/design
Drugs are mostly chemicals with a specific structure that interacts with some biological objects Examples: Inhibiting the activities of an important protein of bacteria Blocking the interaction between virus and receptors of host cell Simulating the production of a hormone Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Suppose we want to identify/design a chemical to target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Computational problem: Input: a target protein and a list of chemicals Goal: find a chemical that binds the target well Try different locations and orientations Binding depends on structure and chemistry Output: One or more chemicals that bind the target well Difficulties: Computational complexity Large search space for each protein-chemical combination Need to try many chemicals Need to ensure specificity (not to target other proteins and cause side effects) Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

There is a game for players to try folding proteins called FoldIt ( Score based on free energy Real time update of scores and ranks Players can discuss and share solutions Resulted in some amazingly good folds as compared to automatic predictions by computer programs Image source: Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Summary Functions depend on structures Different levels of structures:
Primary (sequence) Secondary (local) Tertiary (global) Quaternary (interactions) RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model Important sub-structures Stacking pairs Hairpin loops Internal loops/bulges Multi-loops Pseoduknots Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Further readings Chapter 11 of Algorithms in Bioinformatics: A Practical Introduction Speed up of algorithm Algorithm for RNA structure perdition with pseudoknots Free slides available Parts VII and VIII of Fundamental Concepts of Bioinformatics Protein folding and protein structure prediction Docking Last update: 21-Nov-2015 CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015

Lecture 11. RNA Secondary Structure Prediction

Similar presentations

Presentation on theme: "Lecture 11. RNA Secondary Structure Prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 11. RNA Secondary Structure Prediction

Similar presentations

Presentation on theme: "Lecture 11. RNA Secondary Structure Prediction"— Presentation transcript:

Similar presentations

About project

Feedback