Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 21 RNA Secondary Structure Prediction

Similar presentations


Presentation on theme: "Lecture 21 RNA Secondary Structure Prediction"— Presentation transcript:

1 Lecture 21 RNA Secondary Structure Prediction
CS5263 Bioinformatics Lecture 21 RNA Secondary Structure Prediction

2 Road map Biological roles for RNA What’s “secondary structure”?
How is it represented? Why is it important? How to predict?

3 Central dogma DNA RNA Protein The flow of genetic information
transcription translation DNA RNA Protein Replication

4 Classical Roles for RNA
mRNA - Message RNA tRNA - Transfer RNA (~61 kinds, ~ 75nt) rRNA - Ribosomal RNA (~4 kinds, 120-5k nt) RNA Protein Ribosome

5 Classical Roles for RNA
mRNA tRNA rRNA Ribosome

6 “Semi-classical” RNA snRNA - small nuclear RNA (splicing: U1, etc, nt) RNaseP - tRNA processing (~300 nt) SRP - signal recognition particle; membrane targeting (~ nt) tmRNA - resetting stalled ribosomes, destroy aberrant mRNA Telomerase - ( nt) snoRNA - small nucleolar RNA (many varieties; nt)

7 New Roles for RNA Riboswitch: an mRNA regulates its own activity
siRNA (Nobel prize 2006, Fire & Mello) microRNAs saRNA: small activating RNA Hundreds of families Rfam release 1, 1/2003: 25 families, 55k instances Rfam release 7, 3/2005: 503 families, 300k instances

8 Example: Riboswitch

9 Non-coding RNAs 1% of DNA codes for
Dramatic discoveries in last 5 years 100s of new families Many roles: regulation, transport, stability, catalysis, … 1% of DNA codes for protein, but 30% of it is copied into RNA, i.e. ncRNA >> mRNA

10 Take-home message RNAs play many important roles in the cell beyond the classical roles Many of which yet to be discovered RNA functions are determined by structures

11 RNA structure Primary: sequence Secondary: base-pairing
Tertiary: 3D shape

12 RNA base-pairing Watson-Crick Pairing “Wobble Pair” G – U ~1kcal/mole
C-G ~3kcal/mole A-U ~2kcal/mole “Wobble Pair” G – U ~1kcal/mole Non-canonical Pairs

13 tRNA structure

14 Secondary structure prediction
Given: CAUUUGUGUACCU…. Goal: How can we compute that?

15 Terminology Hairpin Loops Interior loops Stems Multi-branched loop
Bulge loop

16 Pseudoknot 5’ 5 10 15 20 25 30 35 40 45 3’ 5’- ucgacuguaaaaaagcgggcgacuuucagucgcucuuuuugucgcgcgc -3’ 10 20 30 40 Makes structure prediction hard. Not considered in most algorithms.

17 The Nussinov algorithm
Goal: maximizing the number of base-pairs Idea: Dynamic programming Loop matching Nussinov, Pieczenik, Griggs, Kleitman ’78 Too simple for accurate prediction, but stepping-stone for later algorithms

18 The Nussinov algorithm
C U Problem: Find the RNA structure with the maximum (weighted) number of nested pairings Nested: no pseudoknot ACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCG

19 The Nussinov algorithm
Given sequence X = x1…xN, Define DP matrix: F(i, j) = maximum number of base-pairs if xi…xj folds optimally Matrix is symmetric, so let i < j

20 The Nussinov algorithm
Can be summarized into two cases: (i, j) paired: optimal score is 1 + F(i+1, j-1) (i, j) unpaired: optimal score is maxk F(i, k) + F(k+1, j) a number of other ways to summarize, all equivalent

21 The Nussinov algorithm
F(i, i) = 0 F(i+1, j-1) + S(xi, xj) F(i, j) = max maxk F(i, k) + F(k+1, j) S(xi, xj) = 1 if xi, xj can form a base-pair, and 0 otherwise Generalize: S(A, U) = 2, S(C, G) = 3, S(G, U) = 1 Or other types of scores (later) F(1, N) gives the optimal score for the whole seq

22 How to fill in the DP matrix?
F(i+1, j-1) + S(xi, xj) F(i, j) = max maxk F(i, k) + F(k+1, j) (i, j) i i+1 How to fill in the DP matrix? j–1 j

23 How to fill in the DP matrix?
F(i+1, j-1) + S(xi, xj) F(i, j) = max maxk F(i, k) + F(k+1, j) How to fill in the DP matrix? j – i = 1

24 How to fill in the DP matrix?
F(i+1, j-1) + S(xi, xj) F(i, j) = max maxk F(i, k) + F(k+1, j) How to fill in the DP matrix? j – i = 2

25 How to fill in the DP matrix?
F(i+1, j-1) + S(xi, xj) F(i, j) = max maxk F(i, k) + F(k+1, j) How to fill in the DP matrix? j – i = 3

26 How to fill in the DP matrix?
F(i+1, j-1) + S(xi, xj) F(i, j) = max maxk F(i, k) + F(k+1, j) How to fill in the DP matrix? j – i = N - 1

27 Minimum Loop length Sharp turns unlikely
Let minimum length of hairpin loop be 1 F(i, j) = 0 for j – i < 2 U  A G  C C  G G C

28 Algorithm Initialization: F(i, i) = 0; for i = 1 to N
Iteration: For L = 1 to N-1 For i = 1 to N – l j = min(i + L, N) F(i+1, j -1) + s(xi, xj) F(i, j) = max max{ i  k < j } F(i, k) + F(k+1, j) Termination: Best score is given by F(1, N) (Need to trace back; refer to the Durbin book)

29 Complexity Time complexity: O(N3) Memory: O(N2) For L = 1 to N-1
For i = 1 to N – l j = min(i + L, N) F(i+1, j -1) + s(xi, xj) F(i, j) = max max{ i  k < j } F(i, k) + F(k+1, j) Time complexity: O(N3) Memory: O(N2)

30 Example RNA sequence: GGGAAAUCC Only count # of base-pairs
G-C = 1 G-U = 1 Minimum hairpin loop length = 1

31 G G G A A A U C C G G G A A A U C C

32 G G G A A A U C C 1 G G G A A A U C C

33 G G G A A A U C C 1 G G G A A A U C C

34 G G G A A A U C C 1 G G G A A A U C C

35 AAA G  U G  C AA AA A  U A  U G  C G G  C G G G G A A A U C C 1
1 2 3 G  U G  C AAA G G G A A A U C C A  U G  C G A  U G G  C AA AA

36 AAA G  U G  C AA AA A  U A  U G  C G G  C G G G G A A A U C C 1
1 2 3 G  U G  C AAA G G G A A A U C C A  U G  C G A  U G G  C AA AA

37 AAA G  U G  C AA AA A  U A  U G  C G G  C G G G G A A A U C C 1
1 2 3 G  U G  C AAA G G G A A A U C C A  U G  C G A  U G G  C AA AA

38 AAA G  U G  C AA AA A  U A  U G  C G G  C G G G G A A A U C C 1
1 2 3 G  U G  C AAA G G G A A A U C C A  U G  C G A  U G G  C AA AA

39 Energy minimization For L = 1 to N-1 For i = 1 to N – l
j = min(i + L, N); E(i+1, j -1) + e(xi, xj) E(i, j) = min min{ i  k < j } E(i, k) + E(k+1, j) e(xi, xj) represents the energy for xi base pair with xj Energy are negative values. Therefore minimization rather than maximize. More complex energy rules: energy depends on neighboring bases

40 Terminology Hairpin Loops Interior loops Stems Multi-branched loop
Bulge loop

41 The Zuker algorithm – main ideas
Instead of base pairs, pairs of base pairs (more accurate) Separate score for bulges Separate score for different-size & composition of loops Separate score for interactions between stem & beginning of loop Use additional matrix to remember current state. similar to affine-gap alignment.

42 Two popular implementation
mFold by Zuker RNAfold in the Vienna package (Hofacker) Includes several useful utilities, such as structure comparison, searching, base-paring probability from partition functions, etc.

43 Accuracy 50-70% for sequences up to 300 nt Not perfect, but useful
Possible reasons: Energy rule not perfect: 5-10% error Many alternative structures within this error range Alternative structure do exist Structure may change in presence of other molecules

44 Comparative structure prediction
Given K homologous aligned RNA sequences: Human aagacuucggaucuggcgacaccc Mouse uacacuucggaugacaccaaagug Worm aggucuucggcacgggcaccauuc Fly ccaacuucggauuuugcuaccaua Orc aagccuucggagcgggcguaacuc If ith and jth positions are always base paired and covary, then they are likely to be paired

45 Mutual information fab(i,j): # of times the pair a, b are in positions i, j fa (i): # of times the base a is in positions i aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc fgc(3,13) = 3/5 fcg(3,13) = 1/5 fau(3,13) = 1/5 fg(3) = 3/5 fc(3) = 1/5 fa(3) = 1/5 fc(13) = 3/5 fg(13) = 1/5 fu(13) = 1/5

46 Mutual information Also called covariance score
M is high if base a in position i always follow by base b in position j Does not require a to base-pair with b Advantage: can detect non-canonical base-pairs However, M = 0 if no mutation at all, even if perfect base-pairs aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc One way to get around is to combine covariance and energy scores

47 Comparative structure prediction
Given a multiple alignment, can infer structure that maximizes the sum of mutual information, by DP However, alignment is hard, since structure often more important than sequence

48 Comparative structure prediction
In practice: Get multiple alignment Find covarying bases – deduce structure Improve multiple alignment (by hand) Go to 2 A manual EM process!!

49 Comparative structure prediction
Align then fold Align and fold Fold then align

50 Context-free Grammar for RNA Secondary Structure
S = SS | aSu | cSg | uSa | gSc | L L = aL | cL | gL | uL |  S ag u cg aaacgg ugcc S S S L S S a L L a L a c g g a g u g c c c g u

51 Stochastic Context-free Grammar (SCFG)
Probabilistic context-free grammar Probabilities can be converted into weights CFG vs SCFG is similar to RG vs HMM S = SS S = aSu | uSa | L S = cSg | gSc | L S = uSg | gSu | L L = aL | cL | gL | uL |  e(xi, xj) + F(i+1, j-1) F(i, j) = max L(i, j) maxk (F(i, k) + F(k+1, j)) L(i, j) = 0 2 3 1

52 SCFG Decoding Decoding: given a grammar (SCFG/HMM) and a sequence, find the best parse (highest probability or score) CYK algorithm (Viterbi) The Nussinov and Zuker algorithms are essentially special cases of CYK CYK and SCFG are also used in other domains (NLP, Compiler, etc).

53 SCFG Evaluation Given a sequence and a SCFG model
Estimate P(seq is generated by model), summing over all possible paths Inside-outside algorithm Analogous to forward-background Inside: bottom-up parsing (P(xi..xj)) Outside: top-down parsing (P(x1..xi-1 xj+1..xN)) Can calculate base-paring probability Analogous to posterior decoding Essentially the same idea implemented in the Vienna RNAfold package

54 SCFG Learning Covariance model: similar to profile HMMs
Given a set of sequences with common structures, simultaneously learn SCFG parameters and optimally parse sequences into states EM on SCFG Inside-outside algorithm Efficiency is a bottleneck Have been successfully applied to predict tRNA genes and structures tRNAScan

55 Future directions Structure prediction Structural comparison tools
Secondary Tertiary Structural comparison tools Structural alignment Structure search tools “RNA-BLAST” Structural motif finding “RNA-MEME”


Download ppt "Lecture 21 RNA Secondary Structure Prediction"

Similar presentations


Ads by Google