Download presentation
Presentation is loading. Please wait.
1
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc
2
CS262 Lecture 19, Win07, Batzoglou RNA and Translation
3
CS262 Lecture 19, Win07, Batzoglou RNA and Splicing
4
CS262 Lecture 19, Win07, Batzoglou Hairpin Loops Stems Bulge loop Interior loops Multi-branched loop
5
CS262 Lecture 19, Win07, Batzoglou Secondary Structure Tertiary Structure
6
CS262 Lecture 19, Win07, Batzoglou
8
Modeling RNA Secondary Structure: Context-Free Grammars
9
CS262 Lecture 19, Win07, Batzoglou A Context Free Grammar S ABNonterminals: S, A, B A aAc | aTerminals:a, b, c, d B bBd | bProduction Rules: 5 rules Derivation: Start from the S nonterminal; Use any production rule replacing a nonterminal with a terminal, until no more nonterminals are present S AB aAcB … aaaacccB aaaacccbBd … aaaacccbbbbbdddd Produces all strings a i+1 c i b j+1 d j, for i, j 0
10
CS262 Lecture 19, Win07, Batzoglou Example: modeling a stem loop S a W 1 u W 1 c W 2 g W 2 g W 3 c W 3 g L c L agugc What if the stem loop can have other letters in place of the ones shown? ACGG UGCC AG U CG
11
CS262 Lecture 19, Win07, Batzoglou Example: modeling a stem loop S a W 1 u | g W 1 u W 1 c W 2 g W 2 g W 3 c| g W 3 u W 3 g L c| a L u L agucg| agccg | cugugc More general: Any 4-long stem, 3-5-long loop: S aW 1 u | gW 1 u | gW 1 c | cW 1 g | uW 1 g | uW 1 a W 1 aW 2 u | gW 2 u | gW 2 c | cW 2 g | uW 2 g | uW 2 a W 2 aW 3 u | gW 3 u | gW 3 c | cW 3 g | uW 3 g | uW 3 a W 3 aLu | gLu | gLc | cLg | uLg | uLa L aL 1 | cL 1 | gL 1 | uL 1 L 1 aL 2 | cL 2 | gL 2 | uL 2 L 2 a | c | g | u | aa | … | uu | aaa | … | uuu ACGG UGCC AG U CG GCGA UGCU AG C CG GCGA UGUU CUG U CG
12
CS262 Lecture 19, Win07, Batzoglou A parse tree: alignment of CFG to sequence ACGG UGCC AG U CG A C G G A G U G C C C G U S W1W1 W2W2 W3W3 L S a W1 u W1 c W2 g W2 g W3 c W3 g L c L agucg
13
CS262 Lecture 19, Win07, Batzoglou Alignment scores for parses! We can define each rule X s, where s is a string, to have a score. Example: W g W’ c:3(forms 3 hydrogen bonds) W a W’ u:2(forms 2 hydrogen bonds) W g W’ u: 1(forms 1 hydrogen bond) W x W’ z -1, when (x, z) is not an a/u, g/c, g/u pair Questions: -How do we best align a CFG to a sequence? (DP) -How do we set the parameters?(Stochastic CFGs)
14
CS262 Lecture 19, Win07, Batzoglou The Nussinov Algorithm Let’s forget CFGs for a moment Problem: Find the RNA structure with the maximum (weighted) number of nested pairings A G A C C U C U G G G CG GC AG UC U A U G C G A A C G C GU CA UC AG C U G G A A G A A G G G A G A U C U U C A C C A A U A C U G A A U U G C ACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCG A
15
CS262 Lecture 19, Win07, Batzoglou The Nussinov Algorithm Given sequence X = x 1 …x N, Define DP matrix: F(i, j) = maximum number of weighted bonds if x i …x j folds optimally Two cases, if i < j: x i is paired with x j F(i, j) = s(x i, x j ) + F(i+1, j – 1) x i is not paired with x j F(i, j) = max { k: i k < j } F(i, k) + F(k+1, j) ij ij ij k
16
CS262 Lecture 19, Win07, Batzoglou The Nussinov Algorithm Initialization: F(i, i-1) = 0; for i = 2 to N F(i, i) = 0;for i = 1 to N Iteration: For l = 2 to N: For i = 1 to N – l j = i + l – 1 F(i+1, j – 1) + s(x i, x j ) F(i, j) = max max{ i k < j } F(i, k) + F(k+1, j) Termination: Best structure is given by F(1, N) (Need to trace back; refer to the Durbin book)
17
CS262 Lecture 19, Win07, Batzoglou The Nussinov Algorithm and CFGs Define the following grammar, with scores: S g S c : 3 | c S g : 3 a S u : 2 | u S a : 2 g S u : 1 | u S g : 1 S S : 0 | a S : 0 | c S : 0 | g S : 0 | u S : 0 | : 0 Note: is the “” string Then, the Nussinov algorithm finds the optimal parse of a string with this grammar
18
CS262 Lecture 19, Win07, Batzoglou The Nussinov Algorithm Initialization: F(i, i-1) = 0; for i = 2 to N F(i, i) = 0;for i = 1 to NS a | c | g | u Iteration: For l = 2 to N: For i = 1 to N – l j = i + l – 1 F(i+1, j – 1) + s(x i, x j ) S a S u | … F(i, j) = max max{ i k < j } F(i, k) + F(k+1, j) S S S Termination: Best structure is given by F(1, N)
19
CS262 Lecture 19, Win07, Batzoglou Stochastic Context Free Grammars In an analogy to HMMs, we can assign probabilities to transitions: Given grammar X 1 s 11 | … | s in … X m s m1 | … | s mn Can assign probability to each rule, s.t. P(X i s i1 ) + … + P(X i s in ) = 1
20
CS262 Lecture 19, Win07, Batzoglou Example S a S b : ½ a : ¼ b : ¼ Probability distribution over all strings x: x = a n b n+1, then P(x) = 2 -n ¼ = 2 -(n+2) x = a n+1 b n, same Otherwise: P(x) = 0
21
CS262 Lecture 19, Win07, Batzoglou Computational Problems Calculate an optimal alignment of a sequence and a SCFG (DECODING) Calculate Prob[ sequence | grammar ] (EVALUATION) Given a set of sequences, estimate parameters of a SCFG (LEARNING)
22
CS262 Lecture 19, Win07, Batzoglou Normal Forms for CFGs Chomsky Normal Form: X YZ X a All productions are either to 2 nonterminals, or to 1 terminal Theorem (technical) Every CFG has an equivalent one in Chomsky Normal Form (The grammar in normal form produces exactly the same set of strings)
23
CS262 Lecture 19, Win07, Batzoglou Example of converting a CFG to C.N.F. S ABC A Aa | a B Bb | b C CAc | c Converting: S AS’ S’ BC A AA | a B BB | b C DC’ | c C’ c D CA S A BC A a a B b B b b C A c ca S A S’ BC AA aa BB BB b b b D C’ C A c ca
24
CS262 Lecture 19, Win07, Batzoglou Another example S ABC A C | aA B bB | b C cCd | c Converting: S AS’ S’ BC A C’C’’ | c | A’A A’ a B B’B | b B’ b C C’C’’ | c C’ c C’’ CD D d
25
CS262 Lecture 19, Win07, Batzoglou Decoding: the CYK algorithm Given x = x 1....x N, and a SCFG G, Find the most likely parse of x (the most likely alignment of G to x) Dynamic programming variable: (i, j, V):likelihood of the most likely parse of x i …x j, rooted at nonterminal V Then, (1, N, S): likelihood of the most likely parse of x by the grammar
26
CS262 Lecture 19, Win07, Batzoglou The CYK algorithm (Cocke-Younger-Kasami) Initialization: For i = 1 to N, any nonterminal V, (i, i, V) = log P(V x i ) Iteration: For i = 1 to N – 1 For j = i+1 to N For any nonterminal V, (i, j, V) = max X max Y max i k<j (i,k,X) + (k+1,j,Y) + log P(V XY) Termination: log P(x | , * ) = (1, N, S) Where * is the optimal parse tree (if traced back appropriately from above) i j V XY
27
CS262 Lecture 19, Win07, Batzoglou A SCFG for predicting RNA structure S a S | c S | g S | u S | S a | S c | S g | S u a S u | c S g | g S u | u S g | g S c | u S a SS Adjust the probability parameters to reflect bond strength etc No distinction between non-paired bases, bulges, loops Can modify to model these events L: loop nonterminal H: hairpin nonterminal B: bulge nonterminal etc
28
CS262 Lecture 19, Win07, Batzoglou CYK for RNA folding Initialization: (i, i-1) = log P( ) Iteration: For i = 1 to N For j = i to N (i + 1, j – 1) + log P(x i S x j ) (i, j – 1) + log P(S x i ) (i, j) = max (i + 1, j) + log P(x i S) max i < k < j (i, k) + (k + 1, j) + log P(S S)
29
CS262 Lecture 19, Win07, Batzoglou Evaluation Recall HMMs: Forward:f l (i) = P(x 1 …x i, i = l) Backward:b k (i) = P(x i+1 …x N | i = k) Then, P(x) = k f k (N) a k0 = k a 0k e k (x 1 ) b k (1) Analogue in SCFGs: Inside:a(i, j, V) = P(x i …x j is generated by nonterminal V) Outside: b(i, j, V) = P(x, excluding x i …x j is generated by S and the excluded part is rooted at V)
30
CS262 Lecture 19, Win07, Batzoglou The Inside Algorithm To compute a(i, j, V) = P(x i …x j, produced by V) a(i, j, v) = X Y k a(i, k, X) a(k+1, j, Y) P(V XY) kk+1 i j V XY
31
CS262 Lecture 19, Win07, Batzoglou Algorithm: Inside Initialization: For i = 1 to N, V a nonterminal, a(i, i, V) = P(V x i ) Iteration: For i = 1 to N – 1 For j = i + 1 to N For V a nonterminal a(i, j, V) = X Y k a(i, k, X) a(k+1, j, X) P(V XY) Termination: P(x | ) = a(1, N, S)
32
CS262 Lecture 19, Win07, Batzoglou The Outside Algorithm b(i, j, V) = Prob(x 1 …x i-1, x j+1 …x N, where the “gap” is rooted at V) Given that V is the right-hand-side nonterminal of a production, b(i, j, V) = X Y k<i a(k, i – 1, X) b(k, j, Y) P(Y XV) i j V k X Y
33
CS262 Lecture 19, Win07, Batzoglou Algorithm: Outside Initialization: b(1, N, S) = 1 For any other V, b(1, N, V) = 0 Iteration: For i = 1 to N – 1 For j = N down to i For V a nonterminal b(i, j, V) = X Y k<i a(k, i – 1, X) b(k, j, Y) P(Y XV) + X Y k<i a(j + 1, k, X) b(i, k, Y) P(Y VX) Termination: It is true for any i, that: P(x | ) = X b(i, i, X) P(X x i )
34
CS262 Lecture 19, Win07, Batzoglou Learning for SCFGs We can now estimate c(V) = expected number of times V is used in the parse of x 1 ….x N 1 c(V) = –––––––– 1 i N i j N a(i, j, V) b(i, j, V) P(x | ) 1 c(V XY) = –––––––– 1 i N i<j N i k<j b(i, j, V) a(i, k, X) a(k+1, j, Y) P(V XY) P(x | )
35
CS262 Lecture 19, Win07, Batzoglou Learning for SCFGs Then, we can re-estimate the parameters with EM, by: c(V XY) P new (V XY) = –––––––––––– c(V) c(V a) i: xi = a b(i, i, V) P(V a) P new (V a) = –––––––––– = –––––––––––––––––––––––––––––––– c(V) 1 i N i<j N a(i, j, V) b(i, j, V)
36
CS262 Lecture 19, Win07, Batzoglou Summary: SCFG and HMM algorithms GOALHMM algorithmSCFG algorithm Optimal parseViterbiCYK EstimationForwardInside BackwardOutside LearningEM: Fw/BckEM: Ins/Outs Memory ComplexityO(N K)O(N 2 K) Time ComplexityO(N K 2 )O(N 3 K 3 ) Where K: # of states in the HMM # of nonterminals in the SCFG
37
CS262 Lecture 19, Win07, Batzoglou The Zuker algorithm – main ideas Models energy of a fold in terms of specific features: 1.Pairs of base pairs (stacked pairs) 2.Bulges 3.Loops (size, composition) 4.Interactions between stem and loop 5’ 3’ position i position j length l 5’ 3’ position i position j 5’ 3’ positions i position j position j’
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.