Download presentation
Presentation is loading. Please wait.
1
CS 5263 Bioinformatics Lecture 5: Affine Gap Penalties
2
Last lecture Local Sequence Alignment Bounded Dynamic Programming Linear Space Sequence Alignment
3
The Smith-Waterman algorithm Initialization: F(0, j) = F(i, 0) = 0 0 F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + (x i, y j ) Iteration: F(i, j) = max
4
The Smith-Waterman algorithm Termination: 1.If we want the best local alignment… F OPT = max i,j F(i, j) 2.If we want all local alignments scoring > t For all i, j find F(i, j) > t, and trace back
5
Bounded Dynamic Programming O(kM) time O(kN) memory x 1 ………………………… x M y N ………………………… y 1 k
6
Linear-space alignment N-k * M/2 k*k* O(M+N) memory 2MN time
7
Homework Problem 5 hints Dot matrix for visualizing seq similarities Seq1: x[1..m] Seq2: y[1..n] A(i, j) = 1 if k=1:10 ( (x i+k, y j+k )) > 7 A(i, j) = 1 if k=1:20 ( (x i+k, y j+k )) > 15 A dot matrix does not do any alignment (global or local). It helps to detect strongly conserved regions. A(i, j) = 1 if (x i, y j ) = 1
8
Seq1 Seq2
9
Today How to model gaps more accurately? Statistics of alignments –Where does (x i, y j ) come from? –Are two aligned sequences actually related? – not today
10
What’s a better alignment? GACGCCGAACG ||||| ||| GACGC---ACG GACGCCGAACG |||| | | || GACG-C-A-CG Score = 8 x m – 3 x d However, gaps usually occur in bunches. -During evolution, chunks of DNA may be lost entirely -Aligning genomic sequences vs. cDNAs (reverse complimentary to mRNAs)
11
Model gaps more accurately Current model: –Gap of length n incurs penalty n d General: –Convex function –E.g. (n) = c * sqrt (n) n n
12
General gap dynamic programming Initialization:same Iteration: F(i-1, j-1) + s(x i, y j ) F(i, j) = max max k=0…i-1 F(k,j) – (i-k) max k=0…j-1 F(i,k) – (j-k) Termination: same Running Time: O((M+N)MN)(cubic) Space: O(NM) (linear-space algorithm not applicable)
13
Compromise: affine gaps (n) = d + (n – 1) e | | gap open extension d e (n) Match: 2 Gap open: -5 Gap extension: -1 GACGCCGAACG ||||| ||| GACGC---ACG GACGCCGAACG |||| | | || GACG-C-A-CG 8x2-5-2 = 98x2-3x5 = 1 We want to find the optimal alignment with affine gap penalty in O(MN) time O(MN) or better O(M+N) memory
14
Allowing affine gap penalties Still three cases –x i aligned with y j –X i aligns to a gap Are we continuing a gap in x? (if no, start is more expensive) –Y j aligns to a gap Are we continuing a gap in y? (if no, start is more expensive) We can use a finite state machine to represent the three cases as three states –The machine has two heads, reading the chars on the two strings separately –At every step, each head reads 0 or 1 char from each sequence –Depending on what it reads, goes to a different state, and produces different scores
15
Finite State Machine F: have just read 1 char from each seq (x i aligned to y j ) Ix: have read 0 char from x. (y j aligned to a gap) Iy: have read 0 char from y ( x i aligned to a gap) F Ix Iy ? / ? Input Output State
16
F Ix Iy (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e Input Output Start state Current stateInputOutputNext state F (x i,y j ) F F (-,y j )d Ix F (x i,-)d Iy Ix (-,y j )e Ix … …… …
17
AAC ACT F-F-F-FF-F-F-F AAC ||| ACT F-Iy-F-F-IxF-Iy-F-F-Ix AAC- || -ACT F-F-Iy-F-IxF-F-Iy-F-Ix AAC- | A-CT F Ix Iy (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e start state Given a pair of sequences, an alignment (not necessarily optimal) corresponds to a state path in the FSM. Optimal alignment: find a state path to read the two sequences such that the total output score is the highest
18
Dynamic programming We encode this information in three different matrices For each element (i,j) we use three variables –F(i,j): best alignment (score) of x 1..x i & y 1..y j if x i aligns to y j –I x (i,j): best alignment of x 1..x i & y 1..y j if y j aligns to gap –I y (i,j): best alignment of x 1..x i & y 1..y j if x i aligns to gap xixi yjyj xixi yjyj xixi yjyj F(i, j) Ix(i, j) Iy(i, j)
19
F Ix Iy (x i,y j ) / (x i,-) /d (x i,-)/e (-, y j ) /d (-, y j )/e F(i-1, j-1) + (x i, y j ) F(i, j) = max Ix(i-1, j-1) + (x i, y j ) Iy(i-1, j-1) + (x i, y j ) xixi yjyj
20
F Ix Iy (x i,y j ) / (x i,-) /d (x i,-)/e (-, y j ) /d (-, y j )/e F(i, j-1) + d Ix(i, j) = max Ix(i, j-1) + e xixi yjyj Ix(i, j)
21
F Ix Iy (x i,y j ) / (x i,-) /d (x i,-)/e (-, y j ) /d (-, y j )/e F(i-1, j) + d Iy(i, j) = max Iy(i-1, j) + e xixi yjyj Iy(i, j)
22
F(i – 1, j – 1) F(i, j) = (x i, y j ) + max I x (i – 1, j – 1) I y (i – 1, j – 1) F(i, j – 1) + d I x (i, j) = max I x (i, j – 1) + e F(i – 1, j) + d I y (i, j) = max I y (i – 1, j) + e Continuing alignment Closing gaps in x Closing gaps in y Opening a gap in x Gap extension in x Opening a gap in y Gap extension in y
23
Data dependency F IxIx IyIy i j i-1 j-1 i-1 j-1
24
Data dependency IyIy IxIx F i j i j i j
25
If we stack all three matrices –No cyclic dependency –Therefore, we can fill in all three matrices in order
26
Algorithm for i = 1:m –for j = 1:n Fill in F(i, j), I x (i, j), I y (i, j) –end end F(M, N) = max (F(M, N), I x (M, N), I y (M, N)) Time: O(MN) Space: O(MN) or O(N) when combine with the linear-space algorithm
27
Exercise x = GCAC y = GCC m = 2 s = -2 d = -5 e = -1
28
0 -- -- -- -- -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- F: aligned on bothIy: Insertion on y F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Ix: Insertion on x (xi, yj) d e d e m = 2 s = -2 d = -5 e = -1
29
0 -- -- -- -- 2 -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) (xi, yj) = 2 m = 2 s = -2 d = -5 e = -1
30
0 -- -- -- -- 2-7 -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) (xi, yj) = -2 m = 2 s = -2 d = -5 e = -1
31
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) (xi, yj) = -2 m = 2 s = -2 d = -5 e = -1
32
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -5 -6 -7 -8 -5-6-7 -- -- -3 -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Ix(i,j) Ix(i,j-1) F(i,j-1) d = -5 e = -1 m = 2 s = -2 d = -5 e = -1
33
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -5 -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Ix(i,j) Ix(i,j-1) F(i,j-1) d = -5 e = -1 m = 2 s = -2 d = -5 e = -1
34
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Iy(i,j) Iy(i-1,j) F(i-1,j) d=-5 e=-1 m = 2 s = -2 d = -5 e = -1
35
0 -- -- -- -- 2-7-8 -- -7 -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) (xi, yj) = -2 m = 2 s = -2 d = -5 e = -1
36
0 -- -- -- -- 2-7-8 -- -74 -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) (xi, yj) = 2 m = 2 s = -2 d = -5 e = -1
37
0 -- -- -- -- 2-7-8 -- -74 -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) (xi, yj) = 2 m = 2 s = -2 d = -5 e = -1
38
0 -- -- -- -- 2-7-8 -- -74 -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Ix(i,j) Ix(i,j-1) F(i,j-1) d = -5 e = -1 m = 2 s = -2 d = -5 e = -1
39
0 -- -- -- -- 2-7-8 -- -74 -- -- -- -- -- -5 -- -- -- -6-3 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Iy(i,j) Iy(i-1,j) F(i-1,j) d=-5 e=-1 m = 2 s = -2 d = -5 e = -1
40
0 -- -- -- -- 2-7-8 -- -74 -- -- -- -- -- -5 -- -- -- -6-3-12-13 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) (xi, yj) d e d e m = 2 s = -2 d = -5 e = -1
41
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -- -- -- -- -- -- -6-3-12-13 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) (xi, yj) d e d e m = 2 s = -2 d = -5 e = -1
42
0 -- -- -- -- 2-7-8 -- -74 -- -8-52 -- -- -- -- -- -- -- -6-3-12-13 -7-8 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Iy(i,j) Iy(i-1,j) F(i-1,j) d=-5 e=-1 m = 2 s = -2 d = -5 e = -1
43
0 -- -- -- -- 2-7-8 -- -74 -- -8-52 -- -- -- -- -- -- -- -6-3-12-13 -7-8-6 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Iy(i,j) Iy(i-1,j) F(i-1,j) d=-5 e=-1 m = 2 s = -2 d = -5 e = -1
44
0 -- -- -- -- 2-7-8 -- -74 -- -8-52 -- -9-61 -- -- -- -5 -- -- -- -6-3-12-13 -7-8-6 -8-13-2-3 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- -- -14-11 FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) (xi, yj) d e d e m = 2 s = -2 d = -5 e = -1
45
0 -- -- -- -- 2-7-8 -- -74 -- -8-52 -- -9-61 -- -- -- -5 -- -- -- -6-3-12-13 -7-8-6 -8-13-2-3 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- -- -14-11 FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) (xi, yj) d e d e m = 2 s = -2 d = -5 e = -1
46
0 -- -- -- -- 2-7-8 -- -74 -- -8-52 -- -9-61 -- -- -- -5 -- -- -- -6-3-12-13 -7-8-6 -8-13-2-3 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- -- -14-11 FIy Ix G C C GCACGCAC GCACGCAC GCACGCAC GCAC || | GC-C x = y = x = y = x = y = x y GCACGCAC G C C x = y = m = 2 s = -2 d = -5 e = -1
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.