Download presentation
Presentation is loading. Please wait.
1
Sequence Alignment Kun-Mao Chao (趙坤茂)
Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
2
GenBank 200.0
3
GenBank 215.0
4
GenBank 220.0
5
orz’s sequence evolution
orz (kid) OTZ (adult) Orz (big head) Crz (motorcycle driver) on_ (soldier) or2 (bottom up) oΩ (back high) STO (the other way around) Oroz (me) the origin? their evolutionary relationships? their putative functional relationships?
6
What? The truth is more important than the facts. THETR UTHIS MOREI
7
Dot Matrix
9
Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT
An alignment of A and B: C---TTAACT CGGATCA--T Sequence A Sequence B
10
Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT
An alignment of A and B: Mismatch Match C---TTAACT CGGATCA--T Deletion gap Insertion gap
11
Alignment Graph C---TTAACT CGGATCA--T Sequence A: CTTAACT
Sequence B: CGGATCAT C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T
12
A simple scoring scheme
Match: +8 (w(x, y) = 8, if x = y) Mismatch: -5 (w(x, y) = -5, if x ≠ y) Each gap symbol: -3 (w(-,x)=w(x,-)=-3) C T T A A C T C G G A T C A - - T = +12 Alignment score
13
An optimal alignment -- the alignment of maximum score
Let A=a1a2…am and B=b1b2…bn . Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj With proper initializations, Si,j can be computed as follows.
14
Computing Si,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n
15
Initializations C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 C T T A A C T
16
S3,5 = ? C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 ? C T T A A C T
17
S3,5 = ? C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 7-3=4 -3+8=5 -5-3=-8 C T T A A C T
18
S3,5 = 5 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T optimal score
19
C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14
8 – 5 – = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T
20
Now try this example in class
Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal alignment?
21
Initializations G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 C AA T T G A
22
S4,2 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 ? C AA T T G A
23
S4,2 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 0-3=-3 -11-5=-16 -14-3=-17 C AA T T G A
24
S5,5 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 ? C AA T T G A
25
S5,5 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 16-3=13 19-5=14 C AA T T G A
26
S5,5 = 14 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 14 24 21 18 32 29 1 27 C AA T T G A optimal score
27
C A A T - T G A G A A T C T G C -5 +8 +8 +8 -3 +8 +8 -5 = 27
= 27 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 14 24 21 18 32 29 1 27 C AA T T G A
29
Global Alignment vs. Local Alignment
30
Maximum-sum interval Given a sequence of real numbers a1a2…an , find a consecutive subsequence with the maximum sum. 9 –3 1 7 – –4 2 –7 6 – For each position, we can compute the maximum-sum interval ending at that position in O(n) time. Therefore, a naive algorithm runs in O(n2) time.
31
Computing a segment sum in O(1) time?
Input: a sequence of real numbers a1a2…an Query: the sum of ai ai+1…aj
32
Computing a segment sum in O(1) time
prefix-sum(i) = a1+a2+…+ai all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) j i prefix-sum(j) prefix-sum(i-1)
33
Maximizing sum(i, j) sum(i, j) = prefix-sum(j) – prefix-sum(i-1)
O(n)-time Method 1 sum(i, j) = prefix-sum(j) – prefix-sum(i-1) For each location j, prefix-sum(j) is fixed. To compute the maximum-sum interval ending at position j can be done by finding the minimum prefix-sum before position j. j i prefix-sum(j) prefix-sum(i-1)
34
Maximum-sum interval (The recurrence relation)
Define S(i) to be the maximum sum of the intervals ending at position i. O(n)-time Method 2 ai If S(i-1) < 0, concatenating ai with its previous interval gives less sum than ai itself.
35
Maximum-sum interval (Tabular computation)
9 – – –4 2 –7 6 – S(i) – – The maximum sum
36
Maximum-sum interval (Traceback)
9 – – –4 2 –7 6 – S(i) – – The maximum-sum interval:
37
An optimal local alignment
Si,j: the score of an optimal local alignment ending at (i, j) between a1a2…ai and b1b2…bj. With proper initializations, Si,j can be computed as follows.
38
local alignment C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T Match: 8
Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T
39
local alignment C G G A T C A T 8 5 2 3 13 11 C T T A A C T Match: 8
Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 2-3=-1 5+8=13 3-3=0 C T T A A C T
40
local alignment C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T The best score
41
A – C - T A T C A T 8-3+8-3+8 = 18 C G G A T C A T 8 5 2 3 13 11 10 7
8 5 2 3 13 11 10 7 18 C T T A A C T The best score
42
Now try this example in class
Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal local alignment?
43
Did you get it right? G A A T C T G C 8 5 2 3 16 13 10 7 4 24 21 18 15
8 5 2 3 16 13 10 7 4 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A
44
A A T – T G A A T C T G = 37 G A A T C T G C 8 5 2 3 16 13 10 7 4 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A
45
Osamu Gotoh
46
Affine gap penalties C - - - T T A A C T C G G A T C A - - T
Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: -3 (w(-,b) = w(a,-) = -3) Each gap is charged an extra gap-open penalty: -4. -4 -4 C T T A A C T C G G A T C A - - T = +12 Alignment score: 12 – 4 – 4 = 4
47
Affine gap panalties A gap of length k is penalized x + k·y.
gap-open penalty Three cases for alignment endings: ...x ...x ...x ...- x gap-symbol penalty an aligned pair This is the same as the scoring scheme that penalizes the first symbol x + y and an extended symbol y. a deletion an insertion
48
Affine gap penalties Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.
49
Affine gap penalties (A gap of length k is penalized x + k·y.)
50
Affine gap penalties S I D S I D -y w(ai,bj) -x-y S I D D -x-y I S -y
51
Constant gap penalties
Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: 0 (w(-,b) = w(a,-) = 0) Each gap is charged a constant penalty: -4. -4 -4 C T T A A C T C G G A T C A - - T = +27 Alignment score: 27 – 4 – 4 = 19
52
Constant gap penalties
Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.
53
Constant gap penalties
54
Restricted affine gap panalties
A gap of length k is penalized x + f(k)·y. where f(k) = k for k <= c and f(k) = c for k > c Five cases for alignment endings: ...x ...x ...x ...- x and 5. for long gaps an aligned pair a deletion an insertion
55
Restricted affine gap penalties
56
D(i, j) vs. D’(i, j) Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j) Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c D(i, j) <= D’(i, j)
57
Max{S(i,j)-x-ky, S(i,j)-x-cy}
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.