Download presentation
Presentation is loading. Please wait.
Published byAlicia Thompson Modified over 9 years ago
1
SMAWK
2
REVISE
3
Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) + (S[i], T[j]), V(i-1,j) + (S[i], -), V(i,j-1) + (-, T[j]) }
4
DIST and OUT matrix (Revise) O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrixOUT matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 012345 10 -2 -- -- 1101 -- 133420 -1200200 -13 100 -14 123 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 max col
5
Compute O without explicit OUT O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 SMAWK
6
Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays. Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.Convex condition: M[a,c] M[b,c] M[a,d] M[b,d]. 2.Concave condition: M[a,c] M[b,c] M[a,d] M[b,d].
7
SMAWK Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.
8
Presentation Outline What is Monge arrays? – Monge Totally monotone Why DIST alignment matrix is Monge arrays? How to compute totally monotone arrays efficiently? – SMAWK Given a totally monotone arrays Compute all columns maxima in O(n)
9
MONGE AND TOTALLY MONOTONE PROPERTIES
10
Monge A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.M[a, c] + M[b, d] M[a, d] + M[b, c] 2.M[a, c] + M[b, d] M[a, d] + M[b, c] cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x……
11
Totally monotone A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.Convex condition: M[a,c] M[b,c] M[a,d] M[b,d] 2.Concave condition: M[a,c] M[b,c] M[a,d] M[b,d] Monge Totally monotone cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x……
12
Intuition Monge: Quadrangle inequality: a c b d x z cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x…… M[a, c] + M[b, d] M[a, d] + M[b, c]
13
History Computational Geometry All nearest neighbor problem – Shamos and Hoey proved (n log n) in 1975 All farthest neighbor problem – F.P.Reparata proved (n log n) in 1977 All farthest neighbor problem in convex polygon – Lee and Preparata proved O(n) in 1978
14
SMAWK Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987 Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.
15
DIST AND OUT MATRICES
16
Assumption – row and column maxima of a totally monotone matrix can be computed in O(n) Why DIST and OUT matrices of the alignment problem is totally monotone?
17
DIST and OUT matrix (Revise) O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrixOUT matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 012345 10 -2 -- -- 1101 -- 133420 -1200200 -13 100 -14 123 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 max col
18
Compute O without explicit OUT O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 SMAWK
19
DIST is Monge O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I
20
DIST is Monge array Monge M[a, c] + M[b, d] M[a, d] + M[b, c] Totally monotone by Concave condition: M[a,c] M[b,c] M[a,d] M[b,d]
21
Comment on this approach Advantages – Easy to parallelize – Easy to combine Disadvantages – Need to compute/keep more information
22
Applications Parallel sequence alignment – O(log m log n) time – Using O(m n / log m) processors (CREW PRAM) Best non-overlapping alignment score – O(n 2 log 2 n) time Tandem approximate repeat – O(n 2 log n) time Common Substring Alignment
23
SMAWK
24
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] Find all column mimimas of the following totally monotone arrays b < d a < c b = d a c
25
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c b > d a = c b d Find all column mimimas of the following totally monotone arrays b < d a < c b = d a c
26
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c b > d a = c b d b < d a < c b = d a c Observation 1
27
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c b > d a = c b d Observation 2 b < d a < c b = d a c
28
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c b > d a = c b d SMAWK is a recursive algorithm of 2 steps – REDUCE – INTERPOLATE b < d a < c b = d a c
29
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c b > d a = c b d SMAWK is a recursive algorithm of 2 steps – REDUCE – INTERPOLATE REDUCE removes rows INTERPOLATE removes half of the columns b < d a < c b = d a c
30
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
31
0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
32
0123456789 1 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
33
0123456789 1 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
34
0123456789 1 2 3 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
35
0123456789 1 2 3 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
36
0123456789 1 2 3 4 102028424856758688 5 2933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
37
0123456789 1 2 3 4 102028424856758688 5 2933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
38
0123456789 1 2 3 4 102028424856758688 5 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
39
0123456789 1 2 3 4 102028424856758688 5 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
40
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
41
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
42
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
43
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
44
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 48 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
45
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 48 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
46
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
47
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 4239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
48
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 28 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE
49
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 1561461311209784806531 18 17816414613511096927339 REDUCE
50
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 18 REDUCE
51
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 18 REDUCE
52
0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 18 REDUCE
53
0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25 REDUCE
54
0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25 INTERPOLATE Remove all odd indexed colums
55
02468 4 20425686 6 21354465 7 384461 10 423343 11 4147 12 4445 13 46 14 46 16 INTERPOLATE
56
02468 4 20425686 6 21354465 7 384461 10 423343 11 4147 12 4445 13 46 14 46 16 RECURSIVE Find all row minima
57
0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25
58
0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25
59
0123456789 4 102028 6 243539 7 42 10 3533444329 11 29 12 24 13 23 14 20 16 25
60
0123456789 4 102028 6 243539 7 42 10 3533444329 11 29 12 24 13 23 14 20 16 25
61
0123456789 1 42577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339
62
APPROXIMATE TANDEM REPEAT Application of DIST and SMAWK
63
Tandem repeat IRQI QLWLR QIWIR LRQL
64
Social City
65
Observation Approximate tandem repeat – With the Mid-point c – Alignments start at column c end at row c c c 0n n
66
4 cases – Cross column n/2 – Cross row n/2 – In side sub-triangle [0,n/2] – In side sub-triangle [n/2,n]
67
Algorithm 1.Find all repeats that cross – row n/2 – column n/2 2.Recursively solve the – sub-array [0..n/2, 0..n/2] – sub-array [n/2..n, n/2..n] c1c1 0 n/2c2c2 c1c1 c2c2 c3c3 c3c3
68
Cross column n/2 Combine – Best path from column c to (k,n/2) – Best path from (k,n/2) to row c c c 0n n n/2
69
Cross column n/2 Sub-problems: – DIST_col (c,n/2) [i,j] – DIST_row (c,n/2) [i,j] c1c1 0 n/2c2c2 c1c1 c2c2
70
Cross column n/2 DIST_col (c,n/2) [i,j] : O(n 3 ) words Encode in array of binary trees Using O(n 2 log n) words B[j,c] is a binary tree B[j,c](i) is a leaf of the tree Read an entry of DIST_col (c,n/2) [i,j] in O(log n) c1c1 0 n/2c2c2 c1c1 c2c2
71
Algorithm 1.Find all repeats O(n 2 logn) – cross row n/2 – column n/2 1.Recursively solve the – sub-array [0..n/2, 0..n/2] – sub-array [n/2..n, n/2..n] c1c1 0 n/2c2c2 c1c1 c2c2 c3c3 c3c3
72
References Aggarwal, A. and Park, J. Notes on Searching in Multidimensional Monotone Arrays. IEEE Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM. Lawrence L. Larmore. The SMAWK Algorithm. UNLV. Apostolico, A. and Atallah, M.J. and Larmore, L.L. and McFaddin, S.. Efficient Parallel Algorithms for String Editing and Related Problems. SIAM J. Comput. Landau, G.M. and Ziv-Ukelson, M. On the Common Substring Alignment Problem. J. of Algorithms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.