Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMAWK. REVISE Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) +  (S[i], T[j]),

Similar presentations


Presentation on theme: "SMAWK. REVISE Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) +  (S[i], T[j]),"— Presentation transcript:

1 SMAWK

2 REVISE

3 Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) +  (S[i], T[j]), V(i-1,j) +  (S[i], -), V(i,j-1) +  (-, T[j]) }

4 DIST and OUT matrix (Revise) O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrixOUT matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 012345 10 -2 -- -- 1101 -- 133420 -1200200 -13 100 -14 123 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 max col

5 Compute O without explicit OUT O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 SMAWK

6 Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays. Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.Convex condition: M[a,c]  M[b,c]  M[a,d]  M[b,d]. 2.Concave condition: M[a,c]  M[b,c]  M[a,d]  M[b,d].

7 SMAWK Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.

8 Presentation Outline What is Monge arrays? – Monge  Totally monotone Why DIST alignment matrix is Monge arrays? How to compute totally monotone arrays efficiently? – SMAWK Given a totally monotone arrays Compute all columns maxima in O(n)

9 MONGE AND TOTALLY MONOTONE PROPERTIES

10 Monge A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.M[a, c] + M[b, d]  M[a, d] + M[b, c] 2.M[a, c] + M[b, d]  M[a, d] + M[b, c] cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x……

11 Totally monotone A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d 1.Convex condition: M[a,c]  M[b,c]  M[a,d]  M[b,d] 2.Concave condition: M[a,c]  M[b,c]  M[a,d]  M[b,d] Monge  Totally monotone cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x……

12 Intuition Monge: Quadrangle inequality: a c b d x z cdz aM[a,c]M[a,d]… bM[b,c]M[b,d] x…… M[a, c] + M[b, d]  M[a, d] + M[b, c]

13 History Computational Geometry All nearest neighbor problem – Shamos and Hoey proved  (n log n) in 1975 All farthest neighbor problem – F.P.Reparata proved  (n log n) in 1977 All farthest neighbor problem in convex polygon – Lee and Preparata proved O(n) in 1978

14 SMAWK Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987 Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.

15 DIST AND OUT MATRICES

16 Assumption – row and column maxima of a totally monotone matrix can be computed in O(n) Why DIST and OUT matrices of the alignment problem is totally monotone?

17 DIST and OUT matrix (Revise) O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrixOUT matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 012345 10 -2 -- -- 1101 -- 133420 -1200200 -13 100 -14 123 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 max col

18 Compute O without explicit OUT O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I DIST matrix I (input borders) Block – sub-sequences “acg”, “ag” 012345 I0I0 0-2-3 △△ I1I1 -2-3 △ I2I2 -2001-3 I3I3 △ -2 0 I4I4 △△ 0 I5I5 △△△ -20 I 0 =1 I 1 =2 I 2 =3 I 3 =2 I 4 =1 I 5 =3 O0O0 O1O1 O2O2 O3O3 O4O4 O5O5 133423 SMAWK

19 DIST is Monge O g a gca G 0 2 0 1 2 3 4 1 3 4 5 5 I

20 DIST is Monge array Monge M[a, c] + M[b, d]  M[a, d] + M[b, c] Totally monotone by Concave condition: M[a,c]  M[b,c]  M[a,d]  M[b,d]

21 Comment on this approach Advantages – Easy to parallelize – Easy to combine Disadvantages – Need to compute/keep more information

22 Applications Parallel sequence alignment – O(log m log n) time – Using O(m n / log m) processors (CREW PRAM) Best non-overlapping alignment score – O(n 2 log 2 n) time Tandem approximate repeat – O(n 2 log n) time Common Substring Alignment

23 SMAWK

24 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] Find all column mimimas of the following totally monotone arrays b < d  a < c b = d  a  c

25 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c  b > d a = c  b  d Find all column mimimas of the following totally monotone arrays b < d  a < c b = d  a  c

26 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c  b > d a = c  b  d b < d  a < c b = d  a  c Observation 1

27 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c  b > d a = c  b  d Observation 2 b < d  a < c b = d  a  c

28 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c  b > d a = c  b  d SMAWK is a recursive algorithm of 2 steps – REDUCE – INTERPOLATE b < d  a < c b = d  a  c

29 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 [a b] [c d] a > c  b > d a = c  b  d SMAWK is a recursive algorithm of 2 steps – REDUCE – INTERPOLATE REDUCE removes rows INTERPOLATE removes half of the columns b < d  a < c b = d  a  c

30 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

31 0123456789 1 2542577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

32 0123456789 1 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

33 0123456789 1 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

34 0123456789 1 2 3 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

35 0123456789 1 2 3 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

36 0123456789 1 2 3 4 102028424856758688 5 2933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

37 0123456789 1 2 3 4 102028424856758688 5 2933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

38 0123456789 1 2 3 4 102028424856758688 5 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

39 0123456789 1 2 3 4 102028424856758688 5 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

40 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

41 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

42 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

43 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

44 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 48 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

45 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 48 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

46 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

47 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 4239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

48 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 28 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339 REDUCE

49 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 1561461311209784806531 18 17816414613511096927339 REDUCE

50 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 18 REDUCE

51 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 18 REDUCE

52 0123456789 1 2 3 4 102028424856758688 5 6 2124353944596559 7 28384244576152 8 9 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 15 16 25 17 18 REDUCE

53 0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25 REDUCE

54 0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25 INTERPOLATE Remove all odd indexed colums

55 02468 4 20425686 6 21354465 7 384461 10 423343 11 4147 12 4445 13 46 14 46 16 INTERPOLATE

56 02468 4 20425686 6 21354465 7 384461 10 423343 11 4147 12 4445 13 46 14 46 16 RECURSIVE Find all row minima

57 0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25

58 0123456789 4 102028424856758688 6 2124353944596559 7 28384244576152 10 423533444329 11 4741504729 12 44524524 13 554623 14 4620 16 25

59 0123456789 4 102028 6 243539 7 42 10 3533444329 11 29 12 24 13 23 14 20 16 25

60 0123456789 4 102028 6 243539 7 42 10 3533444329 11 29 12 24 13 23 14 20 16 25

61 0123456789 1 42577890103123142151 2 213548657685105123130 3 13263551586786100104 4 102028424856758688 5 202933444955738280 6 132124353944596559 7 192528384244576152 8 35374048 4962 49 9 3736374239 515037 10 413937423533444329 11 585654554741504729 12 666461 5144524524 13 827672705649554623 14 999183806356594620 15 1241161071008071725828 16 1331251131068675745925 17 1561461311209784806531 18 17816414613511096927339

62 APPROXIMATE TANDEM REPEAT Application of DIST and SMAWK

63 Tandem repeat IRQI QLWLR QIWIR LRQL

64 Social City

65 Observation Approximate tandem repeat – With the Mid-point c – Alignments start at column c end at row c c c 0n n

66 4 cases – Cross column n/2 – Cross row n/2 – In side sub-triangle [0,n/2] – In side sub-triangle [n/2,n]

67 Algorithm 1.Find all repeats that cross – row n/2 – column n/2 2.Recursively solve the – sub-array [0..n/2, 0..n/2] – sub-array [n/2..n, n/2..n] c1c1 0 n/2c2c2 c1c1 c2c2 c3c3 c3c3

68 Cross column n/2 Combine – Best path from column c to (k,n/2) – Best path from (k,n/2) to row c c c 0n n n/2

69 Cross column n/2 Sub-problems: – DIST_col (c,n/2) [i,j] – DIST_row (c,n/2) [i,j] c1c1 0 n/2c2c2 c1c1 c2c2

70 Cross column n/2 DIST_col (c,n/2) [i,j] : O(n 3 ) words Encode in array of binary trees Using O(n 2 log n) words B[j,c] is a binary tree B[j,c](i) is a leaf of the tree Read an entry of DIST_col (c,n/2) [i,j] in O(log n) c1c1 0 n/2c2c2 c1c1 c2c2

71 Algorithm 1.Find all repeats O(n 2 logn) – cross row n/2 – column n/2 1.Recursively solve the – sub-array [0..n/2, 0..n/2] – sub-array [n/2..n, n/2..n] c1c1 0 n/2c2c2 c1c1 c2c2 c3c3 c3c3

72 References Aggarwal, A. and Park, J. Notes on Searching in Multidimensional Monotone Arrays. IEEE Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM. Lawrence L. Larmore. The SMAWK Algorithm. UNLV. Apostolico, A. and Atallah, M.J. and Larmore, L.L. and McFaddin, S.. Efficient Parallel Algorithms for String Editing and Related Problems. SIAM J. Comput. Landau, G.M. and Ziv-Ukelson, M. On the Common Substring Alignment Problem. J. of Algorithms


Download ppt "SMAWK. REVISE Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1) +  (S[i], T[j]),"

Similar presentations


Ads by Google