Download presentation
Presentation is loading. Please wait.
Published byLewis Tucker Modified over 9 years ago
1
23 Jan, 2008SOFSEM 20081 A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns Costas Iliopoulos M. Sohel Rahman
2
23 Jan, 2008SOFSEM 20082 Classic Pattern Matching Input: A string T of length n (the text) A string P of length m (the pattern). Output Whether P occurs in T Occ = {i | P = T [i..i + m − 1]} Existence Query Computation of Occurrence set From Alphabet
3
23 Jan, 2008SOFSEM 20083 Example We have GAC at position 3 and 12 Occ = {3, 12}. P = GAC Occ = {5, 14}.
4
23 Jan, 2008SOFSEM 20084 Swap Matching GCCTCTCACGTT Text P = ACGCT 11091123456781213 A CTCGA 12345
5
23 Jan, 2008SOFSEM 20085 Swap Matching GCCTCTCACGTT Text P = ACGCT 11091123456781213 A CTCGA CTCGA CTCGA Occ = {1,5,6}
6
23 Jan, 2008SOFSEM 20086 Motivation Swap Error is a common error during typing. The phenomenon of swaps occurs in gene mutations and duplications.
7
23 Jan, 2008SOFSEM 20087 Existing results O(nm 1/3 log m log ) O(n log 2 m) O(n log m log ) = min(m,| |) (Some very special cases) 2000: Amir, Aumann, Landau, Lewenstein, Lewenstein. 1998: Amir, Landau, Lewenstein, Lewenstein. 2003: Amir, Cole, Hariharan, Lewenstein, Porat. All results uses FFT
8
23 Jan, 2008SOFSEM 20088 Existing results Some related variants are also investigated in the literature: Approximate version: Amir, Lewenstein, Porat (2002) Weighted Version: Zhang, Guo, Iliopoulos (2004)
9
23 Jan, 2008SOFSEM 20089 Our Contribution A new graph theoretic model O(m/w n logm) time. For word-size patterns: O(n log m) The first non-FFT efficient algorithm for swap matching
10
23 Jan, 2008SOFSEM 200810 The new Model
11
23 Jan, 2008SOFSEM 200811 T-Graph ccaaacbaccbc 11091123456781213 a T = ba 1415 T-Graph acaabcacabaccbc
12
23 Jan, 2008SOFSEM 200812 P-Graph cbab 12345 a P =P-Graph acb ba b ca b b ac a ab 1 2 345
13
23 Jan, 2008SOFSEM 200813 P-Graph ccab 12345 a P =P-Graph acc bacca b b ac a ac 1 2 345
14
23 Jan, 2008SOFSEM 200814 So… P swap matches T P-Graph swap matches T-Graph
15
23 Jan, 2008SOFSEM 200815 An Efficient Algorithm
16
23 Jan, 2008SOFSEM 200816 Degenerate strings Let = {A, C, G, T} Then we can get 2^4 -1 = 15 non-empt y sets of letters. At each position of a degenerate string we have one of those sets.
17
23 Jan, 2008SOFSEM 200817 Degenerate strings… TGAC GACTACTAG TCG ACAGAT CG CTCG ACGT
18
23 Jan, 2008SOFSEM 200818 Degenerate strings… X= T C CA T C A CAC 1234567
19
23 Jan, 2008SOFSEM 200819 Degenerate strings Equality/Match X= T C CA T C A CAC 1234567 Y= T C A C A X[3] =d Y[1]. WHY? Because, X[3] Y[1] = A Y =d X[1..3] Y =d X[3..5] Y =d X[4..6]
20
23 Jan, 2008SOFSEM 200820 P-Graph => Degenerate String acb ba b c b ac a ab 1 2 345 a c a b c a b c a b a b
21
23 Jan, 2008SOFSEM 200821 P => cabaaabcb 11092345678 b T = According to Deg. Mat, OK! According to Swap. Mat, NOT OK! Swap Match vs Deg. Match a c a b c a b c a b a b a c a b c a b c a b a b
22
23 Jan, 2008SOFSEM 200822 Why Doesn’t Work? cabaaabcb 11092345678 b T = a c a b c a b c a b a b acb ba b c b ac a ab 1 2 345 ccab 12345 a
23
23 Jan, 2008SOFSEM 200823 Forbidden Graph acc bac b a a ac
24
23 Jan, 2008SOFSEM 200824 Our Algorithm Shift-Or Algorithm The concept of the Forbidden Graph
25
23 Jan, 2008SOFSEM 200825 D-Mask caba P = c => a c a b a b a c a c c acXD->b 001ac1 001 1 001 1 001abc0 1 2 3 4 011ab05
26
23 Jan, 2008SOFSEM 200826 F-Mask acc bac b a a ac (a,a) 0 0 0 0 1 2 3 4 05 (a,b)(b,b)(c,c)(c,a)(X,X) 00000 00000 00000 00000 00000 1 2 345 11 11 1 1 1 1
27
23 Jan, 2008SOFSEM 200827 Computing R matrix ccaaacbaccbc 11091123456781213 aba 1415 2 3 4 5 1 c c a b a 1 1 1 1 1 0 1 1 1 1 1 Shift 1 1 1 1 0 0 0 0 0 0 DaDa 0 0 0 0 0 F (X,a) Or 1 1 1 1 0 1 1 1 1 0 X
28
23 Jan, 2008SOFSEM 200828 Computing R matrix ccaaacbaccbc 11091123456781213 aba 1415 2 3 4 5 1 c c a b a 1 1 1 1 1 0 1 1 1 1 0 Shift 0 1 1 1 0 0 0 0 1 0 DcDc 0 0 0 0 0 F (a,c) Or 0 1 1 1 0 1 1 1 1 0 X 0 1 1 1 0
29
23 Jan, 2008SOFSEM 200829 Computing R matrix ccaaacbaccbc 11091123456781213 aba 1415 2 3 4 5 1 c c a b a 1 1 1 1 1 0 0 1 1 1 0 Shift 0 0 1 1 0 0 0 0 0 0 DaDa 0 0 0 1 0 F (c,a) Or 0 0 1 1 0 1 1 1 1 0 X 0 1 1 1 0 0 0 1 1 0
30
23 Jan, 2008SOFSEM 200830 Computing R matrix ccaaacbaccbc 11091123456781213 aba 1415 2 3 4 5 1 c c a b a 1 1 1 1 1 0 0 0 0 1 0 Shift 0 0 0 0 0 1 1 0 0 1 DbDb 0 0 0 0 0 F (c,b) Or 1 1 0 0 1 1 1 1 1 0 X 0 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1
31
23 Jan, 2008SOFSEM 200831 Computing R matrix ccaaacbaccbc 11091123456781213 aba 1415 2 3 4 5 1 c c a b a 1 1 1 1 1 0 1 1 1 1 0 X 0 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 0
32
23 Jan, 2008SOFSEM 200832 Running Time Computing D-Maks: O(m/w (m + | |)) Computing F-Maks: O(m/w m log m) Computing R Values: O(m/w n log m) O(m/w n log m) O(n log m) short patterns (m~w)
33
23 Jan, 2008SOFSEM 200833 Future Works Explore the possibilities of using Graph pattern matching Experimental works Forthcoming paper contains experimental works using biological examples.
34
23 Jan, 2008SOFSEM 200834 The End Thank you very much
35
23 Jan, 2008SOFSEM 200835
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.