Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 String Matching of Bit Parallel Suffix Automata.

Similar presentations


Presentation on theme: "1 String Matching of Bit Parallel Suffix Automata."— Presentation transcript:

1 1 String Matching of Bit Parallel Suffix Automata

2 2 Suffix Automata Base on a Deterministic Acyclic Word Graph (DAWG) To facilitate comparing equivalence suffix string Nondeterministic suffix automata Deterministic suffix automata Subset Construction

3 3 Suffix Automata Search Also called Backward Deterministic automata Matching (BDM) Build the factor x for pattern p  endpos(x) set of all the pattern position where an occurrence of x ends  Ex: Pattern = baabbaa, endpos(aa) = {3,7} Safe shift, if no equivalent suffix in pattern Text: shift left to right Fail to matching a factor Shift window Windows size = pattern length

4 4 BDM Algorithm Build automata Reached the final state

5 5 Suffix Automata Search Example 1. Build Reverse Deterministic Suffix Automata 2. endpos(x) to find a factor 3. Fail to find a factor, do a safe shift

6 6 1. T= [abbaba a ]bbaab a is a factor of p r and a reverse prefix of p. last =6 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

7 7 2. T= [abbab aa ]bbaab aa is a factor of p r and a reverse prefix of p. last =5 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

8 8 3. T= [abba baa ]bbaab aab is a factor of p r 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

9 9 4. T= [abb abaa ]bbaab We fail to recognize the next a.So we shift the window to last. We search again in position:T= abbab[aabbaab]. last=7 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

10 10 5. T= abbab[aabbaa b ] b is a factor of p r 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

11 11 6. T= abbab[aabba ab ] ba is a factor of p r 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

12 12 7. T= abbab[aabb aab ] baa is a factor of p r and a reverse prefix of p. last =4 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

13 13 8. T= abbab[aab baab ] baab is a factor of p r 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

14 14 9. T= abbab[aa bbaab ] baabb is a factor of p r 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

15 15 10. T= abbab[a abbaab ] baabba is a factor of p r 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

16 16 11. T= abbab[ aabbaab ] We recognize the word aabbaab and report an occurrence. 01234567 145 26 4 5 6 2367 7 37 a a a a a a b b b b b Suffix Automata Search Example

17 17 BNDM Algorithm Backward Nondeterministic Dawg Matching (BNDM) Handle class, multiple pattern, and allow errors Using bit parallelism, Combine Shift-Or and BDM Faster than BDM 20% ~ 25%, Faster than BM 10% ~ 40% Update Function

18 18 BNDM Algorithm

19 19 BNDM Example

20 20 BNDM Example

21 21 BNDM Further Improvement Handle long pattern  Partition pattern p into subpatterns p i  Build a array of D and B, process each part with basic algorithm  If p i is found, than process p i+1 … Handle Class  Modified B table only Have the ith bit set for all chars belonging to ith position in pattern Multiple Pattern  Two method Interleave patterns, shift r bit for each D update Just concatenate, shift 1 bit, but modifed D = (D<<1) &(1 m-1 0) r  Where r is # of patterns Approximate Matching  Use Wu’s method

22 22 Performance Comparison In 1/100 of second per megabyte

23 23 Reference Gonzalo Navarro and Mathieu Raffinot. A Bit-parallel approach to Suffix Automata: Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998. Gonzalo Navarro, Mathieu Raffinot, Fast and Flexible String Matching by Combining Bit- parallelism and Suffix Automata (1998)

24 24 Rreverse Pattern ?


Download ppt "1 String Matching of Bit Parallel Suffix Automata."

Similar presentations


Ads by Google