1 String Matching of Bit Parallel Suffix Automata
2 Suffix Automata Base on a Deterministic Acyclic Word Graph (DAWG) To facilitate comparing equivalence suffix string Nondeterministic suffix automata Deterministic suffix automata Subset Construction
3 Suffix Automata Search Also called Backward Deterministic automata Matching (BDM) Build the factor x for pattern p endpos(x) set of all the pattern position where an occurrence of x ends Ex: Pattern = baabbaa, endpos(aa) = {3,7} Safe shift, if no equivalent suffix in pattern Text: shift left to right Fail to matching a factor Shift window Windows size = pattern length
4 BDM Algorithm Build automata Reached the final state
5 Suffix Automata Search Example 1. Build Reverse Deterministic Suffix Automata 2. endpos(x) to find a factor 3. Fail to find a factor, do a safe shift
6 1. T= [abbaba a ]bbaab a is a factor of p r and a reverse prefix of p. last = a a a a a a b b b b b Suffix Automata Search Example
7 2. T= [abbab aa ]bbaab aa is a factor of p r and a reverse prefix of p. last = a a a a a a b b b b b Suffix Automata Search Example
8 3. T= [abba baa ]bbaab aab is a factor of p r a a a a a a b b b b b Suffix Automata Search Example
9 4. T= [abb abaa ]bbaab We fail to recognize the next a.So we shift the window to last. We search again in position:T= abbab[aabbaab]. last= a a a a a a b b b b b Suffix Automata Search Example
10 5. T= abbab[aabbaa b ] b is a factor of p r a a a a a a b b b b b Suffix Automata Search Example
11 6. T= abbab[aabba ab ] ba is a factor of p r a a a a a a b b b b b Suffix Automata Search Example
12 7. T= abbab[aabb aab ] baa is a factor of p r and a reverse prefix of p. last = a a a a a a b b b b b Suffix Automata Search Example
13 8. T= abbab[aab baab ] baab is a factor of p r a a a a a a b b b b b Suffix Automata Search Example
14 9. T= abbab[aa bbaab ] baabb is a factor of p r a a a a a a b b b b b Suffix Automata Search Example
T= abbab[a abbaab ] baabba is a factor of p r a a a a a a b b b b b Suffix Automata Search Example
T= abbab[ aabbaab ] We recognize the word aabbaab and report an occurrence a a a a a a b b b b b Suffix Automata Search Example
17 BNDM Algorithm Backward Nondeterministic Dawg Matching (BNDM) Handle class, multiple pattern, and allow errors Using bit parallelism, Combine Shift-Or and BDM Faster than BDM 20% ~ 25%, Faster than BM 10% ~ 40% Update Function
18 BNDM Algorithm
19 BNDM Example
20 BNDM Example
21 BNDM Further Improvement Handle long pattern Partition pattern p into subpatterns p i Build a array of D and B, process each part with basic algorithm If p i is found, than process p i+1 … Handle Class Modified B table only Have the ith bit set for all chars belonging to ith position in pattern Multiple Pattern Two method Interleave patterns, shift r bit for each D update Just concatenate, shift 1 bit, but modifed D = (D<<1) &(1 m-1 0) r Where r is # of patterns Approximate Matching Use Wu’s method
22 Performance Comparison In 1/100 of second per megabyte
23 Reference Gonzalo Navarro and Mathieu Raffinot. A Bit-parallel approach to Suffix Automata: Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS Pages 14-33, Gonzalo Navarro, Mathieu Raffinot, Fast and Flexible String Matching by Combining Bit- parallelism and Suffix Automata (1998)
24 Rreverse Pattern ?