Presentation is loading. Please wait.

Presentation is loading. Please wait.

Backward Nondeterministic DAWG Matching Algorithm

Similar presentations


Presentation on theme: "Backward Nondeterministic DAWG Matching Algorithm"— Presentation transcript:

1 Backward Nondeterministic DAWG Matching Algorithm
A Bit-parallel Approach to Suffix Automata: Fast Extended String Matching, Navarro, G. and Raffinot, M., Lecture Notes in Computer Science, Vol.1448, 1998, pp Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen

2 Problem Definition: Input : A text T and a pattern P.
Output : All the locations where P matches T.

3 This algorithm uses rule 1: Suffix to Prefix Rule:
For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

4 Find the longest suffix U of the window which is equal to some prefix of P. Skip the pattern as follows: U

5 Example T = GCA TCGACAGAC TATACAGTACG P = GACGGATCA ∵The longest suffix of the window which is equal to a prefix of P is “GAC”, slide the window by 6. T = GCATCGACAGACTATACAGTACG P = GACGGATCA

6 We give an example to introduce how this algorithm find the longest
suffix of the window which is equal to a prefix of P.

7 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD We want to find the longest suffix of “BDDCCDBAD” which is also a prefix of the pattern.

8 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD First, we read “D”.

9 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD We find all the substrings ”D” in the pattern.

10 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD We read the next character “A”. We check if the right of the substrings ”D” are “A” or not.

11 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD Thus, we find out all the substrings ”AD” in the pattern.

12 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD We read the next character “B”. We check if the right of the substrings “AD” are “B” or not.

13 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD We find that the substring ”BAD” is in the pattern. Note that “BAD” is also a prefix of P.

14 Text : ABDDCCDBADEGGGGJJ
Example: Text : ABDDCCDBADEGGGGJJ Pattern : BADADCEAD We read the next character “D”. We can not find a character “D” in the right of the substring “BAD”. We report that “BAD” is the longest suffix of “BDDCCDBAD” which is equal a prefix of P.

15 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD We want to find the longest suffix of “BDDCCDDAD” which is also a substring of the pattern.

16 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD First, we find all the substrings ”D” in the pattern.

17 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD mismatch Then we find out all the substrings ”AD” in the pattern.

18 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD Then we find out all the substrings ”AD” in the pattern.

19 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD mismatch We find out all the substrings ”DAD” in the pattern.

20 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD We find out all the substrings ”DAD” in the pattern.

21 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD mismatch We find all the substrings ”DDAD” in the pattern.

22 Text : ABDDCCDDADEGGGGJJ
Another example: Text : ABDDCCDDADEGGGGJJ Pattern : ACDADCEAD mismatch We find all the substrings ”DDAD” in the pattern. There is no substring “DDAD” in the pattern. There is no any suffix of “BDDCCDDAD” which is equal to a prefix of P.

23 The idea that we explained above is the main idea of this
algorithm. And next we will use bit-parallel method to implement this algorithm.

24 We use bits to store the positions of a character in P.
Example: P: CABBCAD P: CABBCAD A: For character “A”, we store B: For character “B”, we store For character “C”, we store C: For character “D”, we store D: For the characters do not exit in P we store *:

25 Here, we explain how to use bit-parallel to find the substring
of a pattern which is equaled to a suffix of the window. Text: ABCABCABA ,∑={A,B,C,D} Pattern: CABBCAD Pattern: CABCCAD A: B: C: D: other: D: We use a mask D to record some information.

26 <<1: left shift one bit.
Text: ABCABCABA ,∑={A,B,C,D} Pattern: CABBCAD Pattern: CABCCAD A: B: C: D: other: D: And A: D: <<1: left shift one bit. D= <<1 =

27 Text: ABCABCABA ,∑={A,B,C,D} Pattern: CABBCAD Pattern: CABCCAD
other: D: And C: D: We know “CA” is a suffix of the window which is equal to a prefix of the pattern. D= <<1 =

28 Text: ABCABCABA ,∑={A,B,C,D} Pattern: CABBCAD Pattern: CABCCAD
other: D: And B: D: We know “BCA” is a substring of the pattern. D= <<1 =

29 Text: ABCABCABA ,∑={A,B,C,D} Pattern: CABBCAD Pattern: CABCCAD
other: D: And A: There is no substring “ABCA” in the pattern.

30 Text: ABCABCABA ,∑={A,B,C,D} Pattern: CABBCAD
“CA” is a suffix of “BCA” which is a prefix of the pattern.

31 We take another example:
Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD

32 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
First, we build: Pattern: ACBCCBD A: B: C: D: others:

33 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
others: Pattern: ACBCCBD D:

34 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
others: Pattern: ACBCCBD D:

35 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
others: Pattern: ACBCCBD D: And C: D: Where there is a “1”, there is a substring “C” in Pattern. We set D = <<1=

36 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
others: Pattern: ACBCCBD D: And C: D: Where there is a “1”, there is a substring “CC” in Pattern. We set D = <<1=

37 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
others: Pattern: ACBCCBD D: And B: D: Where there is a “1”, there is a substring “BCC” in Pattern. We set D = <<1=

38 Example: Text: ABCABCCBA ,∑={A,B,C,D} Pattern: ACBCCBD
others: Pattern: ACBCCBD D: And A: D: There is no substring “ABCC” in Pattern. There is no any suffix of the window which is equal to a prefix of the pattern.

39 Time Complexity: If the length of the text is n and the length of pattern is m, the time complexity of this algorithm is O(mn) in the worst case.

40 Reference [BG92]A new approach to text searching, R. Baeza-Yates and Navarro, G., CACM. Vol. 35, 1992, pp [BEH89]Average sizes of suffix trees and dawgs., Blumer, A., Ehrenfeucht, A. and Haussler, D., Discrete Applied Mathematics, Vol. 24, 1989, pp [BM77] A fast string searching algorithm. Boyer, R. S. and Moore, J. S., Communications of the ACM, Vol. 20, 1977, pp [GM98] A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching, G. NAVARRO and M. RAFFINOT, In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science 1448, Springer-Verlag, Berlin, 1998, pp


Download ppt "Backward Nondeterministic DAWG Matching Algorithm"

Similar presentations


Ads by Google