Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Author: Ricardo A. Baeza-Yates, Gaston H. Gonnet  Publisher: 1992 Communications of the ACM  Presenter: Yuen-Shuo Li  Date: 2013/08/14 1.

Similar presentations

Presentation on theme: " Author: Ricardo A. Baeza-Yates, Gaston H. Gonnet  Publisher: 1992 Communications of the ACM  Presenter: Yuen-Shuo Li  Date: 2013/08/14 1."— Presentation transcript:

1  Author: Ricardo A. Baeza-Yates, Gaston H. Gonnet  Publisher: 1992 Communications of the ACM  Presenter: Yuen-Shuo Li  Date: 2013/08/14 1

2  String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation.

3 T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111 cbaba T[a] = 11010

4 50301 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

5 0301 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

6 14020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

7 14020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

8 14020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

9 4020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

10 50301 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

11 04121 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

12  To update the state after reading a new character on the text, we must  Shift the vector state b bits to the left to reflect that we have advanced one position in the text.  Update the individual states according to the new character.

13 The number of mismatches

14 0 or 1 b = 1

15 Let {a, b, c, d} be the alphabet, and ababc the pattern. T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

16  The initial state is 11111 11111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

17  The initial state is 11111 11111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

18  The initial state is 11111 1111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

19  The initial state is 11111 11110 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

20  The initial state is 11111 11101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

21  The initial state is 11111 11111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

22  The initial state is 11111 11110 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

23  The initial state is 11111 11101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

24  The initial state is 11111 11010 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

25  The initial state is 11111 10101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

26  The initial state is 11111 11010 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

27  The initial state is 11111 10101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111

28  The initial state is 11111 01111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111 The match at the end of the text is indicated by the value 0 in the leftmost bit of the state


30 m: pattern size w: word size


32 T[a] = 11000 T[b] = 10011 T[c] = 11101 T[d] = 01101


34  We allow up to k characters of the pattern to mismatch with the corresponding text. For example, if k = 2, the pattern mismatch: mismatch (match) dispatch (match) respatch (mismatch)


36 At each step we record the overflow bits in an overflow state, and we reset the overflow bits of all individual states.

37  We want to search for all occurrences of ababc with at most 2 mismatch. Because the value of b is 3 for 2 mismatches, every position in the state is represented by a number in the range 0- 4.  Initial state: 00000  Initial overflow: 44444 We report a match when the sum of the leftmost digits of the state and the overflow is less than 3




41  Experimental results for searching 100 times for all possible matches of a pattern in a 50,000 character English text(a legal document)

42 BMH: Boyer-Moore, as suggested by Horspool

43  The execution time while search 1,000 words chosen at random from the same English text

Download ppt " Author: Ricardo A. Baeza-Yates, Gaston H. Gonnet  Publisher: 1992 Communications of the ACM  Presenter: Yuen-Shuo Li  Date: 2013/08/14 1."

Similar presentations

Ads by Google