Download presentation
Presentation is loading. Please wait.
Published byAileen Greer Modified over 9 years ago
1
Author: Ricardo A. Baeza-Yates, Gaston H. Gonnet Publisher: 1992 Communications of the ACM Presenter: Yuen-Shuo Li Date: 2013/08/14 1
2
String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation.
3
T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111 cbaba T[a] = 11010
4
50301 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
5
0301 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
6
14020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
7
14020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
8
14020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
9
4020 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
10
50301 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
11
04121 State cbbabababcaba… text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
12
To update the state after reading a new character on the text, we must Shift the vector state b bits to the left to reflect that we have advanced one position in the text. Update the individual states according to the new character.
13
The number of mismatches
14
0 or 1 b = 1
15
Let {a, b, c, d} be the alphabet, and ababc the pattern. T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
16
The initial state is 11111 11111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
17
The initial state is 11111 11111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
18
The initial state is 11111 1111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
19
The initial state is 11111 11110 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
20
The initial state is 11111 11101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
21
The initial state is 11111 11111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
22
The initial state is 11111 11110 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
23
The initial state is 11111 11101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
24
The initial state is 11111 11010 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
25
The initial state is 11111 10101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
26
The initial state is 11111 11010 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
27
The initial state is 11111 10101 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111
28
The initial state is 11111 01111 State abdabababc text T[a] = 11010 T[b] = 10101 T[c] = 01111 T[d] = 11111 The match at the end of the text is indicated by the value 0 in the leftmost bit of the state
30
m: pattern size w: word size
32
T[a] = 11000 T[b] = 10011 T[c] = 11101 T[d] = 01101
34
We allow up to k characters of the pattern to mismatch with the corresponding text. For example, if k = 2, the pattern mismatch: mismatch (match) dispatch (match) respatch (mismatch)
36
At each step we record the overflow bits in an overflow state, and we reset the overflow bits of all individual states.
37
We want to search for all occurrences of ababc with at most 2 mismatch. Because the value of b is 3 for 2 mismatches, every position in the state is represented by a number in the range 0- 4. Initial state: 00000 Initial overflow: 44444 We report a match when the sum of the leftmost digits of the state and the overflow is less than 3
41
Experimental results for searching 100 times for all possible matches of a pattern in a 50,000 character English text(a legal document)
42
BMH: Boyer-Moore, as suggested by Horspool
43
The execution time while search 1,000 words chosen at random from the same English text
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.