Download presentation
Presentation is loading. Please wait.
1
KMP algorithm
2
KMP algorithm public int search(String txt) {
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found
3
General idea Avoid backing up in the text string on a mismatch
For example Text: Pattern: When we find a mismatch, how could we move forward in the text? Cleverer way than Brute force ? How to analyze the pattern?
4
How ? Build a DFA DFA – Deterministic finite-state automata States
DFA = States + Transitions States For a pattern with m characters, there are (m + 1) states in the DFA `At state j` means the first (j – 1) characters in the pattern are matched The last state indicates ACCEPT (AC), i.e all characters in the pattern are matched
5
How ? Build a DFA DFA – Deterministic finite-state automata
DFA = States + Transitions Transitions At each state, there are R possible transitions, in which R is the number of all possible characters Formalize transitions as dfa[next_char][current_state] = next_state
6
How ? Build a DFA Explanation: dfa[next_char][current_state] = next_state Suppose we are now at current_state If we see that the next character is next_char, then we should transit to next_state Therefore, dfa[R][m] is a 2-dimensional table exhaustively enumerates all possible cases
7
How ? Build a DFA Explanation: dfa[next_char][current_state] = next_state Pattern: ABABAC (assume R=3 and the only characters are A,B,C) 2D array representation Directed graph representation 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C 6
8
How to use DFA ? Example Text: ABCABABABACA Pattern: ABABAC
9
How to use DFA ? State with state 0 ABCABABABACA Goto state 1
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? State with state 0 ABCABABABACA Goto state 1
10
How to use DFA ? Current state 1 ABCABABABACA Goto state 2
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 1 ABCABABABACA Goto state 2
11
How to use DFA ? Current state 2 ABCABABABACA Goto state 0
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 2 ABCABABABACA Goto state 0
12
How to use DFA ? Current state 0 ABCABABABACA Goto state 1
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 0 ABCABABABACA Goto state 1
13
How to use DFA ? Current state 1 ABCABABABACA Goto state 2
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 1 ABCABABABACA Goto state 2
14
How to use DFA ? Current state 2 ABCABABABACA Goto state 3
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 2 ABCABABABACA Goto state 3
15
How to use DFA ? Current state 3 ABCABABABACA Goto state 4
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 3 ABCABABABACA Goto state 4
16
How to use DFA ? Current state 4 ABCABABABACA Goto state 5
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 4 ABCABABABACA Goto state 5
17
How to use DFA ? Current state 5 ABCABABABACA Goto state 4
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 5 ABCABABABACA Goto state 4
18
How to use DFA ? Current state 4 ABCABABABACA Goto state 5
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 4 ABCABABABACA Goto state 5
19
How to use DFA ? Current state 5 ABCABABABACA Goto state 6
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 5 ABCABABABACA Goto state 6
20
How to use DFA ? Current state 6 (ACCEPT) ABCABABABACA
public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 6 (ACCEPT) ABCABABABACA
21
How to build DFA ? If we could match the next character,
If we see expected character, go to the next state Pattern: ABABAC (assume R=3 and the only characters are A,B,C) 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 - 3 5 B 2 4 C 6 We only need dfa[R][m] since there is no transition information for the last state 1 2 3 4 5 6 A B C
22
How to build DFA ? If we could match the next character 1 2 3 4 5 6 A
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } 1 2 3 4 5 6 A B C
23
How to build DFA ? If we failed to match the next character
Copy data from column x Mimic the transitions of state x Similar to `I am now in state x` or `restart from state x` x is a restart state Update restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. }
24
How to build DFA ? j=0 An example (ABABAC) j (current state): 0 1(A)
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 0 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 B C c j
25
How to build DFA ? j=1 An example (ABABAC) Process
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 1 x (restart state): 0 Process Copy dfa[][0] to dfa[][1] dfa[`B`][1] ←2 x ← dfa[`B`][0] = 0 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 B 2 C c j
26
How to build DFA ? j=1 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) Understand restart state x You are actually at state 1, but if you see next character is A or C, just suppose you are currently at state 0. Recall the meaning of states in DFA, state 0 means you have matched nothing. 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 B 2 C c j
27
How to build DFA ? j=2 An example (ABABAC) Process
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 2 x (restart state): 0 Process Copy dfa[][0] to dfa[][2] dfa[`A`][2] ←3 x ← dfa[`A`][0] = 1 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 C c j
28
How to build DFA ? j=2 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 2 x (restart state): 0 Understand restart state x At state 2, you have matched `AB`, but if you see next character is `B` or `C`, you have to start from very beginning (state 0). 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 C c j
29
How to build DFA ? j=2 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 2 x (restart state): 0 Understand restart state x x ← dfa[`A`][0] = 1, why ? At current state 2, the expect char is `A`, which means if we failed to match at next state 3, we do not need start from the very beginning, since at least we have `A` matched. 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 C c j
30
How to build DFA ? j=3 An example (ABABAC) Process
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 3 x (restart state): 1 Process Copy dfa[][1] to dfa[][3] dfa[`B`][3] ←4 x ← dfa[`B`][1] = 2 1(A)[x] 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 4 C c j
31
How to build DFA ? j=3 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 3 x (restart state): 1 Understand restart state x We restart from state 1 if we failed to match the expected `B`. The reason is that we know we have at least a `A` already matched. 1(A)[x] 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 4 C c j
32
How to build DFA ? j=3 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 3 x (restart state): 1 Understand restart state x x ← dfa[`B`][1] = 2, why ? At current state 3, the expect char is `B` and restart state 1 tells us `A` is already matched in the pattern. Thus if we failed at next state 4, `AB` are already matched, i.e. we could update restart state x to 2. 1(A)[x] 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 4 C c j
33
How to build DFA ? j=4 An example (ABABAC) Process
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 4 x (restart state): 2 Process Copy dfa[][2] to dfa[][4] dfa[`A`][4] ←5 x ← dfa[`A`][2] = 3 1(A) 2(B)[x] 3(A) 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C c j
34
How to build DFA ? j=4 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 4 x (restart state): 2 Understand restart state x Explanation: at state 4, you already matched `ABAB`, if you failed to match next `A`, you assume you still matched `AB` since restart state is 2. This assumption is achieved by copying the column of 2 for failed cases (`B` and `C`). 1(A) 2(B)[x] 3(A) 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C c j
35
How to build DFA ? j=5 An example (ABABAC) Process
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 5 x (restart state): 3 Process Copy dfa[][3] to dfa[][5] dfa[`C`][5] ←6 x ← dfa[`C`][3] = 0 1(A) 2(B) 3(A)[x] 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C 6 c j
36
How to build DFA ? j=5 An example (ABABAC) Understand restart state x
public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 5 x (restart state): 3 Understand restart state x Explanation: we have already matched `ABABA`, if we failed to match the expected `C`, we assume we have matched `ABA` since restart state is 3 1(A) 2(B) 3(A)[x] 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C 6 c j
37
Understand state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found Update x when we build the DFA is similar to the state transition when we match pattern in the text
38
Understand state x The transition of state x: match the pattern itself using partially constructed DFA table Build the next state of the DFA: we need to know the info of restart state x An example (ABABA) x ←0 x ← dfa[`B`][0] = 0 x ← dfa[`A`][0] = 1 x ← dfa[`B`][1] = 2 x ← dfa[`A`][2] = 3 x ← dfa[`C`][3] = 0
39
Conclusion Update x when we build the DFA is similar to the state transition when we match pattern in the text Understand that the process of building DFA is the same as matching the pattern to itself. By analyzing the pattern, we know how to move forward when we see failed matching characters.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.