Princeton University COS 226 Algorithms and Data Structures Spring Knuth-Morris-Pratt Reference: Chapter 19, Algorithms in C by R. Sedgewick. Addison Wesley, 1990.
2 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. aabaaa Search Pattern 34 aa 56 a 01 aa 2 b accept state
3 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. aabaaa Search Pattern 34 aa 56 a 01 aa 2 b b b b b b a accept state
4 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
5 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
6 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
7 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
8 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
9 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
10 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
11 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
12 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
13 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aaabaa Search Text baaab accept state
14 Knuth-Morris-Pratt KMP algorithm. n Use knowledge of how search pattern repeats itself. n Build FSA from pattern. n Run FSA on text. 34 aa 56 a 01 aa 2 b b b b b b a aabaaa Search Pattern aabaaa Search Text baaab accept state
15 FSA Representation FSA used in KMP has special property. n Upon character match, go forward one state. n Only need to keep track of where to go upon character mismatch. – go to state next[j] if character mismatches in state j b0 a aabaaa Search Pattern next aa 56 a 01 aa 2 b b b b b b a accept state
16 KMP Algorithm Two key differences from brute force. n Text pointer i never backs up. Need to precompute next[] table. int kmpsearch(char p[], char t[], int next[]) { int i, j = 0; int M = strlen(p); // pattern length int N = strlen(t); // text length for (i = 0; i < N; i++) { if (t[i] == p[j]) j++; // char match else j = next[j]; // char mismatch if (j == M) return i – M + 1; // found } return N; // not found } KMP String Search
17 FSA Construction for KMP FSA construction for KMP. n FSA builds itself! Example. Building FSA for aabaaabb. State 6. p[0..5] = aabaaa – assume you know state for p[1..5] = abaaa X = 2 – if next char is b (match): go forward6 + 1 = 7 – if next char is a (mismatch): go to state for abaaaa X + 'a' = 2 – update X to state for p[1..6] = abaaab X + 'b' = 3 34 aa 56 a 01 aa 2 b b b b b b a
18 FSA Construction for KMP FSA construction for KMP. n FSA builds itself! Example. Building FSA for aabaaabb. 34 aa 56 a 01 aa 2 b b b b b b a 7 a b
19 FSA Construction for KMP FSA construction for KMP. n FSA builds itself! Example. Building FSA for aabaaabb. State 7. p[0..6] = aabaaab – assume you know state for p[1..6] = abaaab X = 3 – if next char is b (match): go forward7 + 1 = 8 – next char is a (mismatch): go to state for abaaaba X + 'a' = 4 – update X to state for p[1..7] = abaaabb X + 'b' = 0 34 aa 56 a 01 aa 2 b b b b b b a 7 a b
20 FSA Construction for KMP FSA construction for KMP. n FSA builds itself! Example. Building FSA for aabaaabb. 34 aa 56 a 01 aa 2 b b 7 8 b b b a b b a b a
21 FSA Construction for KMP FSA construction for KMP. n FSA builds itself! Crucial insight. n To compute transitions for state n of FSA, suffices to have: – FSA for states 0 to n-1 – state X that FSA ends up in with input p[1..n-1] To compute state X' that FSA ends up in with input p[1..n], it suffices to have: – FSA for states 0 to n-1 – state X that FSA ends up in with input p[1..n-1]
22 FSA Construction for KMP 01 a b b a X aabaaabb Search Pattern pattern[1..j]j
23 FSA Construction for KMP 01 a b b0 a1 0 aabaaabb Search Pattern X 00 pattern[1..j]jnext 0
24 FSA Construction for KMP 01 aa 2 b b b0 a aabaaabb Search Pattern X 0 a11 0 pattern[1..j]jnext 0 0
25 FSA Construction for KMP 301 aa 2 b b b a b0 a aabaaabb Search Pattern X 0 ab0 a pattern[1..j]jnext 0 2 0
26 FSA Construction for KMP 34 a 01 aa 2 b b b b a b0 a aabaaabb Search Pattern X 0 ab0 aba1 a pattern[1..j]jnext
27 FSA Construction for KMP 34 aa 5 01 aa 2 b b b b b a b0 a aabaaabb Search Pattern abaa X 2 0 ab0 aba1 a pattern[1..j]jnext
28 FSA Construction for KMP 34 aa 56 a 01 aa 2 b b b b b b a b0 a aabaaabb Search Pattern abaa X 2 abaaa2 0 ab0 aba1 a pattern[1..j]jnext
29 FSA Construction for KMP 34 aa 56 a 01 aa 2 b b 7 b b b b a b a b0 a aabaaabb Search Pattern abaa X 2 abaaa2 abaaab3 0 ab0 aba1 a pattern[1..j]jnext
30 FSA Construction for KMP 34 aa 56 a 01 aa 2 b b 7 8 b b b a b b a b a b0 a abaa X 2 abaaa2 abaaab3 0 ab0 aba1 a abaaabb07 aabaaabb Search Pattern pattern[1..j]jnext
31 FSA Construction for KMP Code for FSA construction in KMP algorithm. void kmpinit(char p[], int next[]) { int j, X = 0, M = strlen(p); next[0] = 0; for (j = 1; j < M; j++) { if (p[X] == p[j]) { // char match next[j] = next[X]; X = X + 1; } else { // char mismatch next[j] = X + 1; X = next[X]; } FSA Construction for KMP