KMP algorithm.

Slides:



Advertisements
Similar presentations
1 Week 2 Questions / Concerns Schedule this week: Homework1 & Lab1a due at midnight on Friday. Sherry will be in Klamath Falls on Friday Lexical Analyzer.
Advertisements

Finite Automata CPSC 388 Ellen Walker Hiram College.
Prefix & Suffix Example W = ab is a prefix of X = abefac where Y = efac. Example W = cdaa is a suffix of X = acbecdaa where Y = acbe A string W is a prefix.
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne String Searching Reference: Chapter 19, Algorithms in C by R. Sedgewick. Addison.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Princeton University COS 226 Algorithms and Data Structures Spring 2004 Kevin Wayne DFA Construction for KMP Reference:
Princeton University COS 226 Algorithms and Data Structures Spring Knuth-Morris-Pratt Reference: Chapter 19, Algorithms.
Pattern Matching II COMP171 Fall Pattern matching 2 A Finite Automaton Approach * A directed graph that allows self-loop. * Each vertex denotes.
Aho-Corasick String Matching An Efficient String Matching.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
KMP String Matching Prepared By: Carlens Faustin.
Regular Expressions and Finite State Automata  Themes  Finite State Automata (FSA)  Describing patterns with graphs  Programs that keep track of state.
Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular.
MCS 101: Algorithms Instructor Neelima Gupta
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
MCS 101: Algorithms Instructor Neelima Gupta
CSc 453 Lexical Analysis (Scanning)
Deterministic Finite Automata (DFA) - 1 q0q0q0q0 q1q1q1q strings that do not end with "1". Build an automaton to identify strings that end with.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:
CS 203: Introduction to Formal Languages and Automata
String Sorts Tries Substring Search: KMP, BM, RK
Fundamental Data Structures and Algorithms
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Deterministic Finite Automata Nondeterministic Finite Automata.
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lecture Three: Finite Automata Finite Automata, Lecture 3, slide 1 Amjad Ali.
Finite State Machines Dr K R Bond 2009
Non-regular languages
Standard Representations of Regular Languages
CSE322 PUMPING LEMMA FOR REGULAR SETS AND ITS APPLICATIONS
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
The time complexity for e-closure(T).
Two issues in lexical analysis
Hierarchy of languages
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
Introduction to Finite Automata
Deterministic PDAs - DPDAs
Finite Automata Reading: Chapter 2.
4b Lexical analysis Finite Automata
Elementary Questions about Regular Languages
Non-regular languages
Automating Scanner Construction
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
KMP String Matching Donald Knuth Jim H. Morris Vaughan Pratt 1997.
2-Dimensional Pattern Matching
4b Lexical analysis Finite Automata
CSC312 Automata Theory Transition Graphs Lecture # 9
Data Structures and Algorithms for Information Processing
Some Graph Algorithms.
CSCI 2670 Introduction to Theory of Computing
Chapter # 5 by Cohen (Cont…)
Lecture 5 Scanning.
Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st
CSc 453 Lexical Analysis (Scanning)
Week 14 - Wednesday CS221.
Part Two : Nondeterministic Finite Automata
Presentation transcript:

KMP algorithm

KMP algorithm public int search(String txt) { public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found

General idea Avoid backing up in the text string on a mismatch For example Text: 00000000000000000000000000000000000001 Pattern: 000000001 When we find a mismatch, how could we move forward in the text? Cleverer way than Brute force ? How to analyze the pattern?

How ? Build a DFA DFA – Deterministic finite-state automata States DFA = States + Transitions States For a pattern with m characters, there are (m + 1) states in the DFA `At state j` means the first (j – 1) characters in the pattern are matched The last state indicates ACCEPT (AC), i.e all characters in the pattern are matched

How ? Build a DFA DFA – Deterministic finite-state automata DFA = States + Transitions Transitions At each state, there are R possible transitions, in which R is the number of all possible characters Formalize transitions as dfa[next_char][current_state] = next_state

How ? Build a DFA Explanation: dfa[next_char][current_state] = next_state Suppose we are now at current_state If we see that the next character is next_char, then we should transit to next_state Therefore, dfa[R][m] is a 2-dimensional table exhaustively enumerates all possible cases

How ? Build a DFA Explanation: dfa[next_char][current_state] = next_state Pattern: ABABAC (assume R=3 and the only characters are A,B,C) 2D array representation Directed graph representation 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C 6

How to use DFA ? Example Text: ABCABABABACA Pattern: ABABAC

How to use DFA ? State with state 0 ABCABABABACA Goto state 1 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? State with state 0 ABCABABABACA Goto state 1

How to use DFA ? Current state 1 ABCABABABACA Goto state 2 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 1 ABCABABABACA Goto state 2

How to use DFA ? Current state 2 ABCABABABACA Goto state 0 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 2 ABCABABABACA Goto state 0

How to use DFA ? Current state 0 ABCABABABACA Goto state 1 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 0 ABCABABABACA Goto state 1

How to use DFA ? Current state 1 ABCABABABACA Goto state 2 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 1 ABCABABABACA Goto state 2

How to use DFA ? Current state 2 ABCABABABACA Goto state 3 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 2 ABCABABABACA Goto state 3

How to use DFA ? Current state 3 ABCABABABACA Goto state 4 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 3 ABCABABABACA Goto state 4

How to use DFA ? Current state 4 ABCABABABACA Goto state 5 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 4 ABCABABABACA Goto state 5

How to use DFA ? Current state 5 ABCABABABACA Goto state 4 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 5 ABCABABABACA Goto state 4

How to use DFA ? Current state 4 ABCABABABACA Goto state 5 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 4 ABCABABABACA Goto state 5

How to use DFA ? Current state 5 ABCABABABACA Goto state 6 public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 5 ABCABABABACA Goto state 6

How to use DFA ? Current state 6 (ACCEPT) ABCABABABACA public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found How to use DFA ? Current state 6 (ACCEPT) ABCABABABACA

How to build DFA ? If we could match the next character, If we see expected character, go to the next state Pattern: ABABAC (assume R=3 and the only characters are A,B,C) 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 - 3 5 B 2 4 C 6 We only need dfa[R][m] since there is no transition information for the last state 1 2 3 4 5 6 A B C

How to build DFA ? If we could match the next character 1 2 3 4 5 6 A public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } 1 2 3 4 5 6 A B C

How to build DFA ? If we failed to match the next character Copy data from column x Mimic the transitions of state x Similar to `I am now in state x` or `restart from state x` x is a restart state Update restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. }

How to build DFA ? j=0 An example (ABABAC) j (current state): 0 1(A) public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 0 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 B C c j

How to build DFA ? j=1 An example (ABABAC) Process public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 1 x (restart state): 0 Process Copy dfa[][0] to dfa[][1] dfa[`B`][1] ←2 x ← dfa[`B`][0] = 0 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 B 2 C c j

How to build DFA ? j=1 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) Understand restart state x You are actually at state 1, but if you see next character is A or C, just suppose you are currently at state 0. Recall the meaning of states in DFA, state 0 means you have matched nothing. 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 B 2 C c j

How to build DFA ? j=2 An example (ABABAC) Process public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 2 x (restart state): 0 Process Copy dfa[][0] to dfa[][2] dfa[`A`][2] ←3 x ← dfa[`A`][0] = 1 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 C c j

How to build DFA ? j=2 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 2 x (restart state): 0 Understand restart state x At state 2, you have matched `AB`, but if you see next character is `B` or `C`, you have to start from very beginning (state 0). 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 C c j

How to build DFA ? j=2 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 2 x (restart state): 0 Understand restart state x x ← dfa[`A`][0] = 1, why ? At current state 2, the expect char is `A`, which means if we failed to match at next state 3, we do not need start from the very beginning, since at least we have `A` matched. 0 [x] 1(A) 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 C c j

How to build DFA ? j=3 An example (ABABAC) Process public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 3 x (restart state): 1 Process Copy dfa[][1] to dfa[][3] dfa[`B`][3] ←4 x ← dfa[`B`][1] = 2 1(A)[x] 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 4 C c j

How to build DFA ? j=3 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 3 x (restart state): 1 Understand restart state x We restart from state 1 if we failed to match the expected `B`. The reason is that we know we have at least a `A` already matched. 1(A)[x] 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 4 C c j

How to build DFA ? j=3 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 3 x (restart state): 1 Understand restart state x x ← dfa[`B`][1] = 2, why ? At current state 3, the expect char is `B` and restart state 1 tells us `A` is already matched in the pattern. Thus if we failed at next state 4, `AB` are already matched, i.e. we could update restart state x to 2. 1(A)[x] 2(B) 3(A) 4(B) 5(A) 6(C) A 1 3 B 2 4 C c j

How to build DFA ? j=4 An example (ABABAC) Process public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 4 x (restart state): 2 Process Copy dfa[][2] to dfa[][4] dfa[`A`][4] ←5 x ← dfa[`A`][2] = 3 1(A) 2(B)[x] 3(A) 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C c j

How to build DFA ? j=4 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 4 x (restart state): 2 Understand restart state x Explanation: at state 4, you already matched `ABAB`, if you failed to match next `A`, you assume you still matched `AB` since restart state is 2. This assumption is achieved by copying the column of 2 for failed cases (`B` and `C`). 1(A) 2(B)[x] 3(A) 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C c j

How to build DFA ? j=5 An example (ABABAC) Process public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 5 x (restart state): 3 Process Copy dfa[][3] to dfa[][5] dfa[`C`][5] ←6 x ← dfa[`C`][3] = 0 1(A) 2(B) 3(A)[x] 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C 6 c j

How to build DFA ? j=5 An example (ABABAC) Understand restart state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } An example (ABABAC) j (current state): 5 x (restart state): 3 Understand restart state x Explanation: we have already matched `ABABA`, if we failed to match the expected `C`, we assume we have matched `ABA` since restart state is 3 1(A) 2(B) 3(A)[x] 4(B) 5(A) 6(C) A 1 3 5 B 2 4 C 6 c j

Understand state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found Update x when we build the DFA is similar to the state transition when we match pattern in the text

Understand state x The transition of state x: match the pattern itself using partially constructed DFA table Build the next state of the DFA: we need to know the info of restart state x An example (ABABA) x ←0 x ← dfa[`B`][0] = 0 x ← dfa[`A`][0] = 1 x ← dfa[`B`][1] = 2 x ← dfa[`A`][2] = 3 x ← dfa[`C`][3] = 0

Conclusion Update x when we build the DFA is similar to the state transition when we match pattern in the text Understand that the process of building DFA is the same as matching the pattern to itself. By analyzing the pattern, we know how to move forward when we see failed matching characters.