Knuth-Morris-Pratt algorithm Presented by Sathyasathish
Agenda Problem/issue Conventional Solution(Compare/one shift) & Ѳ KMP solution & Ѳ
Pattern Matching Problem/issue Finding occurrence of a pattern(string) ‘P’ in String ‘S’ and also finding the position in ‘S’ where the pattern match occurs Source:www.cs.pitt.edu/~kirk/cs1501/notes/cs1501.ppt
Conventional Solution Compare each character of P with S if match continue else shift one position String S a b c a b Pattern p Source:www.cs.pitt.edu/~kirk/cs1501/notes/cs1501.ppt
Comparison S S a b c a b p Step 2: compare p[2] with S[2] a b c p a b Source:www.cs.pitt.edu/~kirk/cs1501/notes/cs1501.ppt
Comparison a b c p a b Step 3: compare p[3] with S[3] S Mismatch occurs here.. p a b “Since mismatch is detected, shift ‘P’ one position to the Right and perform steps analogous to those from step 1 to step 3. At position where mismatch is detected, shift ‘P’ one position to the right and repeat matching procedure. “ Source:www.cs.pitt.edu/~kirk/cs1501/notes/cs1501.ppt
Conventional match program for ( i=0;i+P.length<T.length; i++) { x++; for ( j=0; i+j <T.length && j< P.length && T[i+j]==P[j]; ++z,j++) { //System.out.println(""+T[i+j]+P[j]); flag=false } j++; m=m+j; if (j ==P.length+1 ) System.out.println("found a match at "+(i+1)); System.out.println("Program Charecter comparision : "+(m)+"\nNumber of attepmts : "+x) Soucrce:http://www.ics.uci.edu/~eppstein/161/960227.html migrated from C to java by Sathya
of Conventional Outer loop n times (n length of String ‘S’) Inner loop m times (m length of Pattern ‘P’) Code: for (m){ for(n); } Ѳ (mn)
KMP Potential area where conventional algorithm can be improved are a follows It never keep track previously known character in the then string when there is a partial match , on mis- match it again does comparison for all character in the string KMP uses learning(from partial match) in the String and Pattern (overlap in the pattern)while comparison and we will see how much efficiency it has delivered
Example 0 1 2 3 4 5 6 7 8 9 10 11 T: b a n a n a n o b a n o i=0: X i=1: X i=2: n a n X i=3: X i=4: n a n o i=5: X i=6: n X i=7: X i=8: X i=9: n X i=10: X After investing a lot of work making comparisons in the inner loop of the code, a lot about what's in the text in known (partial match of j characters starting at position i, you know what's in positions S[i]...S[i+j-1]. ), KMP uses this learning http://www.ics.uci.edu/~eppstein/161/960227.html
KMP Solution Issue with Conventional Algorithm i=2: n a n i=3: n a n o(Invalid Shift or wasted shift) KMP First Optimization step -skipping Outer loop i=2: n a n x i=4: n a n o(valid shift or learnt shift) KMP Second Optimization step -skipping Inner loop i=2: n a n x
Comparison KMP http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
KMP Algorithm It differ from conventional algorithm when there is partial mismatch How it differ we will see in a while! First we have to under stand proper prefix and a proper suffix Example S=“nano “ Prefix-n,na , nan but not (nano itself) Suffix- 0, no, ano but not (nano itself) why we need to know this ?
Suffix Prefix Take : String :- abcdabfxxxxx Pattern :- abcdabe Start next comparison from String :- abcdabfXXXXXX Pattern :- abcdabe
How KMP achieve this First it preprocess the pattern irrespective of String to compared. And identify the occurrence of same proper prefix or suffix this is called border or window When there is a mismatch it goes and tries with next largest window Example :ABAMABA http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
Preprocessing
Preprocessing & window width table
String and Pattern matching
Ѳ KMP Table can be computed in Ѳ (m) The searching phase can be performed in O(m+n) time Knuth-Morris-Pratt algorithm performs at most 2n-1 text character comparisons during the searching phase Since m<n overall Ѳ (n) http://www-igm.univ-mlv.fr/~lecroq/string/node8.html#SECTION0080
Thank you Questions??????????????????