Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 18 Oct.

Similar presentations


Presentation on theme: "1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 18 Oct."— Presentation transcript:

1 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 18 Oct 2004 2nd Lecture Christian Schindelhauer schindel@upb.de

2 Search Algorithms, WS 2004/05 2 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Organization  Register for the Exercise Classes! –http://studinfo.upb.de/cgi-bin/go?c=searchalg_2004ws  Sign-up for the presentation of an exercise in time!

3 Search Algorithms, WS 2004/05 3 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Chapter I Part II Searching Text 18 Oct 2004

4 Search Algorithms, WS 2004/05 4 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Searching Text (Overview)  The task of string matching –Easy as a pie  The naive algorithm –How would you do it?  The Rabin-Karp algorithm –Ingenious use of primes and number theory  The Knuth-Morris-Pratt algorithm –Let a (finite) automaton do the job –This is optimal  The Boyer-Moore algorithm –Bad letters allow us to jump through the text –This is even better than optimal (in practice)  Literature –Cormen, Leiserson, Rivest, “Introduction to Algorithms”, chapter 36, string matching, The MIT Press, 1989, 853-885.

5 Search Algorithms, WS 2004/05 5 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The task of string matching  Given –A text T of length n over finite alphabet  : –A pattern P of length m over finite alphabet  :  Output –All occurrences of P in T amnmaaanptaiiptpii T[1]T[n] ptai P[1]P[m] amnmaaanptaiiptpii ptai Shift s T[s+1..s+m] = P[1..m]

6 Search Algorithms, WS 2004/05 6 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Naive Algorithm Naive-String-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.for s  0 to n-m do 4. if P[1..m] = T[s+1.. s+m] then 5. return “Pattern occurs with shift s” 6.fi 7.od Fact:  The naive string matcher needs worst case running time O((n-m+1) m)  For n = 2m this is O(n 2 )  The naive string matcher is not optimal, since string matching can be done in time O(m + n)

7 Search Algorithms, WS 2004/05 7 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Rabin-Karp-Algorithm  Idea: Compute –checksum for pattern P and –checksum for each sub-string of T of length m amnmaaanptaiiptpii 423142311323110 ptai 3 valid hit spurious hit checksums checksum

8 Search Algorithms, WS 2004/05 8 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Performance of Rabin-Karp  The worst-case running time of the Rabin-Karp algorithm is O(m (n-m+1)) = worst-case running time of the naive algorithme  The expected run time of Rabin-Karp is O(n + m (v+n/q)) if v is the number of valid shifts (hits)  If we choose q ≥ m and have only a constant number of hits, then the expected run time of Rabin-Karp is O(n +m)  However, if v and m is large then the running time is O(n 2 ) Today we will learn to do this in time O(n+m)

9 Search Algorithms, WS 2004/05 9 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Total recall: Finite Automata (very formal) Definition A deterministic finite automaton M is a 5-tuple (Q,q 0,A, ,  ), where –Q is a finite set of states –q 0  Q is the start state –A  Q is a distinguished set of accepting sates – , is a finite input alphabet, –  : Q    Q is called the transition function of M Let  :   Q be the final-state function defined as: For the empty string  we have:  (  ) := q 0 For all a   w    define  (wa):=  (w), a  M accepts w if and only i f:  (w)  Q

10 Search Algorithms, WS 2004/05 10 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (I) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 aabbababab input: States

11 Search Algorithms, WS 2004/05 11 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (II) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 input state ab 010 112 213 340 412 States

12 Search Algorithms, WS 2004/05 12 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (III) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412

13 Search Algorithms, WS 2004/05 13 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (IV) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 0

14 Search Algorithms, WS 2004/05 14 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (V) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 01

15 Search Algorithms, WS 2004/05 15 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (VI) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012

16 Search Algorithms, WS 2004/05 16 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (VII) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 0121

17 Search Algorithms, WS 2004/05 17 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (VIII) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 01212

18 Search Algorithms, WS 2004/05 18 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (IX) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012123

19 Search Algorithms, WS 2004/05 19 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (X) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012123 4

20 Search Algorithms, WS 2004/05 20 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (XI) Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012123 42 341

21 Search Algorithms, WS 2004/05 21 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Finite-Automaton-Matcher  The example automaton accepts at the end of occurences of the pattern abba  For every pattern of length m there exists an automaton with m+1 states that solves the pattern matching problem with the following algorithm: Finite-Automaton-Matcher(T, ,P) 1.n  length(T) 2.q  0 3.for i  1 to n do 4. q   (q,T[i]) 5. if q = m then 6. s  i - m 7. return “Pattern occurs with shift” s 8.fi 9.od

22 Search Algorithms, WS 2004/05 22 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Computing the Transition Function: The Idea! amnmaaamptaiipt mmaa mmaa mmaa mmaa mmaa mmaa mmaa

23 Search Algorithms, WS 2004/05 23 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer How to Compute the Transition Function?  A string u is a prefix of string v if there exists a string a such that:ua = v  A string u is a suffix of string v if there exists a string a such that: au = v  Let P k denote the first k letter string of P Compute-Transition-Function(P,  ) 1.m  length(P) 2.for q  0 to m do 3. for each character a   do 4. k  1+min(m,q+1) 5. repeat k  k-1 6. until P k is a suffix of P q a 7.  (q,a)  k 8.od 9.od

24 Search Algorithms, WS 2004/05 24 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example  A string u is a prefix of string v if there exists a string a such that: ua = v  A string u is a suffix of string v if there exists a string a such that: au = v  Let P k denote the first k letter string of P Compute-Transition-Function(P,  ) 1.m  length(P) 2.for q  0 to m do 3. for each character a   do 4. k  1+min(m,q+1) 5. repeat k  k-1 6. until P k is a suffix of P q a 7.  (q,a)  k 8.od 9.od baabaaaa a a baabaaaa baabaaa P8P8 P7aP7a Text Pattern

25 Search Algorithms, WS 2004/05 25 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example  A string u is a prefix of string v if there exists a string a such that: ua = v  A string u is a suffix of string v if there exists a string a such that: au = v  Let P k denote the first k letter string of P Compute-Transition-Function(P,  ) 1.m  length(P) 2.for q  0 to m do 3. for each character a   do 4. k  1+min(m,q+1) 5. repeat k  k-1 6. until P k is a suffix of P q a 7.  (q,a)  k 8.od 9.od baabaaaa b a baabaaaa baabaaa baabaaa baabaa baaba P8P8 P7bP7b P7P7 P6P6 P5P5 Text Pattern

26 Search Algorithms, WS 2004/05 26 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Running time of Compute Transition-Function  A string u is a prefix of string v if there exists a string a such that: ua = v  A string u is a suffix of string v if there exists a string a such that:au = v  Let P k denote the first k letter string of P Compute-Transition-Function(P,  ) 1.m  length(P) 2.for q  0 to m do 3. for each character a   do 4. k  1+min(m,q+1) 5. repeat k  k-1 6. until P k is a suffix of P q a 7.  (q,a)  k 8.od 9.od Factor: m+1 Factor: |  | Factor: m Time for check of equality: m Running time of procedure: O(m 3 |  | )

27 Search Algorithms, WS 2004/05 27 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer From Finite Automata to Knuth-Morris-Pratt  The combined running time of –Compute Transition Function and –Finite-Automaton-Matcher  is O(n + m 3 |  |)  Used memory space: O(m |  |) for transition function –for large alphabets quite a lot  Reduce memory consumption by using the following function:  [q] := max {k : k < q and P k is a suffix of P q } baabaaaaa baabaaaaa  [7] = 4 baabaaaa b a baabaaaa baabaaa baabaaa baabaa baaba P8P8 P7bP7b P7P7 P6P6 P5P5 Text Pattern

28 Search Algorithms, WS 2004/05 28 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer a baaba  [q] := max {k : k < q and P k is a suffix of P q } Pattern: baabaa  [6] = 3 baaa baa  [4] = 1 baaba baaa  [5] = 2 a  [1] = 0 ba a  [2] = 0 baa ba  [3] = 1 baabaaaaa baabaaa baabaa  [7] = 4 baabaaaa baabaaa  [8] = 1 baabaaaa baaba a a  [9] = 1 a

29 Search Algorithms, WS 2004/05 29 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Knuth-Morris-Pratt Pattern Matching KMP-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.   Compute-Prefix-Function(P) 4.q  0 5.for i  1 to n do 6. while q > 0 and P[q+1]  T[i] do 7. q   [q] od 8. if P[q+1] = T[i] then 9. q  q+1 fi 10. if q = m then 11. print “Pattern occurs with shift” i-m 12. q   [q] fi od If P q+1 does not fit (then this is indicated by the last letter - think about this)..... shift the pattern to the next reasonable position (given by  ) If the letter fits, then increment position (otherwise we have q = 0) We don’t forget to shift the pattern for the next occurrence We have matched the whole pattern: “Heureka”

30 Search Algorithms, WS 2004/05 30 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Knuth-Morris-Pratt Pattern Matching KMP-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.   Compute-Prefix-Function(P) 4.q  0 5.for i  1 to n do 6. while q > 0 and P[q+1]  T[i] do 7. q   [q] od 8. if P[q+1] = T[i] then 9. q  q+1 fi 10. if q = m then 11. print “Pattern occurs with shift”i-m 12. q   [q] fi od amnmaaampa m m m a m a ma ma m m a m m mma mma m m Pattern mmaa 

31 Search Algorithms, WS 2004/05 31 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Running Time of KMP KMP-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.   Compute-Prefix-Function(P) 4.q  0 5.for i  1 to n do 6. while q > 0 and P[q+1]  T[i] do 7. q   [q] od 8. if P[q+1] = T[i] then 9. q  q+1 fi 10. if q = m then 11. print “Pattern occurs with shift” i-m 12. q   [q] fi od Here q is decreased by at least 1 if q>0 This happens at most n times  [k] 1 Run time is O(n)

32 Search Algorithms, WS 2004/05 32 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Computing  Compute-Prefix-Function(P) 1.m  length(P) 2.  [1]  0 3.k  0 4.for q  2 to m do 5. while k > 0 and P[k+1]  P[q] do 6. k   [k] od 7. if P[k+1] = P[q] then 8. k  k+1 fi 9.  [q]  k od If P k+1 is not a suffix of P q... shift the pattern to the next reasonable position (given by smaller values of  ) If the letter fits, then increment position (otherwise k = 0) We have found the position such that  [q] := max {k : k < q and P k is a suffix of P q }

33 Search Algorithms, WS 2004/05 33 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Simulation of the Computing  Compute-Prefix-Function(P) 1.m  length(P) 2.  [1]  0 3.k  0 4.for q  2 to m do 5. while k > 0 and P[k+1]  P[q] do 6. k   [k] od 7. if P[k+1] = P[q] then 8. k  k+1 fi 9.  [q]  k od Run time analysis: Analogous to KMP: O(m) baabaaaaa a a ab ab aba abaa abaab ab ab a

34 Search Algorithms, WS 2004/05 34 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Conclusion  The Knuth-Morris-Pratt algorithm works as the Finite-Automaton-Matcher  The computation of the prefix function  needs time O(m) –while the computation of the automaton needs time O(n + m 3 |  |)  Amortized analysis shows that the KMP-Matcher is up to a constant factor as fast as the Finite-Automaton-Matcher  This gives run time of O(m+n) for the KMP-Matcher  This is optimal!  Can we do better?

35 Search Algorithms, WS 2004/05 35 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore: The ideas! amnmaaanptaiiptpii ptii ptii Start comparing at the end What’s this? There is no “a” in the search pattern We can shift m+1 letters An “a” again... ptii First wrong letter! Do a large shift! ptii Bingo! Do another large shift! ptii That’s it! 10 letters compared and ready!

36 Search Algorithms, WS 2004/05 36 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore-Matcher(T,P,  ) 1.n  length(T) 2.m  length(P) 3.  Compute-Last-Occurence-Function(P,m,  ) 4.   Compute-Good-Suffix(P,m) 5.s  0 6.while s  n-m do 7. j  m 8. while j > 0 and P[j] = T[s+j] do 9. j  j-1 od 10. if j=0 then 11.print “Pattern occurs with shift” s 12. s  s+  [0] else 13. s  s+ max(  [j], j - [T[s+j]] ) fi od

37 Search Algorithms, WS 2004/05 37 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Compute-Last-Occurrence-Function(P,m,  ) 1.for each character a   do 2. [a]  0 od 3.for j  1 to m do 4. [P[j]]  j od 5.return Running time: O(|  | + m)

38 Search Algorithms, WS 2004/05 38 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Compute-Good-Suffix-Function(P,m) 1.   Compute-Prefix-Function(P) 2.P’  reverse(P) 3.  ’  Compute-Prefix-Function(P) 4.for j  0 to m do 5.  [j]  m -  [m] od 6.for l  1 to m do 7. j  m -  ’[l] 8. if  [j] > l -  ’[l] then 9.  [j] > l -  ’[l] fi od 10.return  Running time: O(m)

39 39 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Thanks for your attention End of 2nd lecture Next lecture:Mo 25 Oct 2004, 11 am, FU 116 Next exercise class: Mo 18 Oct 2004, 1 pm, F0.530 or We 20 Oct 2004, 1 pm, F1.110


Download ppt "1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 18 Oct."

Similar presentations


Ads by Google