Download presentation
Presentation is loading. Please wait.
Published byRodger Robinson Modified over 9 years ago
1
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 18 Oct 2004 2nd Lecture Christian Schindelhauer schindel@upb.de
2
Search Algorithms, WS 2004/05 2 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Organization Register for the Exercise Classes! –http://studinfo.upb.de/cgi-bin/go?c=searchalg_2004ws Sign-up for the presentation of an exercise in time!
3
Search Algorithms, WS 2004/05 3 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Chapter I Part II Searching Text 18 Oct 2004
4
Search Algorithms, WS 2004/05 4 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Searching Text (Overview) The task of string matching –Easy as a pie The naive algorithm –How would you do it? The Rabin-Karp algorithm –Ingenious use of primes and number theory The Knuth-Morris-Pratt algorithm –Let a (finite) automaton do the job –This is optimal The Boyer-Moore algorithm –Bad letters allow us to jump through the text –This is even better than optimal (in practice) Literature –Cormen, Leiserson, Rivest, “Introduction to Algorithms”, chapter 36, string matching, The MIT Press, 1989, 853-885.
5
Search Algorithms, WS 2004/05 5 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The task of string matching Given –A text T of length n over finite alphabet : –A pattern P of length m over finite alphabet : Output –All occurrences of P in T amnmaaanptaiiptpii T[1]T[n] ptai P[1]P[m] amnmaaanptaiiptpii ptai Shift s T[s+1..s+m] = P[1..m]
6
Search Algorithms, WS 2004/05 6 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Naive Algorithm Naive-String-Matcher(T,P) 1.n length(T) 2.m length(P) 3.for s 0 to n-m do 4. if P[1..m] = T[s+1.. s+m] then 5. return “Pattern occurs with shift s” 6.fi 7.od Fact: The naive string matcher needs worst case running time O((n-m+1) m) For n = 2m this is O(n 2 ) The naive string matcher is not optimal, since string matching can be done in time O(m + n)
7
Search Algorithms, WS 2004/05 7 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Rabin-Karp-Algorithm Idea: Compute –checksum for pattern P and –checksum for each sub-string of T of length m amnmaaanptaiiptpii 423142311323110 ptai 3 valid hit spurious hit checksums checksum
8
Search Algorithms, WS 2004/05 8 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Performance of Rabin-Karp The worst-case running time of the Rabin-Karp algorithm is O(m (n-m+1)) = worst-case running time of the naive algorithme The expected run time of Rabin-Karp is O(n + m (v+n/q)) if v is the number of valid shifts (hits) If we choose q ≥ m and have only a constant number of hits, then the expected run time of Rabin-Karp is O(n +m) However, if v and m is large then the running time is O(n 2 ) Today we will learn to do this in time O(n+m)
9
Search Algorithms, WS 2004/05 9 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Total recall: Finite Automata (very formal) Definition A deterministic finite automaton M is a 5-tuple (Q,q 0,A, , ), where –Q is a finite set of states –q 0 Q is the start state –A Q is a distinguished set of accepting sates – , is a finite input alphabet, – : Q Q is called the transition function of M Let : Q be the final-state function defined as: For the empty string we have: ( ) := q 0 For all a w define (wa):= (w), a M accepts w if and only i f: (w) Q
10
Search Algorithms, WS 2004/05 10 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (I) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 aabbababab input: States
11
Search Algorithms, WS 2004/05 11 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (II) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 input state ab 010 112 213 340 412 States
12
Search Algorithms, WS 2004/05 12 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (III) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412
13
Search Algorithms, WS 2004/05 13 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (IV) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 0
14
Search Algorithms, WS 2004/05 14 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (V) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 01
15
Search Algorithms, WS 2004/05 15 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (VI) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012
16
Search Algorithms, WS 2004/05 16 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (VII) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 0121
17
Search Algorithms, WS 2004/05 17 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (VIII) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 01212
18
Search Algorithms, WS 2004/05 18 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (IX) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012123
19
Search Algorithms, WS 2004/05 19 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (X) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012123 4
20
Search Algorithms, WS 2004/05 20 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example (XI) Q is a finite set of states q 0 Q is the start state Q is a set of accepting sates : input alphabet : Q Q: transition function 0 1 4 2 3 a b b a a b b b a a input state ab 010 112 213 340 412 aabbababab 012123 42 341
21
Search Algorithms, WS 2004/05 21 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Finite-Automaton-Matcher The example automaton accepts at the end of occurences of the pattern abba For every pattern of length m there exists an automaton with m+1 states that solves the pattern matching problem with the following algorithm: Finite-Automaton-Matcher(T, ,P) 1.n length(T) 2.q 0 3.for i 1 to n do 4. q (q,T[i]) 5. if q = m then 6. s i - m 7. return “Pattern occurs with shift” s 8.fi 9.od
22
Search Algorithms, WS 2004/05 22 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Computing the Transition Function: The Idea! amnmaaamptaiipt mmaa mmaa mmaa mmaa mmaa mmaa mmaa
23
Search Algorithms, WS 2004/05 23 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer How to Compute the Transition Function? A string u is a prefix of string v if there exists a string a such that:ua = v A string u is a suffix of string v if there exists a string a such that: au = v Let P k denote the first k letter string of P Compute-Transition-Function(P, ) 1.m length(P) 2.for q 0 to m do 3. for each character a do 4. k 1+min(m,q+1) 5. repeat k k-1 6. until P k is a suffix of P q a 7. (q,a) k 8.od 9.od
24
Search Algorithms, WS 2004/05 24 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example A string u is a prefix of string v if there exists a string a such that: ua = v A string u is a suffix of string v if there exists a string a such that: au = v Let P k denote the first k letter string of P Compute-Transition-Function(P, ) 1.m length(P) 2.for q 0 to m do 3. for each character a do 4. k 1+min(m,q+1) 5. repeat k k-1 6. until P k is a suffix of P q a 7. (q,a) k 8.od 9.od baabaaaa a a baabaaaa baabaaa P8P8 P7aP7a Text Pattern
25
Search Algorithms, WS 2004/05 25 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Example A string u is a prefix of string v if there exists a string a such that: ua = v A string u is a suffix of string v if there exists a string a such that: au = v Let P k denote the first k letter string of P Compute-Transition-Function(P, ) 1.m length(P) 2.for q 0 to m do 3. for each character a do 4. k 1+min(m,q+1) 5. repeat k k-1 6. until P k is a suffix of P q a 7. (q,a) k 8.od 9.od baabaaaa b a baabaaaa baabaaa baabaaa baabaa baaba P8P8 P7bP7b P7P7 P6P6 P5P5 Text Pattern
26
Search Algorithms, WS 2004/05 26 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Running time of Compute Transition-Function A string u is a prefix of string v if there exists a string a such that: ua = v A string u is a suffix of string v if there exists a string a such that:au = v Let P k denote the first k letter string of P Compute-Transition-Function(P, ) 1.m length(P) 2.for q 0 to m do 3. for each character a do 4. k 1+min(m,q+1) 5. repeat k k-1 6. until P k is a suffix of P q a 7. (q,a) k 8.od 9.od Factor: m+1 Factor: | | Factor: m Time for check of equality: m Running time of procedure: O(m 3 | | )
27
Search Algorithms, WS 2004/05 27 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer From Finite Automata to Knuth-Morris-Pratt The combined running time of –Compute Transition Function and –Finite-Automaton-Matcher is O(n + m 3 | |) Used memory space: O(m | |) for transition function –for large alphabets quite a lot Reduce memory consumption by using the following function: [q] := max {k : k < q and P k is a suffix of P q } baabaaaaa baabaaaaa [7] = 4 baabaaaa b a baabaaaa baabaaa baabaaa baabaa baaba P8P8 P7bP7b P7P7 P6P6 P5P5 Text Pattern
28
Search Algorithms, WS 2004/05 28 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer a baaba [q] := max {k : k < q and P k is a suffix of P q } Pattern: baabaa [6] = 3 baaa baa [4] = 1 baaba baaa [5] = 2 a [1] = 0 ba a [2] = 0 baa ba [3] = 1 baabaaaaa baabaaa baabaa [7] = 4 baabaaaa baabaaa [8] = 1 baabaaaa baaba a a [9] = 1 a
29
Search Algorithms, WS 2004/05 29 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Knuth-Morris-Pratt Pattern Matching KMP-Matcher(T,P) 1.n length(T) 2.m length(P) 3. Compute-Prefix-Function(P) 4.q 0 5.for i 1 to n do 6. while q > 0 and P[q+1] T[i] do 7. q [q] od 8. if P[q+1] = T[i] then 9. q q+1 fi 10. if q = m then 11. print “Pattern occurs with shift” i-m 12. q [q] fi od If P q+1 does not fit (then this is indicated by the last letter - think about this)..... shift the pattern to the next reasonable position (given by ) If the letter fits, then increment position (otherwise we have q = 0) We don’t forget to shift the pattern for the next occurrence We have matched the whole pattern: “Heureka”
30
Search Algorithms, WS 2004/05 30 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Knuth-Morris-Pratt Pattern Matching KMP-Matcher(T,P) 1.n length(T) 2.m length(P) 3. Compute-Prefix-Function(P) 4.q 0 5.for i 1 to n do 6. while q > 0 and P[q+1] T[i] do 7. q [q] od 8. if P[q+1] = T[i] then 9. q q+1 fi 10. if q = m then 11. print “Pattern occurs with shift”i-m 12. q [q] fi od amnmaaampa m m m a m a ma ma m m a m m mma mma m m Pattern mmaa
31
Search Algorithms, WS 2004/05 31 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Running Time of KMP KMP-Matcher(T,P) 1.n length(T) 2.m length(P) 3. Compute-Prefix-Function(P) 4.q 0 5.for i 1 to n do 6. while q > 0 and P[q+1] T[i] do 7. q [q] od 8. if P[q+1] = T[i] then 9. q q+1 fi 10. if q = m then 11. print “Pattern occurs with shift” i-m 12. q [q] fi od Here q is decreased by at least 1 if q>0 This happens at most n times [k] 1 Run time is O(n)
32
Search Algorithms, WS 2004/05 32 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Computing Compute-Prefix-Function(P) 1.m length(P) 2. [1] 0 3.k 0 4.for q 2 to m do 5. while k > 0 and P[k+1] P[q] do 6. k [k] od 7. if P[k+1] = P[q] then 8. k k+1 fi 9. [q] k od If P k+1 is not a suffix of P q... shift the pattern to the next reasonable position (given by smaller values of ) If the letter fits, then increment position (otherwise k = 0) We have found the position such that [q] := max {k : k < q and P k is a suffix of P q }
33
Search Algorithms, WS 2004/05 33 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Simulation of the Computing Compute-Prefix-Function(P) 1.m length(P) 2. [1] 0 3.k 0 4.for q 2 to m do 5. while k > 0 and P[k+1] P[q] do 6. k [k] od 7. if P[k+1] = P[q] then 8. k k+1 fi 9. [q] k od Run time analysis: Analogous to KMP: O(m) baabaaaaa a a ab ab aba abaa abaab ab ab a
34
Search Algorithms, WS 2004/05 34 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Conclusion The Knuth-Morris-Pratt algorithm works as the Finite-Automaton-Matcher The computation of the prefix function needs time O(m) –while the computation of the automaton needs time O(n + m 3 | |) Amortized analysis shows that the KMP-Matcher is up to a constant factor as fast as the Finite-Automaton-Matcher This gives run time of O(m+n) for the KMP-Matcher This is optimal! Can we do better?
35
Search Algorithms, WS 2004/05 35 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore: The ideas! amnmaaanptaiiptpii ptii ptii Start comparing at the end What’s this? There is no “a” in the search pattern We can shift m+1 letters An “a” again... ptii First wrong letter! Do a large shift! ptii Bingo! Do another large shift! ptii That’s it! 10 letters compared and ready!
36
Search Algorithms, WS 2004/05 36 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore-Matcher(T,P, ) 1.n length(T) 2.m length(P) 3. Compute-Last-Occurence-Function(P,m, ) 4. Compute-Good-Suffix(P,m) 5.s 0 6.while s n-m do 7. j m 8. while j > 0 and P[j] = T[s+j] do 9. j j-1 od 10. if j=0 then 11.print “Pattern occurs with shift” s 12. s s+ [0] else 13. s s+ max( [j], j - [T[s+j]] ) fi od
37
Search Algorithms, WS 2004/05 37 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Compute-Last-Occurrence-Function(P,m, ) 1.for each character a do 2. [a] 0 od 3.for j 1 to m do 4. [P[j]] j od 5.return Running time: O(| | + m)
38
Search Algorithms, WS 2004/05 38 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Compute-Good-Suffix-Function(P,m) 1. Compute-Prefix-Function(P) 2.P’ reverse(P) 3. ’ Compute-Prefix-Function(P) 4.for j 0 to m do 5. [j] m - [m] od 6.for l 1 to m do 7. j m - ’[l] 8. if [j] > l - ’[l] then 9. [j] > l - ’[l] fi od 10.return Running time: O(m)
39
39 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Thanks for your attention End of 2nd lecture Next lecture:Mo 25 Oct 2004, 11 am, FU 116 Next exercise class: Mo 18 Oct 2004, 1 pm, F0.530 or We 20 Oct 2004, 1 pm, F1.110
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.