Presentation is loading. Please wait.

Presentation is loading. Please wait.

Martin KayString Matching 11 Martin Kay Stanford University.

Similar presentations


Presentation on theme: "Martin KayString Matching 11 Martin Kay Stanford University."— Presentation transcript:

1 Martin KayString Matching 11 Martin Kay Stanford University

2 Martin KayString Matching 12 Naive Search (1) naive_search(Pattern, Text, 1) :- append(Pattern, _, Text). naive_search(Pattern, [_ | Text], N) :- naive_search(Pattern, Text, N0), N is N0+1. naive_search("is", "mississippi", N). N = 2 ? ; N = 5 ? ; no | ?-

3 Martin KayString Matching 13 pref — A Prefix Predicate pref(P, T) :- assert(stat(T, P)), fail. pref([], _). pref([H | P], [H | T]) :- pref(P, T). Make an entry in the data base every time the predicate is called.

4 Martin KayString Matching 14 Search using pref naive_search1(Pattern, Text, 1) :- pref(Pattern, Text). naive_search1(Pattern, [_ | Text], N) :- naive_search1(Pattern, Text, N0), N is N0+1. | ?- naive_search1([i,s], [m,i,s,s,i,s,s,i,p,p,i], N). N = 2 ? ; N = 5 ? ; no | ?-

5 Martin KayString Matching 15 The Statistics | ?- listing(stat). stat([m,i,s,s,i,s,s,i,p,p,i], [i,s]). stat([i,s,s,i,s,s,i,p,p,i], [i,s]). stat([s,s,i,s,s,i,p,p,i], [s]). stat([s,i,s,s,i,p,p,i], []). stat([s,s,i,s,s,i,p,p,i], [i,s]). stat([s,i,s,s,i,p,p,i], [i,s]). stat([i,s,s,i,p,p,i], [i,s]). stat([s,s,i,p,p,i], [s]). stat([s,i,p,p,i], []). stat([s,s,i,p,p,i], [i,s]). stat([s,i,p,p,i], [i,s]). stat([i,p,p,i], [i,s]). stat([p,p,i], [s]). stat([p,p,i], [i,s]). stat([p,i], [i,s]). stat([i], [i,s]). stat([], [s]). stat([], [i,s]). 18 Entries 11 Allignments

6 Martin KayString Matching 16 Observe-- If the pattern “mississippi” matched part of the way, we can move over all the the characters matched because none of them can be an “m”, which is what we need to start a new match. m i s s i o n a r y.... m i s s i s s i p p i Mismatch No “m” here Text: Pattern: or maybe even here So move to here!

7 Martin KayString Matching 17 Observe further -- p e r p e n d i c u l a r... p e r p e t r a t e Text: Pattern: Mismatch This is a prefix of the pattern p e r p e t r a t e So try this

8 Martin KayString Matching 18 Observe yet further -- p e r p e t u a l..... p e r p e t r a t e Text: Pattern: Mismatch No (shorter) prefix of the pattern ends here p e r p e t r a t e So move to here

9 Martin KayString Matching 19 Overlaps a b a c a b a d a b a c a b a a b a b a c a b a d a b a c a b a d a b a c a b a b a Search for in the text a b a c a b a d a b a c a b a

10 Martin KayString Matching 110 Déja vu a b a c a b a d a b a c a b a a b a b a c a b a d a b a c a b a d a b a c a b a b a Search for in the text a b a c a b a d a b a c a b a

11 Martin KayString Matching 111 On-line search We have seen this much of the text so far: We are looking for the pattern cacao. We have some number (0 or more) searches in progress and are waiting for the next character to see which ones continue and maybe to start a new one. c a c c c

12 Martin KayString Matching 112 0 a [0] 1 b [0, 1] 2 a [0, 2] 3 b [0, 1, 3] 4 a [0, 2] 5 c [0, 1, 3] 6 a [0, 4] 7 b [0, 1, 5] 8 a [0, 2, 6] 9 d [0, 1, 3, 7] 10 a [0, 8] 11 b [0, 1, 9] 12 a [0, 2, 10] 13 c [0, 1, 3, 11] 14 a [0, 4, 12] 15 b [0, 1, 5, 13] 16 a [0, 2, 6, 14] result 2 17 d [0, 1, 3, 7] 18 a [0, 8] 19 b [0, 1, 9] 20 a [0, 2, 10] 21 c [0, 1, 3, 11] 22 a [0, 4, 12] 23 b [0, 1, 5, 13] 24 a [0, 2, 6, 14] result 10 25 b [0, 1, 3, 7] 26 a [0, 2] a b a c a b a d a b a c a b a a b a b a c a b a d a b a c a b a d a b a c a b a b a Search for in the text  1.The rightmost pointer always moves. 2.Others pointers move if they can do so over the same character 3.A new ‘0’ is introduced on the left A pointer in a given position always has pointers in the same set of positions to its left These are properties of the pattern only. Therefore they can be cached or precompiled. 

13 Martin KayString Matching 113 0 a [0] 1 b [0, 1] 2 a [0, 2] 3 b [0, 1, 3] 4 a [0, 2] 5 c [0, 1, 3] 6 a [0, 4] 7 b [0, 1, 5] 8 a [0, 2, 6] 9 d [0, 1, 3, 7] 10 a [0, 8] 11 b [0, 1, 9] 12 a [0, 2, 10] 13 c [0, 1, 3, 11] 14 a [0, 4, 12] 15 b [0, 1, 5, 13] 16 a [0, 2, 6, 14] result 2 17 d [0, 1, 3, 7] 18 a [0, 8] 19 b [0, 1, 9] 20 a [0, 2, 10] 21 c [0, 1, 3, 11] 22 a [0, 4, 12] 23 b [0, 1, 5, 13] 24 a [0, 2, 6, 14] result 10 25 b [0, 1, 3, 7] 26 a [0, 2] a b a c a b a d a b a c a b a a b a b a c a b a d a b a c a b a d a b a c a b a b a Search for If this matches... then so will these

14 Martin KayString Matching 114 a b a c a b a d a b a c a b a a b a b a c a b a d a b a c a b a d a b a c a b a b a Search for So try these only if this fails! 0 a [0] 1 b [0, 1] 2 a [0, 2] 3 b [0, 1, 3] 4 a [0, 2] 5 c [0, 1, 3] 6 a [0, 4] 7 b [0, 1, 5] 8 a [0, 2, 6] 9 d [0, 1, 3, 7] 10 a [0, 8] 11 b [0, 1, 9] 12 a [0, 2, 10] 13 c [0, 1, 3, 11] 14 a [0, 4, 12] 15 b [0, 1, 5, 13] 16 a [0, 2, 6, 14] result 2 17 d [0, 1, 3, 7] 18 a [0, 8] 19 b [0, 1, 9] 20 a [0, 2, 10] 21 c [0, 1, 3, 11] 22 a [0, 4, 12] 23 b [0, 1, 5, 13] 24 a [0, 2, 6, 14] result 10 25 b [0, 1, 3, 7] 26 a [0, 2]

15 Martin KayString Matching 115 The failure function a [0] b [0, 1] a [0, 2] b [0, 1, 3] a [0, 2] c [0, 1, 3] a [0, 4] b [0, 1, 5] a [0, 2, 6] d [0, 1, 3, 7] a [0, 8] b [0, 1, 9] a [0, 2, 10] c [0, 1, 3, 11] a [0, 4, 12] a b a c a b a d a b a c a... 0 1 2 3 4 5 6 7 8 9 10 11 12... 0 0 1 0 1 2 3 0 1 2 3 4...

16 Martin KayString Matching 116 a [0] b [0, 1] a [0, 2] b [0, 1, 3] a [0, 2] c [0, 1, 3] a [0, 4] b [0, 1, 5] a [0, 2, 6] d [0, 1, 3, 7] a [0, 8] b [0, 1, 9] a [0, 2, 10] c [0, 1, 3, 11] a [0, 4, 12] a b a c a b a d a b a c a... 0 1 2 3 4 5 6 7 8 9 10 11 12... 0 0 1 0 1 2 3 0 1 2 3 4...

17 Martin KayString Matching 117 The Failure Function a b c a b c a b c -1 0 0 0 1 2 3 4 5

18 Martin KayString Matching 118 The Failure Function a b a c a b a d a b a c a b a -1 0 0 1 0 1 2 3 0 1 2 3 4 5 6 a b a c a b a d a b a c a b a

19 Martin KayString Matching 119 The Failure Function -1 0 0 1 0 1 2 3 0 1 2 3 4 5 6 a b a c a b a d a b a c a b a

20 Martin KayString Matching 120 Substring, Prefix, Suffix Part of a string S (even if it covers the whole of S) is a substring of S. If it includes the first (last) character of S, it is a prefix (suffix) of S. If it does not cover the whole of S, it is a proper substring (prefix, suffix) of S. Example: S = ababac Some substrings: ababac, ab, b, bab, ac,  only ababac is not proper Some prefixes: ababac, a, aba,  only ababac is not proper Some suffixes: ababac, abac, c,  only ababac is not proper  is the empty string

21 Martin KayString Matching 121 Borders If B is a proper prefix and a proper suffix of a string S, it is a border of S. Note  is a border of every string Examples: abcabcabc has borders abc, abcabc,  abacabadabacaba has borders abacaba, aba, a, 

22 Martin KayString Matching 122 a b c a b c a b c -1 0 0 0 1 2 3 4 5 Borders a b c a b c a b c

23 Martin KayString Matching 123 border in Prolog border(Pattern, Boarder) :- append([_ | _], Border, Pattern), append(Border, _, Pattern).

24 Martin KayString Matching 124 border(I, Pattern, Q) :- J is I-1, border(J, Pattern, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q). extend(_, -1, _, 0). extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1. extend(C, P0, Pattern, R) :- border(P0, Pattern, Q), extend(C, Q, Pattern, R). Borders in Linear-time -1 0 0 1 0 1 2 3 0 1 a b a c a b a d a b Borders at position i+1 extend borders at position i

25 Martin KayString Matching 125 Building A Table border(I, Pattern, Q) :- J is I-1, border(J, Pattern, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q). extend(_, -1, _, 0). extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1. extend(C, P0, Pattern, R) :- border(P0, Patttern, Q), extend(C, Q, Pattern, R). make_table(Pattern) :- retractall(border_table(_, _)), assert(border_table(0, 0)), assert(border_table(1, 0)), length(Pattern, PL), make_table(Pattern, 2, PL). make_table(_, I, N) :- I>N, !. make_table(Pattern, I, N) :- border(I, Pattern, K), assert(border_table(I, K)), J is I+1, make_table(Pattern, J, N).

26 Martin KayString Matching 126 Building A Table border(I, Pattern, Q) :- J is I-1, border_table(J, P), nth0(J, Pattern, C), extend(C, P, Pattern, Q). extend(_, -1, _, 0). extend(C, P, Pattern, Q) :- nth0(P, Pattern, C), !, Q is P+1. extend(C, P0, Pattern, R) :- border_table(P0, Q), extend(C, Q, Pattern, R). make_table(Pattern) :- retractall(border_table(_, _)), assert(border_table(0, 0)), assert(border_table(1, 0)), length(Pattern, PL), make_table(Pattern, 2, PL). make_table(_, I, N) :- I>N, !. make_table(Pattern, I, N) :- border(I, Pattern, K), assert(border_table(I, K)), J is I+1, make_table(Pattern, J, N).

27 Martin KayString Matching 127 Searching search(Pattern, Text, N) :- make_table(Pattern), retract(border_table(0, _)), assert(border_table(0, 0)), length(Pattern, PL), search(Pattern, PL, Text, N). search(Pattern, PL, Text, N) :- common_prefix(Pattern, Text, CPL), search(CPL, Pattern, PL, Text, N). search(CPL, _, CPL, _, 0). search(CPL, Pattern, PL, Text0, N) :- border_table(CPL, BL), M is CPL-BL, advance(Text0, M, Text), search(Pattern, PL, Text, N0), N is N0+M. Build the table Do the search

28 Martin KayString Matching 128 Reference Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323-350, June 1977.


Download ppt "Martin KayString Matching 11 Martin Kay Stanford University."

Similar presentations


Ads by Google