Download presentation
Presentation is loading. Please wait.
Published byDaniella Gibbs Modified over 9 years ago
1
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro
2
Scaled matching Input: Text T=t 1,…,t n Pattern P=p 1,…,p m Scaling:P [i] =p 1 …p 1 p 2 …p 2 … p m …p m Output:All text-locations j where i s.t. p [i] matches at j. iii
3
Scaled matching c baa b bcc a a a a b b a bc b a b b cc a aaa
4
Permutation matching Input: Text T=t 1,…,t n Pattern P=p 1,…,p m Permutation (of pattern): p π(1) p π(2) …p π(m) where π is a permutation on [m]. Output:All text-locations j where a pattern permutation occurs.
5
b aca b b a c b b a bc b a c baa a b b Permutation matching
6
b aca b b a c b b a bc b a b a ca b b a
7
Easy to solve in O(n) time (linear size alphabets). The pattern matching version of Jumbled Indexing.
8
Scaled permutation matching Match: First Permutation and then Scaling.
9
Scaled permutation matching c baa a abb c a c a b b a bc b a a a bb c caa
10
Match: First Permutation and then Scaling. B-Eres-Landau[04]: Scaled Permutation Matching in O(n) time. Open: Can one do the reverse efficiently, i.e. scaling and then permutation. Hard ? How can we solve? First - Naïve algorithm
11
Permuted scaled matching Input: Text T=t 1,…,t n Pattern P=p 1,…,p m Output:All text-locations j where exist permuted scaled matching
12
Permuted scaled matching c baa b caa b c a a b b a bc b a b b cc a aaa
13
Naïve algorithm aabcaaaccbacb aacb P= T=
14
Naïve algorithm aabcaaaccbacb aacb P= T= k=1
15
Naïve algorithm aabcaaaccbacb aacb P= T= k=2
16
Naïve algorithm 1.Construct a table R of size (n+1)×|Σ| such that R(i,j)=#σ j (T[0, i]) for i ≥ 0 and R(−1, j) = 0. 2.For every 0 ≤ i < j ≤ n−1 such that j −i+ 1 = km for some natural number k ≥ 1 do: a.Let r(l) =( R(j,l)−R(i−1,l))/# σ l(P). b.if r(l) = k for each l, 0 ≤ l ≤ |Σ| − 1, then announce that i is a k-scaled appearance.
17
Naïve algorithm aabcaaaccbacb aacbP= T=
18
Naïve algorithm aabcaaaccbacbT=
19
Naïve algorithm aabcaaaccbacb 1102011345867912 T=
20
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c T=
21
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 T=
22
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 T=
23
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 T=
24
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 aT=
25
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 T=
26
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 T=
27
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacbP= T=
28
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1P= T= K=
29
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 P= T= K=
30
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 P= T= K=
31
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 P= T= K=
32
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1K= #a=2 #b=#c=1 P= T=
33
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 K=P= T=
34
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb2 #a=2 #b=#c=1 K=P= T=
35
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb2 #a=2 #b=#c=1 K=P= T=
36
Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb2 #a=2 #b=#c=1 K=P= T=
37
Naïve algorithm
38
Better? Properties
39
Mod-equivalent Mod-Equivalency: i and j are Mod-Equivalent if for every character σ (with frequency c in P): # σ in T[0,i] mod c = # σ in T[0,j] mod c
40
Mod-equivalent cbbccaaccbacb 1102011345867.912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 3 5 2 3 5 3 3 6 3 3 6 4 aacbP= #a=2 #b=#c=1 T=
41
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #a=2 #b=#c=1 P= T=
42
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb a #a=2 P= T=
43
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb a #a=2 P= T=
44
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #a=2 P= T=
45
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #b=1 P= T=
46
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #c=1 P= T=
47
Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacbP= T=
48
Mod-equivalent cbbccaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 3 5 2 3 5 3 3 4 3 3 4 4 aacb #a=2 P= T=
49
Mod-equivalent cbbccaaccbacb 102 a b c 0 1 2 1 a 3 5 3 aacb #a=2 P= T=
50
Mod-equivalent cbbccaaccbacb 102 a b c 0 1 2 1 a 3 5 3 aacb #a=2 P= T=
51
Mod-equivalent cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=
52
Equal-quotients
53
Equal-quotients cbbcaaaccbaab 1102011345867912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 4 4 2 4 4 3 5 4 3 5 4 4 aacbP= T=
54
Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=
55
Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=
56
Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=
57
Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=
58
Equal-quotients cbbccaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 3 5 2 3 5 3 3 6 3 3 6 4 aacbP= T=
59
Equal-quotients cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacbP= T=
60
Equal-quotients cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacbP= T=
61
Equal-quotients aaaabbaaaaaa b 115203…1013111214 a b 0 0 1 0 2 0 3 0 3 1 … … 10 1 2 3 4 5 6 aaa bbb bbb P= T=
62
Equal-quotients aaaabbaaaaaa b 15 a b 3 … 3 1 … … 10 6 aaa bbb bbb P= T=
63
Theorem T[i, j] is a permuted k-scaling of P for some k iff 1. Locations i and j of T are mod-equivalent 2.Locations i and j of T satisfy the equal-quotients property for each pair of characters
64
ji a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients
65
ji a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients
66
cbbccaaccbacb a b c a a-b b-c T= bcaaaca P= 28 0 0 0 0 0 0 0 0
67
Putting it together
68
ji a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients 012 Build a table R of size n×2|Σ|+1
69
ji012 Each vector is associated with its location i
70
ji012
71
irir isis i1i1 i2i2 i3i3 Sort the vectors using Radix sort
72
irir isis i1i1 i2i2 i3i3 Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.
73
irir isis i1i1 i2i2 i3i3 For each equivalence class containing locations i 1, i 2,..., i l announce appearances T[i + 1, j] for each i,j ∈ {i 1, i 2,..., i l }, s.t. i < j.
74
Putting it all together
75
Putting it together 3. Each vector is associated with its location i. 4. Sort the vectors using Radix sort. 5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1. 6. For each equivalence class containing locations i 1, i 2,..., i l announce appearances T[i + 1, j] for each i,j ∈ {i 1, i 2,..., i l }, s.t. i < j.
76
Theorem The running time of the permuted scaled matching algorithm is: O(n|Σ|+occ).
77
Output representation The output of the algorithm which we denoted occ may be as large as O(n 2 /m). Example: o Text a n. o Pattern a m.
78
Output representation to reduce large number of appearances set output to shortest match at each text location i. abbcaaaaabaab abaP= T=
79
Output representation to reduce large number of appearances set output to shortest match at each text location i. abbcaaaaabaab abaP= T=
80
Claim Let i < j < h be three text locations. Assume T[i, j] is a permuted scaled appearance of P. Then T[i, h] is a permuted scaled appearance of P iff T[j + 1, h] is a permuted scaled appearance of P. abbcaaaaabaab abaP= T=
81
Claim Let i < j < h be three text locations. Assume T[i, j] is a permuted scaled appearance of P. Then T[i, h] is a permuted scaled appearance of P iff T[j + 1, h] is a permuted scaled appearance of P. abbcaaaaabaab abaP= T=
82
Claim Let i < j < h be three text locations. Assume T[i, j] is a permuted scaled appearance of P. Then T[i, h] is a permuted scaled appearance of P iff T[j + 1, h] is a permuted scaled appearance of P. abbcaaaaabaab abaP= T=
83
Putting it all together
84
Putting it together 3. Each vector is associated with its location i. 4. Sort the vectors using Radix sort. 5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1. 6. For each entry q’ containing linked list i 1, i 2,..., i l announce appearances T[i r +1,i r+1 ] for each i r ∈ {i 1, i 2,..., i l }.
85
Running Time Permuted Scaled Matching: The running time is: O(n|Σ|).
86
For efficiency Need to generate the vectors quickly. Need to compare vectors quickly. Idea: hash
87
Need hash on vectors that can be modified quickly if vector changes very little. Use: hash – similar to Karp-Rabin
88
i+1i a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients At most 1 changes At most 2 changes
89
cbbccaaccbacb 8 a b c 0 0 0 a 0 0 0 a-b b-c 0 0 0 bcaaaca 9 0 1 0 0 T= P=
90
cbbccaaccbacb 8 a b c 0 0 0 a 0 0 0 a-b b-c 0 0 0 bcaaaca 9 0 1 0 0 T= P=
91
The running time can be improved to o Deterministic O(n log |Σ|) o Randomized O(n)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.