Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.

Similar presentations


Presentation on theme: "1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur."— Presentation transcript:

1 1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur

2 2 CPM 2005

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10 CPM 2005

11 11 CPM 2005

12 12 CPM 2005

13 13 CPM 2005

14 14 Parameterized Matching Input: two strings s and t, |s|=|t|, over alphabets ∑ s and ∑ t. s parameterize matches t: if bijection : ∑ s ∑ t, such that (s) = t. (a)=x (b)=y aa bbb xx yyy Example: s t

15 15 Parameterized Matching Input: Two strings T, P; |T|=n, |P|=m. Output: All text locations i, such that (P)=T i …T i+m-1.

16 16 2D Parameterized Matching Input: Text T and pattern P; |T|=n*n, |P|=m*m. Output: All text locations (i,j), such that (P)=T i,j …T i+m-1,j+m-1. Example- a b c a a b b b b x y z x x y y y y (x)=a (y)=b (z)=c T P

17 17 2D Parameterized Matching pattern ‘A horse is a horse, it ain’t make a difference what color it is’ John Wayne

18 18 Parameterized Matching History Introduced by Brenda Baker [Baker93]. Others: [AFM94], [Bak95], [Bak97]. Two Dimensions: [AACLP03][This work]. Used in scaled matching [ABL99]. Periodicity of parameterized matching [ApostolicoGiancarlo]. Approximate parameterized matching [AEL], [HLS04].

19 19 Naïve Algorithm For every location (i,j) of text Check if P parameterized matches at (i,j): 1. For each a  alphabet of P, check if all a’s of P align with same character 2. For each b  alphabet of T, check if all b’s of T align with same character

20 20 Naïve Algorithm Time Analysis: If done properly – O(n 2 m 2 )

21 21 Mismatch pairs Pair of locations such that the characters disagree parameterized. Example, a a b a a a x x y x z y

22 22 1D Encoding Encode every text location by its predecessor location. a b a d d a b d b c b d a a b d a a a a b b b T First a to its left Encoded T 1 3 6 13 14 15 16 17 18 0 1 3 6 13 14 15 16 17

23 23 1D Encoding Two p-matching strings have the same encoded texts. a b b c b a a c b b c b a x y y z y x x z y y z y x 0 0 2 0 3 1 6 4 5 9 8 10 7 S Encoded S T Encoded T

24 24 1D Encoding Hence, in order to check whether two strings p- match, enough to compare their encoded strings. Reduction to exact matching problem. a b b c b b a c b b c b a x y y z y x x z y y z y x 0 0 2 0 3 5 6 4 5 9 8 10 7 0 0 2 0 3 1 6 4 5 9 8 10 7 S Encoded S T Encoded T

25 25 2D Mismatch Pairs Same as 1D mismatch pairs, but with 2D strings. Example: a b a b a b x y x y y y

26 26 First idea, Encode the linearization of text and pattern. 2D Encoding As you will see this box frames the texts that it Contains. That is 2D text All in this little box. As you will all see this box frames the text that it contains. That is 2D text all in this little box.

27 27 First idea, Encode the linearization of text and pattern. 2D Encoding As you will see this box frames the texts that it Contains. That is 2D text All in this little box. As you will see this boxframes the texts that it Contains. That is 2D text All in this little box.

28 28 First idea, Encode the linearization of text and pattern. Overflow problem!! 2D Encoding b b b Different character than b a a

29 29 2D Encoding Second idea, use strips. Strip – Substring of T of size n*m. i-th strip of T, is n*m substring T[1:n,i:i+m-1]. i

30 30 Second Solution For Pattern P compute predecessors on its linearization. For each strip of T, compute predecessors on its linearization. Do Pattern Matching for each strip. Time – O(n 2 m). Can we do better?

31 31 A Faster Solution Set into Duel-and-Sweep setting Needs special care for Duel, Sweep Especially difficult: Pattern preprocessing Desired Time: O(n 2 + poly(m)) We Achieve: O(n 2 + m 2.5 polylog m)

32 32 Remember… Observation: T p-matches P Every text location and its predecessor are not a mismatch pair + # of distinct characters in P and T equal

33 33 Algorithm Outline Duel and sweep paradigm Find candidates - Dueling Divide candidates by strips Update predecessors of every new strip Check new predecessors - Sweep Assume pattern witness table given.

34 34 Witness Witness – Mismatch pair between P and its alignment to location (a,b). +a +b

35 35 Set Candidates Using duel- Every two text locations that has a witness within their alignment can eliminate each other. Apply algorithm [ABF94] and return list of candidates. Time – O(n 2 ).

36 36 Sweep Technique Observation, All candidates agree with each other. Hence, Mismatch pair eliminates all candidates containing it. Therefore, For every predecessor, enough to find one candidate that contains it.

37 37 Sweep Technique How to find? Create new 2m*2m array A such that, A[i,j] = largest row among candidates that starts at column j and overlap with row i. x

38 38 Sweep Technique For every predecessor (i,j), (x,y), use range minima query to find highest candidate contain predecessor.

39 39 Sweep Technique In case of a mismatch pair, eliminate all candidates containing it. How? Use mismatch vector. Every mismatch pair translate into range. For new strips, delete old mistakes and add new. All candidates within this range are eliminated.

40 40 Sweep Technique Reminder- T p-matches P Every text location and its predecessor are not mismatch pair + # of distinct characters in P and T equal Left to do? Count distinct characters for every candidates. Use algorithm of Amir and Cole, time O(m 2 ).

41 41 Overview Checking all predecessors takes linear time. Total time O(n 2 ).

42 42 Pattern Preprocessing Witness – Mismatch pair between P and its alignment to location (a,b). +a +b

43 43 Pattern Preprocessing Find witness table for P in time O(m 2.5 * polylogm). For every pattern location (i,j), create list of size O( ) pointers. Pointer i is predecessor in lines above (i,j). Reduce to exact matching with don’t cares.

44 44 Pattern Preprocessing End cases, multiple cases. A1A1 A3A3 A4A4 A2A2 B1B1 B2B2 B3B3 B4B4 Less than

45 45 Open Questions Can the algorithm time complexity be reduced into O(n 2 +m 2 )?


Download ppt "1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur."

Similar presentations


Ads by Google