1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.

Slides:



Advertisements
Similar presentations
Parameterized Matching Amir, Farach, Muthukrishnan Orgad Keller Modified by Ariel Rosenfeld.
Advertisements

Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search, part 1)
Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
Algorithms for Regulatory Motif Discovery Xiaohui Xie University of California, Irvine.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
By Makinen, Navarro and Ukkonen. Abstract Let A and B be two run-length encoded strings of encoded lengths m’ and n’, respectively. we will show an O(m’n+n’m)
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
Pattern Matching COMP171 Spring Pattern Matching / Slide 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences.
© 2004 Goodrich, Tamassia Dynamic Programming1. © 2004 Goodrich, Tamassia Dynamic Programming2 Matrix Chain-Products (not in book) Dynamic Programming.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
Faster 2-Dimensional Scaled Matching Amihood Amir and Eran Chencinski.
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Case Study. DNA Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known.
Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Kyushu University, Japan SPIRE Cartagena, Colombia.
1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Semi-Numerical String Matching. All the methods we’ve seen so far have been based on comparisons. We propose alternative methods of computation such as:
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
Tamanna Chhabra, Sukhpal Singh Ghuman, Jorma Tarhio Tuning Algorithms for Jumbeled Matching.
String Matching with k Mismatches Moshe Lewenstein Bar Ilan University Modified by Ariel Rosenfeld.
Improved string matching with k mismatches (The Kangaroo Method) Galil, R. Giancarlo SIGACT News, Vol. 17, No. 4, 1986, pp. 52–54 Original: Moshe Lewenstein.
MCS 101: Algorithms Instructor Neelima Gupta
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
MCS 101: Algorithms Instructor Neelima Gupta
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
1 UNIT-I BRUTE FORCE ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 3:
Great Theoretical Ideas In Computer Science John LaffertyCS Fall 2006 Lecture 22 November 9, 2006Carnegie Mellon University b b a b a a a b a b.
Pattern Matching With Don’t Cares Clifford & Clifford’s Algorithm Orgad Keller.
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Advanced Algorithms Analysis and Design
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
2-Dimensional Pattern Matching
CSE 589 Applied Algorithms Spring 1999
Dynamic Programming-- Longest Common Subsequence
String Matching with k Mismatches
Jumbled Matching with SIMD
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Improved Two-Way Bit-parallel Search
Presentation transcript:

1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur

2 CPM 2005

3

4

5

6

7

8

9

10 CPM 2005

11 CPM 2005

12 CPM 2005

13 CPM 2005

14 Parameterized Matching Input: two strings s and t, |s|=|t|, over alphabets ∑ s and ∑ t. s parameterize matches t: if bijection : ∑ s ∑ t, such that (s) = t. (a)=x (b)=y aa bbb xx yyy Example: s t

15 Parameterized Matching Input: Two strings T, P; |T|=n, |P|=m. Output: All text locations i, such that (P)=T i …T i+m-1.

16 2D Parameterized Matching Input: Text T and pattern P; |T|=n*n, |P|=m*m. Output: All text locations (i,j), such that (P)=T i,j …T i+m-1,j+m-1. Example- a b c a a b b b b x y z x x y y y y (x)=a (y)=b (z)=c T P

17 2D Parameterized Matching pattern ‘A horse is a horse, it ain’t make a difference what color it is’ John Wayne

18 Parameterized Matching History Introduced by Brenda Baker [Baker93]. Others: [AFM94], [Bak95], [Bak97]. Two Dimensions: [AACLP03][This work]. Used in scaled matching [ABL99]. Periodicity of parameterized matching [ApostolicoGiancarlo]. Approximate parameterized matching [AEL], [HLS04].

19 Naïve Algorithm For every location (i,j) of text Check if P parameterized matches at (i,j): 1. For each a  alphabet of P, check if all a’s of P align with same character 2. For each b  alphabet of T, check if all b’s of T align with same character

20 Naïve Algorithm Time Analysis: If done properly – O(n 2 m 2 )

21 Mismatch pairs Pair of locations such that the characters disagree parameterized. Example, a a b a a a x x y x z y

22 1D Encoding Encode every text location by its predecessor location. a b a d d a b d b c b d a a b d a a a a b b b T First a to its left Encoded T

23 1D Encoding Two p-matching strings have the same encoded texts. a b b c b a a c b b c b a x y y z y x x z y y z y x S Encoded S T Encoded T

24 1D Encoding Hence, in order to check whether two strings p- match, enough to compare their encoded strings. Reduction to exact matching problem. a b b c b b a c b b c b a x y y z y x x z y y z y x S Encoded S T Encoded T

25 2D Mismatch Pairs Same as 1D mismatch pairs, but with 2D strings. Example: a b a b a b x y x y y y

26 First idea, Encode the linearization of text and pattern. 2D Encoding As you will see this box frames the texts that it Contains. That is 2D text All in this little box. As you will all see this box frames the text that it contains. That is 2D text all in this little box.

27 First idea, Encode the linearization of text and pattern. 2D Encoding As you will see this box frames the texts that it Contains. That is 2D text All in this little box. As you will see this boxframes the texts that it Contains. That is 2D text All in this little box.

28 First idea, Encode the linearization of text and pattern. Overflow problem!! 2D Encoding b b b Different character than b a a

29 2D Encoding Second idea, use strips. Strip – Substring of T of size n*m. i-th strip of T, is n*m substring T[1:n,i:i+m-1]. i

30 Second Solution For Pattern P compute predecessors on its linearization. For each strip of T, compute predecessors on its linearization. Do Pattern Matching for each strip. Time – O(n 2 m). Can we do better?

31 A Faster Solution Set into Duel-and-Sweep setting Needs special care for Duel, Sweep Especially difficult: Pattern preprocessing Desired Time: O(n 2 + poly(m)) We Achieve: O(n 2 + m 2.5 polylog m)

32 Remember… Observation: T p-matches P Every text location and its predecessor are not a mismatch pair + # of distinct characters in P and T equal

33 Algorithm Outline Duel and sweep paradigm Find candidates - Dueling Divide candidates by strips Update predecessors of every new strip Check new predecessors - Sweep Assume pattern witness table given.

34 Witness Witness – Mismatch pair between P and its alignment to location (a,b). +a +b

35 Set Candidates Using duel- Every two text locations that has a witness within their alignment can eliminate each other. Apply algorithm [ABF94] and return list of candidates. Time – O(n 2 ).

36 Sweep Technique Observation, All candidates agree with each other. Hence, Mismatch pair eliminates all candidates containing it. Therefore, For every predecessor, enough to find one candidate that contains it.

37 Sweep Technique How to find? Create new 2m*2m array A such that, A[i,j] = largest row among candidates that starts at column j and overlap with row i. x

38 Sweep Technique For every predecessor (i,j), (x,y), use range minima query to find highest candidate contain predecessor.

39 Sweep Technique In case of a mismatch pair, eliminate all candidates containing it. How? Use mismatch vector. Every mismatch pair translate into range. For new strips, delete old mistakes and add new. All candidates within this range are eliminated.

40 Sweep Technique Reminder- T p-matches P Every text location and its predecessor are not mismatch pair + # of distinct characters in P and T equal Left to do? Count distinct characters for every candidates. Use algorithm of Amir and Cole, time O(m 2 ).

41 Overview Checking all predecessors takes linear time. Total time O(n 2 ).

42 Pattern Preprocessing Witness – Mismatch pair between P and its alignment to location (a,b). +a +b

43 Pattern Preprocessing Find witness table for P in time O(m 2.5 * polylogm). For every pattern location (i,j), create list of size O( ) pointers. Pointer i is predecessor in lines above (i,j). Reduce to exact matching with don’t cares.

44 Pattern Preprocessing End cases, multiple cases. A1A1 A3A3 A4A4 A2A2 B1B1 B2B2 B3B3 B4B4 Less than

45 Open Questions Can the algorithm time complexity be reduced into O(n 2 +m 2 )?