Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

5.5 and 5.6 Multiply Polynomials
Xiaoming Sun Tsinghua University David Woodruff MIT
Foundations of Cryptography Lecture 3 Lecturer: Moni Naor.
The beauty of prime numbers vs the beauty of the random Ely Porat Bar-Ilan University Israel.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Parameterized Matching Amir, Farach, Muthukrishnan Orgad Keller Modified by Ariel Rosenfeld.
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Foundations of Cryptography Lecture 5 Lecturer: Moni Naor.
WS Algorithmentheorie 03 – Randomized Algorithms (Primality Testing) Prof. Dr. Th. Ottmann.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
String Searching Algorithms Problem Description Given two strings P and T over the same alphabet , determine whether P occurs as a substring in T (or.
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search, part 1)
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Property Matching and Weighted Matching Amihood Amir, Eran Chencinski, Costas Iliopoulos, Tsvi Kopelowitz and Hui Zhang.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.
Algorithms for Regulatory Motif Discovery Xiaohui Xie University of California, Irvine.
25/06/2015Marius Mikucionis, AAU SSE1/22 Principles and Methods of Testing Finite State Machines – A Survey David Lee, Senior Member, IEEE and Mihalis.
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Special Products of Polynomials.
Cryptography Lecture 8 Stefan Dziembowski
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Advanced Algorithm Design and Analysis (Lecture 3) SW5 fall 2004 Simonas Šaltenis E1-215b
Semi-Numerical String Matching. All the methods we’ve seen so far have been based on comparisons. We propose alternative methods of computation such as:
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
String Matching with k Mismatches Moshe Lewenstein Bar Ilan University Modified by Ariel Rosenfeld.
Improved string matching with k mismatches (The Kangaroo Method) Galil, R. Giancarlo SIGACT News, Vol. 17, No. 4, 1986, pp. 52–54 Original: Moshe Lewenstein.
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
Exponents, Polynomials and Functions
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro.
Data Stream Algorithms Lower Bounds Graham Cormode
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.
On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling Amihood Amir Benny Porat.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Pattern Matching With Don’t Cares Clifford & Clifford’s Algorithm Orgad Keller.
Nondeterministic Finite State Machines Chapter 5.
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
The Theory of NP-Completeness
Probabilistic Algorithms
Information Complexity Lower Bounds
Randomized Algorithms
Fast Fourier Transform
CS 154, Lecture 6: Communication Complexity
Pattern Matching With Don’t Cares Clifford & Clifford’s Algorithm
Randomized Algorithms
2-Dimensional Pattern Matching
String Matching with k Mismatches
Lecture 15, Winter 2019 Closest Pair, Multiplication
Review Multiply (3b – 2)(2b – 3) Multiply (4t + 3)(4t + 3)
Presentation transcript:

Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University

Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); …. Baker ’ s Parameterized Matching

Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); …. Baker ’ s Parameterized Matching c=1; c = g(c)*5+f(c); Pattern Baker ’ s work pdup dupstat psearch SICOMP 1997 JCSS 1996

Two dimensional parameterized matching pattern ‘ A horse is a horse, it ain ’ t make a difference what color it is ’ John Wayne

Input P = p 1 …p m over alphabet T = t 1... t n over alphabet Output: locations i of T, for which a bijection : exists s.t. (P) = (p 1 ) (p 2 )… (p m ) = t i …t i+m-1 Parameterized Matching

One dimensional Baker 1996, JCSS- Suffix Trees Baker 1997, SICOMP- Boyer Moore Amir, Farach, Muthu 1995, IPL- Knuth-Morris-Pratt Two dimensional Regular methods fail !!

Function Matching Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = b f(e) = c f(a) = a Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = a f(e) = d f(a) = b Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = d f(e) = a f(a) = d Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = ?? no match ! Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

Function Matching vs. Parameterized Matching P p-matches t i …t i+m-1 iff 1. P f-matches t i …t i+m-1 and 2. # of symbols in t i …t i+m-1 = # of symbols in P P = h e h a e h h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = d f(e) = a f(a) = d f(h) = b f(e) = c f(a) = a

Na ï ve Algorithm At each location i of text T check if pattern f-matches Check For each letter ‘a’ in pattern Are elements aligned with the pattern ‘a’s the same? no? declare ‘no match’ All letters “OK” – declare ‘match’ Running time: O(nm), where m = |P| and n = |T|

Function Matching with Don ’ t Cares Input: P = p 1 …p m over alphabet {?} T = t 1... t n over alphabet P = h e ? ? e h T = a b c b a c b c d b c d a d d a d Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1, f(?) - wildcard

Why do we need don ’ t cares? Pattern Text

Linearize Text and Pattern Text Pattern … Line 1Line 2 T =

Linearize Text and Pattern Text Pattern … Line 5Line 6 T= … P = ???????????????????????? Line 1Line 2 n n m m n-m

t 1 t 2 t 3 t 4... t n-2 t n-1 t n p m p m-1... p 2 p 1 p 1 t 1 p 1 t 2... p 1 t n-2 p 1 t n-1 p 1 t n p 2 t 1 p 2 t 2 p 2 t 3... p 2 t n-2 p 2 t n-1 p 2 t n p 3 t 1 p 3 t 2 p 3 t 3 p 3 t 3... p 3 t n-1 p 3 t n p m t 1... p m t m p m t m+1.. p m t n-1 p m t n..... Polynomial Multiplication - Convolutions... Running time: O(n log m)

t 1 t 2 t 3 t 4... t n-2 t n-1 t n p m p m-1... p 2 p 1 p 1 t 1 p 1 t 2... p 1 t n-2 p 1 t n-1 p 1 t n p 2 t 1 p 2 t 2 p 2 t 3... p 2 t n-2 p 2 t n-1 p 2 t n p 3 t 1 p 3 t 2 p 3 t 3 p 3 t 4... p 3 t n-1 p 3 t n p m t 1... p m t m p m t m+1.. p m t n-1 p m t n..... Convolutions: Fischer-Patterson [1974] p 1 p 2 p 3 p 4... p m...

t 1 t 2 t 3 t 4... t n-2 t n-1 t n p m p m-1... p 2 p 1 p 1 t 1 p 1 t 2... p 1 t n-2 p 1 t n-1 p 1 t n p 2 t 1 p 2 t 2 p 2 t 3... p 2 t n-2 p 2 t n-1 p 2 t n p 3 t 1 p 3 t 2 p 3 t 3 p 3 t 4... p 3 t n-1 p 3 t n p m t 1... p m t m p m t m+1.. p m t n-1 p m t n..... p 1 p 2 p 3 p 4... p m... Convolutions: Fischer-Patterson [1974]

How does this help for Function Matching? beneath each symbol from the pattern alphabet all text characters must be the same The property that needs to be checked is:

T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example -

h in P vs. a in T T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example - T a = P R h =

h - aT a = P R h = T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example -

h - aT a = P R h = T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example - h e h a e h ? e

h - a => in O(n log m) time!! T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example - T a = P R h =

h - a => in O(| | n log m) time!! h - b h - c h - d Match(h) T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example -

In general - the Algorithm For each character ‘a’ in create P a For each character ‘b’ in create T b For all P a and T b multiply them and construct Match(a) for each ‘a’ in Announce each location i of T as a ‘match’ if Match(a)[i] = 1 for all a’s in P => in O(| || | n log m) time.

Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j Idea: Let’s encode text with numbers for symbols and encode pattern to compute their sum and separately their sum of squares.

Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j T # = T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P e = Example: Compute sum of text char ’ s beneath “ e ”

Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j T # 2 = T # = T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P e = Example: Compute sum of squares beneath “ e ”

Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j Running Time: Two convolutions for each pattern character. O(| | n log m)

Can we do better for big alphabets? We have seen – 2 algorithms for Function Matching 1.O(nm) - na ï ve algorithm 2.O(| | n log m) - convolution based We will see: 1.O(n log 2 m)- randomized convolutions based 2.Lower bound of (nm) for deterministic convolutions based methods

Def: Def: A pattern is 2-charactered if every character appears at most twice in the pattern. Example: Example: P = a b c b c c b b P 1 = a 1 b 1 c 1 b 1 c 1 c 2 b 2 b 2 (even pairs) P 2 = a 1 b 1 c 1 b 2 c 2 c 2 b 2 b 3 (odd pairs) Lemma: Let P be a pattern and T a text. 2-charactered patterns P 1 and P 2 s.t. at loc. i of T P f-matches iff P 1 and P 2 f-match.

Situation: Situation: An algorithm for Function Matching with 2-charactered patterns a general algorithm for Function Matching. So, all that needs to be checked is that: each pair in P has equal text symbols beneath it.

1.For each character: 1.For each character: - a in T, randomly choose r a in {0, 1} - relace all a ’ s in T with r a - get T ’ - b in P, randomly choose s b in {1,2} - set first b to be s b and the second b to be -s b - get P ’ 2.Convolve T ’ and P ’ R 3.For each location i, for which T ’ *P ’ R [i] equals 0 for the convolution declare a ‘ match ’ New Randomized Algorithm

Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a f(a) = f(b) = f(c) = f(d) = g(v) = g(q) = g(u) = f(T) = g(P) = 2 6 – 2 8 – 6 – – – = 0 h(v) = a h(q) = b h(u) = a h(s) = a

Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a f(a) = f(b) = f(c) = f(d) = g(v) = g(q) = g(u) = f(T) = g(P) = 2 6 – 2 8 – 6 – – = -2

Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a f(a) = f(b) = f(c) = f(d) = g(v) = g(q) = g(u) = f(T) = g(P) = 2 6 – 2 8 – 6 – =

Running Time: Running Time: O(nk log m) with probability 2 -k O(n log 2 m) with probability 1/m if P f-matches at location i of T then f(T)*g(P) R [i+m-1] is trivially always equal to 0 if P does not f-match at location i of T then for each convolution, f(T)*g(P) R [i+m-1], equals 0 with probability ½ with k rounds of amplification the probability is ( ½ ) k Correctness:

Limitation of the Convolutions Model Can we do the same deterministically? No! To show this we use the model of communication complexity Alice Bob x f(x,y) y

Limitation of the Convolutions Model Known: Known: for x,y in {0,1} k the communication complexity of equals(x,y) is (k) Take pattern P = a 1 a 2 a 3 … a m a 1 a 2 a 3 … a m, where i j a i a j Given a collection of convolutions { } the convolutions of location i, (g(P)*f(t))[i+m-1] = g(a j )*f(t i+j-1 ) + g(a j )*f(t i+j+m-1 ). Since we are in essence comparing t i … t i+m-1 to t i+m … t i+2m-1 we get the equal information from the convolution. This is lower bounded by (m) for each location, In general (nm)

Another Application for Function Matching Protein Folding detection: P = …

Questions 1.Can Function Matching be solved deterministically in o(nm) time for big alphabets? 2.Are there special cases of Function Matching that are easier (other than Parameterized Matching and other trivial ones)? 3.Does 2-dimensional Parameterized Matching need to be solved with function matching?