S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.

Slides:



Advertisements
Similar presentations
1 Faster algorithms for string matching with k mismatches Adviser : R. C. T. Lee Speaker: C. C. Yen Journal of Algorithms, Volume 50, Issue 2, February.
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
Parameterized Matching Amir, Farach, Muthukrishnan Orgad Keller Modified by Ariel Rosenfeld.
Parametrized Matching Amir, Farach, Muthukrishnan Orgad Keller.
Longest Common Subsequence
Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Equivalence, Order, and Inductive Proof
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
T(n) = 4 T(n/3) +  (n). T(n) = 2 T(n/2) +  (n)
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University.
Aki Hecht Seminar in Databases (236826) January 2009
Efficiency of Algorithms
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Property Matching and Weighted Matching Amihood Amir, Eran Chencinski, Costas Iliopoulos, Tsvi Kopelowitz and Hui Zhang.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
Algorithms and Efficiency of Algorithms February 4th.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
Orgad Keller Modified by Ariel Rosenfeld Less Than Matching.
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
S C A L E D PATTERN MATCHING A.Amir Bar-Ilan Univ. & Georgia Tech A.Butman Holon College M.Lewenstein Bar-Ilan Univ. E.Porat Bar-Ilan Univ.
Algorithm Analysis Dr. Bernard Chen Ph.D. University of Central Arkansas.
Chapter 6 Algorithm Analysis Bernard Chen Spring 2006.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
1 C ollege A lgebra Systems and Matrices (Chapter5) 1.
Searching. RHS – SOC 2 Searching A magic trick: –Let a person secretly choose a random number between 1 and 1000 –Announce that you can guess the number.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
20/10/2015Applied Algorithmics - week31 String Processing  Typical applications: pattern matching/recognition molecular biology, comparative genomics,
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
CS Discrete Mathematical Structures Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 9:30-11:30a.
MCS 101: Algorithms Instructor Neelima Gupta
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
Divide & Conquer  Themes  Reasoning about code (correctness and cost)  recursion, induction, and recurrence relations  Divide and Conquer  Examples.
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
Book: Algorithms on strings, trees and sequences by Dan Gusfield Presented by: Amir Anter and Vladimir Zoubritsky.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
MCS 101: Algorithms Instructor Neelima Gupta
Measuring complexity Section 7.1 Giorgi Japaridze Theory of Computability.
COMPSCI 102 Introduction to Discrete Mathematics.
Exact String Matching Algorithms Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU.
Multiple Pattern Matching Algorithms on Collage System T. Kida, T. Matsumoto, M. Takeda, A. Shinohara, and S. Arikawa Department of Informatics, Kyushu.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro.
1 Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also.
Great Theoretical Ideas in Computer Science.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy.
On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling Amihood Amir Benny Porat.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
CSE 589 Applied Algorithms Spring 1999
Time complexity Here we will consider elements of computational complexity theory – an investigation of the time (or other resources) required for solving.
2.4 Sequences and Summations
CPSC 411 Design and Analysis of Algorithms
String matching.
Recap lecture 29 Example of prefixes of a language, Theorem: pref(Q in R) is regular, proof, example, Decidablity, deciding whether two languages are equivalent.
2-Dimensional Pattern Matching
Computational Geometry
CPSC 411 Design and Analysis of Algorithms
Presentation transcript:

S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University

Motivation Searching for Templates in Aerial Photographs Input: Aerial photo Template Task: Search for all locations where the template appears in the image.

Model Low level (pixel level) avoid costly processing Asymptotically efficient solutions. Serial, exact algorithms.

Types of Approximations Local errors: Level of detail Occlusion Noise results: O(n² log m) mismatches O(n²k²( edit distance, k errors, rectangular patterns. O(n²k√(m log m) √(k log k) edit distance, k errors, half rectangular patterns AL-88 AF-95

Types of Approximation Orientation. results: O(n²m ) FU-98 O(n²m³) ACL-98 Scaling: Natural scales: results: O(n) 1-d EV-88 O(n² log |Σ|) 2-d ALV-92 O(n²) dictionary AC-96 Real scales: this result: O(n) 1-d, truncation 5

It seems daunting, but…

CPM 2003: Morelia, Mexico

Problem inherently inexact What if occurrence is 1½ times bigger? What is the meaning of “½ a pixel”? Solutions until now: Natural Scales - Consider only discrete scales: 1, 2, 3, 4, 5,...

Definition: Text: Pattern: Find all occurrences of the pattern in the text in all discrete sizes. m m n n

Discrete exact Scaled Matching T P A A A A A A A A A A A A A A A A A A C C A A C A A A A C C A A A A A C C A A A A A A A C C A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C A A A A A C C C A A A A A A A A A A C C C A A A A C A A A A A A A A A A A A A A A A A A A A A A A A C C A C A A A A A A A A A A A A A

Discrete exact Scaled Matching P Z U Y K V S X E T P³ Z Z Z U U U Y Y Y K K K V V V S S S X X X E E E T T T

Idea: Fix a scale s Constant amount of work for each square (s-block) s s n n/s

Algorithm time Time for scale s: Total time: converges to a constant Making the total time O(n²)

Problem: Real scales Was open even for strings … How do we define? aabcccbb Scaled to 2: aaaabbccccccbbbb Scaled to 1½: aaab cccc bbb truncate truncate ½b ½c

Formally: Denote: a aaa... a Problem Definition 1: Input: Pattern Text: Output: All text locations where appears for some r times r

Remark α ≥ 1 means we only scale “up” Reasons: Avoid conceptual problem of loss of resolution. From “far enough” away everything looks the same. By our definition, for k<1/m there is a match at every text location.

Simplify definition Definition 2: Look for in the text. Example: P=aabcccbbbb Match by definition 2: daaabccccbbbbbbe Match by definition 1 but not by def 2: daaaabccccbbbbbbbe

Why are definitions equivalent? Split text and pattern to symbol part T s, P s and length part T L, P L. Example: P= aabcccbbbb P s =abcb P L =2134 T=daaabccccbbbbbbe T s =dabcbe T L =131461

Time Time for split: O(n+m) Finding P s in T s : O(n+m) (e.g. KMP) HARD PART: Finding P L in T L.

Definitions are Equivalent Claim: Solving def 2 in time O(f(n)) Solving def 1 in time O(f(n)). Why? - Find in time O(f(n)) - For each match verify 1 st and last symbol in constant time in T s and T L. Total time: O(f(n)+n)=O(f(n)).

Na ï ve algorithm for matching P L in T L For each text location, position pattern starting at that location and calculate interval [t/p, (t+1)/p) for each resulting pair. This is the interval of possible scales since t/p·p = t for every α < t/p, |αp| < t (t+1)/p ·p = t+1 for every α ≥ t/p, |αp| > t

Check intersection If intersection of all intervals is not empty then there is a match. Time: O(nm) Example: P L : T L : [1,3/2) [4,5) The intersection is empty thus no scaled match in location 1. But…

Check intersection If intersection of all intervals is not empty then there is a match. Time: O(nm) Example: P L : T L : [2,5/2) [2,3) [2,5/2)[7/3,8/3)[2,5/2) The intersection is [7/3,5/2) thus there is a scaled match in location 2.

Improvement – Parameterized Matching Introduced: Baker Motivation: “copying” code.

Parameterized Matching Input: two strings s and t |s|=|t|, over alphabets ∑ s and ∑ t. s parameterize matches t: if bijection : ∑ s ∑ t, such that (s) = t. (a)=x (b)=y aa b bb xx yyy Example:

Parameterized Matching Claim (AFM-94): For Σ that can be sorted in linear time (e.g. Σ={1,..., n}) Parameterized matching can be done in time O(n).

The reduction Lemma: for which P L matches T L at location i scaled to α only if P L p-matches T L at i. Proof: Assume P L does not p-match T L at location i. The possible situations are:

Possibility 1 w.l.o.g. c ≥ a+1 For c = a+1 (smallest possible): TLTL PLPL a bb c≠a

Possibility 2 w.l.o.g. c ≥ b+1 Intersection not empty only if: (a+1)/(b+1) > a/b i.e. ab+b > ab+a b>a But this can never happen if α ≥ 1. TLTL PLPL a bc≠b a

Algorithm for Real Scaled String Matching Let { P i 1, P i 2,..., P i j } be the different numbers in P L. 1.P-match P L in T L. 2.For each match, chack intersection of intervals between P i 1,..., P i j and corresponding symbols in T L. End Algorithm

P L = P i 1 =2 P i 2 =3 p-matches T L = scaled match Example:

Important Fact: So there are at most O(√m) different P i k ’s. Time: O(n) for parameterized matching (Σ={1,2,…,n}). O(√m) verification for each location. Total: O(n√m).

Tighter analysis Upper bound number of possible p-matches. Lemma: Let |P|=m, |T|=n, { P i 1, P i 2,..., P i j } be the different numbers in P L. Then there are at most n/2j p-matches of P L in T L. Meaning: Since verification time is O(j) per p-match, the lemma implies that total verification time is: O((n/2j) · j) = O(n)

Proof of Lemma: 1 st appearance of P i 1,..., P i j P L P i 1 P i 2 P i j T L a 1 a 2 a j m-match

Lemma’s proof (cont.) Let x be the total number of p-matches in the text. The sum of all text elements that match 1 st occurrences of P i k ‘s in the pattern ≥ (xj²)/2 But: There are overlaps! How many?

Lemma’s proof (cont.) For each text location, at most j matches will count it. Therefore… Total count without overlaps ≥ Clearly: x·j/2 ≤ n thus x ≤ (2n)/j

Open Problem: Give 1-d algorithm linear in run-length compressed text and pattern.