Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

1 Approximate string matching using factor automata J. Holub and B. Melichar Theoretical Computer Science vol.249 p Speaker: L. C. Chen Advisor:
CS3012 Formal Languages and Compilers Frank Guerin Room 227 Lectures Monday11:00MT3 Tuesday9:00105 St. Marys Tutorials Thursday13:00 Thursday 14:00definitely.
Formal Languages Languages: English, Spanish,... PASCAL, C,... Problem: How do we define a language? i.e. what sentences belong to a language? e.g.Large.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Equivalence, Order, and Inductive Proof
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
Indexing Text with Approximate q-grams Adriano Galati & Marjolijn Elsinga.
1 Approximate string matching using factor automata Jan Holub and Borivoj Melichar Theoretical Computer Science vol.249 p Speaker: L. C. Chen Advisor:
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Intro to Matrices Don’t be scared….
Stats & Linear Models.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
SPANISH CRYPTOGRAPHY DAYS (SCD 2011) A Search Algorithm Based on Syndrome Computation to Get Efficient Shortened Cyclic Codes Correcting either Random.
String Matching Chapter 32 Highlights Charles Tappert Seidenberg School of CSIS, Pace University.
Chapter 4 Matrices By: Matt Raimondi.
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
SPLASH: Structural Pattern Localization Analysis by Sequential Histograms A. Califano, IBM TJ Watson Presented by Tao Tao April 14 th, 2004.
Fall 2002CMSC Discrete Structures1 One, two, three, we’re… Counting.
March 10, 2015Applied Discrete Mathematics Week 6: Counting 1 Permutations and Combinations How many different sets of 3 people can we pick from a group.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
Great Theoretical Ideas in Computer Science.
Module 2 How to design Computer Language Huma Ayub Software Construction Lecture 8.
MCS 101: Algorithms Instructor Neelima Gupta
1 The number of crossings of curves on surfaces Moira Chas from Stony Brook University King Abdul- Aziz University Spring 2012.
Introduction to Theory of Automata By: Wasim Ahmad Khan.
Class Opener:. Identifying Matrices Student Check:
2009/9 1 Matrices(§3.8)  A matrix is a rectangular array of objects (usually numbers).  An m  n (“m by n”) matrix has exactly m horizontal rows, and.
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
MCS 101: Algorithms Instructor Neelima Gupta
Inferring Finite Automata from queries and counter-examples Eggert Jón Magnússon.
Word : Let F be a field then the expression of the form a 1, a 2, …, a n where a i  F  i is called a word of length n over the field F. We denote the.
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
1 Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also.
 The Sinkhorn-Knopp Algorithm and Fixed Point Problem  Solutions for 2 × 2 and special n × n cases  Circulant matrices for 3 × 3 case  Ongoing work.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Great Theoretical Ideas In Computer Science John LaffertyCS Fall 2005 Lecture 10Sept Carnegie Mellon University b b a b a a a b a b One.
Lecture # 4.
2/24/20161 One, two, three, we’re… Counting. 2/24/20162 Basic Counting Principles Counting problems are of the following kind: “How many different 8-letter.
Great Theoretical Ideas In Computer Science John LaffertyCS Fall 2006 Lecture 22 November 9, 2006Carnegie Mellon University b b a b a a a b a b.
A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki.
Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2005 Lecture 9Feb Carnegie Mellon University b b a b a a a b a b One Minute.
CSE 589 Applied Algorithms Spring 1999
13.4 Product of Two Matrices
Review of Matrix Operations
Regular Languages, Regular Operations, Closure
Chapter 7 Matrix Mathematics
Andrzej Ehrenfeucht, University of Colorado, Boulder
Matrix Operations Monday, August 06, 2018.
Matrix Operations.
Recap lecture 29 Example of prefixes of a language, Theorem: pref(Q in R) is regular, proof, example, Decidablity, deciding whether two languages are equivalent.
CSCI N207 Data Analysis Using Spreadsheet
2-Dimensional Pattern Matching
Permutations and Combinations
Suffix Arrays and Suffix Trees
3.6 Multiply Matrices.
Matrix A matrix is a rectangular arrangement of numbers in rows and columns Each number in a matrix is called an Element. The dimensions of a matrix are.
15-826: Multimedia Databases and Data Mining
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
Presentation transcript:

Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005

Introduction Problem description Naive algorithm Filter-based algorithms –A simple filter function –Takaoka-Zhu –Baker-Bird Baeza-Yates & Régnier Polcar Conclusions Future work Questions

Problem description One-dimensional pattern matching: finding all occurrences of a pattern string in a text string Two-dimensional pattern matching: finding all occurrences of a 2D pattern matrix in a 2D text matrix Applications: image processing,...

Naive algorithm Simply check for each position in the text whether there is a match there Most straightforward, but inefficient, solution Better algorithms –use gathered information to disregard a larger area of the text at onces and/or –precompute information to determine more quickly whether a match exists on a position in the text

Filter-based algorithms (0) Define a “filter function”, which transforms each row of the pattern matrix to a single value Using this function, reduce the pattern matrix to a single (column) vector

Filter-based algorithms (1) Apply the filter function to partial rows of the text matrix There can only be an occurrence where the pattern’s column vector occurs in the reduced text Use 1D pattern matching to find those occurrences

Filter-based algorithms: a simple filter function A simple example of a filter function: f(x) = x[0] Pattern: Text: aaa bab aaa a b a aaa bab aaa baa baa aba abb bbb bab bb ab bb baa abb aba aa ab aa aaa bab aaa baa baa aba abb bbb bab baa abb aba

Filter-based algorithms: Takaoka-Zhu Filter function: hash function from the (1D) Karp-Rabin algorithm aaa bab aaa baa baa aba abb bbb bab bb ab bb baa abb aba aa ab aa

Filter-based algorithms: Baker-Bird (0) Based on Aho-Corasick automaton –Aho-Corasick is an algorithm for (1D) multipattern matching –It uses a special automaton, based on the pattern strings Filter function for Baker-Bird: state in the Aho-Corasick automaton, based on the pattern’s rows

Filter-based algorithms: Baker-Bird (1) Pattern: Trie based on pattern rows {aaa, bab}: q0 q1q2q3 q4q5q6 a aa a b b aaa bab aaa

Filter-based algorithms: Baker-Bird (2) Pattern: Aho-Corasick automaton based on pattern rows {aaa, bab}: q0 q1q2q3 q4q5q6 a aa a b b b b b b a b a a aaa bab aaa q3 q6 q3 b

Filter-based algorithms: Baker-Bird (3) Pattern: Text: aaa bab aaa q3 q6 q3 aaa bab aaa baa baa aba abb bbb bab bb ab bb baa abb aba aa ab aa q4q5 q6q4q5 q3 q4 q2q4 q2q3q4 q5q6q4 q5 q4q5q6 q5q6 q2q3 q4q5q6 q5q2q3

Baeza-Yates & Régnier (0) Say our pattern has m rows In the text, each occurrence of the pattern intersects with exactly one row of the form i * m – 1 0 m-1 2*m-1 3*m-1

Baeza-Yates & Régnier (1) Algorithm idea: –use 1D multipattern matching to search for occurrences of any pattern row in these rows of the text –where such a match occurs, check if there is a match with the entire pattern in the surrounding area aaa bab aaa aaa bab aaa baa baa aba abb bbb bab bb ab bb baa abb aba aa ab aa

Polcar (0) In some 1D pattern matching algorithms, we view an occurrence of the pattern as a suffix of a prefix of the text For Polcar, we do the same in two dimensions

Polcar (1) For each prefix of the text A, we compute the set of suffixes of A that are also a prefix of the pattern:

Polcar (1) For each prefix of the text A, we compute the set of suffixes of A that are also a prefix of the pattern:

Polcar (1) For each prefix of the text A, we compute the set of suffixes of A that are also a prefix of the pattern:

Polcar (1) For each prefix of the text A, we compute the set of suffixes of A that are also a prefix of the pattern:

Polcar (2) In derivations of the corresponding 1D pattern matching algorithms, sets of prefixes of the pattern are represented by their element of maximum length In 2D there is not always one unique maximum But these sets of matrices can be represented by their maximal elements

Conclusions Presentation of several 2D pattern matching algorithms All of them have been formally derived –derivation is a formal proof –derivations show the major design decisions Similarities between the filter-based algorithms Several improvements to existing algorithms –most notably: in Polcar’s algorithm, sets of matrices can be represented by their maximal elements

Future work Derive other existing algorithms Construct a taxonomy Find new algorithms Expand existing pattern matching toolkits (SPARE Time / SPARE Parts) or create a new 2D pattern matching toolkit Thorough performance analysis Further generalisations of the 2D pattern matching problem –Multipattern matching –More than two dimensions –Approximate 2D pattern matching –Patterns of non-rectangular shapes –...

Questions