Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Space-for-Time Tradeoffs
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
3 -1 Chapter 3 String Matching String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching.
Tries Standard Tries Compressed Tries Suffix Tries.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search, part 1)
Sabegh Singh Virdi ASC Processor Group Computer Science Department
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Chapter 2: Algorithm Discovery and Design
ASC Language 1 Additional ASC Programming Comments NOTE: These are additional notes to be added to “ASC Programming” slides by Michael Scherger. Comparison.
Chapter 2 The Algorithmic Foundations of Computer Science
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
Character Matching Character Matching Systolic Design — Character Matching A straightforward approach to search for a pattern within a string of characters.
ECE 526 – Network Processing Systems Design Network Security: string matching algorithm Chapter 17: George Varghese.
1 Scalable Pattern-Matching via Dynamic Differentiated Distributed Detection (D 4 ) Author: Kai Zheng, Hongbin Lu Publisher: GLOBECOM 2008 Presenter: Han-Chen.
Chapter 2: Algorithm Discovery and Design
A Multiple Associative Model to Support Branches in Data Parallel Applications Wittaya Chantamas and Johnnie W. Baker Department of Computer Science Kent.
Chapter 2: Algorithm Discovery and Design
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.
Special Products Section 6.4. Find the product. (x + 2)(x + 2) (x + 3)(x + 3)
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Invitation to Computer Science 6th Edition
Invitation to Computer Science, Java Version, Second Edition.
KMP String Matching Prepared By: Carlens Faustin.
Arrays BCIS 3680 Enterprise Programming. Overview 2  Array terminology  Creating arrays  Declaring and instantiating an array  Assigning value to.
Tamanna Chhabra, Sukhpal Singh Ghuman, Jorma Tarhio Tuning Algorithms for Jumbeled Matching.
MCS 101: Algorithms Instructor Neelima Gupta
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
MCS 101: Algorithms Instructor Neelima Gupta
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Data Structures & Algorithms
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
1 String Matching Algorithms Topics  Basics of Strings  Brute-force String Matcher  Rabin-Karp String Matching Algorithm  KMP Algorithm.
1 String Processing CHP # 3. 2 Introduction Computer are frequently used for data processing, here we discuss primary application of computer today is.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
Invitation to Computer Science 5 th Edition Chapter 2 The Algorithmic Foundations of Computer Science.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For.
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
A Scalable Pipelined Associative SIMD Array With Reconfigurable PE Interconnection Network For Embedded Applications Hong Wang & Robert A. Walker Computer.
Chapter 9 Introduction to Arrays Fundamentals of Java.
Strings in Python String Methods. String methods You do not have to include the string library to use these! Since strings are objects, you use the dot.
LINKED LISTS.
15-853:Algorithms in the Real World
PACL and ASC Processor Research Overview
The Data Types and Data Structures
Programming in Machine Language
String Processing.
Prof. Neary Adapted from slides by Dr. Katherine Gibson
Tuesday, 12/3/02 String Matching Algorithms Chapter 32
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
Chapter 7 Space and Time Tradeoffs
Space-for-time tradeoffs
Knuth-Morris-Pratt Algorithm.
String Processing.
Space-for-time tradeoffs
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Improved Two-Way Bit-parallel Search
15-826: Multimedia Databases and Data Mining
Presentation transcript:

Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt April 18, 2007 Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt April 18, 2007

2 String Matching Problem / Aka. pattern matching or string searching / Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security / Aka. pattern matching or string searching / Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security

3 What are we doing? / Given a pattern and some text, find out if the pattern is IN the text / Is pattern AB in the text ABAA? If so, where? / Given a pattern and some text, find out if the pattern is IN the text / Is pattern AB in the text ABAA? If so, where? AB ABAA

4 What’s the notation? / P is a pattern string of length m / T is a text string of length n, usually n ≥ m / P is a pattern string of length m / T is a text string of length n, usually n ≥ m

5 Goal of String Matching / To find all occurrences of a pattern string in the text string / Locate all positions i in T such that T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m / To find all occurrences of a pattern string in the text string / Locate all positions i in T such that T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m Why use P[j]? How does it relate to T[i+j-1]?

6 Pattern Variations / An exact pattern / A “Don’t Care” character ( *) in pattern / Flexibility in matching / * indicates character(s) of the text that are irrelevant to the matching process / An exact pattern / A “Don’t Care” character ( *) in pattern / Flexibility in matching / * indicates character(s) of the text that are irrelevant to the matching process

7 General “Don’t Care” Character’s (*) Characteristics / Single character of text / Multiple consecutive text characters / No characters / Combination of above three Example: / Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null) / Single character of text / Multiple consecutive text characters / No characters / Combination of above three Example: / Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null)

8 String Matching using ASC / Three parallel algorithms using associative computing (using 1-D mesh) / String matching for exact match / String matching with fixed length “don’t care” / I.e., exactly 1 character / String matching with variable length “don’t care” / a “don’t care” can have any length or be null / Three parallel algorithms using associative computing (using 1-D mesh) / String matching for exact match / String matching with fixed length “don’t care” / I.e., exactly 1 character / String matching with variable length “don’t care” / a “don’t care” can have any length or be null

9 ASC Exact Match Algorithm for (j = patt_length - 1; j >= 0; j--) { Responders are text[$] == patt_string[j] and counter[$] == patt_counter; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell; for (j = patt_length - 1; j >= 0; j--) { Responders are text[$] == patt_string[j] and counter[$] == patt_counter; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell;

@00 A00 B00 B00 B00 A00 B00 B00 B00 A00 B00 A00 Text[$]Match[$]Counter[$] Pattern: BBA Text: ABBBABBBABA m=pattern length n=text length j = pattern index i = text index Pattern: BBA 0 patt_ counter patt_length 3

11

@01 A00 B03 B12 B01 A00 B03 B12 B01 A02 B01 A00 Text[$] Match[$] Counter[$] Pattern: BBA Text: ABBBABBBABA m = pattern length n = text length j = pattern index i = text index Final State of Exact Match Algorithm B B A B B A

13 Algorithm for unit length "don't cares" using ASC for (j = patt_length - 1; j >= 0; j--) { if (pattern[j] == '*') Responders are counter[$] == patt_counter; else // pattern[j] is not the “don’t care” character Responders are text[$] == pattern[j] and counter[$] == patt_counter; If no Responders are detected, exit; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell; for (j = patt_length - 1; j >= 0; j--) { if (pattern[j] == '*') Responders are counter[$] == patt_counter; else // pattern[j] is not the “don’t care” character Responders are text[$] == pattern[j] and counter[$] == patt_counter; If no Responders are detected, exit; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell;

14 ASC Exact Match Algorithm (again) for (j = patt_length - 1; j >= 0; j--) { Responders are text[$] == patt_string[j] and counter[$] == patt_counter; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell; for (j = patt_length - 1; j >= 0; j--) { Responders are text[$] == patt_string[j] and counter[$] == patt_counter; Responders add 1 to counter[$] and store result in counter[$] of preceding cell; patt_counter++; } /* When pattern has been processed */ Responders are counter[$] == patt_length; Responders set match[$] = 1 in next cell;

@00 A00 B00 B00 B00 A00 B00 B00 B00 A00 B00 A00 Text[$]Match[$]Counter[$] Pattern: BBA Text: ABBBABBBABA m=pattern length n=text length j = pattern index i = text index Pattern: B*A 0 patt_ counter patt_length 3

16

@01 A00 B03 B12 B01 A00 B03 B12 B01 A02 B01 A00 Text[$] Match[$] Counter[$] Pattern: B*A Text: ABBBABBBABA m = pattern length n = text length j = pattern index i = text index Final State of Exact Match Algorithm B B A B B A

18 VLDC Algorithm (added) / Works on each “segment” of the pattern broken up by the * character / AB*BB*A has three sections / Consecutive ** characters not necessary, not allowed / This VLDC algorithm unique / Provides information to find all continuation points of all matches following each “*” / Works on each “segment” of the pattern broken up by the * character / AB*BB*A has three sections / Consecutive ** characters not necessary, not allowed / This VLDC algorithm unique / Provides information to find all continuation points of all matches following each “*”

19 VLDC ALGORITHM USING ASC int patt_length = m; int maxcell = n + 2; /* Special handling for ‘*’ at end of pattern */ if (pattern[m-1] == ‘*’) { Responders are cell index > 1; Responders set segment$[0] = 1; patt_counter = 1; k = 1; /* Reset initial segment index */ } while ((patt_length -= patt_counter) > 0 && maxcell > 0) { patt_counter = 0; for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--) { Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell; Responders add 1 to counter$ and store result in counter$ of preceding cell; patt_counter++; } Responders are counter$ == patt_counter; int patt_length = m; int maxcell = n + 2; /* Special handling for ‘*’ at end of pattern */ if (pattern[m-1] == ‘*’) { Responders are cell index > 1; Responders set segment$[0] = 1; patt_counter = 1; k = 1; /* Reset initial segment index */ } while ((patt_length -= patt_counter) > 0 && maxcell > 0) { patt_counter = 0; for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--) { Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell; Responders add 1 to counter$ and store result in counter$ of preceding cell; patt_counter++; } Responders are counter$ == patt_counter;

20 VLDC continued Responders set segment$[k] = patt_counter in next cell; Responders are segment$[k] > 0; maxcell = maximum cell index value of Responders else if no Respondersmaxcell = 0; All cells become Responders and set counter$ = 0; patt_counter++; k++ } /* When pattern has been processed */ Responders are segment$[--k] > 0; Responders set match$ = 1; /* Special handling for ‘*’ at start of pattern */ if (pattern[0] == ‘*’) { Responders are cell index 1; Responders set match$ = 1; } Responders set segment$[k] = patt_counter in next cell; Responders are segment$[k] > 0; maxcell = maximum cell index value of Responders else if no Respondersmaxcell = 0; All cells become Responders and set counter$ = 0; patt_counter++; k++ } /* When pattern has been processed */ Responders are segment$[--k] > 0; Responders set match$ = 1; /* Special handling for ‘*’ at start of pattern */ if (pattern[0] == ‘*’) { Responders are cell index 1; Responders set match$ = 1; }

Pattern: AB*BB*A Text: ABBBABBBABA After third pattern segment in VLDC 00  1  0000Y  N A 00010100Y B B B 00  1  0000Y  N A 00010100Y B B B 00  1  0000Y  N A 00010100Y B 00  1  0000Y  N A 00010100Y 0  1  T$M$C$ 6 13  Maxcell S0$S1$S2$ Patt_counter 12 Responder$

Pattern: AB*BB*A Text: ABBBABBBABA After second pattern segment in VLDC A 00  1  2  0100Y B 0 00  20Y  Y  Y B 00  1  000  20Y  Y  N B 00000Y  N A 00  1  2  0100Y B 0 00  20Y  Y  Y B 00  1  000  20Y  Y  N B 00000Y  N A 00  1  0100 B 00000Y  N A 120123012012 T$M$Counter$ 6 13  12  Maxcell S0$S1$S2$ Patt_counter 12 Responder$ (Used to keep pattern segments in order, I.e. AB occurs before BB)

Pattern: AB*BB*A Text: ABBBABBBABA After first pattern segment in VLDC 00  2  0000Y A 00  1  0100202Y  N B 00  1  0020Y  N B 00  1  0020Y  N B 00  2  0000Y  N  Y A 00  1  0100202Y  N B B B A B A 12012301230120123012 T$M$Counter$ 6 13  12  8  Maxcell S0$S1$S2$ Patt_counter 12 Responder$ (Used to keep pattern segments in order, I.e. AB occurs before BB)

Pattern: AB*BB*A Text: ABBBABBBABA Final State in VLDC A 10102Y B B B A 10102Y B B B A B A 12012301230120123012 T$M$Counter$ 6 13  12  8  Maxcell S0$S1$S2$ Patt_counter 12 Responder$ (Used to keep pattern segments in order, I.e. AB occurs before BB)

25 Finding All Continuation Points / Match starts where M$ = 1 / Match to any pattern segment begins where S$[x] == segment length / i.e. where any S$[x] > 0 / Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index / Match starts where M$ = 1 / Match to any pattern segment begins where S$[x] == segment length / i.e. where any S$[x] > 0 / Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index

Pattern: AB*BB*A Text: ABBBABBBABA Using the Final State in VLDC A B B B A B B B A B A T$M$C$ S0$S1$S2$ 12 Start with index 2, where there’s a match M$=1 Work from S2$ down and left, count down 2 values and move into S1$, count down 2 values and move to S0$ That produces: 2  4  6 ABBBA Any index >= 4 in S1[$] whose value is >0 will also produce a correct match 2  7  10 ABBBABBBA 2  8  10 ABBBABBBA Some of the additional matches are: 2  4  10 ABBBABBBA 2  4  12 ABBBABBBABA 2  8  12 ABBBABBBABA 6  8  10 ABBBA 6  8  12 ABBBABA

27 Existing Algorithms / Sequential Algorithms / Naïve algorithm: O(mn) / Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n) / Parallel Algorithms / A PRAM exact string matching: O(n) / On a reconfigurable mesh: O(1) on n(n-m+1) PEs / On a SIMD hypercube (limited to {0,1}): O(lg n) on n/lg n PEs / On a neural network: O(1) on nm PEs / ASC algorithms: O(m) time on O(n) PEs / Sequential Algorithms / Naïve algorithm: O(mn) / Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n) / Parallel Algorithms / A PRAM exact string matching: O(n) / On a reconfigurable mesh: O(1) on n(n-m+1) PEs / On a SIMD hypercube (limited to {0,1}): O(lg n) on n/lg n PEs / On a neural network: O(1) on nm PEs / ASC algorithms: O(m) time on O(n) PEs

28 Question to consider / The “don’t care” character allows non-matching for an arbitrary length. This is discussed on slide 13. Instead, consider “*” to allow a non-match for two characters and make necessary changes in trace in Slide