Hannu Peltola Jorma Tarhio Aalto University Finland Variations of Forward-SBNDM.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.
1 String Matching of Bit Parallel Suffix Automata.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Pattern Matching1. 2 Outline and Reading Strings (§9.1.1) Pattern matching algorithms Brute-force algorithm (§9.1.2) Boyer-Moore algorithm (§9.1.3) Knuth-Morris-Pratt.
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen
1 The Colussi Algorithm Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen Correctness and Efficiency of Pattern Matching Algorithms Information and Computation,
A Fast String Matching Algorithm The Boyer Moore Algorithm.
The chromosomes contains the set of instructions for alive beings
Boyer-Moore Algorithm 3 main ideas –right to left scan –bad character rule –good suffix rule.
1 A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber Tech. Rep. TR94-17,Department of Computer Science, University of Arizona, May 1994.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
A Pre-Processing Algorithm for String Pattern Matching Laurence Boxer Department of Computer and Information Sciences Niagara University and Department.
Aho-Corasick String Matching An Efficient String Matching.
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Reverse Colussi algorithm
Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee
Indexing and Searching
Pattern Matching1. 2 Outline Strings Pattern matching algorithms Brute-force algorithm Boyer-Moore algorithm Knuth-Morris-Pratt algorithm.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Advanced Algorithm Design and Analysis (Lecture 3) SW5 fall 2004 Simonas Šaltenis E1-215b
Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
Tamanna Chhabra, Sukhpal Singh Ghuman, Jorma Tarhio Tuning Algorithms for Jumbeled Matching.
SHRiMP: Accurate Mapping of Short Reads in Letter- and Colour-spaces Stephen Rumble, Phil Lacroute, …, Arend Sidow, Michael Brudno.
MCS 101: Algorithms Instructor Neelima Gupta
Exact String Matching Algorithms: A Survey Mehreen Ali, Hina Naz Khan, Shumaila Sayyab, Nadeem Iftikhar Department of Bio-Science Mohammad Ali Jinnah University,
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
Application: String Matching By Rong Ge COSC3100
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Tamanna Chhabra, M. Oguzhan Kulekci, and Jorma Tarhio Aalto University.
MCS 101: Algorithms Instructor Neelima Gupta
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
Exact String Matching Algorithms Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
CS5263 Bioinformatics Lecture 15 & 16 Exact String Matching Algorithms.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
Rabin & Karp Algorithm. Rabin-Karp – the idea Compare a string's hash values, rather than the strings themselves. For efficiency, the hash value of the.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Advanced Data Structure: Bioinformatics
Alternative Algorithms for Lyndon Factorization
Source : Practical fast searching in strings
Exact string matching: one pattern (text on-line)
Recuperació de la informació
13 Text Processing Hongfei Yan June 1, 2016.
Rabin & Karp Algorithm.
How does a computer represent everything using just zeros and ones?
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.
Jumbled Matching with SIMD
Tècniques i Eines Bioinformàtiques
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Improved Two-Way Bit-parallel Search
Sequences 5/17/ :43 AM Pattern Matching.
Presentation transcript:

Hannu Peltola Jorma Tarhio Aalto University Finland Variations of Forward-SBNDM

Aug. 29, 2011 Aims Tuning algorithms for exact string matching. Studying the effect of simultaneous 2-byte read.

Aug. 29, 2011 SBNDM Simple Backward Nondeterministic DAWG Matching SBNDM [18] is a simplification of BNDM [17]. Both are bit-parallel algorithms. Text T = t 1...t n, pattern P = p 1...p m. At each alignment window of P in T, scan T from right to left until the suffix of the window is not a factor of P or an occurrence of P is found.

Aug. 29, 2011 Shift of SBNDM No factor: m P found: 1 Else: next alignment starts at the last factor

Aug. 29, 2011 SBNDM, example P = banana, T = antanabadbanana... alignment: antanabadbanana a na ana

Aug. 29, 2011 SBNDM, example P = banana, T = antanabadbanana... alignment: antanabadbanana a na ana not a factor: tana next alignment: antanabadbanana

Aug. 29, 2011 SBNDM, example P = banana, T = antanabadbanana... alignment: antanabadbanana a na ana not a factor: tana next alignment: antanabadbanana not a factor: d next alignment: antanabadbanana

Aug. 29, 2011 SBNDMq SBNDMq [6] is a tuned version of SBNDM. Processing of an alignment starts with checking a q-gram. Let q = 4. Consider an alignment at antana. Instead of testing four suffixes a, na, ana, tana, only tana is tested. Testing is done in a fast loop.

Aug. 29, 2011 Forward-SBNDM Forward-SBNDM (FSB for short) by Faro & Lecroq [7] is a lookahead version of SBNDM2. Both FSB and SBNDM2 read a 2-gram x 1 x 2 before a factor test. x 1 x 2 is matched with the end of P in SBNDM2. Only x 1 is matched with the end of P in FSB, and x 2 is a lookahead character following the current alignment. FSB is faster than SBNDM2 for large alphabets.

Aug. 29, 2011 Generalization of FSB: FSB(q,f) FSB(q,f) (= Forward-SBNDM(q,f)) is SBNDMq with f lookahead characters, f = 0, 1,..., q - 1. FSB(2,1) = FSB and FSB(q,0) = SBNDMq. Motivation: SBNDMq works well on modern processors also for q>2.

Aug. 29, 2011 FSB(q,f) Let UV be a q-gram, where |V| = f. After reading UV there are 3 alternatives: i. If U is a suffix of P, reading continues leftwards. ii. Else if UV is a factor of P, reading continues leftwards. iii. Else the state vector is zero and P is shifted m - q+f+1 positions (f positions more than in SBNDMq).

Aug. 29, 2011 Occurrence vectors in FSB(q,2) Example: P = banana banana SBNDMq: B[n] = FSB(q,2): B[n] = B[a] = B[x] = extra bits

Aug. 29, 2011 State vectors in FSB(q,2) for q=4 4-gram nanx : x n a n gram State vectorConclusion nanx na is a suffix of P xana not a factor anan factor of P nanx is not a factor

Aug. 29, 2011 Benefits / drawbacks of lookahead characters and extra bits Benefits Longer shifts  more speed Combined suffix / factor test Drawback More q-grams accepted  less speed

Aug. 29, 2011 Greedy skip loop for SBNDM2 (GSB2 = Greedy-SBNDM2) Factor tests of two 2-grams are done in one round. Let B 2 [x,y] denote the combined occurrence vector of characters x and y. B 2 [x,y] = B[x] & (B[y]<<1) next: D  B 2 [t i,t i+1 ] if D = 0 then if B 2 [t i+m-1,t i+m ] = 0 then i  i+2*m-2 goto next

Aug. 29, byte read Read two characters (= 2 bytes = 16 bits) in one instruction (in a skip loop). Suits well q-gram algorithms with even q. For experiments we made two versions of the algorithms: Standard (1-byte read) b-version using 2-byte read

Aug. 29, byte read (cont.) Advantage: a part of computation can moved to preprocessing phase Example: B 2 [x,y] = B[x] & (B[y]<<1) Speed-up factor even more than 2 Drawback: extra 0.1 ms for preprocessing.

Aug. 29, byte read? Many border crosses happen => slow down 2 32 tables too big for practice

Aug. 29, 2011 Experimental results/KJV Bible In the recent comparison S. Faro, T. Lecroq: The Exact String Matching Problem: a Comprehensive Experimental Evaluation (2010), the algorithms EBOM and Hash3 were the fastest in the bible text for m = 4,..., Hash EBOM

Aug. 29, 2011 KJV: EBOM & Hash3 (on ThinkPad X61s)

Aug. 29, 2011 KJV: EBOMb & Hash3b (with 2-byte read) added

Aug. 29, 2011 KJV: SBNDM2b = FSB(2,0)b added

Aug. 29, 2011 KJV: GSB2b added

Aug. 29, 2011 KJV: FSB(4,i)b added, i = 0,1,2

Aug. 29, 2011 KJV: Speed-up factors of 2-byte read GSB21.32 FSB(2,0)1.34 FSB(2,1)1.24 FSB(4,0)1.72 FSB(4,1)2.15 FSB(4,2)2.03 Hash31.05 EBOM1.17

Aug. 29, 2011 Other experiments DNA and binary data was also tested. Gain of lookahead characters or the greedy loop was smaller than with the bible data. Gain of 2-byte read was smaller with 64-bit code than with 32-bit code.

Aug. 29, 2011 Conclusions Two new algorithms were presented: FSB(q,f) GSB2 The new algorithms are faster than earlier algorithms on English data: GSB2 for m = 4, …, 8 FSB(q,f) for m = 8, …, 20 2-byte read makes most string algorithms faster.

Aug. 29, 2011 Web site for practical speed comparison cse.aalto.fi/stringmatching