MA/CSSE 473 Day 24 Student questions Quadratic probing proof

Slides:



Advertisements
Similar presentations
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Space-for-Time Tradeoffs
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
Boyer Moore Algorithm String Matching Problem Algorithm 3 cases Searching Timing.
Dept of Computer Science, University of Bristol. COMS Chapter 5.2 Slide 1 Chapter 5.2 String Searching - Part 2 Boyer-Moore Algorithm Rabin-Karp.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search, part 1)
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Boyer-Moore Algorithm 3 main ideas –right to left scan –bad character rule –good suffix rule.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
  ;  E       
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Theory of Algorithms: Space and Time Tradeoffs James Gain and Edwin Blake {jgain | Department of Computer Science University of Cape.
Chapter 6 Transform-and-Conquer Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Fundamental Data Structures and Algorithms
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
MA/CSSE 473 Day 25 Student questions Boyer-Moore.
CSC 421: Algorithm Design & Analysis
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
CSG523/ Desain dan Analisis Algoritma
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
CSC 421: Algorithm Design & Analysis
Data Structures Using C++ 2E
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Data Structures Using C++ 2E
Advanced Associative Structures
CSCE350 Algorithms and Data Structure
Instructor: Lilian de Greef Quarter: Summer 2017
Space-for-time tradeoffs
Collision Resolution Neil Tang 02/18/2010
Chapter 7 Space and Time Tradeoffs
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSC 421: Algorithm Design & Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Space-for-time tradeoffs
Space-for-time tradeoffs
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Collision Resolution Neil Tang 02/21/2008
Space-for-time tradeoffs
Space-for-time tradeoffs
MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore
CSE 373: Data Structures and Algorithms
Presentation transcript:

MA/CSSE 473 Day 24 Student questions Quadratic probing proof String search Horspool Don't include the second Space-time tradeoffs slide in the on-line slides

Finish the Hashing discussion: quadratic probing

Collision Resolution: Quadratic probing With linear probing, if there is a collision at H, we try H, H+1, H+2, H+3, ... (all modulo the table size) until we find an empty spot. Causes (primary) clustering With quadratic probing, we try H, H+12. H+22, H+32, ... Eliminates primary clustering, but can cause secondary clustering. Is it possible that it misses some available array positions? I.e it repeats the same positions over and over, while never probing some other positions?

Hints for quadratic probing Choose a prime number for the array size, then … If the array is not more than half full, finding a place to do an insertion is guaranteed , and no cell is probed twice before finding it Suppose the array size is P, a prime number greater than 3 Show by contradiction that if i and j are ≤ ⌊P/2⌋, and i≠j, then H + i2 (mod P) ≢ H + j2 (mod P). Use an algebraic trick to calculate next index Replaces mod and general multiplication with subtraction and a bit shift Difference between successive probes: H + (i+1)2 = H + i2 + (2i+1) [can use a bit-shift for the multiplication]. nextProbe = nextProbe + (2i+1); if (nextProbe >= P) nextProbe -= P; Suppose i≠j, and H + i2 (mod P)  H + j2 (mod P). Then (H + i2) -(H + j2)  0 (mod P) (i-j)(i+j)  0 (mod P). But neither 0 < (i + j) < P, so i=j, a contradiction.

Quadratic probing analysis No one has been able to analyze it Experimental data shows that it works well Provided that the array size is prime, and is the table is less than half full

Brute Force String Search Example The problem: Search for the first occurrence of a pattern of length m in a text of length n. Usually, m is much smaller than n. What makes brute force so slow? When we find a mismatch, we can shift the pattern by only one character position in the text. Text: abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra Pattern: abracadabra abracadabra abracadabra abracadabra abracadabra

Faster String Searching Was a HW problem Brute force: worst case m(n-m+1) A little better: but still Ѳ(mn) on average Short-circuit the inner loop

What we want to do When we find a character mismatch Shift the pattern as far right as we can Without the possibility of skipping over a match.

Horspool's Algorithm A simplified version of the Boyer-Moore algorithm A good bridge to understanding Boyer-Moore Published in 1980 Recall: What makes brute force so slow? When we find a mismatch, we can only shift the pattern to the right by one character position in the text. Text: abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra Pattern: abracadabra abracadabra abracadabra abracadabra Can we sometimes shift farther? Like Boyer-Moore, Horspool does the comparisons in a counter-intuitive order (moves right-to-left through the pattern) We mostly look at Horspool as a bridge to understanding Boyer-Moore

Horspool's Main Question If there is a character mismatch, how far can we shift the pattern, with no possibility of missing a match within the text? What if the last character in the pattern is compared to a character in the text that does not occur anywhere in the pattern? Text: ... ABCDEFG ... Pattern: CSSE473

How Far to Shift? Look at first (rightmost) character in the part of the text that is compared to the pattern: The character is not in the pattern .....C.......... {C not in pattern) BAOBAB The character is in the pattern (but not the rightmost) .....O..........(O occurs once in pattern) BAOBAB .....A..........(A occurs twice in pattern) The rightmost characters do match .....B...................... Tell students to think about how far we can

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Shift Table Example Shift table is indexed by text and pattern alphabet E.g., for BAOBAB: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6

Example of Horspool’s Algorithm A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 _ 6 BARD LOVED BANANAS (this is the text) BAOBAB (this is the pattern) BAOBAB BAOBAB (unsuccessful search)

Horspool Code

Continued on next slide Horspool Example pattern = abracadabra text = abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra shiftTable: a3 b2 r1 a3 c6 a3 d4 a3 b2 r1 a3 x11 abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra abracadabra Continued on next slide

Horspool Example Continued pattern = abracadabra text = abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra shiftTable: a3 b2 r1 a3 c6 a3 d4 a3 b2 r1 a3 x11 abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra abracadabra 49 Using brute force, we would have to compare the pattern to 50 different positions in the text before we find it; with Horspool, only 13 positions are tried.

Boyer Moore Intro When determining how far to shift after a mismatch Horspool only uses the text character corresponding to the rightmost pattern character Can we do better? Often there is a partial match (on the right end of the pattern) before a mismatch occurs Boyer-Moore takes into account k, the number of matched characters before a mismatch occurs. If k=0, same shift as Horspool.

Boyer-Moore Algorithm Based on two main ideas: compare pattern characters to text characters from right to left precompute the shift amounts in two tables bad-symbol table indicates how much to shift based on the text’s character that causes a mismatch good-suffix table indicates how much to shift based on matched part (suffix) of the pattern Details next session

Bad-symbol shift in Boyer-Moore If the rightmost character of the pattern does not match, Boyer-Moore algorithm acts much like Horspool’s If the rightmost character of the pattern does match, BM compares preceding characters right to left until either all pattern’s characters match, or a mismatch on text’s character c is encountered after k > 0 matches text pattern bad-symbol shift: How much should we shift by? d1 = max{t1(c ) - k, 1} , where t1(c) is the value from the Horspool shift table.  k matches Q7

Boyer-Moore Algorithm After successfully matching 0 < k < m characters, the algorithm shifts the pattern right by d = max {d1, d2} where d1 = max{t1(c) - k, 1} is the bad-symbol shift d2(k) is the good-suffix shift Remaining question: How to compute good-suffix shift table? d2[k] = ???

Good-suffix Shift in Boyer-Moore Good-suffix shift d2 is applied after the k last characters of the pattern are successfully matched 0 < k < m How can we take advantage of this? As in the bad suffix table, we want to pre-compute some information based on the characters in the suffix. We create a good suffix table whose indices are k = 1...m-1, and whose values are how far we can shift after matching a k-character suffix (from the right). Spend some time talking with one or two other students. Try to come up with criteria for how far we can shift. Example patterns: CABABA AWOWWOW WOWWOW ABRACADABRA First, look (right to left) for a previous substring that matches the suffix. What if the previous character (before the matching part) is the same as c? Shifting that much definitely will not be a match. So we want the rightmost match that is not preceded by c. What if there isn't one. We'd think that we could just shift by m? But what if a prefix of the pattern matches a suffix (length less than k)? Q8-10

Boyer-Moore Example On Moore's home page http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.html