1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.

Slides:



Advertisements
Similar presentations
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Advertisements

Space-for-Time Tradeoffs
Data Structures Using C++ 2E
Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.
Tries Standard Tries Compressed Tries Suffix Tries.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
Goodrich, Tamassia String Processing1 Pattern Matching.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
1 Pattern Matching Using n-grams With Algebraic Signatures Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas Schwarz[2] [1] Université Paris Dauphine.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Theory of Algorithms: Space and Time Tradeoffs James Gain and Edwin Blake {jgain | Department of Computer Science University of Cape.
1 Pattern Matching Using n-gram Sampling Of Cumulative Algebraic Signatures : Preliminary Results Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas.
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
CSC 211 Data Structures Lecture 13
1 CSC 427: Data Structures and Algorithm Analysis Fall 2008 Algorithm analysis, searching and sorting  best vs. average vs. worst case analysis  big-Oh.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
CSC 427: Data Structures and Algorithm Analysis
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Exact String Matching Algorithms Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
MA/CSSE 473 Day 25 Student questions Boyer-Moore.
CSC 421: Algorithm Design & Analysis
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CSG523/ Desain dan Analisis Algoritma
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
CSC 427: Data Structures and Algorithm Analysis
CSC 427: Data Structures and Algorithm Analysis
CSC 421: Algorithm Design & Analysis
Hash table CSC317 We have elements with key and satellite data
CSC 222: Object-Oriented Programming
Mark Redekopp David Kempe
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
CSC 421: Algorithm Design Analysis
13 Text Processing Hongfei Yan June 1, 2016.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
Chapter 7 Space and Time Tradeoffs
CSC 421: Algorithm Design & Analysis
Space-for-time tradeoffs
Space-for-time tradeoffs
CSC 421: Algorithm Design Analysis
CSC 421: Algorithm Design Analysis
CSC 427: Data Structures and Algorithm Analysis
Amortized Analysis and Heaps Intro
Space-for-time tradeoffs
Space-for-time tradeoffs
Sequences 5/17/ :43 AM Pattern Matching.
Lecture-Hashing.
Presentation transcript:

1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string matching brute force, Horspool algorithm, Boyer-Moore algorithm

Space vs. time for many applications, it is possible to trade memory for speed  i.e., additional storage can allow for a more efficient algorithm  ArrayList wastes space when it expands (doubles in size) each expansion yields a list with (slightly less than) half empty improves overall performance since only log N expansions when adding N items  linked structures sacrifice space (reference fields) for improved performance e.g., a linked list takes twice the space, but provides O(1) add/remove from either end  heap sort can perform an efficient sort of a list by building a heap then extracting  hash tables can obtain O(1) add/remove/search if willing to waste space to minimize the chance of collisions, must keep the load factor low HashSet & HashMap resize (& rehash) if load factor every reaches 75% 2

Space vs. time in fact, you have made these space/time tradeoffs often in your code  AnagramFinder (find all anagrams of a given word) can preprocess the dictionary & build a map of words, keyed by sorted letters then, each anagram lookup is O(1)  BinaryTree (implement a binary tree & eventually extend to BST) kept track of the number of nodes in a field (updated on each add/remove) turned the O(N) size operation into O(1)  OKReads (store book reviews & look up by reviewer or title) to allow for efficient searches by either, needed to build redundant data structures  other examples? 3

String matching many useful applications involve searching text for patterns  word processing (e.g., global find and replace)  data mining (e.g., looking for common parameters)  genomics (e.g., searching for gene sequences) TTAAGGACCGCATGCCCTCGAATAGGCTTGAGCTTGCAATTAACGCCCAC… in general  given a (potentially long) string S and need to find (relatively small) pattern P  an obvious, brute force solution exists: O(|S|  |P|)  however, utilizing preprocessing and additional memory can reduce this to O(|S|) in practice, it is often < |S| HOW CAN THIS BE?!? 4

Brute force String: FOOBARBIZBAZ Pattern: BIZ the brute force approach would shift the pattern along, looking for a match: FOOBARBIZBAZ 5 at worst: |S|-|P|+1 passes through the pattern each pass requires at most |P| comparisons  O(|S|  |P|)

Smart shifting FOOBARBIZBAZ BIZ suppose we look at the rightmost letter in the attempted match  here, compare Z in the pattern with O in the string since there is no O in the pattern, can skip the next two comparisons 6 FOOBARBIZBBAZ BIZ again, no R in BIZ, so can skip another two comparisons FOOBARBIZBBAZ BIZ

Horspool algorithm given a string S and pattern P, 1.Construct a shift table containing each letter of the alphabet. if the letter appears in P (other than in the last index)  store distance from end otherwise, store |P| e.g., P = BIZ 2.Align the pattern with the start of the string S, i.e., with S 0 S 1 …S |P|-1 3.Repeatedly, compare from right-to-left with the aligned sequence S i S i+1… S i+|P|-1 if all |P| letters match, then FOUND. if not, then shift P to the right by shiftTable[S i+|P|-1 ] if the shift falls off the end, then NOT FOUND 7 ABC…I…XYZ 323…1…333

Horspool example 1 S = FOOBARBIZBAZP=BIZ shiftTable for "BIZ" FOOBARBIZBAZ BIZ Z and O do not match, so shift shiftTable[O] = 3 spots FOOBARBIZBAZ BIZ Z and R do not match, so shift shiftTable[R] = 3 spots FOOBARBIZBAZ BIZ pattern is FOUND  total number of comparisons = 5 8 ABC…I…XYZ 323…1…333

Horspool example 2 S = "FOZIZBARBIZBAZ"P="BIZ" shiftTable for "BIZ" FOZIZBARBIZBAZ BIZ Z and Z match, but not I and O, so shift shiftTable[Z] = 3 spots FOZIZBARBIZBAZ BIZ Z and B do not match, so shift shiftTable[B] = 2 spots FOZIZBARBIZBAZ BIZ Z and R do not match, so shift shiftTable[R] = 3 spots FOZIZBARBIZBAZ BIZ pattern is FOUND  total number of comparisons = 7 9 ABC…I…XYZ 323…1…333 suppose we are looking for BIZ in a string with no Z's how many comparisons?

Horspool analysis space & time  requires storing shift table whose size is the alphabet  since the alphabet is usually fixed, table requires O(1) space  worst case is O(|S|  |P|) this occurs when skips are infrequent & close matches to the pattern appear often  for random data, however, only O(|S|) 10 Horspool algorithm is a simplification of a more complex (and well-known) algorithm: Boyer-Moore algorithm  in practice, Horspool is often faster  however, Boyer-Moore has O(|S|) worst case, instead of O(|S|  |P|)

Boyer-Moore algorithm based on two kinds of shifts (both compare right-to-left, find first mismatch)  the first is bad-symbol shift (based on the symbol that caused the mismatch) BIZFIZIBIZFIZBIZ FIZBIZ F and B don't match, shift to align F BIZFIZIBIZFIZBIZ FIZBIZ I and Z don't match, shift to align I BIZFIZIBIZFIZBIZ FIZBIZ I and Z don't match, shift to align I BIZFIZIBIZFIZBIZ FIZBIZ F and Z don't match, shift to align F BIZFIZIBIZFIZBIZ FIZBIZ FOUND 11

Bad symbol shift bad symbol table is |alphabet|  |P|  kth row contains shift amount if mismatch occurred at index k 12 ABC…F…I…YZ 0626…5…1… …4…-… …3…2… …2…1… …1…-…22 bad symbol table for FIZBIZ: BIZFIZIBIZFIZBIZ FIZBIZ F and B don't match  badSymbolTable(F,2) = 3 BIZFIZIBIZFIZBIZ FIZBIZ I and Z don't match  badSymbolTable(I, 0) = 1 BIZFIZIBIZFIZBIZ FIZBIZ I and Z don't match  badSymbolTable(I, 3) = 1 BIZFIZIBIZFIZBIZ FIZBIZ F and Z don't match  badSymbolTable(F, 0) = 5 BIZFIZIBIZFIZBIZ FIZBIZ FOUND

Good suffix shift find the longest suffix that matches  if that suffix appears to the left in P preceded by a different char, shift to align  if not, then shift the entire length of the word BIZFIZIBIZFIZBIZ FIZBIZ IZ suffix matches, IZ appears to left so shift to align BIZFIZIBIZFIZBIZ FIZBIZ no suffix match, so shift 1 spot BIZFIZIBIZFIZBIZ FIZBIZB IZ suffix matches, doesn't appear again so full shift BIZFIZIBIZFIZBIZ FIZBIZ FOUND 13

Good suffix shift good suffix shift table is |P|  assume that suffix matches but char to the left does not  if that suffix appears to the left preceded by a different char, shift to align  if not, then can shift the entire length of the word 14 IZBIZZBIZBIZIZZ good suffix table for FIZBIZ BIZFIZIBIZFIZBIZ FIZBIZ IZ suffix matches  goodSuffixTable(IZ) = 3 BIZFIZIBIZFIZBIZ FIZBIZ no suffix match  goodSuffixTable() = 1 BIZFIZIBIZFIZBIZ FIZBIZ BIZ suffix matches  goodSuffixTable(BIZ) = 6 BIZFIZIBIZFIZBIZ FIZBIZ FOUND

Boyer-Moore string search algorithm 1.calculate the bad symbol and good suffix shift tables 2.while match not found and not off the edge a)compare pattern with string section b)shift1 = bad symbol shift of rightmost non-matching char c)shift2 = good suffix shift of longest matching suffix d)shift string section for comparison by max(shift1, shift2) 15 the algorithm has been proven to require at most 3*|S| comparisons  so, O(|S|)  in practice, can require fewer than |S| comparisons  requires storing O(|P|) bad symbol shift table and O(|P|) good suffix shift table