Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.

Slides:



Advertisements
Similar presentations
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Advertisements

Hashing.
Space-for-Time Tradeoffs
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Boyer-Moore Algorithm 3 main ideas –right to left scan –bad character rule –good suffix rule.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Hashing General idea: Get a large array
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
COSC 2007 Data Structures II
1 Pattern Matching Using n-grams With Algebraic Signatures Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas Schwarz[2] [1] Université Paris Dauphine.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Theory of Algorithms: Space and Time Tradeoffs James Gain and Edwin Blake {jgain | Department of Computer Science University of Cape.
1 Pattern Matching Using n-gram Sampling Of Cumulative Algebraic Signatures : Preliminary Results Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas.
Chapter 6 Transform-and-Conquer Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Hashing Hashing is another method for sorting and searching data.
Hash Tables - Motivation
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 UNIT-I BRUTE FORCE ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 3:
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
MA/CSSE 473 Day 25 Student questions Boyer-Moore.
CSC 421: Algorithm Design & Analysis
CSG523/ Desain dan Analisis Algoritma
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
CSC 421: Algorithm Design & Analysis
Hash table CSC317 We have elements with key and satellite data
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Hashing Alexandra Stefan.
Sorting.
String Processing.
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
Chapter 7 Space and Time Tradeoffs
Space-for-time tradeoffs
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Space-for-time tradeoffs
String Processing.
Space-for-time tradeoffs
Space-for-time tradeoffs
Analysis and design of algorithm
MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore
Presentation transcript:

Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing room) b input enhancement non comparison-based sortingnon comparison-based sorting auxiliary tables (shift tables for pattern matching)auxiliary tables (shift tables for pattern matching) b prestructuring hashinghashing indexing schemes (eg, B-trees)indexing schemes (eg, B-trees) b tables of information that do all the work dynamic programmingdynamic programming

Design and Analysis of Algorithms - Chapter 72 Sorting by Counting Algorithm ComparisonCountingSort(A[0..n-1]) for i  0 to n-1 do Count[i]  0 for i  0 to n-2 do for j  i+1 to n-1 do if A[i] <A[j] then Count[j]  Count[j]+1 else Count[i]  Count[i]+1 for i  0 to n-1 do S[Count[i]]  A[i] b Example: b Efficiency

Design and Analysis of Algorithms - Chapter 73 Sorting by Counting (2) Algorithm DistributionCountingSort(A[0..n-1]) for j  0 to u-l do D[j]  0 for i  0 to n-1 do D[A[i]-l]  D[A[i]-l] + 1 for j  1 to u-l do D[j]  D[j-1]+D[j] for i  n-1 down to 0 do j  A[i]-l; S[D[j]-1]  A[i]; D[j]  D[j]-1 b Example: b Efficiency

Design and Analysis of Algorithms - Chapter 74 String matching b pattern: a string of m characters to search for b text: a (long) string of n characters to search in b Brute force algorithm: 1.Align pattern at beginning of text 2.moving from left to right, compare each character of pattern to the corresponding character in text until –all characters are found to match (successful search); or –a mismatch is detected 3.while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2.

Design and Analysis of Algorithms - Chapter 75 String searching - History b 1970: Cook shows (using finite-state machines) that problem can be solved in time proportional to n+m b 1976 Knuth and Pratt find algorithm based on Cook’s idea; Morris independently discovers same algorithm in attempt to avoid “backing up” over text b At about the same time Boyer and Moore find an algorithm that examines only a fraction of the text in most cases (by comparing characters in pattern and text from right to left, instead of left to right) b 1980 Another algorithm proposed by Rabin and Karp virtually always runs in time proportional to n+m and has the advantage of extending easily to two-dimensional pattern matching and being almost as simple as the brute- force method.

Design and Analysis of Algorithms - Chapter 76 Horspool’s Algorithm b A simplified version of Boyer-Moore algorithm that retains key insights: compare pattern characters to text from right to leftcompare pattern characters to text from right to left given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement)given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement)

Design and Analysis of Algorithms - Chapter 77 How far to shift? Look at first (rightmost) character in text that was compared. Three cases: The character is not in the patternThe character is not in the pattern.....c ( c not in pattern).....c ( c not in pattern) BAOBAB BAOBAB The character is in the pattern (but not at rightmost position)The character is in the pattern (but not at rightmost position).....O ( O occurs once in pattern).....O ( O occurs once in pattern) BAOBAB BAOBAB.....A ( A occurs twice in pattern).....A ( A occurs twice in pattern) BAOBAB BAOBAB The rightmost characters produced a matchThe rightmost characters produced a match.....B B BAOBAB BAOBAB b Shift Table: Stores number of characters to shift by depending on first character compared

Design and Analysis of Algorithms - Chapter 78 Shift table b Constructed by scanning pattern before search begins b All entries are initialized to length of pattern.  For c occurring in pattern, update table entry to distance of rightmost occurrence of c from end of pattern b Algorithm ShiftTable(P[0..m-1]) for i  0 to size-1 do Table[i]  m for j  0 to m-2 do Table[P[j]]  m-1-j return Table

Design and Analysis of Algorithms - Chapter 79 Shift table  Example for pattern BAOBAB: b Then: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Design and Analysis of Algorithms - Chapter 710 The Algorithm b Horspool Matching(P[0..m-1,T[0..n-1]]) ShiftTable(P[0..m-1]) i  m-1 while i<=n-1 do k0k0k0k0 while k<=m-1 and P[m-1-k]=T[i-k] do k  k+1 if k=m return i-m+1 else i  i+Table[T[i]] return -1

Design and Analysis of Algorithms - Chapter 711 Boyer-Moore algorithm b Based on same two ideas: compare pattern characters to text from right to leftcompare pattern characters to text from right to left given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement)given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement) b Uses additional shift table with same idea applied to the number of matched characters

Design and Analysis of Algorithms - Chapter 712 The bad-symbol shift b Based on the Horspool idea of using the extra table b However, this table is computed differently If c, the text character corresponding to the last pattern character, is not in the pattern, then shift in the same way by m characters (actually c is a bad symbol)If c, the text character corresponding to the last pattern character, is not in the pattern, then shift in the same way by m characters (actually c is a bad symbol) If the mismatching character (the bad symbol) does not appear in the pattern, then shift to overpass itIf the mismatching character (the bad symbol) does not appear in the pattern, then shift to overpass it If the mismathcing character (the bad symbol) appears in the pattern, then shift to align the bad symbol to the same text character (lying to the left of the mismatching position).If the mismathcing character (the bad symbol) appears in the pattern, then shift to align the bad symbol to the same text character (lying to the left of the mismatching position).

Design and Analysis of Algorithms - Chapter 713 The bad-symbol shift - example The bad symbol IS NOT in the pattern...SER BARBER BARBER shift 4 positions BARBER shift 4 positions The bad symbol IS in the pattern...AER BARBER BARBER shift 2 positions, BARBER shift 2 positions, This shift is given by: d=max[t1(c)-k,1], where t1 is the Horspool table, k the distance between the bad symbol from the end of the pattern

Design and Analysis of Algorithms - Chapter 714 The good-suffix shift - example b What happens if a matched suffix appears again in the pattern (eg. ABRACADABRA) b Important to find another suffix with a different previous character. Calculate the shift as the distance between two occurrences of the suffix. b Also, important to find the longest prefix of size l<k that matches the suffix of size l. Calculate the shift as the distance between the suffix and the prefix.

Design and Analysis of Algorithms - Chapter 715 The good-suffix shift – example (2) Kpatternd2 1 ABCBAB Kpatternd21 BAOBAB

Design and Analysis of Algorithms - Chapter 716 Final rule for Boyer-Moore b Calculate shift as d1if k=0 d = d = max(d1,d2)if k>0 where d1=max(t1(c)-k) b Example BESS_KNEW_ABOUT_BAOBABSBAOBAB {

Design and Analysis of Algorithms - Chapter 717 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert – find – delete b Applications: – databases – symbol tables

Design and Analysis of Algorithms - Chapter 718 Hash tables and hash functions b Hash table: an array with indices that correspond to buckets b Hash function: determines the bucket for each record b Example: student records, key=SSN. Hash function: h(k) = k mod m (k is a key and m is the number of buckets) if m=1000, where is record with SSN= stored? if m=1000, where is record with SSN= stored? b Hash function must: be easy to computebe easy to compute distribute keys evenly throughout the tabledistribute keys evenly throughout the table

Design and Analysis of Algorithms - Chapter 719 Collisions b If h(k1) = h(k2) then there is a collision. b Good hash functions result in fewer collisions. b Collisions can never be completely eliminated. b Two types handle collisions differently: Open hashing - bucket points to linked list of all keys hashing to it. Closed hashing – one key per bucket, in case of collision, find another bucket for one of the keys – –linear probing: use next bucket – –double hashing: use second hash function to compute increment

Design and Analysis of Algorithms - Chapter 720 Open hashing b If hash function distributes keys uniformly, average length of linked list will be n/m b Average number of probes S = 1+α/2, U = α b Worst-case is still linear! b Open hashing still works if n>m.

Design and Analysis of Algorithms - Chapter 721 Closed hashing b Does not work if n>m. b Avoids pointers. b Deletions are not straightforward. b Number of probes to insert/find/delete a key depends on load factor α = n/m (hash table density) – successful search: (½) (1+ 1/(1- α)) – unsuccessful search: (½) (1+ 1/(1- α)²) b As the table gets filled (α approaches 1), number of probes increases dramatically: