Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.

Slides:



Advertisements
Similar presentations
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Advertisements

HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Space-for-Time Tradeoffs
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
TTIT33 Algorithms and Optimization – Lecture 5 Algorithms Jan Maluszynski - HT TTIT33 – Algorithms and optimization Lecture 5 Algorithms ADT Map,
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
B+-tree and Hashing.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Cpt S 223 – Advanced Data Structures Course Review Midterm Exam # 2
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Theory of Algorithms: Space and Time Tradeoffs James Gain and Edwin Blake {jgain | Department of Computer Science University of Cape.
Chapter 6 Transform-and-Conquer Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Hash Tables - Motivation
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
MA/CSSE 473 Day 25 Student questions Boyer-Moore.
CSG523/ Desain dan Analisis Algoritma
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Sorting.
Advanced Associative Structures
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries and Their Implementations
Chapter 7 Space and Time Tradeoffs
Space-for-time tradeoffs
Advanced Implementation of Tables
Database Design and Programming
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Space-for-time tradeoffs
Space-for-time tradeoffs
Space-for-time tradeoffs
Presentation transcript:

Chapter 7 Space and Time Tradeoffs

Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or its part) to store some info to be used later in solving the problem counting sortscounting sorts string searching algorithmsstring searching algorithms b prestructuring — preprocess the input to make accessing its elements easier hashinghashing indexing schemes (e.g., B-trees)indexing schemes (e.g., B-trees)

Sorting by counting

Algorithm uses same number of key comparisons as selection sort and b In addition, uses a linear amount of extra space b It is not recommended for practical use

Review: String searching by brute force pattern: a string of m characters to search for text: a (long) string of n characters to search in Brute force algorithm Step 1Align pattern at beginning of text Step 2Moving from left to right, compare each character of pattern to the corresponding character in text until either all characters are found to match (successful search) or a mismatch is detected Step 3 While a mismatch is detected and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2

String searching by preprocessing Several string searching algorithms are based on the input enhancement idea of preprocessing the pattern b Knuth-Morris-Pratt (KMP) algorithm preprocesses pattern left to right to get useful information for later searching b Boyer -Moore algorithm preprocesses pattern right to left and store information into two tables b Horspool’s algorithm simplifies the Boyer-Moore algorithm by using just one table

Horspool’s Algorithm A simplified version of Boyer-Moore algorithm: preprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurspreprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurs always makes a shift based on the text’s character c aligned with the last character in the pattern according to the shift table’s entry for calways makes a shift based on the text’s character c aligned with the last character in the pattern according to the shift table’s entry for c

How far to shift? Look at first (rightmost) character in text that was compared: b The character is not in the pattern.....c ( c not in pattern).....c ( c not in pattern) BAOBAB BAOBAB b The character is in the pattern (but not the rightmost).....O ( O occurs once in pattern) BAOBAB.....O ( O occurs once in pattern) BAOBAB.....A ( A occurs twice in pattern).....A ( A occurs twice in pattern) BAOBAB BAOBAB b The rightmost characters do match.....B B BAOBAB BAOBAB

Shift table b Shift sizes can be precomputed by the formula distance from c’s rightmost occurrence in pattern among its first m-1 characters to its right end distance from c’s rightmost occurrence in pattern among its first m-1 characters to its right end t(c) = t(c) = pattern’s length m, otherwise pattern’s length m, otherwise by scanning pattern before search begins and stored in a table called shift table by scanning pattern before search begins and stored in a table called shift table  Shift table is indexed by text and pattern alphabet Eg, for BAOBAB: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Example of Horspool’s alg. application BARD LOVED BANANAS BAOBAB BAOBAB BAOBAB BAOBAB (unsuccessful search) BAOBAB (unsuccessful search) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ 6

Boyer-Moore algorithm Based on same two ideas: comparing pattern characters to text from right to leftcomparing pattern characters to text from right to left precomputing shift sizes in two tablesprecomputing shift sizes in two tables –bad-symbol table indicates how much to shift based on text’s character causing a mismatch –good-suffix table indicates how much to shift based on matched part (suffix) of the pattern

Bad-symbol shift in Boyer-Moore algorithm b If the rightmost character of the pattern doesn’t match, BM algorithm acts as Horspool’s b If the rightmost character of the pattern does match, BM compares preceding characters right to left until either all pattern’s characters match or a mismatch on text’s character c is encountered after k > 0 matches textpattern bad-symbol shift d 1 =max{t 1 (c ) - k, 1} bad-symbol shift d 1 = max{t 1 (c ) - k, 1} c k matches

Good-suffix shift in Boyer-Moore algorithm b Good-suffix shift d 2 is applied after 0 < k < m last characters were matched  d 2 (k) = the distance between matched suffix of size k and its rightmost occurrence in the pattern that is not preceded by the same character as the suffix Example: CABABA d 2 (1) = 4  If there is no such occurrence, match the longest part of the k-character suffix with corresponding prefix; if there are no such suffix-prefix matches, d 2 (k) = m Example: WOWWOW d 2 (2) = 5, d 2 (3) = 3, d 2 (4) = 3, d 2 (5) = 3

Boyer-Moore Algorithm After matching successfully 0 < k < m characters, the algorithm shifts the pattern right by d = max {d 1, d 2 } d = max {d 1, d 2 } where d 1 =max{t 1 (c) - k, 1} is bad-symbol shift where d 1 = max{t 1 (c) - k, 1} is bad-symbol shift d 2 (k) is good-suffix shift d 2 (k) is good-suffix shift Example: Find pattern AT _ THAT in WHICH _ FINALLY _ HALTS. _ _ AT _ THAT WHICH _ FINALLY _ HALTS. _ _ AT _ THAT

Boyer-Moore Algorithm (cont.) Step 1 Fill in the bad-symbol shift table Step 2 Fill in the good-suffix shift table Step 3 Align the pattern against the beginning of the text Step 4 Repeat until a matching substring is found or text ends: Compare the corresponding characters right to left. Compare the corresponding characters right to left. If no characters match, retrieve entry t 1 (c) from the bad-symbol table for the text’s character c causing the mismatch and shift the pattern to the right by t 1 (c). If 0 < k < m characters are matched, retrieve entry t 1 (c) from the bad-symbol table for the text’s character c causing the mismatch and entry d 2 (k) from the good- suffix table and shift the pattern to the right by If no characters match, retrieve entry t 1 (c) from the bad-symbol table for the text’s character c causing the mismatch and shift the pattern to the right by t 1 (c). If 0 < k < m characters are matched, retrieve entry t 1 (c) from the bad-symbol table for the text’s character c causing the mismatch and entry d 2 (k) from the good- suffix table and shift the pattern to the right by d = max {d 1, d 2 } where d 1 =max{t 1 (c) - k, 1}. d = max {d 1, d 2 } where d 1 = max{t 1 (c) - k, 1}.

Example of Boyer-Moore alg. application B E S S _ K N E W _ A B O U T _ B A O B A B S B E S S _ K N E W _ A B O U T _ B A O B A B S B A O B A B B A O B A B d 1 = t 1 ( K ) = 6 B A O B A B d 1 = t 1 ( K ) = 6 B A O B A B d 1 = t 1 ( _ )-2 = 4 d 1 = t 1 ( _ )-2 = 4 d 2 (2) = 5 d 2 (2) = 5 B A O B A B B A O B A B d 1 = t 1 ( _ )-1 = 5 d 1 = t 1 ( _ )-1 = 5 d 2 (1) = 2 d 2 (1) = 2 B A O B A B (success) B A O B A B (success) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ 6 kpatternd2d2 1 BAOBAB

Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – find – insert – delete b Based on representation-change and space-for-time tradeoff ideas b Important applications: – symbol tables – databases (extendible hashing)

Hash tables and hash functions The idea of hashing is to map keys of a given file of size n into a table of size m, called the hash table, by using a predefined function, called the hash function, h: K  location (cell) in the hash table h: K  location (cell) in the hash table Example: student records, key = SSN. Hash function: h(K) = K mod m where m is some integer (typically, prime) If m = 1000, where is record with SSN= stored? Generally, a hash function should: be easy to computebe easy to compute distribute keys about evenly throughout the hash tabledistribute keys about evenly throughout the hash table

Collisions If h(K 1 ) = h(K 2 ), there is a collision If h(K 1 ) = h(K 2 ), there is a collision b Good hash functions result in fewer collisions but some collisions should be expected (birthday paradox) b Two principal hashing schemes handle collisions differently : Open hashing – each cell is a header of linked list of all keys hashed to itOpen hashing – each cell is a header of linked list of all keys hashed to it Closed hashingClosed hashing –one key per cell –in case of collision, finds another cell by –linear probing: use next free bucket – double hashing: use second hash function to compute increment

Open hashing (Separate chaining) Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(K) = sum of K ‘s letters’ positions in the alphabet MOD 13 KeyAFOOLANDHISMONEYARESOONPARTED h(K)h(K)h(K)h(K) AFOOLANDHISMONEYAREPARTED SOON Search for KID

Open hashing (cont.)  If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. b Average number of probes in successful, S, and unsuccessful searches, U: S  1+ α /2, U = α S  1+ α /2, U = α  Load α is typically kept small (ideally, about 1) b Open hashing still works if n > m

Closed hashing (Open addressing) Keys are stored inside a hash table. Circular array A AFOOL AANDFOOL AANDFOOLHIS AANDMONEYFOOLHIS AANDMONEYFOOLHISARE AANDMONEYFOOLHISARESOON PARTEDAANDMONEYFOOLHISARESOON KeyAFOOLANDHISMONEYARESOONPARTED h(K)h(K)h(K)h(K)

Closed hashing (cont.) b Does not work if n > m b Avoids pointers b Deletions are not straightforward  Number of probes to find/insert/delete a key depends on load factor α = n/m (hash table density) and collision resolution strategy. For linear probing: S = (½) (1+ 1/(1- α )) and U = (½) (1+ 1/(1- α )²)  As the table gets filled ( α approaches 1), number of probes in linear probing increases dramatically:

7.4 B-Trees b Extends idea of using extra space to facilitate faster access b This is done to access a data set on disk. b Extends the idea of a 2-3 tree b All data records are stored at the leaves in increasing order of the keys b The parental nodes are used for indexing Each parental node contains n-1 ordered keysEach parental node contains n-1 ordered keys The keys are interposed with n pointers to the node’s childrenThe keys are interposed with n pointers to the node’s children All keys in the subtree T 0 are smaller than K 1,All keys in the subtree T 0 are smaller than K 1, All the keys in subtree T 1 are greater than or equal to K 1 and smaller than K 2 with K 1 being equal to the smallest key in T 1All the keys in subtree T 1 are greater than or equal to K 1 and smaller than K 2 with K 1 being equal to the smallest key in T 1 etcetc

B-Trees b This is a n-node b All the nodes in a binary search tree are 2-nodes

Structural properties of B-Tree of order m ≥ 2 b The root is either a leaf or has between 2 and m children b Each node, except for the root and the leaves, has between ceil(m/2) – 1 and m children Hence between ceil(m/2) – 1 and m – 1 keysHence between ceil(m/2) – 1 and m – 1 keys b The tree is (perfectly) balanced All its leaves are at the same levelAll its leaves are at the same level

Searching b Starting at the root b Follow a chain of pointers to the leaf that may contain the search key b Search for the search key among the keys of the leaf Keys are in sorted order – can use binary search if number of keys is largeKeys are in sorted order – can use binary search if number of keys is large b How many nodes to we have to access during a search of a record with a given key?

Insertion is O(log n) b Apply the search procedure to the new record’s key K To find the appropriate leaf for the new recordTo find the appropriate leaf for the new record b If there is enough room in the leaf, place it there In the appropriate sorted key positionIn the appropriate sorted key position b If there is no room for the record The leaf is split in half by sending the second half of records to a new nodeThe leaf is split in half by sending the second half of records to a new node The smallest key in the new node and the pointer to it will have to be inserted in the old leaf’s parentThe smallest key in the new node and the pointer to it will have to be inserted in the old leaf’s parent –Immediately after the key and pointer to the old leaf This recursive procedure may percolate up to the tree’s rootThis recursive procedure may percolate up to the tree’s root –If the root is full, a new root is created –Two halves of the old root’s keys split between two children of the new root

Insertion in action To this