Amortized Analysis of Rehashing

Slides:



Advertisements
Similar presentations
CS 473Lecture X1 CS473-Algorithms I Lecture X Dynamic Tables.
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
ArrayLists David Kauchak cs201 Spring Extendable array Arrays store data in sequential locations in memory Elements are accessed via their index.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2010 Lecture 3 Tuesday, 2/9/10 Amortized Analysis.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2005 Lecture 3 Tuesday, 2/8/05 Amortized Analysis.
Tirgul 9 Amortized analysis Graph representation.
UMass Lowell Computer Science Graduate Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Lecture 3 Tuesday, 2/10/09 Amortized Analysis.
Theory I Algorithm Design and Analysis (8 – Dynamic tables) Prof. Th. Ottmann.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
CS333 / Cutler Amortized Analysis 1 Amortized Analysis The average cost of a sequence of n operations on a given Data Structure. Aggregate Analysis Accounting.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
17.Amortized analysis Hsu, Lih-Hsing. Computer Theory Lab. Chapter 17P.2 The time required to perform a sequence of data structure operations in average.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 2 (Part 2) Tuesday, 9/11/01 Amortized Analysis.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Amortized Analysis The problem domains vary widely, so this approach is not tied to any single data structure The goal is to guarantee the average performance.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2004 Simonas Šaltenis E1-215b
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
David Luebke 1 12/12/2015 CS 332: Algorithms Amortized Analysis.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Amortized Analysis. Problem What is the time complexity of n insert operations into an dynamic array, which doubles its size each time it is completely.
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Introduction to Algorithms: Amortized Analysis. Introduction to Algorithms Amortized Analysis Dynamic tables Aggregate method Accounting method Potential.
Amortized Analysis.
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
Hash table CSC317 We have elements with key and satellite data
Hashing - resolving collisions
Hash Tables (Chapter 13) Part 2.
Table Amortized cost: $3 Insert 5 Actual cost: $1.
Presentation by Marty Krogel
Amortized Analysis The problem domains vary widely, so this approach is not tied to any single data structure The goal is to guarantee the average performance.
Chapter 17 Amortized Analysis Lee, Hsiu-Hui
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Instructor: Lilian de Greef Quarter: Summer 2017
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
CSE 326: Data Structures Hashing
Introduction to Algorithms
CS 332: Algorithms Amortized Analysis Continued
Amortized Analysis and Heaps Intro
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hash Tables: Associative Containers with Constant Time Operations --- On Average Consider the problem of computing the frequency of words.
Podcast Ch21b Title: Collision Resolution
Linear Hashing Example
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Amortized Analysis of Rehashing

What is rehashing Hash table too full  spend a lot of time looking in buckets Solution: rehash make hash table twice the size for each item in original hash table, hash to location in bigger table

Assumptions for Thursday, Feb. 10, 2000 Rehash whenever table is 50% full or more Just a sequence of inserts (can be generalized for other operations) We never get a collision (once dealing with collisions, we’re in average-case analysis territory) Hash table starts as size 2

Observations How expensive is an insert? How expensive is a rehash? When will we need to rehash?

Observations How expensive is an insert? How expensive is a rehash? O(1) - assuming no collisions. Say it’s 1. How expensive is a rehash? O(N) - where N is current size of table. Say it’s N. When will we need to rehash? whenever size of table is power of 2 (1,2,4,8,16,…)

Amortized analysis Strategy 1: add up operations Note: I’ve made assumptions about the constants, but the analysis could be done for any constants. Note 2: This is sometimes called the “Aggregate Method”

Amortized analysis Strategy 2: Accounting Method Charge the cost of some operations to other operations Each operation gives us some “tokens”, which we can spend on future operations

Accounting Method analysis Each insert gives us 3 tokens 1 token for that insert 1 token for rehashing this item the first time 1 token for rehashing another item that already got rehashed once or more (since we rehash on double the size, # of items never hashed = # of items already hashed once or more)

What happens tokens added tokens used Insert 1 1A,1B,1C 1A Rehash to 4 1B – rehash 1 Insert 2 2A,2B,2C 2A Rehash to 8 2B – rehash 2 2C – rehash 1 Insert 3 3A,3B,3C 3A Insert 4 4A,4B,4C 4A Rehash to 16 3B – rehash 3 4B – rehash 4 3C – rehash 1 4C – rehash 2

What happens tokens added tokens used Insert 5 5A,5B,5C 5A Insert 6 6A,6B,6C 6A Insert 7 7A,7B,7C 7A Insert 8 8A,8B,8C 8A Rehash to 32 5B – rehash 5 6B – rehash 6 7B – rehash 7 8B – rehash 8 5C – rehash 1 6C – rehash 2 7C – rehash 3 8C – rehash 4

Potential function Potential function: more sophisticated tokens Potential function is a function When operation costs less, potential goes up When operation costs more, take from potential Always positive – or we’re taking too long

Tokens as a potential function potential function at operation #i P(i)= 1 + 2F - S/2 F = # of filled (non-empty) hash table slots S = total # of hash table slots Actual cost of operation of operation #i C(i) Amortized cost of operation #(i+1): CA(i+1)=C(i+1) + ( P(i+1) – P(i) )

The Math Beginning: For insert with no rehashing F=0, S=2. P(i)=0 For insert with no rehashing C(i+1)=1 P(i+1)-P(i)=2 (added non-empty slot) For insert with rehashing of N items C(i+1)=1+N P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N)

Potential method analysis P(i) is always positive. Yes. We rehash when F  1/2S, at which point, 4F=S. Etc… Amortized cost is constant CA(i+1)=C(i+1)+P(i+1)-P(i) = 3, in both cases (previous slide)