Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Chapter 2.5: Dictionaries and Hash Tables  
Hashing CS 3358 Data Structures.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables and Associative Containers CS-212 Dick Steflik.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hash Table March COP 3502, UCF.
IS 2610: Data Structures Searching March 29, 2004.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Chapter 2.5: Dictionaries and Hash Tables  
Hashing (part 2) CSE 2011 Winter March 2018.
Chapter 2.5: Dictionaries and Hash Tables
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Chapter 2.5: Dictionaries and Hash Tables
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
EE 312 Software Design and Implementation I
Collision Handling Collisions occur when different elements are mapped to the same cell.
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Lecture-Hashing.
Presentation transcript:

Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select? Are O(log n) comparisons necessary? (no) Hashing basic plan: create a big array for the items to be stored use a function to figure out storage location from key (hash function) a collision resolution scheme is necessary

Hashing Example Simple Hash function: Treat the key as a large integer K h(K) = K mod M, where M is the table size let M be a prime number. Example: Suppose we have 101 buckets in the hash table. ‘abcd’ in hex is 0x Converted to decimal it’s % 101 = 11 Thus h(‘abcd’) = 11. Store the key at location 11. “dcba” hashes to 57. “abbc” also hashes to 57 – collision. What to do? If you have billions of possible keys and hundreds of buckets, lots of collisions are possible!

Hashing Strings h(‘aVeryLongVariableName’)? Instead of dealing with very large numbers, you can use Horner’s method: 256 * = % 101 = * = % 101 = * = % 101 = 87 Scramble by replacing 256 with 117 int hash(char *v, int M) { int h, a=117; for (h=0; *v; v++) h = (a*h + *v) % M; return h; }

Collisions How likely are collisions? Birthday paradox M sqrt(  M/2) (about 1.25 sqrt(M)) [1.25 sqrt(365) is about 24] Experiment: generate random numbers … Collision at 13 th number, as predicted What to do about collisions?

Separate Chaining Build a linked list for each bucket Linear search within list 0: 1: L A A A 2: M X 3: N C 4: 5: E P E E 6: 7: G R 8: H S 9: I 10: Simple, practical, widely used Cuts search time by a factor of M over sequential search

Separate Chaining 2 Insertion time? O(1) Average search cost, successful search? O(N/2M) Average search cost, unsuccessful? O(N/M) M large: CONSTANT average search time Worst case: N (“probabilistically unlikely”) Keep lists sorted? insert time O(N/2M) unsuccessful search time O(N/2M)

Linear Probing Or, we could keep everything in the same table Insert: upon collision, search for a free spot Search: same (if you find one, fail) Runtime? Still O(1) if table is sparse But: as table fills, clustering occurs Skipping c spots doesn’t help…

Clustering Long clusters tend to get longer Precise analysis difficult Theorem (Knuth): Insert cost: approx. (1 + 1/(1-N/M) 2 )/2 (50% full  2.5 probes; 80% full  13 probes) Search (hit) cost: approx. (1 + 1/(1-N/M))/2 (50% full  1.5 probes; 80% full  3 probes) Search (miss): same as insert Too slow when table gets 70-80% full How to reduce/avoid clustering?

Double Hashing Use a second hash function to compute increment seq. Analysis extremely difficult About like ideal (random probe) Thm (Guibas-Szemeredi): Insert: approx 1+1/(1-N/M) Search hit: ln(1+N/M)/(N/M) Search miss: same as insert Not too slow until the table is about 90% full

Dynamic Hash Tables Suppose you are making a symbol table for a compiler. How big should you make the hash table? If you don’t know in advance how big a table to make, what to do? Could grow the table when it “fills” (e.g. 50% full) Make a new table of twice the size. Make a new hash function Re-hash all of the items in the new table Dispose of the old table

Table Growing Analysis Worst case insertion:  (n), to re-hash all items Can we make any better statements? Average case? O(1), since insertions n through 2n cost O(n) (on average) for insertions and O(2n) (on average) for rehashing  O(n) total (with 3x the constant) Amortized analysis? The result above is actually an amortized result for the rehashing. Any sequence of j insertions into an empty table has O(j) average cost for insertions and O(2j) for rehashing. Or, think of it as billing 3 time units for each insertion, storing 2 in the bank. Withdraw them later for rehashing.

Separate Chaining vs. Double Hashing Assume the same amount of space for keys, links (use pointers for long or variable-length keys) Separate chaining: 1M buckets, 4M keys 4M links in nodes 9M words total; avg search time 2 Double hashing in same space: 4M items, 9M buckets in table average search time: 1/(1-4/9) = 1.8: 10% faster Double hashing in same time 4M items, average search time 2 space needed: 8M words (1/(1-4/8) = 2) (11% less space)

Deletion How to implement delete() with linear chaining? Simply unlink unwanted item Runtime? Same as search() How to implement delete() with linear probing? Can’t just erase it. (Why not?) Re-hash entire cluster Or mark as deleted? How to delete() with double hashing? Re-hashing cluster doesn’t work – which “cluster”? Mark as deleted Every so often re-hash entire table to prune “dead-wood”

Comparisons and summary Separate chaining advantages: idiot-proof (degrades gracefully) no large chunks of memory needed (but is this better?) Why use hashing? constant time search and insert, on average easy to implement Why not use hashing? No performance guarantees Too much arithmetic on long keys – high constant Uses extra space Doesn’t support pred, succ, sort, etc. – no notion of order Where did perl “hashes” get their name?

Hashing Summary Separate chaining: easiest to deploy Linear probing: fastest (but takes more memory) Double hashing: least memory (but takes more time to compute the second hash function) Dynamic (grow): handles any number of inserts Curious use of hashing: early unix spell checker (back in the days of the 3M machines…) Construction Search Miss RB Chain Probe Dbl Grow RB Chain Probe Dbl Grow 5k k k k k