DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Copyright © Curt Hill Hashing A quick lookup strategy.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hash Tables (Chapter 13) Part 2.
Hashing CENG 351.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Collision Resolution Neil Tang 02/18/2010
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Database Design and Programming
EE 312 Software Design and Implementation I
Collision Resolution Neil Tang 02/21/2008
Hashing.
DATA STRUCTURES-COLLISION TECHNIQUES
EE 312 Software Design and Implementation I
CSE 326: Data Structures Lecture #14
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Collision Resolution: Open Addressing Extendible Hashing
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric Hashing

DS.H.2 Why should we consider another indexing technique when B-trees are so great? 1.To avoid the multi-level index structure on disk. 2.To get a small constant search time for equality queries to databases. 3.To provide another structure that is useful for internal memory look-up tables.

DS.H.3 HASHING Given: 1. a relatively large block of storage called the hash table 2. an attribute or key Possible Goals: 1. Insert: store the key and its value in the table. 2. Find: find the key in the table and return its value. 3. Delete: remove the key and its value from the table.

DS.H.4 Hash Function: A hash function maps keys to ‘random’ addresses within the hash table. Example: Let the hash table be N locations long. Suppose the keys are integers or can somehow be converted to integers. f(key) = key % N /* key modulo N */ is the most common, simple hash function. NOTE: in hashing, the potential number of possible keys is much greater than the number of keys in use at any given time.

DS.H.5 Example: Let N = 50 and suppose there are 300 possible keys, but we don’t expect to have more than 50 in the table at once. f(key) = key % 50 Key Table Address collision There are many different ways to resolve collisions.

DS.H.6 First, a few more hash functions Numeric Keys: h1(key) = key % N h2(key) = extract(P, Q, key*C) /* Start at bit position P of key*C and extract Q bits to generate addresses in the correct range */ h3(key) = irand(N, key) /* Feed the key as a seed to a random number generator and normalize to an integer between 0 and N-1 */

DS.H.7 Character String Keys: h4(key) = char_sum(key) % N /* Add together the byte representations of each of the characters and normalize to table size */ h5(key) = extract(P, Q, char_product(key)) /* Multiply together the byte representations of each of the characters and extract Q bits starting at bit P to generate addresses in the correct range */ h6(key) =  key[keysize – i –1] * 32 modulo table size (book’s Fig 5.5 is wrong) i = 0 keysize - 1 i

DS.H.8 Separate Chaining Separate chaining is a collision strategy that uses linked lists to solve the collision problem. A “bin” of the hash table merely points to a linked list that hold all keys that hash to that bin Bin 25 holds 3 different keys. Separate chaining is very flexible, makes insertion and deletion easy, but it requires pointers, which isn’t so good on disk.

DS.H.9 Open Addressing Open addressing uses one big contiguous hash table. When there is a collision, it tries alternate locations, using the function: hi(x) = ( hash(x) + F(i) ) % N It first tries h0(x), then h1(x), etc. until it finds the key in the table or comes to an empty cell to put it in h0(x) = 25 h1(x) = 30 h2(x) = 35 So how do we define F(i)?

DS.H.10 There are several common ways: 1.Linear Probing /* F(i) is a linear function of i */ F(i) = i is most common. It tries the next position for each probe. h(0) = hash(x) h(1) = hash(x) + 1 h(2) = hash(x) + 2 This is simple, but has the problem of primary clustering. Clusters develop in the table and most keys lead to some search.

DS.H Quadratic Probing /* F(i) is a quadratic function of i */ F(i) = i This spreads out the probes more. h(0) = hash(x) h(1) = hash(x) + 1 h(2) = hash(x) + 4 This works better, but some secondary clustering effects have been reported. 2 Theorem: If quadratic probing is used and the table size is prime, then a new element can always be inserted if the table is at least half empty. (proof by contradiction is readable)

DS.H Double hashing /* Use a second hash function */ F(i) = i * hash2(x) This spreads out the probes more. h(0) = hash(x) h(1) = hash(x) + 1 * hash2(x) h(2) = hash(x) + 2 * hash2(x) Variant: use a sequence of hash functions h(i) = x % N i+1 Disadvantage of Open Addressing: deletion is difficult. Why?

DS.H.13 Rehashing /* If the table is getting too full, rebuild it with twice the space. Do this infrequently and at night. */ 70 Louis Smith 45 John Smith 52 Kate Green 97 Ray Finch 94 Craig Mir Louis Smith 52 Kate Green 94 Craig Mir 45 John Smith 97 Ray Finch Why is 45 in bin 1 when 45 % 5 = 0?

DS.H.14 Complexity Analysis Let N be the number of entries in the table at the current time. Let T be the table size. Let = N/T be the load factor. Chaining: N can be larger or smaller than T. e.g. We can have 10 lists of 5 elements each. 1. What is the longest any list can be? 2. What is the shortest any list can be? 3. What is the average length of a list?

DS.H.15 Complexity Analysis N entries, table size T, =N/T Chaining: 1. What is the longest any list can be? N 2. What is the shortest any list can be? 0 3.What is the average length of a list? When =1, N = T, so average 1 element / list. When =2, N = 2T, so average 2 elements / list. General case: the average list has elements. Search time: T(N,T) = c1 + c2( /2) = O( ) = O(N/T) Insertion time: O(1) why? hash search a list

DS.H.16 Open Addressing: N  T, << 1 (full is BAD) Linear Probing: What is the average number of cells probed in a successful search? = percentage of full cells. (1- ) is the percentage of empty cells. 1/(1- ) is the number of cells searched before an empty one is found. Maximum probes is 1 + 1/(1- ). Average probes is ½(1 + 1/(1- ) ).

DS.H.17 Linear Probing: = N/T = percentage of full cells. Successful Search: ½(1 + 1/(1- ) ) Insert or Unsuccessful Search: ½(1 + 1/(1- ) ) Examples: =.25  1 =.99  50 Examples: =.1  1 =.5  3 =.75  9 =.99  5000

DS.H.18 Extendible Hashing Extendible hashing is a fast access method for dynamic files. For data on disk, we don’t want to chase pointers. Suppose that m (key, data) pairs fit in one disk block and that the hash function returns a bit string. - Keep a directory that is organized according to the leading D bits of the hash value. - D changes dynamically as the table grows. - Use only enough bits to distinguish blocks.

DS.H Suppose we add key It belongs in bucket 0, which is full. So we split it into buckets 00 and 01. D=1 (bit) D=2

DS.H.20 In extendible hashing, the index grows as needed. Complexity: N: entries M: block size Expected number of leaves: (N/M) log e Expected directory size: O(N / M) The bigger M is the better /M

DS.H.21 Hashing Applications in compilers: to store and access identifiers in databases: for fast equality queries in image analysis for storing large structures Region Adjacency Graph Construction Geometric Hashing Large number of regions with only a small percentage active at one time. Large number of (object, transform) pairs requiring lots of quick lookups.