Tirgul 9 Hash Tables (continued) Reminder Examples.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Hashing Uri Zwick January 2014.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Introduction to Algorithms 6.046J/18.401J
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Presentation transcript:

Tirgul 9 Hash Tables (continued) Reminder Examples

Hash Table In a hash table, we allocate an array of size m, which is much smaller than |U| (the set of keys). We use a hash function h() to determine the entry of each key. The crucial point: the hash function should “ spread ” the keys of U equally among all the entries of the array. The division method: If we have a table of size m, we can use the hash function

How to choose hash functions The crucial point: the hash function should “ spread ” the keys of U equally among all the entries of the array. Unfortunately, since we don ’ t know in advance the keys that we ’ ll get from U, this can be done only approximately. Remark: the hash functions usually assume that the keys are numbers. We ’ ll discuss next class what to do if the keys are not numbers.

The division method A good choice example: if we have |U|=2000, and we want each search to take (on average) 3 operations, we can choose the closest primal number to 2000/3, m= , , …

The multiplication method The disadvantage of the division method hash function is: It depends on the size of the table. The way we choose m affect the performance of the hash function. The multiplication method hash function does not depend on m as much as the division method hash function.

The multiplication method The multiplication method: Multiply a constant 0<A<1 with k. The fractional part of kA is taken, and multiplied by m. Formally, The multiplication method does not depends as much on m since A helps randomizing the hash function. In this method the are better choices for A of course …

The multiplication method A bad choice of A, example: if m = 100 and A=1/3, then for k=10, h(k)=33, for k=11, h(k)=66, And for k=12, h(k)=99. This is not a good choice of A, since we ’ ll have only three values of h(k)... The optimal choice of A depends on the keys themselves. Knuth claims that is likely to be a good choice.

The multiplication method A good choice of A, example: if m = 1000 and, then for k=61, h(k)=700, for k=62, h(k)=318, For k=63, h(k)=936 And for k=64, h(k)=554.

What if keys are not numbers? The hash functions we showed only work for numbers. When keys are not numbers,we should first convert them to numbers. A string can be treated as a number in base 256. Each character is a digit between 0 and 255. The string “ key ” will be translated to

Translating long strings to numbers The disadvantage of the method is: A long string creates a large number. Strings longer than 4 characters would exceed the capacity of a 32 bit integer. We can write the integer value of “ word ” as (((w* o)*256 + r)*256 + d) When using the division method the following facts can be used: (a+b) mod n = ((a mod n)+b) mod n (a*b) mod n = ((a mod n)*b) mod n.

Translating long strings to numbers The expression we reach is: ((((((w*256+o)mod m)*256)+r)mod m)*256+d)mod m Using the properties of mod, we get the simple alg.: int hash(String s, int m) int h=s[0] for ( i=1 ; i<s.length ; i++) h = ((h*256) + s[i])) mod m return h Notice that h is always smaller than m. This will also improve the performance of the algorithm.

Collisions What happens when several keys have the same entry? clearly it might happen, since U is much larger than m. Collision. Collisions are more likely to happen when the hash table is almost full. We define the “ load factor ” as Where n is the number of keys in the hash table, And m is the size of the table.

Chaining There are two approaches to handle collisions: Chaining. Open Addressing. Chaining: Each entry in the table is a linked list. The linked list holds all the keys that are mapped to this entry. Search operation on a hash table which applies chaining takes time.

Chaining This complexity is calculated under the assumption of uniform hashing. Notice that in the chaining method, the load factor may be greater than one.

Open addressing In this method, the table itself holds all the keys. We change the hash function to receive two parameters: The first is the key. The second is the probe number. We first try to locate h(k,0) in the table. If it fails we try to locate h(k,1) in the table, and so on.

Open addressing It is required that {h(k, 0),...,h(k,m-1)} will be a permutation of {0,..,m-1}. After m-1 probes we ’ ll definitely find a place to locate k (unless the table is full). Notice that here, the load factor must be smaller than one. There is a problem with deleting keys. What is it?

Open addressing While searching key i and reaching an empty slot, we don ’ t know if: The key i doesn ’ t exist in the table. Or, key i does exist in the table but at the time key i was inserted this slot was occupied, and we should continue our search. We will discuss two ways to implement open addressing: linear probing double hashing

Open addressing Linear probing - h(k,i)=(h(k)+i) mod m The problem: primary clustering. If several consecutive slots are occupied, the next free slot has high probability of being occupied. Search time increases when large clusters are created. The reason for the primary clustering stems from the fact that there are only m different probe sequences.

Open addressing Double hashing – h(k,i)=(h 1 (k)+ih 2 (k)) mod m Better than linear probing. The problem can not have a common divisor with m (besides 1). different probe sequences!

Performance (without proofs) Insertion and unsuccessful search of an element into an open-address hash table requires probes on average. A successful search: the average number of probes is For example: If the table is 50% full then a search will take about 1.4 probes on average. If the table 90% full then the search will take about 2.6 probes on average.

Example for Open Addressing A computer science geek goes to a sibyl. She ask him to scramble the Tarot cards. The geek does not trust the sibyl and he decides to apply open addressing as scrambling technique. The card numbers: 10, 22, 31, 4, 15, 28, 17, 88. He tries Linear probing with m=11 and h1(k)=k mod m. He gets primary clustering which known to be bad luck …

Example for Open Addressing Just before the sibyl looses her patience he tries double hashing with m=11, h2(k)=1+(k mod (m-1)), and h1(k)=k mod m.

When should hash tables be used Hash tables are very useful for implementing dictionaries if we don ’ t have an order on the elements, or we have order but we need only the standard operations. On the other hand, hash tables are less useful if we have order and we need more than just the standard operations. For example, last(), or iterator over all elements, which is problematic if the load factor is very low.

When should hash tables be used We should have a good estimate of the number of elements we need to store For example, the huji has about 30,000 students each year, but still it is a dynamic d.b. Re-hashing: If we don ’ t know a-priori the number of elements, we might need to perform re-hashing, increasing the size of the table and re-assigning all elements.