Theory I Algorithm Design and Analysis (7 Hashing: Open Addressing)

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
Hashing CS 3358 Data Structures.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Theory I Algorithm Design and Analysis (6 Hashing: Chaining) Prof. Th. Ottmann.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Spring 2015 Lecture 6: Hash Tables
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Fundamental Structures of Computer Science II
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Week 8 - Wednesday CS221.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Advanced Associative Structures
Hash Table.
Resolving collisions: Open addressing
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
A Hash Table with Chaining
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Collision Handling Collisions occur when different elements are mapped to the same cell.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Theory I Algorithm Design and Analysis (7 Hashing: Open Addressing) Prof. Th. Ottmann

Hashing: General Framework Set of keys S hash function h Univer- se U of all possible keys 0, …, m-1 hash table T h(s) = hash address h(s) = h(s´) s and s´ are synonyms with respect to h address collision

Possible ways of treating collisions Treatment of collisions: Collisions are treated differently in different methods. A data set with key s is called a colliding element if bucket Bh(s) is already taken by another data set. What can we do with colliding elements? 1. Chaining: Implement the buckets as linked lists. Colliding elements are stored in these lists. 2. Open Addressing: Colliding elements are stored in other vacant buckets. During storage and lookup, these are found through so-called probing.

Hashing by chaining Keys are stored in overflow lists h(k) = k mod 7 0 1 2 3 4 5 6 hash table T pointer 53 12 15 2 43 5 colliding elements 19 This type of chaining is also known as direct chaining.

Open addressing Idea: Store colliding elements in vacant (“open”) buckets of the hash table If T[h(k)] is taken, find a different bucket for k according to a fixed rule Example: Consider the bucket with the next smaller index: (h(k) - 1) mod m General: Consider the sequence (h(k) - j) mod m j = 0, …, m-1 0 1 h(k) m-2 m-1 … ..... .….

Probe sequences Even more general: Consider the probe sequence (h(k) – s(j,k)) mod m j = 0, ..., m-1, for a given function s(j,k) Examples for the function s(j,k) = j (linear probing) s(j,k) = (-1)j * (quadratic probing) s(j,k) = j * h´(k) (double hashing)

Probe sequences Properties of s(j,k) Sequence (h(k) – s(0,k)) mod m, (h(k) – s(1,k)) mod m, (h(k) – s(m-2,k)) mod m, (h(k) – s(m-1,k)) mod m should result in a permutation of 0, ..., m-1. Example: Quadratic probing Critical: Deletion of data sets  mark as deleted (Insert 4, 18, 25; delete 4; lookup 18, 25) 0 1 2 3 4 5 6 h(11) = 4 s(j,k) = -1,1,-4,4,-9,9

Open addressing class OpenHashTable extends HashTable { // in HashTable: TableEntry [] T; private int [] tag; static final int EMPTY = 0; static final int OCCUPIED = 1; static final int DELETED = 2; // Constructor OpenHashTable (int capacity) { super(capacity); tag = new int [capacity]; for (int i = 0; i < capacity; i++) { tag[i] = EMPTY; } } // The hash function protected int h (Object key) {...} // Function s for probe sequence protected int s (int j, Object key) { // quadratic probing if (j % 2 == 0) return ((j + 1) / 2) * ((j + 1) / 2); else return -((j + 1) / 2) * ((j + 1) / 2); }

Open addressing – lookup public int searchIndex (Object key) { /* searches for an entry with the given key in the hash table and returns the respective index or -1 */ int i = h(key); int j = 1; // next index of probing sequence while (tag[i] != EMPTY &&!key.equals(T[i].key)){ // Next entry in probing sequence i = (h(key) - s(j++, key)) % capacity; if (i < 0) i = i + capacity; } if (key.equals(T[i].key) && tag[i] == OCCUPIED) return i; else return -1; } public Object search (Object key) { /* searches for an entry with the given key in the hash table and returns the respective value or NULL */ int i = searchIndex (key); if (i >= 0) return T[i].value; else return null; }

Open addressing – insert public void insert (Object key, Object value) { // inserts an entry with the given key and value int j = 1; // next index of probing sequence int i = h(key); while (tag[i] == OCCUPIED) { i = (h(key) - s(j++, key)) % capacity; if (i < 0) i = i + capacity; } T[i] = new TableEntry(key, value); tag[i] = OCCUPIED; }

Open addressing – delete public void delete (Object key) { // deletes entry with given key from the hash table int i = searchIndex(key); if (i >= 0) { // Successful search tag[i] = DELETED; } }

Test program public class OpenHashingTest { public static void main(String args[]) { Integer[] t= new Integer[args.length]; for (int i = 0; i < args.length; i++) t[i] = Integer.valueOf(args[i]); OpenHashTable h = new OpenHashTable (7); for (int i = 0; i <= t.length - 1; i++) { h.insert(t[i], null);# h.printTable (); } h.delete(t[0]); h.delete(t[1]); h.delete(t[6]); h.printTable(); } } Call: java OpenHashingTest 12 53 5 15 2 19 43 Output (quadratic probing): [ ] [ ] [ ] [ ] [ ] (12) [ ] [ ] [ ] [ ] [ ] (53) (12) [ ] [ ] [ ] [ ] [ ] (53) (12) (5) [ ] (15) [ ] [ ] (53) (12) (5) [ ] (15) (2) [ ] (53) (12) (5) (19) (15) (2) [ ] (53) (12) (5) (19) (15) (2) (43) (53) (12) (5) (19) (15) (2) {43} {53} {12} (5)

Probe sequences – linear probing s(j,k) = j Probe sequence for k: h(k), h(k)-1, ..., 0, m-1, ..., h(k)+1, Problem: “primary clustering” Pr (next object ends at position 2) = 4/7 Pr (next object ends at position 1) = 1/7 Long chains are extended with higher probability than short ones. 0 1 2 3 4 5 6 5 53 12

Efficiency of linear probing Successful search: Failed search: Cn (successful) C´n (failed) 0.50 1.5 2.5 0.90 5.5 50.5 0.95 10.5 200.5 1.00 - - Efficiency of linear probing decreases drastically as soon as the load factor gets close to the value 1.

Quadratic probing s(j,k) = Probe sequence for k: h(k), h(k)+1, h(k)-1, h(k)+4, ... Permutation, if m = 4l + 3 is prime. Problem: secondary clustering, i.e. two synonyms k and k´ always run through the same probe sequence.

Efficiency of quadratic probing Successful search: Failed search: Cn (successful) C´n(failed) 0.50 1.44 2.19 0.90 2.85 11.40 0.95 3.52 22.05 1.00 - -

Uniform probing s(j,k) = πk(j) πk one of the m! permutations of {0, ..., m-1} - only depends on k - same probability for each permutation Cn (successful) C´n(failed) 0.50 1.39 2 0.90 2.56 10 0.95 3.15 20 1.00 - -

Random probing Realization of uniform probing is very complex. Alternative: Random probing s(j,k) = random number depending on k s(j,k) = s(j´,k) possible, but improbable

Double hashing Idea: Choose another hash function h´ s(j,k) = j · h´(k) Probe sequence for k: h(k), h(k)-h´(k), h(k)-2h´(k), ... Requirement: Probing sequence must correspond to a permutation of the hash addresses. Hence: h´(k) ≠ 0 and h´(k) no factor of m, i.e. h´(k) does not divide m. Example: h´(k) = 1 + (k mod (m-2))

Example Hash functions: h(k) = k mod 7 h´(k) = 1 + k mod 5 Insert sequence: 15, 22, 1, 29, 26 In this example we can do with a single probing step almost every time. Double hashing is as efficient as uniform probing. Double hashing is simpler to implement. 0 1 2 3 4 5 6 15 h´(22) = 3 0 1 2 3 4 5 6 15 22 h´(1) = 2 0 1 2 3 4 5 6 15 22 1 h´(29) = 5 0 1 2 3 4 5 6 15 29 22 1 h´(26) = 2

Improving successful search – motivation Hash table of size 11; double hashing with h(k) = k mod 11 and h´(k) = 1 + (k mod (11 – 2)) = 1 + (k mod 9) Already inserted: 22, 10, 37, 47, 17 Yet to be inserted: 6 and 30 h(6) = 6, h´(6) = 1 + 6 = 7 h(30) = 8, h´(30) = 1 + 3 = 4 0 1 2 3 4 5 6 7 8 9 10 22 47 37 17 10 0 1 2 3 4 5 6 7 8 9 10 22 47 37 6 17 10

Improving successful search In general: Insert: - k collides with kold in T[i], i.e. i = h(k) - s(j,k) = h(kold) - s(j´,kold) - kold is already stored in T[i] Idea: Find a vacant bucket for k or kold Two options: (O1) kold remains in T[i] consider new position h(k) - s(j+1,k) for k (O2) k replaces kold consider new position h(kold) - s(j´+1, kold) for kold if (O1) or (O2) finds a vacant bucket then insert the respective key done else follow (O1) or (O2) further

Improving successful search Brent’s method: only follow (O1) k collides with k´ k gives way k´ gives way k collides with k´´ done k gives way k´´ gives ways k collides with k´´´ done k gives way k´´´ gives way k collides with k´´´´ done Binary tree probing: follow (O1) and (O2)

Improving successful search Problem: kold replaced by k  next position in probe sequence for kold? Giving way is simple for kold if: s(j, kold) - s(j -1, kold) = s(1,kold) for all 1 ≤ j ≤ m -1. This is, e.g., true for linear probing and double hashing.

Example Hash functions: h(k) = k mod 7 h´(k) = 1 + k mod 5 Insert sequence: 12, 53, 5, 15, 2, 19 h(5) = 5 occupied by k´= 12 Consider: h´(5) = 1  h(5) -1 · h´(5)  5 pushes 12 from its bucket 0 1 2 3 4 5 6 53 12

Improving successful search Lookup k: k´>k in probe sequence  lookup failed Insert: smaller keys push away greater keys Invariant: All keys in the probe sequence before k are smaller than k (but not necessarily in ascending order) Problems: The “pushing“ process may trigger a “chain reaction” k´ pushed away by k: position of k´ in probe sequence? Required: s(j,k) - s(j -1,k) = s(1,k), 1 ≤ j ≤ m

Ordered hashing Lookup Input: key k Output: Information about data set with key k, or null Begin at i  h(k) while T[i] not empty and T[i] .k < k do i  (i – s(1,k)) mod m end while; if T[i] occupied and T[i] .k = k then search successful else search failed

Ordered hashing Insert Input: key k  Begin at i  h(k) while T[i] not empty and T[i] .k ≠ k do if k < T[i].k then if T[i] is removed then exit while-loop else // k pushes away T[i].k swap T[i].k with k i = (i – s(1,k)) mod m end while; if T[i] is not occupied then insert k in T[i]