Implementation of Linear Probing (continued) Helping method for locating index: private int findIndex(long key) // return -1 if the item with key 'key'

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Log Files. O(n) Data Structure Exercises 16.1.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Collision Resolution: Open Addressing
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Hash Tables1 Part E Hash Tables  
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
COSC 2007 Data Structures II
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions Kevin Quinn Fall 2015.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Data Structures Using C++ 2E
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Hashing CSE 2011 Winter July 2018.
Data Structures Using C++ 2E
Quadratic probing Double hashing Removal and open addressing Chaining
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Resolving collisions: Open addressing
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Ch Hash Tables Array or linked list Binary search trees
Collision Handling Collisions occur when different elements are mapped to the same cell.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Implementation of Linear Probing (continued) Helping method for locating index: private int findIndex(long key) // return -1 if the item with key 'key' was not found { int index = h(key); int probe = index; int k = 1; // probe number do { if (table[probe]==null) { // probe sequence has ended break; } if (table[probe].getKey()==key) return probe; probe = (index + step(k)) % table.length; // check next slot k++; } while (probe!=index); return -1; // not found }

Implementation of Linear Probing (continued) Find and Deleting the item: public T find(long key) { int index = findIndex(key); if (index>=0) return (T) table[index]; else return null; // not found } public T delete(long key) { int index = findIndex(key); if (index>=0) { T item = (T) table[index]; table[index] = AVAILABLE; // mark available return item; } else return null; // not found }

Open addressing: Quadratic Probing Designed to prevent primary clustering. It does this by increasing the step by increasingly large amounts as more probes are required to insert a record. This prevents clusters from building up. In quadratic probing the step is equal to the square of the probe number. With linear probing the step values for a sequence of probes would be {1, 2, 3, 4, etc}. For quadratic probing the step values would be {1, 2 2, 3 2, 4 2, etc}, i.e. {1, 4, 9, 16, etc}. Disadvantage of this method: After a number of probes the sequence of steps repeats itself (remember that the step will be probe number 2 mod the size of the hash table). This repetition occurs when the probe number is roughly half the size of the hash table. Secondary clustering.

Open addressing: Quadratic Probing Disadvantage of this method: After a number of probes the sequence of steps repeats itself. => It fails to insert a new item even if there is still a space in the array. Secondary clustering: the sequence of probe steps is the same for any insertion. Secondary clustering refers to the increase in the probe length (that is the number of probes required to find a record) for records where collisions have occurred (the keys are mapped to the same value). Note that this is not as large a problem as primary clustering. However, it is important to realize that in practice these two issues are not significant, given a large hash table and a good hash function it is extremely unlikely that these issues will affect the performance of the hash table, unless it becomes nearly full.

Figure: Quadratic probing with h(x) = x mod 101

Implementation It’s enough to modify the step helping method: private int step(int k) // step function { return k*k // quadratic probing }

Open addressing: Double Hashing Double hashing aims to avoid both primary and secondary clustering and is guaranteed to find a free element in a hash table as long as the table is not full. It achieves these goals by calculating the step value using a second hash function h’. step(k) = k.h’(key) This new hash function h’ should: be different from the original hash function (remember that it was the original hash function that resulted in the collision in the first place) and, not result in zero (as original index + 0 = original index)

Open addressing: Double Hashing The second hash function is usually chosen as follows: h’(key) = q – (key%q), where q is a prime number q<N (N is the size of the array). Remark: It is important that the size of the hash table is a prime number if double hashing is to be used. This guarantees that successive probes will (eventually) try every index in the hash table before an index is repeated (which would indicate that the hash table is full). For other hashings (and for q) we want to use prime numbers to eliminate existing patterns in the data.

Figure: Double hashing during the insertion of 58, 14, and 91

Double Hashing – Implementation public class DoubleHashTable implements HashTableInterface { private KeyedItem[] table; // special values: null = EMPTY, T with key=0 = AVAILABLE private static KeyedItem AVAILABLE = new KeyedItem(0); private int q; // should be a prime number public DoubleHashTable(int size,int q) // size: should be a prime number; // recommended roughly twice bigger // than the expected number of elements // q: recommended to be a prime number, should be smaller than size { table = new KeyedItem[size]; this.q=q; }

Double Hashing – Implementation private int h(long key) // hash function // return index { return (int)(key % table.length); // typecast to int } private int hh(long key) // second hash function // return step multiplicative constant { return (int)(q - key%q); } private int step(int k,long key) // step function { return k*hh(key); }

Double Hashing – Implementation Call step(k,item.getKey()) instead of step(k) public void insert(T item) throws HashTableFullException { int index = h(item.getKey()); int probe = index; int k = 1; do { if (table[probe]==null || table[probe]==AVAILABLE) { // this slot is available table[probe] = item; return; } probe = (index + step(k,item.getKey())) % table.length; // check next slot k++; } while (probe!=index); throw new HashTableFullException("Hash table is full."); }

Open addressing performance The performance of a hash table depends on the load factor of the table. The load factor α is the ratio of the number of data items to the size of the array. Of the three types of open addressing double hashing gives the best performance. Overall, open addressing works very well up to load factors of around 0.5 (when 2 probes on average are required to find a record). For load factors greater than 0.6 performance declines dramatically.

Rehashing If the load factor goes over the safe limit, we should increase the size of the hash table (as for dynamic arrays). This process is called rehashing. Comments: we cannot just double the size of the table, as the size should be a prime number; it will change the main hash function it’s not enough to just copy items Rehashing will take time O(N)

Dealing with Collisions (2 nd approach): Separate Chaining In separate chaining the hash table consists of an array of lists. When a collision occurs the new record is added to the list. Deletion is straightforward as the record can simply be removed from the list. Finally, separate chaining is less sensitive to load factor and it is normal to aim for a load factor of around 1 (but it will work also for load factors over 1).

Figure: Separate chaining (using linked lists). If array-based implementation of list is used: they are called buckets.