Hash Tables Part II: Using Buckets

Slides:



Advertisements
Similar presentations
CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.
Advertisements

Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Hashing is another method for sorting and searching data.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Fundamental Structures of Computer Science II
Appendix I Hashing.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Hashing.
Hashing & HashMaps CS-2851 Dr. Mark L. Hornick.
TCSS 342, Winter 2006 Lecture Notes
COMP 53 – Week Eleven Hashtables.
Hash table CSC317 We have elements with key and satellite data
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing CENG 351.
Subject Name: File Structures
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash table another data structure for implementing a map or a set
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
CS223 Advanced Data Structures and Algorithms
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hashing II CS2110 Spring 2018.
Hashing CS2110.
Data Structures and Algorithms
Chapter 10 Hashing.
Dictionaries and Their Implementations
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
Hash Tables and Associative Containers
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
EE 312 Software Design and Implementation I
Hash Tables Buckets/Chaining
CS223 Advanced Data Structures and Algorithms
Hash Tables Open Address Hashing
Ch Hash Tables Array or linked list Binary search trees
slides created by Marty Stepp
Collision Handling Collisions occur when different elements are mapped to the same cell.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Podcast Ch21b Title: Collision Resolution
Data Structures and Algorithm Analysis Hashing
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
EE 312 Software Design and Implementation I
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hash Tables Part II: Using Buckets CS 261 – Data Structures Hash Tables Part II: Using Buckets

Hash Tables, Review Hash tables are similar to Vectors except… Elements can be indexed by values other than integers A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: Transform the value (or key) to an integer Map that integer to a valid hash table index Example: storing names Compute an integer from a name Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John 0 Angie, Robert Hash Function 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: Open address hashing: if a spot is full, probe for next empty spot Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: Chaining/buckets: maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry 0 Robert Angie 1 Linda 2 Max John Joe 3 4 Mark Abigail

Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 }

Contains test, remove Contains: Find correct bucket, then see if the element is there Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

Hash Table Size Load factor: l = n / m So, load factor represents the average number of elements at each table entry Want the load factor to remain small Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size # of elements Load factor Size of table

Hash Tables: Algorithmic Complexity Assumptions: Time to compute hash function is constant Chaining uses a linked list Worst case analysis  All values hash to same position Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: Worst case for open addressing  O( ) Worst case for chaining  O( ) Best case for open addressing  O( ) Best case for chaining  O( ) n n  O(log n) if use AVL tree 1 1

Hash Tables: Average Case Assuming that the hash function distributes elements uniformly (a BIG if) Then the average case for all operations is O() So you want to try and keep the load factor relatively small. You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 But that only improves things IF the hash function distributes values uniformly. What happens if hash value is always zero?

So when should you use hash tables? Your data values must be objects with good hash functions defined (string, Double) Or you need to write your own definition of hashCode Need to know that the values are uniformly distributed If you can’t guarantee that, then a skip list or AVL tree is often faster

Your turn Now do the worksheet to implement hash table with buckets Run down linked list for contains test Think about how to do remove Keep track of number of elements Resize table if load factor is bigger than 3.0 Questions??