CS 261 – Data Structures Hash Tables Part II: Using Buckets.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Hash Tables and Associative Containers CS-212 Dick Steflik.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
COSC 2007 Data Structures II
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS261 Data Structures Hash Tables Concepts. Goals Hash Functions Dealing with Collisions.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
CS 206 Introduction to Computer Science II 11 / 16 / 2009 Instructor: Michael Eckmann.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CS261 Data Structures Hash Tables Open Address Hashing.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
CS Fall 2009 Linked List Introduction. Characteristics of Linked Lists Elements are held in objects termed Links Links are 1-1 with elements, allocated.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
Hash Tables Part II: Using Buckets
Data Structures and Algorithms
Data Structures and Algorithms
Hash Tables and Associative Containers
Hash Tables Buckets/Chaining
Hash Tables Open Address Hashing
Presentation transcript:

CS 261 – Data Structures Hash Tables Part II: Using Buckets

Hash Tables, Review Hash tables are similar to Vectors except… –Elements can be indexed by values other than integers –A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: 1.Transform the value (or key) to an integer 2.Map that integer to a valid hash table index Example: storing names –Compute an integer from a name –Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John Hash Function 0 Angie, Robert 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: 1.Open address hashing: if a spot is full, probe for next empty spot 2.Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

0 Robert 1 2 MaxJohn 3 4 Mark Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: –Chaining/buckets: maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry Angie Linda Joe Abigail

Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 }

Contains test, remove Contains: Find correct bucket, then see if the element is there Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

Remove - need to change previous Since we have only single links, remove is tricky Solutions: use double links (too much work), or use previous pointers We have seen this before. Keep a pointer to the previous node, trail after current node Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it } prev = current; }

Two cases, prev is null or not Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it if (prev == 0) ht->table[indx] = current->next; else prev->next = current->next; free (current); ht->size--; return; } prev = current; }

Hash Table Size Load factor: = n / m –So, load factor represents the average number of elements at each table entry Want the load factor to remain small Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size Load factor # of elements Size of table

Hash Tables: Algorithmic Complexity Assumptions: –Time to compute hash function is constant –Chaining uses a linked list –Worst case analysis  All values hash to same position –Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: –Worst case for open addressing  O( ) –Worst case for chaining  O( ) –Best case for open addressing  O( ) –Best case for chaining  O( ) n 1 1 n  O(log n) if use AVL tree

Hash Tables: Average Case Assuming that the hash function distributes elements uniformly (a BIG if) Then the average case for all operations is O( ) So you want to try and keep the load factor relatively small. You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 But that only improves things IF the hash function distributes values uniformly. What happens if hash value is always zero?

So when should you use hash tables? Your data values must be objects with good hash functions defined (string, Double) Or you need to write your own definition of hashCode Need to know that the values are uniformly distributed If you can’t guarantee that, then a skip list or AVL tree is often faster

Your turn Now do the worksheet to implement hash table with buckets Run down linked list for contains test Think about how to do remove Keep track of number of elements Resize table if load factor is bigger than 3.0 Questions??