Hash table another data structure for implementing a map or a set

Slides:



Advertisements
Similar presentations
Hashing as a Dictionary Implementation
Advertisements

What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Maps, Dictionaries, Hashtables
Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Data Structures Using C++1 Chapter 9 Search Algorithms.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Data Structures Using C++
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Building Java Programs Generics, hashing reading: 18.1.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Sets and Maps Chapter 9.
Hashing.
Data Structures Using C++ 2E
COMP 53 – Week Eleven Hashtables.
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
School of Computer Science and Engineering
Slides by Steve Armstrong LeTourneau University Longview, TX
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Data Structures Using C++ 2E
Introduction to Hashing - Hash Functions
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Advanced Associative Structures
Hash Table.
Building Java Programs
Data Structures and Algorithms
Hashing.
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
Hash Tables and Associative Containers
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Hash Tables Chapter 12.7 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time Nyhoff, ADTs, Data Structures and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
EE 312 Software Design and Implementation I
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Data Structures & Algorithms
slides created by Marty Stepp
Collision Handling Collisions occur when different elements are mapped to the same cell.
Podcast Ch21a Title: Hash Functions
What we learn with pleasure we never forget. Alfred Mercier
EE 312 Software Design and Implementation I
Lecture-Hashing.
Presentation transcript:

Hash table another data structure for implementing a map or a set

Quiz answer all question with true or false A hash table stores the map elements in order of their key values A hash table makes use of an array or a vector Searching for an element in a hash table never requires more than 1 comparison

Quiz answer all question with true or false A hash table stores the map elements in order of their key values false A hash table makes use of an array or a vector true Searching for an element in a hash table never requires more than 1 comparison false

What is a hash table? An array (or vector) with a fixed capacity (tsize) Add, find, retrieve and remove all start by applying a hash function to an item (or its key) Hash function uses the key to compute a value between 0 and tsize – 1 Hash(key)  0 .. Tsize - 1 Result is Known as the home address

A simple example Tsize is 7 Keys of items to be added are: 345, 617, 963, 712, 366 Hash function is: Key % tsize What is the home address of each item to be added?

1 2 3 4 5 6 345 % 7  2 617 % 7  1 963 % 7  4 712 % 7  5 366 % 7  2 --- a collision

How to deal with collisions? Collisions are inevitable Home address becomes the starting point for finding Where to add a key-value pair Finding item with a given key Retrieving the value associated with a given key Finding key-value pair to be removed A hash table Needs a collision resolution strategy

Two kinds of hash tables Open hash tables All items are stored in the array/vector at their home address (if open) or follow a probe sequence to find open spot Chained hash tables Items with the same home address are synonyms Synonyms are stored in a list starting at the home address For both all operations start by applying a hash function and then using the home address as the starting point for finding the item or where to store the item

an open hash table add the following items item home address 345 2 345 2 617 1 963 4 712 5 366 2 1 2 3 4 5 6 E E E E E E E

an open hash table Find: 712, 366, 49, 50 E F 617 F 345 F 366 F 963 F 1 2 3 4 5 6 E F 617 F 345 Find: 712, 366, 49, 50 F 366 F 963 F 712 E

an open hash table Remove: 345 E E F 617 F 617 F 345 R F 366 F 366 F 1 2 3 4 5 6 1 2 3 4 5 6 E E F 617 F 617 F 345 Remove: 345 R F 366 F 366 F 963 F 963 F 712 F 712 E E

a chained hash table add the following items item home address 345 2 345 2 617 1 963 4 712 5 366 2 1 2 3 4 5 6

a chained hash table Find: 712, 345, 49, 50 Remove: 345 617 366 345 1 2 3 4 5 6 617 366 345 Find: 712, 345, 49, 50 Remove: 345 963 712

Hash table performance Depends on Load factor (n / tsize) must be < 1 for open hash tale Can be > 1 for chained hash table Quality of the hash function How uniformly are the home addresses produced distributed over the range 0 .. Tsize-1

Chained hash tables Hash function determines which synonym list to search With a good hash function synonym lists will be of nearly equal length What is the average synonym list length if we add n items to a hash table with a size of tsize? Given HT1 and ht2 Both have a load factor of 2 Ht1 holds 1000 items and ht2 holds 4000 items How does their expected performance compare? Time to find an item grows at the same rate as the load factor (not n) search for an item in a chained hash table is O(1) Search for an item in a closed hash table also increases at same rate as the load factor

Hash functions Simplest case Item being hashed is an integer Return Item % tsize What if the item being hashed is a string? create an integer from the string Simplest way Sum the characters (each is stored as an asci value)

hash functions Key  0 .. tSize – 1 A good hash function uniformly distributes home addresses over the range 0 .. Tsize-1 What can be done to make a hash function good? Using a prime number for tsize helps reduce collisions Knowledge about keys can help devise a better hash function for a given set of key values Who uses (calls) the hash function? The program that makes use of a set/map? The implementer of the map? Who should write the hash function? The program that makes use of the map?

Map user writes Map implementer uses a hash function the hash function Map Interface Map user writes a hash function Map implementer uses the hash function

How can the Map user pass the hash function to the map implementer? a function (like a variable) has a type and a value The type of a function is its prototype (arguments and return type) The value of function is a pointer to a block of code to be executed Typedef int(*hashfunc) (keyType key, int tsize); Any function that has parameters of keytype and int and returns an int is of type *hashfunc User of the map writes a hash function and passes it to the map constructor which stores it as a data member

MAP IMPLEMENTATION typedef int(*hashFunc)(KeyType item, int tsize); class Map{ public: Map(int size, hashFunc func); private: hashFunc hash; // the hash function };

MAP USER int hash(KeyType key, int tsize); // input: size of the hash table and a key // output: returns an int between 0 and tsize - 1 int main () { Map myMap(size, hash); } int hash(KeyType key, int tsize){

Hash table performance Two Key factors Load factor - n / tsize over .75 for open hash table likely to produce long probe sequences can be more than 1 for chained hash table Quality of the hash function How uniformly are the home addresses produced distributed over the range 0 .. Tsize-1 Using a prime number for tsize helps reduce collisions Knowledge about keys can help devise a better hash function for a given set of key values

Choosing between a bst and a hash table for a map or set What do we know about the number of items to be stored? Space needed for a bst is O(n) Space needed for a chained HT is O(n + tsize) Space for an open ht is O(tsize) What do we know about the items or keys to be compared? Need to be comparable for both Need a hash function for a ht Is it necessary to traverse elements in order of items or keys?