CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hashing.
Hashing General idea Hash function Separate Chaining Open Addressing
Hashing as a Dictionary Implementation
CS202 - Fundamental Structures of Computer Science II
Hashing: Collision Resolution Schemes
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
1 Joe Meehean 1.  BST easy to implement average-case times O(LogN) worst-case times O(N)  AVL Trees harder to implement worst case times O(LogN)  Can.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Chapter 5: Hashing Collision Resolution: Separate Chaining Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing as a Dictionary Implementation Chapter 19.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Hash Tables - Motivation
CS201: Data Structures and Discrete Mathematics I Hash Table.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing, Hashing Tables Chapter 8. Class Hierarchy.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hashing & Hash Tables. Sets/Dictionaries Set - Our best efforts to date:
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC2100B Tutorial 6 Hashing Hao Ma Yi LIU Mar 4, 2004.
Duke CPS Faster and faster and … search l Binary search trees ä average case insert/search/delete = O( ) ä worst case = O( ) l balanced search.
Fundamental Structures of Computer Science II
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Hashing Alexandra Stefan.
Advanced Associative Structures
Dictionaries and Their Implementations
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Ch Hash Tables Array or linked list Binary search trees
Data Structures and Algorithm Analysis Hashing
CMSC 341 Lecture 12.
Jordi Cortadella and Jordi Petit Department of Computer Science
CS 144 Advanced C++ Programming April 23 Class Meeting
Presentation transcript:

CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University

Outline Hash Tables (Chapter 5)

Motivation Recall Big Question 4: –How can I retrieve/search data efficiently? After investigating the balanced binary search trees (AVL, Red-Black), we can ask: –Is it possible to break the log(n) barrier for insertion and deletion?

Hash Tables A hash table is a data structure that was invented as an attempt to break the log(N) insertion and deletion barrier of the balanced binary search trees. Conceptually, a hash table is an array of items plus a hash function that maps arbitrary objects to indices of the array. A hash function first extracts a key from a given object and then maps the key into a legal array index. For example, if an object is an employee record, the key could be the employee’s SSN or the employee’s first and last names. Typical keys are numbers and strings.

Example: A Hash Table “Mark” “Rachel” “David” “Deborah” “John”

Hash Functions Hashing Key Extraction legal index Object

Hash Functions It is impossible to find a hash function that computes indices (two different array cells) for any two distinct keys. Why? Because there are infinitely many keys, but only finitely many slots in the table. Question: What are we to do? Answer: Look for hash functions that distribute keys evenly among the cells.

Three Hashing Problems Choose a hash function: –Simple and fast; –Distributes keys evenly. Choose a table size. Choose a collision resolution strategy (what to do when several keys are mapped to the same index).

Choosing a Hash Function If keys are integers, Key Mod TableSize is a sensible strategy. Caveat: Keys should be random and should not have some undesirable properties. For example, if TableSize = 10 and all keys end in 0, Key Mod TableSize is not a sensible strategy.

Choosing a Table Size To avoid the situations with uneven key distributions, TableSize is typically a prime number. When keys are random integers Key Mod TableSize works fairly well.

A Hash Function: Example 1

int hash(const string& key, int tableSize) { int hashVal = 0; for(int i = 0; i < key.length(); i++) { hashVal += key[i]; } return hashVal % tableSize; }

Comments on hash1 Easy to compute and fast. If the TableSize is large, the function may not distribute keys well. Why? Suppose TableSize = 10,007 (a prime) and all keys are ASCII strings of length 8 or smaller. hash1’s range is [0, 127*8=1016]. This is NOT an acceptable distribution.

Hash Function: Example 2

int hash2(const string &key, int tableSize) { int hashVal = 0; for(int j=0; j < key.length(); j++) { hashVal = 37 * hashVal + key[j]; } hashVal %= tableSize; if ( hashVal < 0 ) { hashVal += tableSize; } return hashVal; }

Comments On Hash2 Easy to compute. Fast on relatively short keys. Distributes keys fairly well. Potential problems with very long keys, because there will be lots of buffer overflows and collisions.

Collision Resolution A collision occurs when an element is inserted under a key that hashes to the cell that is already occupied with a different element.

Collision Resolution Strategies Separate chaining Open addressing

Separate Chaining Separate chaining keeps a list of all elements whose keys hash to the same index. What does it mean? Under separate chaining, a hash table is an array of lists. The term “lists” is used rather loosely in the previous statement. It can be an array of AVL search trees or an array of has tables. But the linked list remains the most common choice.

Hash Table: Implementation template class CHashTable { … private: vector > m_Lists; int m_Size; … }; int hash(const string &key) { …}

Hash Table: Implementation class CEmployee { private: string m_Name; double m_Salary; … }; int hash(const Employee &x) { return hash(x.GetName()); }

Hash Table: Implementation template int CHashTable ::hashIndex(const T& x) const { int index = hash(x); index %= m_Lists.size(); if ( index < 0 ) index += m_Lists.size(); return index; }