Hash Tables and Associative Containers

Slides:



Advertisements
Similar presentations
Hashing as a Dictionary Implementation
Advertisements

Hashing Techniques.
Hashing CS 3358 Data Structures.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables and Associative Containers CS-212 Dick Steflik.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
School of Computer Science and Engineering
Slides by Steve Armstrong LeTourneau University Longview, TX
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Introduction to Hashing - Hash Functions
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash functions Open addressing
Hash table another data structure for implementing a map or a set
Advanced Associative Structures
Hash Table.
Hash Table.
Hash Tables.
Data Structures and Algorithms
Chapter 10 Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Chapter 12.7 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time Nyhoff, ADTs, Data Structures and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Ch Hash Tables Array or linked list Binary search trees
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
What we learn with pleasure we never forget. Alfred Mercier
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hash Tables and Associative Containers CS-240 Dick Steflik

Hash Tables a hash table is an array of size Tsize has index positions 0 .. Tsize-1 two types of hash tables open hash table array element type is a <key, value> pair all items stored in the array chained hash table element type is a pointer to a linked list of nodes containing <key, value> pairs items are stored in the linked list nodes keys are used to generate an array index home address (0 .. Tsize-1)

Faster Searching "balanced" search trees guarantee O(log2 n) search path by controlling height of the search tree AVL tree 2-3-4 tree red-black tree (used by STL associative container classes) hash table allows for O(1) search performance search time does not increase as n increases

Hash Table a hash table is an array/vector (fixed size) has index positions 0 .. Tsize-1 if we could use the keys as an index we would have O(1) retrieval hashTable[key] keys are used to generate an array index home address (0 .. Tsize-1) function to do this is called a hash function hash(key) returns an int value hash(key) % Tsize => 0 . . Tsize - 1

Collisions Collisions occur whenever two keys produce the same index (hash to the same location Design goal: pick a hash function that produces no collisions Away of life with hash tables What do you do? linear probing: check the next location, if its empty use it quadratic probing: check next, then 2 away, then 4 away......

a Hash Table of size 7 1 2 3 4 5 6 some insertions: hash(K1) % 7 => 3 hash(K2) % 7 => 5 hash(K3) % 7 => 2 hash(K4) % 7 => 3 hash(K5) % 7 => 2 hash(K6) % 7 => 4 1 2 3 4 5 6 T linear probe open addressing collision resolution strategy key value empty

Search Performance 1 2 3 4 5 6 average number of probes needed to retrieve the value with key K? 1 2 3 4 5 6 F K3 K3info F K1 K1info F K2 K2info F K4 K4info F K5 K5info F K6 K6info T K home address #probes K1 3 K2 5 K3 2 K4 3 K5 2 K6 4 1 2 5 4 14/6 = 2.33 (successful) unsuccessful search?

Chaining with Separate Lists 1 2 3 4 5 6 hash(K1) % 7 => 3 hash(K2) % 7 => 5 hash(K3) % 7 => 2 hash(K4) % 7 => 3 hash(K5) % 7 => 2 hash(K6) % 7 => 4 K3 K3info K5 K5info K1 K1info K4 K4info K6 K6info K2 K2info linked lists of synonyms

Search Performance 1 2 3 4 5 6 average number of probes needed to retrieve the value with key K? 1 2 3 4 5 6 K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info K home address #probes K1 3 K2 5 K3 2 K4 3 K5 2 K6 4 1 2 8/6 = 1.33 (successful) unsuccessful search?

Where are Hash Tables used? Databases Spelling checkers Java uses them all over the place (built into the language) most scripting languages (ASP, PERL, PHP) have associative arrays Caching Schemes software – browsers, http proxy servers, DNS servers hardware – memory caching, instruction caching

Deletions? search for item to be deleted chained hash table delete a node from a linked list open hash table just mark spot as "empty"? must mark vacated spot as “deleted” is different than “empty”

Hash Functions a hash function is used to map a key to an array index (home address) search starts from here insert, retrieve, delete all start by applying the hash function to the key Characteristics uniform distribution of hash values (no clustering) goals for a hash function fast to compute even distribution over the entire collection of keys all hash functions produce collisions multiple keys hash to same home address

Some Hash Functions... Division works good in most cases as long as keys are relatively random H(key) = key mod m if key is an integer identity function ( return key) good if keys are random not good if keys have similar characteristics ex m = 25 all keys divisible by 5 would map into positions 0, 5,10,15… causing clustering around those values

more Hash functions... Mid-Squared index = 10001000102 = 54610 produces a nearly random distribution of indices mid-square technique takes longer to compute but gives better distribution when keys may have some digits in common convert key to an octal string A-Z = 018 - 328 and 0-9 = 338 - 448 ex key = A1 = 1348 1348 * 1348 = 204208 using a table of 1024 elements 0010001000100002 use middle 10 bits as the index index = 10001000102 = 54610 note - most collisions will occur for short identifiers

more Hash functions... Digit Folding Double hashing assume a 5 digit decimal string (digits 0-9 only) H(key) = d1 + d2 + d3 + d4 + d5 (sum of digits) this would yield 0 <= h <= 45 for all possible keys if we were to fold the digits in pairs H(key) = d1d2 + d3d4 + d5 0 <= h <= 207 (99 + 99 + 9) Double hashing use two (or more) hash functions serially helps overcome effects of a function that produces a poor distribution of keys

Clustering Undesirable characteristic of the hash function selected and the collision resolution strategy too many keys hash to the same location causing long strings of keys that need to be searched especially bad using a divide based function and using linear probing insertion/deletion/search can approach O(n) Solutions Pick a different hash function Pick a different collision resolution strategy

Factors Affecting Search Performance quality of hash function Uniformity of the distribution depends on actual data collision resolution strategy used load factor of the HashTable N/Tsize the lower the load factor the better the search performance

Successful Search Performance open addressing open addressing chaining (linear probing) (double hashing) load factor 0.5 1.50 1.39 1.25 0.7 2.17 1.72 1.35 0.9 5.50 2.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00

Summary of Hash tables search speed depends on load factor and quality of hash function should be less than .75 for open addressing can be more than 1 for chaining items not kept sorted by key very good for fast access to unordered data with known upper bound to pick a good TSize