CS 332: Algorithms Hash Tables David Luebke 1 7/19/2018.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis.
Hash Tables CIS 606 Spring 2010.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
M. Böhlen and R. Sebastiani 9/29/20161 Data Structures and Algorithms Roberto Sebastiani
Many slides here are based on E. Demaine , D. Luebke slides
Data Structures Using C++ 2E
Hashing Jeff Chastine.
Hashing, Hash Function, Collision & Deletion
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Dynamic Order Statistics
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Advanced Associative Structures
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Introduction to Algorithms 6.046J/18.401J
CSCE 3110 Data Structures & Algorithm Analysis
Algorithms and Data Structures Lecture VI
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Space-for-time tradeoffs
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
EE 312 Software Design and Implementation I
CS 5243: Algorithms Hash Tables.
Hashing.
CS 3343: Analysis of Algorithms
DATA STRUCTURES-COLLISION TECHNIQUES
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
DATA STRUCTURES-COLLISION TECHNIQUES
EE 312 Software Design and Implementation I
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

CS 332: Algorithms Hash Tables David Luebke 1 7/19/2018

Homework 4 Programming assignment: Implement Skip Lists Evaluate performance Extra credit: compare performance to randomly built BST Due Mar 9 (Fri before Spring Break) Will post later today David Luebke 2 7/19/2018

Review: Skip Lists The basic idea: Keep a doubly-linked list of elements Min, max, successor, predecessor: O(1) time Delete is O(1) time, Insert is O(1)+Search time During insert, add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4) level 3 level 2 level 1 3 9 12 18 29 35 37 David Luebke 3 7/19/2018

Summary: Skip Lists O(1) expected time for most operations O(lg n) expected time for insert O(n2) time worst case But random, so no particular order of insertion evokes worst-case behavior O(n) expected storage requirements Easy to code David Luebke 4 7/19/2018

Review: Hash Tables Hash table: Given a table T and a record x, with key (= symbol) and satellite data, we need to support: Insert (T, x) Delete (T, x) Search(T, x) We want these to be fast, but don’t care about sorting the records In this discussion we consider all keys to be (possibly large) natural numbers David Luebke 5 7/19/2018

Review: Direct Addressing Suppose: The range of keys is 0..m-1 Keys are distinct The idea: Set up an array T[0..m-1] in which T[i] = x if x T and key[x] = i T[i] = NULL otherwise This is called a direct-address table Operations take O(1) time! David Luebke 6 7/19/2018

Review: The Problem With Direct Addressing Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers? Problem 1: direct-address table will have 232 entries, more than 4 billion Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be Solution: map keys to smaller range 0..m-1 This mapping is called a hash function David Luebke 7 7/19/2018

Hash Functions Next problem: collision T U (universe of keys) h(k1) k1 U (universe of keys) h(k1) k1 h(k4) k4 K (actual keys) k5 h(k2) = h(k5) k2 h(k3) k3 m - 1 David Luebke 8 7/19/2018

Resolving Collisions How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing David Luebke 9 7/19/2018

Open Addressing Basic idea (details in Section 12.4): To insert: if slot is full, try another slot, …, until an open slot is found (probing) To search, follow same sequence of probes as would be used when inserting the element If reach element with correct key, return it If reach a NULL pointer, element is not in table Good for fixed sets (adding but no deletion) Example: spell checking Table needn’t be much bigger than n David Luebke 10 7/19/2018

Chaining Chaining puts elements that hash to the same slot in a linked list: T —— U (universe of keys) k1 k4 —— —— k1 —— k4 K (actual keys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k8 k3 —— k6 k8 k6 —— —— David Luebke 11 7/19/2018

Chaining How do we insert an element? T —— U (universe of keys) k1 k4 K (actual keys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k8 k3 —— k6 k8 k6 —— —— David Luebke 12 7/19/2018

Chaining How do we delete an element? Do we need a doubly-linked list for efficient delete? T —— U (universe of keys) k1 k4 —— —— k1 —— k4 K (actual keys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k8 k3 —— k6 k8 k6 —— —— David Luebke 13 7/19/2018

Chaining How do we search for a element with a given key? T —— U (universe of keys) k1 k4 —— —— k1 —— k4 K (actual keys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k8 k3 —— k6 k8 k6 —— —— David Luebke 14 7/19/2018

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table: the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? David Luebke 15 7/19/2018

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) David Luebke 16 7/19/2018

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) What will be the average cost of a successful search? David Luebke 17 7/19/2018

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+) What will be the average cost of a successful search? A: O(1 + /2) = O(1 + ) David Luebke 18 7/19/2018

Analysis of Chaining Continued So the cost of searching = O(1 + ) If the number of keys n is proportional to the number of slots in the table, what is ? A:  = O(1) In other words, we can make the expected cost of searching constant if we make  constant David Luebke 19 7/19/2018

Choosing A Hash Function Clearly choosing the hash function well is crucial What will a worst-case hash function do? What will be the time to search in this case? What are desirable features of the hash function? Should distribute keys uniformly into slots Should not depend on patterns in the data David Luebke 20 7/19/2018

Hash Functions: The Division Method h(k) = k mod m In words: hash k into a table with m slots using the slot given by the remainder of k divided by m What happens to elements with adjacent values of k? What happens if m is a power of 2 (say 2P)? What if m is a power of 10? Upshot: pick table size m = prime number not too close to a power of 2 (or 10) David Luebke 21 7/19/2018

Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA - kA)  What does this term represent? David Luebke 22 7/19/2018

Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA - kA)  Choose m = 2P Choose A not too close to 0 or 1 Knuth: Good choice for A = (5 - 1)/2 Fractional part of kA David Luebke 23 7/19/2018

Hash Functions: Worst Case Scenario You are given an assignment to implement hashing You will self-grade in pairs, testing and grading your partner’s implementation In a blatant violation of the honor code, your partner: Analyzes your hash function Picks a sequence of “worst-case” keys, causing your implementation to take O(n) time to search What’s an honest CS student to do? David Luebke 24 7/19/2018

Hash Functions: Universal Hashing As before, when attempting to foil an malicious adversary: randomize the algorithm Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored Guarantees good performance on average, no matter what keys adversary chooses David Luebke 25 7/19/2018

The End David Luebke 26 7/19/2018