David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

1 11. Hash Tables Heejin Park College of Information and Communications Hanyang University.
David Luebke 1 6/1/2014 CS 332: Algorithms Medians and Order Statistics Structures for Dynamic Sets.
David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
Data Structures: A Pseudocode Approach with C
Data Structures Using C++
1 Hash Tables Saurav Karmakar. 2 Motivation What are the dictionary operations? What are the dictionary operations? (1) Insert (1) Insert (2) Delete (2)
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
CSCI 2720 Hashing   Spring 2005.
Hash Table.
Briana B. Morrison Adapted from William Collins
Analysis of Algorithms CS 477/677
Hash Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Hash Tables Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis.
Hash Tables CIS 606 Spring 2010.
Hashing.
Introduction to Algorithms
David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Data Structures Hashing Uri Zwick January 2014.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Many slides here are based on E. Demaine , D. Luebke slides
CSCI 210 Data Structures and Algorithms
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 5243: Algorithms Hash Tables.
CS 3343: Analysis of Algorithms
Presentation transcript:

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing

David Luebke 2 6/7/2014 Red-Black Trees Red-black trees do what they do very well What do you think is the worst thing about red- black trees? A: coding them up

David Luebke 3 6/7/2014 Skip Lists A relatively recent data structure A probabilistic alternative to balanced trees A randomized algorithm with benefits of r-b trees O(lg n) expected time for Search, Insert O(1) time for Min, Max, Succ, Pred Much easier to code than r-b trees Fast!

David Luebke 4 6/7/2014 Linked Lists Think about a linked list as a structure for dynamic sets. What is the running time of: Min() and Max() ? Successor() ? Delete() ? How can we make this O(1)? Predecessor() ? Search() ? Insert() ? Goal: make these O(lg n) time in a linked-list setting So these all take O(1) time in a linked list. Can you think of a way to do these in O(1) time in a red-black tree?

David Luebke 5 6/7/2014 Skip Lists The basic idea: Keep a doubly-linked list of elements Min, max, successor, predecessor: O(1) time Delete is O(1) time, Insert is O(1)+Search time During insert, add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4) level level 2 level 3

David Luebke 6 6/7/2014 Skip List Search To search for an element with a given key: Find location in top list Top list has O(1) elements with high probability Location in this list defines a range of items in next list Drop down a level and recurse O(1) time per level on average O(lg n) levels with high probability Total time: O(lg n)

David Luebke 7 6/7/2014 Skip List Insert Skip list insert: analysis Do a search for that key Insert element in bottom-level list With probability p, recurse to insert in next level Expected number of lists = 1+ p + p 2 + … = ??? = 1/(1-p) = O(1) if p is constant Total time = Search + O(1) = O(lg n) expected Skip list delete: O(1)

David Luebke 8 6/7/2014 Skip Lists O(1) expected time for most operations O(lg n) expected time for insert O(n 2 ) time worst case (Why?) But random, so no particular order of insertion evokes worst-case behavior O(n) expected storage requirements (Why?) Easy to code

David Luebke 9 6/7/2014 Review: Hashing Tables Motivation: symbol tables A compiler uses a symbol table to relate symbols to associated data Symbols: variable names, procedure names, etc. Associated data: memory location, call graph, etc. For a symbol table (also called a dictionary), we care about search, insertion, and deletion We typically dont care about sorted order

David Luebke 10 6/7/2014 Review: Hash Tables More formally: Given a table T and a record x, with key (= symbol) and satellite data, we need to support: Insert (T, x) Delete (T, x) Search(T, x) We want these to be fast, but dont care about sorting the records The structure we will use is a hash table Supports all the above in O(1) expected time!

David Luebke 11 6/7/2014 Hashing: Keys In the following discussions we will consider all keys to be (possibly large) natural numbers How can we convert floats to natural numbers for hashing purposes? How can we convert ASCII strings to natural numbers for hashing purposes?

David Luebke 12 6/7/2014 Review: Direct Addressing Suppose: The range of keys is 0..m-1 Keys are distinct The idea: Set up an array T[0..m-1] in which T[i] = xif x T and key[x] = i T[i] = NULLotherwise This is called a direct-address table Operations take O(1) time! So whats the problem?

David Luebke 13 6/7/2014 The Problem With Direct Addressing Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers? Problem 1: direct-address table will have 2 32 entries, more than 4 billion Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be Solution: map keys to smaller range 0..m-1 This mapping is called a hash function

David Luebke 14 6/7/2014 Hash Functions Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

David Luebke 15 6/7/2014 Resolving Collisions How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing

David Luebke 16 6/7/2014 Open Addressing Basic idea (details in Section 12.4): To insert: if slot is full, try another slot, …, until an open slot is found (probing) To search, follow same sequence of probes as would be used when inserting the element If reach element with correct key, return it If reach a NULL pointer, element is not in table Good for fixed sets (adding but no deletion) Example: spell checking Table neednt be much bigger than n

David Luebke 17 6/7/2014 Chaining Chaining puts elements that hash to the same slot in a linked list: T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 18 6/7/2014 Chaining How do we insert an element? T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 19 6/7/2014 Chaining T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 k5k5 k2k2 k3k3 k8k8 k6k6 k7k7 How do we delete an element? Do we need a doubly-linked list for efficient delete?

David Luebke 20 6/7/2014 Chaining How do we search for a element with a given key? T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 21 6/7/2014 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table: the load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key?

David Luebke 22 6/7/2014 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+ )

David Luebke 23 6/7/2014 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+ ) What will be the average cost of a successful search?

David Luebke 24 6/7/2014 Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+ ) What will be the average cost of a successful search? A: O(1 + /2) = O(1 + )

David Luebke 25 6/7/2014 Analysis of Chaining Continued So the cost of searching = O(1 + ) If the number of keys n is proportional to the number of slots in the table, what is ? A: = O(1) In other words, we can make the expected cost of searching constant if we make constant

David Luebke 26 6/7/2014 Choosing A Hash Function Clearly choosing the hash function well is crucial What will a worst-case hash function do? What will be the time to search in this case? What are desirable features of the hash function? Should distribute keys uniformly into slots Should not depend on patterns in the data

David Luebke 27 6/7/2014 Hash Functions: The Division Method h(k) = k mod m In words: hash k into a table with m slots using the slot given by the remainder of k divided by m What happens to elements with adjacent values of k? What happens if m is a power of 2 (say 2 P )? What if m is a power of 10? Upshot: pick table size m = prime number not too close to a power of 2 (or 10)

David Luebke 28 6/7/2014 Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) = m (kA - kA ) What does this term represent?

David Luebke 29 6/7/2014 Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) = m (kA - kA ) Choose m = 2 P Choose A not too close to 0 or 1 Knuth: Good choice for A = ( 5 - 1)/2 Fractional part of kA

David Luebke 30 6/7/2014 Hash Functions: Worst Case Scenario Scenario: You are given an assignment to implement hashing You will self-grade in pairs, testing and grading your partners implementation In a blatant violation of the honor code, your partner: Analyzes your hash function Picks a sequence of worst-case keys, causing your implementation to take O(n) time to search Whats an honest CS student to do?

David Luebke 31 6/7/2014 Hash Functions: Universal Hashing As before, when attempting to foil an malicious adversary: randomize the algorithm Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored Guarantees good performance on average, no matter what keys adversary chooses