1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
Log Files. O(n) Data Structure Exercises 16.1.
Implementation of Linear Probing (continued) Helping method for locating index: private int findIndex(long key) // return -1 if the item with key 'key'
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CSE 326: Data Structures: Hash Tables
CSE 326: Data Structures Hashing Ben Lerner Summer 2007.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
CSE 326: Data Structures Lecture #13 Bart Niswonger Summer Quarter 2001.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions Kevin Quinn Fall 2015.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Fundamental Structures of Computer Science II
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing Alexandra Stefan.
CSE 373: Hash Tables Hunter Zahn Summer 2016 UW CSE 373, Summer 2016.
Hashing Alexandra Stefan.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Instructor: Lilian de Greef Quarter: Summer 2017
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
Collision Resolution Neil Tang 02/18/2010
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
CSE 326: Data Structures Hashing
Hashing Alexandra Stefan.
Tree traversal preorder, postorder: applies to any kind of tree
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Collision Resolution Neil Tang 02/21/2008
Ch Hash Tables Array or linked list Binary search trees
Ch. 13 Hash Tables  .
Data Structures and Algorithm Analysis Hashing
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
CSE 373: Data Structures and Algorithms
Presentation transcript:

1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14

2 Dictionary Implementations So Far Unsorted linked list Sorted Array BSTAVLSplay (amortized) Insert Find Delete

3 Hash Tables Constant time accesses! A hash table is an array of some fixed size, usually a prime number. General idea: key space (e.g., integers, strings) 0 … TableSize –1 hash function: h(K) hash table

4 Example key space = integers TableSize = 10 h(K) = K mod 10 Insert: 7, 18, 41,

5 Another Example key space = integers TableSize = 6 h(K) = K mod 6 Insert: 7, 18, 41,

6 Hash Functions 1.simple/fast to compute, 2.Avoid collisions 3.have keys distributed evenly among cells. Perfect Hash function:

7 Sample Hash Functions: key space = strings s = s 0 s 1 s 2 … s k-1 1.h(s) = s 0 mod TableSize 2.h(s) = mod TableSize 3.h(s) = mod TableSize

8 Collision Resolution Collision: when two keys map to the same location in the hash table. Two ways to resolve collisions: 1.Separate Chaining 2.Open Addressing (linear probing, quadratic probing, double hashing)

9 Separate Chaining Separate chaining: All keys that map to the same hash value are kept in a list (or “bucket”) Insert:

10 Analysis of find Defn: The load factor,, of a hash table is the ratio:  no. of elements  table size For separate chaining, = average # of elements in a bucket Unsuccessful find: Successful find:

11 How big should the hash table be? For Separate Chaining:

12 tableSize: Why Prime? Suppose –data stored in hash table: 7160, 493, 60, 55, 321, 900, 810 –tableSize = 10 data hashes to 0, 3, 0, 5, 1, 0, 0 –tableSize = 11 data hashes to 10, 9, 5, 0, 2, 9, 7 Real-life data tends to have a pattern Being a multiple of 11 is usually not the pattern

13 Open Addressing Insert: Linear Probing: after checking spot h(k), try spot h(k)+1, if that is full, try h(k)+2, then h(k)+3, etc.

14 Terminology Alert! “Open Hashing” equals “Separate Chaining” “Closed Hashing” equals “Open Addressing” Weiss

15 Linear Probing f(i) = i Probe sequence: 0 th probe = h(k) mod TableSize 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 2) mod TableSize... i th probe = (h(k) + i) mod TableSize

16 Linear Probing – Clustering [R. Sedgewick] no collision collision in small cluster collision in large cluster

17 Load Factor in Linear Probing For any < 1, linear probing will find an empty slot Expected # of probes (for large table sizes) –successful search: –unsuccessful search: Linear probing suffers from primary clustering Performance quickly degrades for > 1/2

18 Quadratic Probing f(i) = i 2 Probe sequence: 0 th probe = h(k) mod TableSize 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 4) mod TableSize 3 th probe = (h(k) + 9) mod TableSize... i th probe = (h(k) + i 2 ) mod TableSize Less likely to encounter Primary Clustering

19 Quadratic Probing Insert:

20 Quadratic Probing Example insert(76) 76%7 = 6 insert(40) 40%7 = 5 insert(48) 48%7 = 6 insert(5) 5%7 = 5 insert(55) 55%7 = 6 insert(47) 47%7 = 5 But…

21 Quadratic Probing: Success guarantee for < ½ If size is prime and < ½, then quadratic probing will find an empty slot in size/2 probes or fewer. –show for all 0  i,j  size/2 and i  j (h(x) + i 2 ) mod size  (h(x) + j 2 ) mod size –by contradiction: suppose that for some i  j: (h(x) + i 2 ) mod size = (h(x) + j 2 ) mod size  i 2 mod size = j 2 mod size  (i 2 - j 2 ) mod size = 0  [(i + j)(i - j)] mod size = 0 Because size is prime (i-j) or (i+j) must be zero, and neither can be

22 Quadratic Probing: Properties For any < ½, quadratic probing will find an empty slot; for bigger, quadratic probing may find a slot Quadratic probing does not suffer from primary clustering: keys hashing to the same area are not bad But what about keys that hash to the same spot? –Secondary Clustering!

23 Double Hashing f(i) = i * g(k) where g is a second hash function Probe sequence: 0 th probe = h(k) mod TableSize 1 th probe = (h(k) + g(k)) mod TableSize 2 th probe = (h(k) + 2*g(k)) mod TableSize 3 th probe = (h(k) + 3*g(k)) mod TableSize... i th probe = (h(k) + i*g(k)) mod TableSize

24 Double Hashing Example h(k) = k mod 7 and g(k) = 5 – (k mod 5) Probes

25 Resolving Collisions with Double Hashing Insert these values into the hash table in this order. Resolve any collisions with double hashing: Hash Functions: H(K) = K mod M H 2 (K) = 1 + ((K/M) mod (M-1)) M =

26 Idea: When the table gets too full, create a bigger table (usually 2x as large) and hash all the items from the original table into the new table. When to rehash? –half full ( = 0.5) –when an insertion fails –some other threshold Cost of rehashing? Rehashing

27 Java hashCode() Method Class Object defines a hashCode method –Intent: returns a suitable hashcode for the object –Result is arbitrary int; must scale to fit a hash table (e.g. obj.hashCode() % nBuckets) –Used by collection classes like HashMap Classes should override with calculation appropriate for instances of the class –Calculation should involve semantically “significant” fields of objects

28 hashCode() and equals() To work right, particularly with collection classes like HashMap, hashCode() and equals() must obey this rule: if a.equals(b) then it must be true that a.hashCode() == b.hashCode() –Why? Reverse is not required

29 Hashing Summary Hashing is one of the most important data structures. Hashing has many applications where operations are limited to find, insert, and delete. Dynamic hash tables have good amortized complexity.