CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.

Slides:



Advertisements
Similar presentations
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Advertisements

Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Maps, Dictionaries, Hashtables
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1 Implementing a queue in an array Hashing We show a neat idea that allows a queue to be stored in an array but takes constant time for both adding and.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
The Map ADT and Hash Tables. 2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing CS 110: Data Structures and Algorithms First Semester,
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
CS261 Data Structures Hash Tables Open Address Hashing.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Implementing the Map ADT.  The Map ADT  Implementation with Java Generics  A Hash Function  translation of a string key into an integer  Consider.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Recitation 7 Hashing.
Hashing CS2110.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Collision Handling Collisions occur when different elements are mapped to the same cell.
Presentation transcript:

CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty Add an element to the set Remove an element from the set Get the size of the set (number of elements in it) Tell whether a value is in the set Tell whether the set is empty. Note: We work here with a set of Strings. But the elements of the set could be anything at all; only the “hash function” would change. 1

What’s wrong with using an array? Adding an element requires testing whether it is in the array. Expected time O(n) Removing an element requires moving elements down. Expected time O(n) Testing whether an element is in: expected time O(n) b n "xy" "abc" "1$2" "aaa" … Keep set elements in b[0..n-1] Keeping the array sorted presents its own problems. ArrayList and Vector implemented using arrays; same problems 2

Solution: Let an element appear anywhere in b A hash function says where to put an element b "abc" "235" "aaa" "1$2" "xy" Keep set elements anywhere in b In example to right: "abc" hashes to 0 "1$2" hashes to 100 Collision: string hashes to occupied place Solution: Put in next available space "235" and "aaa" both hash to 60 3

Array elements: null or of type HashEntry /** An instance is an element in hash array */ private static class HashEntry { public String element; // the element public boolean isInSet; // = " element is in set " /** Constructor: an element that is in the set iff b */ public HashEntry( String e, boolean b) { element= e; isInSet= b; } HashEntry object says whether it is in set. To remove an element, set field isInSet to false. 4

0 k Hashing with linear probing To add string " bc " to set: int k= hashCode( "bc"); We discuss hash functions later b … Check elements b[k], b[k+1], … until: null or element containing " bc " is found Probe: checking one element 5

0 k Basic fact int k= hashCode( s); b … Suppose k = hashCode(s). Suppose s is in set. Let b[j] be first (with wraparound) null element at or after b[k]. Then s is in one of elements b[k..j-1] (with wraparound) Basic fact relies on never setting an element to null 6

/** Add s to this set (if not in) */ public void add(String s) { int k= hashCode(s); while (b[k] != null && !b[k].element.equals(s)) k= (k+1) % b.length(); if (b[k] = = null) { b[k]= new HashEntry(s, true); size= size+1; return; } // s is in b[k] –but it may not be in the set if (!b[k].isInSet) { b[k].isInSet= true; size= size + 1; } procedure add % (remainder) gives wraparound 7

/** Remove from this set (if it is in) */ public void remove(String s) { int k= hashCode(s); while (b[k] != null && !b[k].element.equals(s)) k= (k+1) % b.length(); if (b[k] = = null || !b[k].isInSet) { return; // s is in b[k] and is in the set; remove it b[k].isInSet= false; size= size - 1; } procedure remove 8

lf = (number of elements that are not null) / b.length Estimate of how full array is: close to 0: relatively empty Close to 1: too full Somebody proved: Load factor Under certain independence assumptions (about the hash function), the average number of probes in adding an element is 1 / (1 – lf) Array half full? Addition expected to need only 2 probes! E.g. size 2000, 1000 elements are null. Only 2 probes! Wow! 9

/** Rehash array b */ private void rehash( ) { HashEntry[] oldb= b; // copy of array b // Create a new, empty array b= new HashEntry[nextPrime(4 * size)]; size= 0; // Copy active elements from oldb to b for (int i= 0; i != oldb.length; i= i+1) if (oldb[i] !== null && oldb[i].isInSet) b.add(oldb[i].element); } Procedure rehash Size of new array: first prime larger than 4 * (size of set) Why a prime? Next slides 10

Linear probing: Look at k, k+1, k+2 … Clustering: because 2 strings that hash to k, k+1 have almost same probe sequence Instead, use quadratic probing: b[k] b[k+1 2 ] b[k+2 2 ] b[k+3 2 ] … Quadratic probing 11 Removes primary clustering Efficient calculation: (i+1) 2 – i 2 = 2*i – 1 Get to next probe with mult and add

Quadratic probing b[k] b[k+1 2 ] b[k+2 2 ] b[k+3 2 ] … 12 Someone proved: If 1.Size of array is a prime 2.Load factor <= 1/2 Then A new element can be added Probe sequence never probes same elements twice Two facts hold for linear probing even if size is not prime. But quadratic probing requires prime size

Hash function 13 Want a hash function that doesn’t put too many elements at the same position. Class String has a good hash function s.hashCode() The specs define it as (with n the length of s): s[0]*31 n-1 + s[1]*31 n s[n-1] Time is O(n)! Extremely long strings? Create you own hash function, But it’s not easy to create a good one.

Java’s hashcode-equals contract 14 HashCode and equals are implemented in class Object. HashCode in Object: usually implemented by converting internal address (pointer) to an integer General contract for HashCode: During an execution, c.hashCode() should consistently return same value unless info used in calculating c.equals(…) is changed c1.equals(c2) true? Then c1.hashCode() = c2.hashCode() c1.equls(c2) false? For best performance c1.hashCode() != c2.hashCode() —but not required. Override equals? Then override hashCode also if you are going to use it.

Summary 15 We presented basics of hashing, although there are a few other ideas you should be aware of. We summarize, giving references to the text by Carrano and elsewhere for more information. Carrano does Hashing in Chapter 21, 523–546. Describe basic idea of hashing (524–526). hash table (the array) hash function: Given search key, compute a hash code: an integer. Integer is then changed (Carrano says compressed) to be in range of hash table, usually using remainder function. Perfect hash function: maps each search key into a different integer that is an index in the hash table. Good hash function properties: (1) minimize collisions, (2) Be fast to compute.

Summary 16 Java hash functions: String provides hashCode function. It’s >= 0. Each wrapper class provides hashCode function for the values that it wraps; for class Integer, it is the wrapped int, so it can be negative. (page 528–530). Cryptographic hash function (visit Wikipedia). Produce a fixed-size bit string for an arbitrary block of data such that any change to the data will, with high probability, change the hash value. Critical for information security applications, like digital signatures. Not easy to come up with good ones. The widely used MD5 Message-Digest Algorithm (by Ron Rivest of MIT) produces a 16-byte hash value, but it has flaws.

Summary 17 Hash table size n: Best n is a prime > 2. Then compression of hash code using % n provides indices that are distributed uniformly in 0..n-1 (page 531). Be careful with h % n: it not the modulus operation. If h < 0, h% n is in 1-n..0, so add n or use absolute value. Load factor lf: Ratio of number of occupied hash-table elements to size of hash-table. Proved: for linear or quadratic probing, under certain independence conditions, the average number of probes in adding an element is at most 1 / (1 – lf). So if hash table is half full, only 2 probes expected! Keep it at most half full by making bigger hash table when necessary.

Summary 18 Collision: Occurs when two different search keys hash and are then compressed to same index. Two general ways to proceed: 1.Open addressing: Use Linear probing. Has problem of primary clustering Quadratic probing. Hash table size should be a prime With both linear and quadratic probing, don’t remove a deleted element from hash table. It must stay there with a flag indicating it is not in set. 2.Separate chaining: Entry in hash table is head of a linked list of all keys that hash to same index. Takes more space but eliminates many collisions (page 539–542).