COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay.

Slides:



Advertisements
Similar presentations
Hashing as a Dictionary Implementation
Advertisements

Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
Maps, Dictionaries, Hashtables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
COMP 103 Priority Queues, Partially Ordered Trees and Heaps.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
COMP 103 Hashing 2014-T2 Lecture 32 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay.
An introduction to costs (continued), and Binary Search 2013-T2 Lecture 11 School of Engineering and Computer Science, Victoria University of Wellington.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
COMP 103 Hashing. 2 RECAP-TODAY RECAP Bitmaps are a fast way to implement Sets of integers, characters, etc TODAY  Hashing is a similar idea  Detecting.
Hashing Hashing is another method for sorting and searching data.
2014-T2 Lecture 19 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
Hashing as a Dictionary Implementation Chapter 19.
2013-T2 Lecture 18 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
CSC 427: Data Structures and Algorithm Analysis
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
More about costs: cost of “ensureCapacity”, cost of ArraySet, Binary Search 2014-T2 Lecture 12 School of Engineering and Computer Science, Victoria University.
COMP 103 Bitsets. 2 Sets, and more Sets!  Unsorted Array  Sorted ArrayO(n) for at least one of  Linked Listcontains, add, remove  Binary Search TreeO(log.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
COMP 103 Hashing (II), and exam tips 2014-T2 Lecture 33 Marcus Frean School of Engineering and Computer Science, Victoria University of Wellington  Marcus.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
2015-T2 Lecture 30 School of Engineering and Computer Science, Victoria University of Wellington  Lindsay Groves, Marcus Frean, Peter Andreae, and Thomas.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
2014-T2 Lecture 29 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae and Thomas.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
2014-T2 Lecture 27 School of Engineering and Computer Science, Victoria University of Wellington  Lindsay Groves, Marcus Frean, Peter Andreae, and Thomas.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
2015-T2 Lecture 19 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
2014-T2 Lecture 18 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae, and John.
2015-T2 Lecture 28 School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae and Thomas.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
COMP 103 Course Review. 2 Menu  A final word on hash collisions in Open Addressing / Probing  Course Summary  What we have covered  What you should.
Sets and Maps Chapter 9.
COMP 103 Hashing Marcus Frean 2015-T2 Lecture 31
Hashing CSE 2011 Winter July 2018.
COMP 103 Sorting with Binary Trees: Tree sort, Heap sort Alex Potanin
More complexity analysis & Binary Search
Efficiency add remove find unsorted array O(1) O(n) sorted array
Searching.
Sets and Maps Chapter 9.
Presentation transcript:

COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington  Marcus Frean, Lindsay Groves, Peter Andreae and Thomas Kuehne, VUW

2 RECAP-TODAY RECAP  Linked Structures, including trees, heaps achieved perfect O(log n) insert/find performance TODAY  Mind-blowingly fast sorting O(1) insert/find performance!

3 Linear Time Sorting Algorithm  Constant time per entry to sort private int HashSort(int[] numbers) { int[] present = new int[7]; for (int i = 0; i < numbers.length(); i ++ ) present[numbers[i]]++; }  Limitations  elements must be integers  element value range must be limited  frequency data structure may be sparsely populated numberspresent cf. BucketSort

4 Hashing  Fixing the limitations  convert element into an integer  use a hash function to assign an integer to an element  Potential  Set, Bag, Maps with constant time insert / find!  Challenges  how to compute the hash code?  how to deal with collisions?

5 O(1) Sets with big values? ✔ We need a way to compute an index for an object: add(“2001 – A Space Odyssey”) “Hashing”: compute the “hash code” of an object N ✔✗✔✔✗✗✗✗✗✗✗ ⋯⋯ ✗ Hash function 581 “ 2001 – A Space Odyssey ”

6 O(1) Sets with big values?  But there are too many possible film titles!  Suppose the hash function always produces a number between 0 and 1000 ⇒ some film titles must end up with the same number! ⇒ “Collision” N ✔✗✔✔✗✗✗✗✗✗✗ ⋯⋯ ✔✔ HASH “ Gravity ” “ 2001 – A Space Odyssey ” HASH

7 Detecting collisions  Store the item in the array, instead of a boolean  Questions 1. How to choose hash function that minimises collisions? 2. How to manage collisions when they occur? N ⋯⋯ “ Gravity ” “ 2001 – A Space Odyssey ” HASH

8 A HashSet private E[ ] data ; public boolean contains(E value) { int hash = Math.abs(value.hashCode() % data.length); if (data[hash] == null) return false; else if (data[hash].equals(value)) return true; else //Collision !!! } public boolean add(E value) { int hash = Math.abs(value.hashCode() % data.length); if (data[hash] == null) { data[hash] = value; size++; return true; } else if (data[hash].equals(value)) return false; else //Collision !!! } Cost is independent of number of items in Set Cost is determined by cost of hashCode() must be consistent: a.equals(b)  a.hashCode() == b.hashCode() every class defines this method every class defines this method

9 Computing Hash Codes Wish list Summary for HashCode Function  Should produce an integer  Should distribute the hash codes evenly through the range minimises collisions  Should be fast to compute  Should take account of all components of the object  Must be consistent with equals() two items that are equal must have the same hash value Can we avoid clashes altogether? That would be perfect!  perfect hash function

10 A Simple Hash Function for Strings  We could add up the codes of all the characters: private int hash(String value) { int hashCode = 0; for (int i = 0; i < value.length(); i++) hashCode += value.charAt(i); return hashCode; } Why is this not very good?

11 Example: Hashing course codes 418 ← DEAF ← DEAF102 DEAF201 ⋮ 429 ← BBSC201 MDIA ← ECHI410 MDIA102 MDIA ← ECHI303 JAPA111 JAPA201 MDIA202 MDIA220 MDIA ← ARCH101 ASIA101 BBSC231 BBSC303 BBSC321 CHEM201 ECHI403 ECHI412 JAPA112 JAPA211 JAPA301 MDIA203 MDIA302 MDIA320 ⋮ 450 ← ANTH412 ARCH389 ARTH111 BIOL228 BIOL327 BIOL372 CHEM489 COML304 COML403 COML421 COMP102 COMP201 CRIM313 CRIM421 DESN215 DESN233 ECON328 ECON409 ECON418 ECON508 EDUC449 EDUC458 EDUC548 EDUC557 ENGL228 ENGL408 ENGL426 ENGL435 ENGL444 ENGL453 FREN124 FREN331 FREN403 FREN412 GEOL362 GEOL407 GERM214 GERM403 GERM412 INFO213 INFO312 INFO402 ITAL206 ITAL215 LALS501 LATI404 LING224 LING323 LING404 MAOR102 MARK304 MARK403 MATH206 MATH314 MATH323 MATH431 MOFI403 PHIL104 PHIL203 PHIL302 PHIL320 PHIL401 PHIL410 RELI321 RELI411 SAMO101 ⋮ a lot of collisions!

12 Better Hash Functions  Make the contribution of each character depend on its position: private int hash(String course) { int k = 257; int hashCode = 0; for (int i = 0; i < course.length(); i ++ ) hashCode = hashCode * k + course.charAt(i); return hashCode; } hashCode(s) = k 6 x s 0 + k 5 x s 1 + k 4 x s 2 + k 3 x s 3 + k 2 x s 4 + k 1 x s 5 + s 6 (it is best to use a prime number for the constant k)

13 Perfect Hash Functions  Perfect hash function gives no collisions for a given data set  Example - for VUW courses private int hash(String course) { int hash = 0; for (int i = 0; i < course.length(); i++) hash = (hash * 51 + course.charAt(i)) % 72201; return hash; }  Building a perfect hash function is  very difficult  very specific to a particular set of possible values  only useful in very specialised circumstances

14 Dealing with Collisions  Two approaches  Use a collection at each place (“buckets” or “chaining”)  Look for an empty place in the hashtable (“probing” or “open addressing”) N ⋯⋯ “ 2001 – A Space Odyssey ” HASH “ Gravity ” HASH

15 Collisions: chaining / buckets  Store a Set in each cell: hash value → which set  Performance?  if the array is of size k, each subset will be about 1/k th of size()  cost ≈ cost(hashCode) + cost (subset) ant fox hen dog bee kea cow elk owl pig sow tui ape bat bug cat eel gnu jay nit ray yak cod roe This is what Java's HashMap does. If the sets get too big  Rehash: double array size and reassign elements This is what Java's HashMap does. If the sets get too big  Rehash: double array size and reassign elements

16 Java and hashCode  All objects have a hashCode method and an equals method, so:  you can call equals on any object  and you can put any object into a HashSet, HashMap, …  Many predefined objects (eg String) have good equals and hashCode methods defined  The default equals method:  compares references, i.e., equals is ==  if this is not what you want, define your own equals method  The default hashCode  returns an integer based on the reference (pointer value)  If you redefine equals, you should redefine hashCode too!