Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Hashing Techniques.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Sets and Maps ITEC200 – Week Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hash Tables1   © 2010 Goodrich, Tamassia.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Comp 335 File Structures Hashing.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Copyright © Curt Hill Hashing A quick lookup strategy.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Appendix I Hashing.
Sets and Maps Chapter 9.
Hashing, Hash Function, Collision & Deletion
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
Slides by Steve Armstrong LeTourneau University Longview, TX
Hash Table.
Hash Table.
Hash Tables.
Chapter 10 Hashing.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
What we learn with pleasure we never forget. Alfred Mercier
Collision Resolution.
Presentation transcript:

Appendix I Hashing

Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 2

Hashing In previous collections, the location of an element in a collection was either: – determined by the order in which they were added (examples) – determined by comparing some key related to the element (examples) In hashing elements are stored in the collection at a location determined by applying a hash function to the value to be stored. – That is, the elements are stored in a hash table, with their location determined by a hashing function – Each location is called a cell or a bucket. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 3

Idealistically.. In an ideal world each value would be hashed to a unique address in a 1-to-1 fashion. If this were the case, then the time to access/store data in a hash table would be O(1) Factors to prevent this: – Less than perfect hash function – Limitations on the size of the address space Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 4

Example Consider an example where we create an array that will hold 26 elements To store names, we create a simple hashing function that associates the first letter of each name to a separate cell The first letter of the string determines into which cell the name is stored. Because the access time to a particular element is independent of the number of elements stored all operations would be O(1). But this requires that each element map to a unique position. – If this is achieved, we have what is called a perfect hashing function. Using our example, under what circumstance would this be perfect hashing function? Is this realistic? Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 5

Less than Perfect A collision occurs when two or more elements map to the same location – For our example, when two names that begin with the same letter we have a collision. Collisions will have to be resolved somehow. – There are several techniques for storing multiple elements that map to the same bucket which we look at later. Even if a hashing function isn't perfect, a good hashing function can still result in O(1) operations. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 6

Hash Table Size How large should the table be? If we have a dataset of size n and a perfect hashing function, we'd need a table of size n. Without a perfect hashing function, a good guideline is to make the table 150% of the dataset size. But what if we do not know the size of the dataset? – We then rely on dynamic resizing – creating a larger hash table when the demand for space occurs. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 7

Dynamic Resizing Deciding when to resize is key. One possibility: when the table is full – But performance of a hash table seriously degrades as it becomes full. A better approach is to use a load factor – a percentage of occupancy at which the table will be resized. – For example, if the load factor is.5, then the table would be resized when 50% of the table is filled. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 8

Hashing Functions There are many good approaches to hashing functions – The method used in the name example is extraction – part of an element's key value is used to compute the location. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21 - 9

Hashing Function Examples Extraction – Using only a part of the element’s value or key to compute the location at which to store the element. Example on page 1007 Extract the first character of the value and calculate it’s offset from the letter ‘A’ to determine its location. – ‘A’ maps to 0; ‘B’ maps to 1, etc. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Hashing Function Examples Division – In this approach, the index is calculated as the remainder of the key divided by some positive integer p. – For a positive integer p, the result will be in the range 0 to p-1. Hashcode(key) = Math.abs(key) % p – Since this yields 0 to p-1 location indices, we use the table size as p. – Example: Apply the hash function to a Key Value = 79 with a table size of 43: – Hash Table Index = Math.abs(79) % 43 = 36 – Good idea: Using a prime number p as the table size, i.e. the divisor, can provide a better distribution of keys across the address space 0 to p-1. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Hashing Function Examples Folding – In this approach, the index is created by dividing the key into parts and then combining or folding the parts together. – In general, the parts have the same length as the desired index except for perhaps the last part. If the first folding does not result in an index within the desired range, a use either extraction or division to yield a smaller index. – Shift folding The parts are added together to create the index – Key = – Hash Table Index = => 1962 – Boundary folding A slight variation of shift folding where some of the parts of the key are reversed before adding – Key = – – Hash Table Index = => 1764 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Hashing Function Examples Mid-Square Method In the mid-square method, the index is calculated by multiplying the key by itself and then using the extraction method (from the middle) For example, for key of 4321: Product = 4321 * 4321 = Extract three digits from the middle: 710 It is important that the same three digits be extracted each time Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

More Functions Review the rest of these functions. In class, we go to slide 18 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Hashing Function Examples Radix Transformation method – Transform the key into another numeric base – If our key is 23 in base 10, we might convert it to 32 in base 7 – Then we use the division method and divide the converted key by the table size and use the remainder as the index Example: Hashcode(23) – Convert the key 23, which is in base 10 to base 7 » 23 base 10 is 32 in base 7 – Use division method to convert to index » Hash Table Index = Math.abs(32) % 17 = 15 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Hashing Functions In the digit analysis method, the index is formed by extracting and then manipulating specific digits from the key If the key is , we might select the digits in positions 2 through 4 yielding 234 The manipulation could then take many forms: – reversing the digits (432) – performing a circular shift (423) – swapping each pair of digits (324) Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Hashing Functions In the length-dependent method, the key and the length of the key are combined in some way to form either the index itself or an intermediate version If our key is 8765, we might multiply the first two digits by the length and then divide by the last digit, yielding 69 If our table size is 43, we would then use the division method to yield an index of 26 Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Java Hashing Functions Java.lang.Object hashcode method – Returns an integer based on the memory location of the object This is generally not useful, but ensures that all objects have a hashcode method A class may override the inherited version of hashcode to provide their own The String and Integer classes define their own hashcode methods Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Resolving Collisions As mentioned, without a perfect hashing function, collisions must be resolved There are several techniques for this as well Chaining – Treat the table as an array of linked lists Open Addressing – linear probing – quadratic probing – double hashing Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Chaining with Links or Overflow Area Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Open Addressing The open addressing method looks for another unused position in the table – The simplest approach is linear probing – if an element hashes to position p and that position is occupied, try position (p+1)%s where s is the size of the table One problem with linear probing is the development of clusters of occupied cells There are other approaches to open addressing – quadratic probing – double hashing Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Deleting Elements Removing an element from a chained implementation falls into one of five cases – if the element is uniquely mapped, simply remove it – if the element is stored in the table but has an index into an overflow area, replace the element and the next index value in the table with the element and next index value of the array position pointed to by the element to be removed – if the element is at the end of the list of elements, set its position to null, set the next pointer of the previous element to null, and add that position to the overflow Two more cases for chaining: – if the element is in the middle of the list, set its position in the overflow to null, and reset the pointer of the previous element to skip it – if the element is not in the table, throw an exception When using open addressing, deletion creates more of a challenge Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

Java Collections Hash Tables The Java Collections API provides seven implementations of hashing Three of these are: – Hashtable – Key-Value Pairs, the oldest class, synchronized. – HashMap- Key-Value Pairs, unsynchronized, permits null values – HashSet –Values only which are unique, unsynchronized, permits null values – Note: The chaining method is used to resolve collisions. Java Foundations, 3rd Edition, Lewis/DePasquale/Chase

The Hashtable Class

The Hashtable Class (cont)

20.5 – The HashSet Class

20.5 – The HashMap Class