CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Hash Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Hashing.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Maps, Dictionaries, Hashtables
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Chapter 16.  Data structures that take control of organizing elements  Elements not in fixed positions  Advantage – better performance Adding Removing.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables and Hash Maps. DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Data Structures Using C++ 2E
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Efficiency add remove find unsorted array O(1) O(n) sorted array
Advanced Associative Structures
Data Structures and Algorithms
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Presentation transcript:

CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick

look up tables a look up table is not really a search method but a method to avoid having to conduct searches. The key either is or can be converted to an integer value which is used as the index. The mapping of all valid keys to the computed indices must be unambiguous. lookup tables ~ the key is used as the index and we can access it directly. May waste much storage space. Running time is O(1).

look up tables (2) consider a 5 digit zip code. 100,000 possible combinations. Not all are used by the postal service but a lookup table to ensure that the zip is valid (has a locale) would require the table to be sized to hold all 100,000 possible combinations.

hashing hashing is a method of devising a lookup table that tries not to waste as much space. the purpose of the hash is to map all possible key combinations into a smaller range of values and to cover that range with a rather uniform distribution.

hashing (2) a hash table uses a hash function to convert the indices. hashing has a side effect. It loses the one to one correspondence of the key to table. It becomes possible to have two or more keys mapped to the same location in the table. these multiple mappings are called collisions.

hashing (3) a hash table requires that the original key also be stored with the record. Then when we retrieve a record from the hash table we can verify that it is indeed the proper record. Hash table design relies on successfully handling 2 elements: choosing a good hash function and good collision handing

hash functions properties of a good hash function –easy to calculate –map all possible key values into a reasonable range –cover the range uniformly and minimize collisions if the key is a string, maybe try some ascii conversions and combine parts of it using arithmetic to come up with integers from 0 to table size-1. too simplistic a method, like truncating the key or using modulo division on it, may create some clusters of collisions if the incoming data is not uniformly distributed.

hash functions (2) 2 primary ways to resolve collisions chaining and probing. chaining~ each entry in the hash table is designed as a structure that can hold more than one element. Each element of the table is referred to as a bucket. A bucket can be implemented as a linked list, a sorted array, or even as a binary search tree. Chaining works well on densely populated hash tables.

hash functions (3) probing~ storing the colliding element in a different slot in the same hash table. If the indexed position is already in use, we use a probing function to find a vacant slot. The simplest probing function is to increment the index by 1. This is known as linear probing. Probing should be used only on relatively sparsely populated tables so that the probing sequences can be kept fairly short.

Hash Tables Hashing can be used to find elements in a data structure quickly without making a linear search A hash table can be used to implement sets and maps A hash function computes an integer value (called the hash code) from an object

Hash Tables A hash table can be implemented as an array of buckets Buckets are sequences of nodes that hold elements with the same hash code If there are few collisions, then adding, locating, and removing hash table elements takes constant time –Big-Oh notation: O(1)

Hash Tables A good hash function minimizes collisions–identical hash codes for different objects To compute the hash code of object x : int h = x.hashCode();

Hash Tables For this algorithm to be effective, the bucket sizes must be small The table size should be a prime number larger than the expected number of elements –An excess capacity of 30% is typically recommended

Computing Hash Codes A hash function computes an integer hash code from an object Choose a hash function so that different objects are likely to have different hash codes.

Computing Hash Codes Bad choice for hash function for a string Adding the unicode values of the characters in the string int h = 0; for (int i = 0; i < s.length(); i++) h = h + s.charAt(i); –Because permutations ("eat" and "tea") would have the same hash code

Computing Hash Codes For example, the hash code of "eat" is The hash code of "tea" is quite different,

Sample Strings and Their Hash Codes StringHash Code "Adam" "Eve" "Harry" "Jim" "Joe" "Juliet" "Katherine" "Sue"

Simplistic Implementation of a Hash Table To implement –Generate hash codes for objects –Make an array –Insert each object at the location of its hash code To test if an object is contained in the set –Compute its hash code –Check if the array position with that hash code is already occupied

Simplistic Implementation of a Hash Table

Problems with Simplistic Implementation It is not possible to allocate an array that is large enough to hold all possible integer index positions It is possible for two different objects to have the same hash code

Solutions Pick a reasonable array size and reduce the hash codes to fall inside the array int h = x.hashCode(); if (h < 0) h = -h; h = h % size; When elements have the same hash code: –Use a node sequence to store multiple objects in the same array position –These node sequences are called buckets

Hash Table with Buckets to Store Elements with Same Hash Code

Algorithm for Finding an Object x in a Hash Table Get the index h into the hash table –Compute the hash code –Reduce it modulo the table size Iterate through the elements of the bucket at position h –For each element of the bucket, check whether it is equal to x If a match is found among the elements of that bucket, then x is in the set –Otherwise, x is not in the set

Hash Tables Adding an element: simple extension of the algorithm for finding an object –Compute the hash code to locate the bucket in which the element should be inserted –Try finding the object in that bucket –If it is already present, do nothing; otherwise, insert it

Hash Tables Removing an element is equally simple –Compute the hash code to locate the bucket in which the element should be inserted –Try finding the object in that bucket –If it is present, remove it; otherwise, do nothing If there are few collisions, adding or removing takes O(1) time

Creating Hash Codes for your Classes Use a prime number as the HASH_MULTIPLIER Compute the hash codes of each instance field For an integer instance field just use the field value Combine the hash codes int h = HASH_MULTIPLIER * h1 +h2; h = HASH_MULTIPLIER * h + h3; h = HASH_MULTIPLIER *h + h4;... return h;

Creating Hash Codes for your Classes Your hashCode method must be compatible with the equals method –if x.equals(y) then – x.hashCode() == y.hashCode()

Creating Hash Codes for your Classes You get into trouble if your class defines an equals method but not a hashCode method –If we forget to define hashCode method for class, it inherits the method from Object superclass –That method computes a hash code from the memory location of the object

Hash Maps In a hash map, only the keys are hashed The keys need compatible hashCode and equals method

Priority Queues A priority queue collects elements, each of which has a priority Example: collection of work requests, some of which may be more urgent than others When removing an element, element with highest priority is retrieved –Customary to give low values to high priorities, with priority 1 denoting the highest priority