Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Hashing as a Dictionary Implementation
Hashing: Collision Resolution Schemes
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Log Files. O(n) Data Structure Exercises 16.1.
Implementation of Linear Probing (continued) Helping method for locating index: private int findIndex(long key) // return -1 if the item with key 'key'
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Comp 335 File Structures Hashing.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Fundamental Structures of Computer Science II
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Hashing CSE 2011 Winter July 2018.
Design and Analysis of Algorithms
Hash Table.
Chapter 28 Hashing.
Hash Tables.
Chapter 21 Hashing: Implementing Dictionaries and Sets
CS202 - Fundamental Structures of Computer Science II
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item). Ex. Student records stored in an array where each student is assigned an id no. and that number is used for the index. Are there any problems with this idea? Gaps will develop if students leave and insertions of new students are limited by the original size of array. Knowing the student id no. is not convenient. Using the index itself as the key field is not efficient.

2. Def. Hash Function - a function used to convert numbers from a large range into numbers in a small range. (The key field is usually the large range and the index of the array is usually the small range.) Ex. Dictionary of 50,000 words. Use the word itself as the key field, but code it numerically to determine a unique location to store the word in the array. Let a = 1, b = 2, c = 3, …z = 26 and let positions of letters in the word have power of ten values: Ex. dab = 4 * * * 10 0 = 412 What size array would be needed to store these 50,000 words, if no word is longer than 10 characters?

zzzzzzzzzz would have the code 28,888,888,890! (too big - bigger than largest int - no array could be that big) Also, if locations were chosen this way, there would be many many empty cells. What size array should be needed for this dictionary? 100,000 - usually twice as large as the no. of items to allow room for collisions (def. obvious but coming up) A hash function is needed to convert the numeric code to a smaller range.

Commonly used hash function: index = largerange % arraysize Ex. Hash the word gave to find its location in the array dictionary. 7* * * *10 0 = 7325 Ex. Hash the word gaty to find its location in the array dictionary. 7* * * *10 0 = 7325 COLLISION!

4. There are 2 methods to resolve collisions: 3. Def. Collision - hashvalue of occupied cell occurs. Def. Open addressing - in case of collision, search for or store in some other available cell. Def. Separate chaining - install a linked list at each index of the array and insert all items that hash to an index into the list.

5. Types of open addressing: Linear probe method - if collision occurs at index x, search locations x+1, x+2, etc. Ex. Gaty would be stored in location 7326 (if available) otherwise location 7327, or 7328, etc. Note: resolves collisions but primary clusters occur. Quadratic probe method - search x+1, x+2 2, x+2 3 etc. Note: resolves primary clusters, but secondary clusters occur.

Rehashing ( also called double hashing) - when collision occurs determine step to search for available cell by hashing the key value again by a new function. Ex. Step = 5 - key % 5 What steps result?5,4,3,2,1 How is this different from the linear & quadratic probe methods? The step is different for different keys. Note: table size must be prime in order to probe all cells. (ex. size=20, step=5, x=0: 0,5,10,15,0,5, 10,15,… try size=19, step=5, x=0: 0,5,10,15,1,6,11,16,2,7,12,17,3,8,13,18,4,9,14

Write code to increase a hash value by step. Hashval += step What do we do if a hash value becomes greater than the size of the array? Wrap around: hashval %= arraysize What do we do about duplicate key values? Should not be allowed. When first item with key is found, search stops. Second item with same key would never be found (unless code is change. Select key value that is unique to the item. (ex. Social security no.)

How do we handle deletions? Replace one field by -1 rather than replace entire object by null. Often object info may be needed in the future. Ex. Even when employee leaves, pension & tax info is needed. However, there is another reason in this code. Something undesirable occurs if the object is replaced by null. Demonstrate what and explain why. What method requires this condition and why? While (hashRay[hashVal] != null && hashRay[hashVal].iData != -1)

6. Def. Load factor - the ratio of the no. of items in a hash table to the size of the table (array). The more full a table is the worse clustering becomes. Therefore, hash tables should be designed to never become more than 1/2 to 2/3 full when open addressing is used. 7. When separate chaining is used to avoid collisions, is load factor a concern? No. n items or more can be placed in a table of size n and the load factor will be 1 or more.(i.e.some locations will hold 1 or more items in its linked list.)

How do we handle duplicates with separate chaining? Duplicates are allowed and will be stored in the same list. Note: search process slows as list is searched linearly. How do we handle deletions? Deletions can be made from a linked list, if appropriate for the application, without empty cell problems resulting.

7. What is the advantage of a hash table? O(1) complexity to search for or insert an item (i.e. constant time regardless of the number of items). 8. Disadvantage? Must know size of array needed in advance (in Java arrays can not be resized - another bigger array would be needed). This problem is reduced when separate chaining is used. Also, there is no way to access items in order.