CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hashing.
CS 206 Introduction to Computer Science II 04 / 10 / 2009 Instructor: Michael Eckmann.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing CS 3358 Data Structures.
Maps, Dictionaries, Hashtables
CSE 250: Data Structures Week 12 March 31 – April 4, 2008.
CS 206 Introduction to Computer Science II 11 / 19 / 2008 Instructor: Michael Eckmann.
Hash Tables and Associative Containers CS-212 Dick Steflik.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
CS 206 Introduction to Computer Science II 11 / 16 / 2009 Instructor: Michael Eckmann.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Fundamental Structures of Computer Science II
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
What we learn with pleasure we never forget. Alfred Mercier
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS Spring 2009 Today’s Topics Questions? Comments? Hash tables –Linear probing –Quadratic probing –Double hashing –Separate chaining

Hashing is used to allow –inserting an item –removing an item –searching for an item all in constant time (in the average case). Hashing does not provide efficient sorting nor efficient finding of the minimum or maximum item etc. Hashes

We want to insert our items (of any type (String, int, double, etc.)) into a structure that allows fast retrieval. Terms: –Hash Table (an array of references to objects(items))‏ table_size is the number of places to store –Hash Function (calculates a hash value (an integer) based on some key data about the item we are adding to the hash table.)‏ –Hash Value (the value returned by the hash function)‏ the hash value must be an integer value within [0, table_size – 1] this gives us the index in the hash table, where we wish to store the item. Hashes

In our example (assume n=8, add items 24, 3, 17, 31), what if we needed to insert item 11 into our hash? There'd be a collision. There are several strategies to handle collisions –Assume the hash value computed was H –the chosen strategy effects how retrieval is handled too –Open Addressing (aka Probing hash table) Place item in next open slot (linear probing)‏ –H+1, or H+2 or H+3... Place item in next open slot –H+1 2, or H+2 2, or H+3 2, or H+4 2,... –Wraparound is allowed Hashes

There are several strategies to handle collisions –Another technique besides Open Addressing, is Separate chaining Each array element stores a linked list Examples of these techniques on the board. Hashes

Typically we won't know our keys ahead of time (before we create our hash)‏ But if we did know all our keys ahead of time, would that help us in any way? Hashes

Let's come up with a hash table to store Strings –we'll need to come up with the size of our table –we'll need to decide whether we will use separate chaining or open addressing hashing –We'll need to create a hash function. We'll also allow insertion and retrieval (determine if an item exists in the hash). Hashes

Strategies for best performance –want items to be distributed evenly (uniformly) throughout the hash table and we want few (or no) collisions so that depends on our data items, our choice of hash function and our size of the hash table –also need to decide whether to use a probing (linear or quadratic) hash table or to use a hash table where collisions are handled by adding the item to a list for the index (hash value)‏ another method is called double hashing. –if choices are done well we get the retrieval time to be a constant, but the worst case is O(n)‏ –we also need to consider the computations needed to insert (computing the hash value)‏--- while this is a constant, a poor computation choice could make it a large constant (which we also wish to avoid)‏ Hashes

Clustering –Why is it bad? –How to avoid it? Quadratic probing (avoids it to some degree)‏ Double Hashing can avoid it Hashes

Is it possible to never find a spot in the table to place an item which uses linear probing? When? With a poor choice of table size, and quadratic probing, it is possible to end up never finding a spot in the table to place an item. For quadratic probing, if the table size is prime, then a new element can always be inserted if the table is at least half empty. Hashes

Double hashing is another open address hashing method. It uses just an array of the items to represent the hash table. - Create a hash table as an array of items - Create a hash function hash1 that will return a hash value that we will call hv1 to place the item in the table. - If a collision occurs, call a second hash function hash2 which returns an int that we will call hv2. Use this hv2 as the amount of indices to hop to find another place to put the item. Hashes

Example: if an item initially hashes to value hv1=5 and this causes a collision, then using the same item we compute hv2 to be 7 using the second hash function. We would then check if there's an open slot in 5+7=12. If it's open, place the item there. If that's not open, look in 12+7=19, and if that's not open continue checking 7 slots away until we find an open slot. Wrap around when necessary (using mod the size of table). Remember the goals of hashing: - we want the data to be distributed well throughout the hash table - we want few collisions in the average case and the worst case Hashes

Another thing that must be taken care of is a situation such as this. Suppose we have a hash table of size 20 and all of the even numbered slots are filled (that is, index 0, 2,... 18) and we are trying to insert an item with hv1 of 4. This is a collision. So if we compute hv2 to be say 6, Hashes

( 4+6)%20=10 is filled, (10+6)%20=16 is filled, (16+6)%20= 2 is filled, ( 2+6)%20= 8 is filled, ( 8+6)%20=14 is filled, (14+6)%20= 0 is filled, ( 0+6)%20= 6 is filled, ( 6+6)%20=12 is filled, (12+6)%20=18 is filled, (18+6)%20= 4 is filled, ( 4+6)%20=10 (we're back where we started...)‏ Hashes

and this would go on forever, because we came back to the starting index without examining all the slots. Actually in this case we examined only the filled slots, and ignored all the empty slots!! Hashes

Double hashing with a table size that is relatively prime with respect to the value (hv2) returned by hash2 will guarantee that if there's an empty slot, we will eventually find it. Numbers that are relatively prime are those which have no common divisors besides 1. You could have a table_size which is an integer p, such that p and p-2 are prime (twin primes). Then the procedure is to compute hv1 to be some int within the range 0 to table_size-1, inclusive. Then compute hv2 to be some int within the range 1 to table_size-3. This will guarantee that if there's an empty slot, we will find it. This method was devised by Donald Knuth. Hashes