1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
1 Data Structures CSCI 132, Spring 2014 Lecture 37 Binary Search Trees II.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Previous Lecture Revision Previous Lecture Revision Hashing Searching : –The Main purpose of computer is to store & retrieve –Locating for a record is.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Chapter 9 Chapter 9 TABLES AND INFORMATION RETRIEVAL.
Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Final Review Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Chapter 9 Tables and Information Retrieval. Tables Introduction In chapter 7 we showed that –By use of key comparisons alone, it is impossible to complete.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.
Collision resolution. Hash Tables consider 14 words : zanyzest zing zoom zealzeta zion zulu zebuzeus zone zerozinc zonk.
Storage and Retrieval Structures by Ron Peterson.
1 Data Structures CSCI 132, Spring 2014 Lecture 36 Binary Search Trees.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
1 Data Structures CSCI 132, Spring 2014 Lecture 20 Linked Lists.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Fundamental Structures of Computer Science II
Hashing Alexandra Stefan.
Hash Tables (Chapter 13) Part 2.
Hashing Alexandra Stefan.
Search by Hashing.
Advanced Associative Structures
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Tree traversal preorder, postorder: applies to any kind of tree
What we learn with pleasure we never forget. Alfred Mercier
CS 144 Advanced C++ Programming April 23 Class Meeting
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

2 Recall Hash Tables A hash table Hash tables use an index function that maps many possible keys to a single location. If the table is sparse, then most of the time only 1 key will go to each location. If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution).

3 The C++ Hash Table Specification const int hash_size = 997; // a prime number of appropriate size class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size]; };

4 Implementation of insert( ) Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( ); probe = hash(new_entry);//Find location to insert new_entry probe_count = 0; increment = 1;

5 insert( ) continued while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result; }

6 Likelihood of collisions How many people have to be in a room before the probability that two of them have the same birthday reaches 50%? P = (1 - (364/365)*(363/365)*(362/365)*...*(365-m+1)/365 > 0.5 when m >= 23 The calculation for a probability of a collision in a table is similar. The table does not have to be very full for the probability of a collision to reach at least 50%. Therefore: Collisions happen! We must handle them efficiently.

7 Counting Probes We can analyze the running time of hash tables by counting comparisons. Comparisons take place when "probing" an entry: Looking at an entry and comparing its key to a target. The number of probes done depends on how full the table is. n = number of entries in the table t = number of total positions in table (= hash_size) = n/t = Load Factor  = 0 means no entries in table  = 0.5 means the table is 1/2 full  <= 1 for contiguous table without chaining (open addressing)  can be greater than 1 if using chaining

8 Number of comparisons for chaining Unsuccessful searches: If entries distributed evenly over the table, then the expected number of entries in each chain is: n/t =. For an unsuccessful search, we must do one probe for each entry in the list, so the average number of probes (or comparisons) is. Successful searches: Average number of comparisons for sequential search of a list with k items is: (k + 1)/2 The node we are looking for is in our list, the other n-1 nodes are distributed evenly over the table so the average number of nodes will be: k = (n-1)/t + 1 ~ n/t + 1 = + 1. Average number of comparisons will be ( )/2 =  /2 + 1

9 Open addressing (without chaining) Evenly distributed entries, Random probing: Number of Comparisons (approx) Successful case: (1/ )ln(1/(1- )) Unsuccessful case:1/(1 - ) Linear Probing: Successful case:0.5(1 + 1/(1- ) ) Unsuccessful case:0.5(1 + 1/(1- ) 2 )

Theoretical and empirical results

11 Hash Tables vs. Other Methods Speed of retrieval from a hash table does not depend on the total number of entries, but on the ratio of entries/table-size ( ). A table of size 40 with 20 entries has the same performance as a table of size 4000 with 2000 entries. Sequential Search:  (n) Binary Search:  ( lg(n)) Hash Table retrieval: O (1) for small. Read section 9.8 on choosing a method for storage and retrieval of data.

12 Radix sort Radix sort creates a table of queues. Each queue corresponds to a letter of the alphabet. Sort from least significant letter to most significant letter.

13 Implementation of Radix Sort const int key_size = 5; const int max_chars = 28; template void Sortable_list :: radix_sort( ) { Record data; Queue queues[max_chars]; for (int position = key_size - 1; position >= 0; position--) { // Loop from the least to the most significant position. while (remove(0, data) == success) { int queue_number = alphabetic_order(data.key_letter(position)); queues[queue_number].append(data); // Queue operation. } rethread(queues); // Reassemble the list. } }