Introduction to Perfect Hashing Schemes

Slides:



Advertisements
Similar presentations
Hash Tables Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis.
Advertisements

HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Problem Solving 5 Using Java API for Searching and Sorting Applications ICS-201 Introduction to Computing II Semester 071.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Log Files. O(n) Data Structure Exercises 16.1.
Maps, Dictionaries, Hashtables
Dictionaries and Hash Tables1  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
COSC 2007 Data Structures II
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Data Structures and Algorithms Lecture 1 Instructor: Quratulain Date: 1 st Sep, 2009.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing as a Dictionary Implementation Chapter 19.
HASH TABLES -Paritosh Gupta. Problem. Required Search for The Precious One way would be to map all the data. And get key-value pairs. This means providing.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing is a method to store data in an array so that sorting, searching, inserting and deleting data is fast. For this every record needs unique key.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Copyright © Curt Hill Hashing A quick lookup strategy.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
1 Chapter 9 Searching And Table. 2 OBJECTIVE Introduces: Basic searching concept Type of searching Hash function Collision problems.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 i206: Lecture 12: Hash Tables (Dictionaries); Intro to Recursion Marti Hearst Spring 2012.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Hashing (part 2) CSE 2011 Winter March 2018.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
COMP 53 – Week Eleven Hashtables.
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Introduction to Hashing & Hashing Techniques
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advance Database System
Introduction to Hashing & Hashing Techniques
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Dictionaries and Hash Tables
Presentation transcript:

Introduction to Perfect Hashing Schemes Perfect Hash Functions Perfect Hashing: An Example using Cichelli’s Method Applications of Hashing

Perfect Hash Functions The hash tables we have seen so far allow the dynamic insertion and removal of items. Possibility of collisions cannot be ruled out in such schemes. Can we rule out the possibility of collisions if we know more about the items to be loaded? A perfect hash function is a one-to-one mapping that guarantees absence of collisions. A perfect hash function that wastes no table space is said to be minimal perfect.

A Perfect Hash Function for Strings R. J. Cichelli gave an algorithm for finding perfect hash functions for strings. He proposes the hash function: h(s)=size+g(s.charAt(0))+g(s.charAt(size-1))%n where size = s.length(). The function g is to be constructed so that h(s) is unique for each string s. For this to be a perfect hash function, the proper mapping of letters to integers is needed.

Perfect Hashing: Outline of Cichelli's Algorithm Given a fixed collection of words, the Cichelli's algorithm proceeds thus: 1. Find the frequency of the first and the last letter of each word; 2.Then find the sum of the frequencies of the first and the last letter of each word; 3. Sort the words in descending order of frequency; 4. Go to the next word (select the next word from step 3); 5. Choose g-values for any unassigned first/last letters for the current word. If a conflict occurs, backtrack and choose again. 6. If there are more words to process, go to Step 4.

Example 1: Illustrating Perfect Hashing Use Cichelli's algorithm to build a minimal perfect hash function for the following nine strings: DO DOWNTO ELSE END IF IN TYPE VAR WITH

Example 1: Solution For Step 1 in the algorithm, we find the frequencies of the first and last letter of each word to find: D O E I F N T V R W H 3 2 4 2 1 1 1 1 1 1 1 Next we find the sum of the first and last letter of each word: DO=5(D+0=3+2), DOWNTO=5, ELSE = 8, END=7, IF=3, IN=3, TYPE=5, VAR=2,WITH=2 Sorting the keywords in decreasing frequency yields: ELSE END DOWNTO DO TYPE IN IF VAR WITH We are now at step 5 of the algorithm, the heart of the algorithm. We try the words in frequency order:

Example 1: Cichelli's Method (cont'd) s = ELSE g(E)=0 h(s) = s.length()+g(E)+g(E)=4 {4} s = END g(D) = 0 h(s) = s.length()+g(E)+g(D)=3 {34} s = DOWNTO g(O) = 0 h(s)= 6 {346} s = DO h(s) = s.length()+g(D)+g(O) = 2 {2346} s = TYPE g(T) = 0 h(s)= 4* {2346} s = TYPE g(T)=1 h(s) = s.length()+g(T)+g(E) =5 {23456}

Example 1: Cichelli's Method (cont'd) s=IN g(I)=0,g(N)=0 h(s)= s.length()+g(I)+g(N)=2*{23456} s=IN g(I)=1,g(N)=0 h(s)=s.length()+g(I)+g(N)=3* {23456} s=IN g(I)=2,g(N)=0 h(s)=s.length()+g(I)+g(N)=4* {23456} s=IN g(I)=3,g(N)=0 h(s)=s.length()+g(I)+g(N)=5* {23456} s=IN g(I)=3,g(N)=1 h(s)=s.length()+g(I)+g(N)=6* {23456} s=IN g(I)=3,g(N)=2 h(s)=s.length()+g(I)+g(N)=7 {234567}

Example 1: Cichelli's Method (cont'd) s=IF g(F)=0 h(s)=s.length()+g(I)+g(F)=5* {234567} s=IF g(F)=1 h(s)=s.length()+g(I)+g(F)=6* {234567} s=IF g(F)=2 hash(s)=s.length()+g(I)+g(F)=7* {234567} s=IF g(F)=3 h(s)=s.length()+g(I)+g(F)=8 {2345678} The steps for VAR and WITH are left an an exercise. You should get V=R=W=H=3, h(VAR)=0 and h(WITH)=1.

Example 1: Cichelli's Algorithm (cont'd) With the g-values E = D = O = 0,T = 1,N = 2,I = F = V = R = W = H = 3, h is minimal perfect. Based on these g-values the strings will be stored as shown below: 0 1 2 3 4 5 6 7 8 VAR WITH DO END ELSE DO DOWNTO IN IF The hash table above is fully occupied with empty slots. Note that if there are empty slots or there is a collision, then the g-value assignments are in error.

Cichelli's Algorithm: Comments The search process in this algorithm is exponential. The algorithm is applicable to small sets of strings. It does not guarantee that a perfect hash function can be found. Program usually run only once and result incorporated into another program. There are extensions to this technique that avoid its limitations. For our purpose in this course, the Cichelli's algorithm is sufficient.

Hashing: A Birthday Surprise! Collisions occur more frequently than people normally think! According to the famous Birthday Surprise 'paradox', if there are 24 or more people in a room, there is >50% chance that two or more will have the same birthday. In other words if records of 24 people are to be loaded into a hash table of size 365, there >50% chance of a collision. Moreover, when up to 47 records are loaded, the chances are better than 19 out of 20 chances of collisions. This justifies efforts in search for minimal perfect hash functions!

Applications of Hashing There are many areas where hashing is applicable. Here are common ones: Databases: Efficient retrieval of records. Compilers: Symbol tables. Games: Lookup board configuration to find the move that goes with it. UNIX shell: Quick command lookup. IP Routing: Fast IP address lookup.

Exercises 1. In our examples using Cichelli's mehod, we selected g-values from {0,1,2,3} . Explain how the choice of g-values from a bigger set affects the efficiency of the algorithm as compared to its chances of finding a minimal perfect hash function. 2. tab Use Cichelli's method to build a minimal perfect hash function for the following 11 Java keywords: class extends implements synchronized throws import protected instanceOf return abstract this Assume that g-values must be integers in the set {0,1,2,3} only. 3. Let A = {a,b,c,d,...,z} be a set of lower-case letters and s = c1c2c3…cn an arbitratry string with characters from A. Then, c1An-1 + c2An-2 + c3An-3 + ... + cnA0 is distinct for each s. This is an ideal hash function for all strings of lower-case letters. Why is it not usable in practice?