Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There is no convenient way to traverse a hash table. –At least double the memory is required. –If the hash table becomes too full (load factor > 50%), the insert(), remove(), and find() operations degrade to O(N). –Careful design must be given to the hash key.
Design of Hash Keys A Hash Table is a collection of elements that performs lookups using an appropriately selected hash function Definition of a Hash Function –A function that when applied to a key value, computes a hash key used as an index to locate the data element Design Issue: How do we Choose Hash Functions? –Goal: The hash function must compute values that are random and span the entire hash table. –Goal: The hash function must be quickly calculated
Additional Design Considerations What if the hash function produces (collisions) the same index for different keys? –Open Addressing (h1(key)+ h2(key,tries))%tableSize Examples: linear probing, secondary probing, quadratic probing, double hashing –Separate Chaining How big should the hash table be? –Answer: At least twice as big as the number of elements the table is to stored. –Answer: A prime length
Collision Resolution Open Addressing Linear Probing (h2(key,tries) = tries) –Characteristics: Primary Clustering, deletions difficult Secondary Probing (h2(key,tries) = constant*tries) –Characteristics: Primary Clustering, deletions difficult Quadratic Probing (h2(key,tries) = tries^2) –Characteristics: Secondary clustering (same collision resolution pattern for all keys) – incomplete use of the hash table, deletions are difficult Double Hashing (h2(key,tries) = second hash function*tries) –Characteristics: Eliminates clustering, deletions are difficult Clustering: Tendency for sections of the table to fill up, with increasing probability that keys to insert hit these areas
Separate Chaining Compute hash key If Collision occurs then Insert key in the front of chain (linked list) Advantages –Hash table grows as needed –Performance is less sensitive to full hash table –Deletion is easy –No clustering
Performance There are charts in the text describing performance of –Linear, Quadratic, Double hash probing –Open Addressing versus Separate Chaining If F = load factor (percentage full). Probability of one collision = F Probability of two collisions = F 2 Expected collisions E=F+2*F 2 +3*F 3 + …= i=0. i*F i = F/(1-F) 2 If F=.5, E= ½ +2* ¼ +3* 1/8 + … ½+ ½ + 3/8 + 4/16 + … ½/(1-½) 2 =2 If F=.75 E = F/(1-F) 2 =(3/4)/(1/16) = 12 If F=0.9 E = F/(1-F)2 = (9/10)/(1/100) = 90 Hash Tables are often used for file system folders, They complement Databases using bTrees for sequential processing and a hash table for rapid searching.