Collision Handling Collisions occur when different elements are mapped to the same cell.

Collision Handling Collisions occur when different elements are mapped to the same cell

Collision Handling Two classes:
(1) Open hashing (External Hashing), a.k.a. separate chaining (2) Closed hashing (internal Hashing), a.k.a. open addressing Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)

Open Hashing Each cell (bucket) in the hash table is the head of a linked list All elements that hash to a particular bucket are placed on that bucket’s linked list Records within a bucket can be ordered in several ways by order of insertion, by key value order, or by frequency-of access order

Open Hashing ... 1 ... 2 3 4 ... D-1

Open Hashing If a collision occurs
Chaining causes search to slow as we now have to search through linked lists. We hope that number of elements per bucket roughly equal in size, so that the lists will be short (Depends on the hash function used) If there are n elements in set, then each bucket will have roughly n/D (called Load factor )

Open Hashing Average time per hash table operation:
D buckets, n elements in dictionary  average n/D elements per bucket (Load factor ) For successful search: 1+α/2 comparisons where α is load factor (assuming uniform hashing) If we can choose D to be about n  constant time

Open Hashing A possible improvement :
use binary search trees instead of linked lists. 1 2 3 4 D-1

Closed Hashing Linear hashing (linear probing) Quadratic probing
Double Hashing

Closed Hashing Associated with closed hashing is a rehash strategy:
“If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty slot  table is full,” h(x) is called home bucket Simplest rehash strategy is called linear hashing hi(x) = (h(x) + i) % D In general, the collision handling strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)

Linear Hashing (Probing)
Table H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 H(k5) K2 K3 H(k3) D - 1

Example Linear (Closed) Hashing
D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4, h(d)=3 Where do we insert d? 3 already filled Probe sequence using linear hashing: h1(d) = (h(d)+1)%8 = 4%8 = 4 h2(d) = (h(d)+2)%8 = 5%8 = 5* h3(d) = (h(d)+3)%8 = 6%8 = 6 etc. 7, 0, 1, 2 Wraps around the beginning of the table! b 1 2 3 a c 4 d 5 6 7

Operations Using Linear Hashing
Test for membership (search function): Examine h(k), h1(k), h2(k), …, until we find k (success) or an empty bucket (fail) If no deletions possible, strategy works! What if deletions? If we reach empty bucket, cannot be sure that k is not somewhere else Solutions: Reorganize the table after every deletion (too expensive) Lazy deletion

Lazy deletion find(58) delete(89) insert(99) d 49 58 9 -1 18 89 a
value state 49 58 9 -1 18 89 a d value state 49 58 9 -1 18 99 a value state Hash function h(89) = 9 h(18) = 8 h(49) = 9 h(58) = 8 h(99) = 9 insert criterion value==-1 OR state==d continue search criterion value!=-1

Linear probing performance
Worst Case Insert and search: (n), n number of elements in table; all n key values have same home bucket No better than linear list for maintaining dictionary! Analysis doesn’t tell us much, let’s look at average case scenario …

Average Case Distinguish between successful and unsuccessful searches Delete = successful search for record to be deleted Insert = unsuccessful search along its probe sequence

Average Case Expected cost of hashing is a function of how full the table is: load factor  = n/D It has been found that average costs under linear hashing (probing) are: Insertion (unsuccessful search): (1 + 1/(1 - )2)/2 Deletion (successful search): (1 + 1/(1 - ))/2 e.g. table half full: - 2.5 probes (insertion) - 1.5 probes (deletion)

Expected number of accesses to hash table  1.0 0.2 0.4 0.6 0.8 1 2 4 5 3 Delete Insert

A problem associated with Linear Probing is called, primary clustering. Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up. This results in increased search times.

Quadratic Probing Same idea as linear probing, but instead of:
h, h+1, h+2, h+3, h+4, ... we explore h, h+1, h+4, h+9, h+16, ... ( h(key) + c i2 ) Note, that we are squaring the number of the probe, not the hash value. Another possibility: h, h+1, h-1, h+4, h-4, h+9,...

Quadratic Probing: Evaluation
Pros: Eliminates primary clustering – more spread Simple to compute Cons: May still cause some clustering (chains, follow same sequence) This is called Secondary Clustering - Two elements x and y with h(x) = h(y) (same home buckets) have same probe sequence.

Quadratic Probing: Evaluation
Improvement: Make the probes follow difference sequences depending on key – use another hash function (Double Hashing)

Double Hashing Double hashing is one of the best methods for dealing with collisions. The slot location is calculated based upon the hash function (H1(k)). If the slot is full, then a second hash function (H2(k))is calculated and combined with the first hash function to determine a new slot. The secondary hash function H2(k) cannot have zero values

Double Hashing H(k5) H(k1) H(k4) K1 H(k2) = H(k5) K4 K5 K2 K3 H(k3)
Table H(k5) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) D - 1

Double Hashing Common choice of compression function for the secondary hash function: H2(k) = q - k mod q where q < D q is a prime The possible values for H2(k) are 1, 2, … , q

Double Hashing H1(k)=k mod 23 H2(k)=11 - (k mod 11) probe = H1(k)
322 1 2 3 808 4 161 5 6 250 7 900 8 767 9 786 10 11 471 12 13 14 336 15 912 16 17 18 19 20 319 21 22 H1(k)=k mod 23 H2(k)=11 - (k mod 11) probe = H1(k) probe = (probe + H2(k)) mod 23 Insert: Key Probes 257 4, 11, 18 812 7, 12, 17 256 3, 11, 19 762 3, 11, 19, 4, 12, 20, 5 i k 322 1 2 3 808 4 161 5 762 6 250 7 900 8 767 9 786 10 700 11 471 12 449 13 14 336 15 912 16 17 812 18 257 19 256 20 319 21 22

Double Hashing performance
Average Case It has been found that average costs under double hashing (probing) are: Insertion (unsuccessful search): Deletion (successful search): e.g. table half full: - 2 probes (insertion) probes (deletion)

Summary: Hashing performance
Number of probes depends on the load factor, α Chaining: unlike linear probing or double hashing, chaining does not require α to be bound by 1. if average linked list is of length λ, then: unsuccessful successful Chaining Linear probing Double Hashing

Hash Table Implementation

Table class – Main p.572 A Table: Is a collection of objects
has operations for adding, removing and locating objects these operations are controlled by a unique key for each object

Table class – Main p.572 public class Table { private int manyItems;
private Object[ ] keys; private Object[ ] data; private boolean[ ] hasBeenUsed; …

constructor { if (capacity <= 0) keys = new Object[capacity];
public Table(int capacity) { if (capacity <= 0) throw new IllegalArgumentException("Capacity is negative"); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }

Hash function { return Math.abs(key.hashCode()) % data.length; }
private int hash(Object key) { return Math.abs(key.hashCode()) % data.length; }

search for an object by key
public boolean containsKey(Object key) { return findIndex(key) != -1; } private int findIndex(Object key) { int count = 0; int i = hash(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i); return -1;

wrap around indexing private int nextIndex(int i) {
if (i+1 == data.length) return 0; else return i+1; }

get an object { int index = findIndex(key); if (index == -1)
public Object get(Object key) { int index = findIndex(key); if (index == -1) return null; else return data[index]; }

insert a key and object public Object put(Object key, Object element)
{ int index = findIndex(key); Object answer; if (index != -1) // replace object for key { answer = data[index]; data[index] = element; return answer; } else if (manyItems < data.length) // new key and object { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; hasBeenUsed[index] = true; manyItems++; return null; else // table is full { throw new IllegalStateException("Table is full.");

remove a key and object public Object remove(Object key) {
int index = findIndex(key); Object answer = null; if (index != -1) answer = data[index]; keys[index] = null; data[index] = null; manyItems--; } return answer;

Changing probe strategy double hash
private int findIndex(Object key) { int count = 0; int i = hash1(key); int p = hash2(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i,p); } return -1; private int nextIndex(int i, int p) { return (i+p)%data.length; }

JAVA’s Hashtable Class
Java has a standard hash table class called: java.util.Hashtabel This class has methods similar to the previous table example Hashtable class grows automatically when it becomes 75% full.

Collision Handling Collisions occur when different elements are mapped to the same cell.

Similar presentations

Presentation on theme: "Collision Handling Collisions occur when different elements are mapped to the same cell."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Collision Handling Collisions occur when different elements are mapped to the same cell.

Similar presentations

Presentation on theme: "Collision Handling Collisions occur when different elements are mapped to the same cell."— Presentation transcript:

Similar presentations

About project

Feedback