Download presentation
Presentation is loading. Please wait.
Published byAubrey Hampton Modified over 6 years ago
1
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN) time Array List take O(1) time Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
2
Array 99999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490
ID: Name: Somsri Faculty: Science ID: Name: Sompong Faculty: Engineering Problem: a lot of empty space Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
3
Hashing 9999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490
9999 ID: Name: Somsri Faculty: Science ID: Name: Sompong Faculty: Engineering Map the key into some number between 0 to ArraySize-1 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
4
Hashing Map the key into an array position using a “hash function”
ArrayIndex = hash(key) Take O(1) time to access an item Much less empty space than using normal array Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
5
Hash Function Must return a valid array index.
Should be 1-to-1 mapping. If key1 != key2 then hash(key1) != hash(key2) A collision occurs when two distinct keys hash to the same location in the array Should distribute the keys evenly Any key value k is equally likely to hash to any of the m array locations. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
6
Simple Hash Function ArrayIndex = key mod TableSize Example:
TableSize should be a prime number for even distribution Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
7
Another Hash Function ArrayIndex = (k0 + 37k k ) mod TableSize Example: 3-character key ArrayIndex = (k0 + 37k k2) mod TableSize ArrayIndex = k * (k * (k2)) mod TableSize Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
8
Hash Function } public static int hash( String key, int tableSize )
int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if ( hashVal < 0 ) // overflow hashVal += tableSize; return hashVal; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
9
Collision When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision. Collision resolving techniques Separate Chaining Open Addressing Linear Probling, Quadratic Probling, Double Hashing Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
10
Separate Chaining 999 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
11
Separate Chaining Load factor l = number of elements / table size
average length of list = l successful search cost 1 + (l/2) link traversals cost depends on l Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
12
Separate Chaining: evenly distributed
999 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
13
Separate Chaining: last digit is zero
10 20 999 Solution: TableSize is prime Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
14
Open Addressing No linked-list. All items are in the array
If a collision occurs, alternative locations are tried until an empty cell is found try h0(x), h1(x), h2(x), … hi(x) = (hash(x) + f(i)) mod TableSize f(i) is a collision resolution strategy Require bigger table, l should be below 0.5 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
15
Linear Probing If a collision occurs, try the next cell sequentially
f(i) = i hi(x) = (hash(x) + i) mod TableSize Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . . Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
16
Linear Probing Insert: 89, 18, 49, 58, 69 1 2 3 4 5 6 7 8 9 49 58 69
1 2 3 4 5 6 7 8 9 49 58 69 18 89 89 is directly inserted into cell 9 18 is directly inserted into cell 8 49 has a collision at cell 9 and finally put into cell 0 58 has collisions at cell 8, 9, 0 and finally put into cell 1 69 has a collisions at cell 9, 0, 1 and finally put into cell 2 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
17
Primary Clustering Forming of blocks of occupied cells (called clusters) A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
18
Linear Probing Problem: Primary Clustering
Normal deletion cannot be performed (some following find operations will fail because the link of collisions that leads to the data is cut) Use lazy deletion Insertion cost = number of probes to find an empty cell = 1/(fraction of empty cells) = 1/(1- l) Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
19
Quadratic Probing Eliminate primary clustering f(i) = i2
hi(x) = (hash(x) + i2) mod TableSize Try hash(x) mod TableSize, hash(x)+12 mod TableSize, hash(x)+22 mod TableSize, hash(x)+32 mod TableSize, Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision) Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
20
Quadratic Probing Insert: 89, 18, 49, 58, 69 1 2 3 4 5 6 7 8 9 49 58
1 2 3 4 5 6 7 8 9 49 58 18 89 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
21
Quadratic Probing Insert: 10, 20, 30, 40, 50, 60, 70 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10 40 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . . 20 30 60 50 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
22
Quadratic Probing Secondary clustering
elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. In the first example, inserting 89, 49, 69 forms a secondary cluster. Inserting 18, 58 forms another secondary cluster. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
23
Double Hashing f(i) = i * hash2(x)
hi(x) = (hash(x) + i * hash2(x)) mod TableSize Try hash(x) mod TableSize, (hash(x) + hash2(x)) mod TableSize, (hash(x) + 2*hash2(x)) mod TableSize, . . . Example: hash2(x) = R - (x mod R) R is a prime number smaller than TableSize Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
24
Double Hashing Insert: 89, 18, 49, 58, 69, 23
hash2(49) = 7-(49 mod 7) = 7 hash2(58) = 7-(58 mod 7) = 5 hash2(69) = 7-(69 mod 7) = 1 hash2(23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 1 2 3 4 5 6 7 8 9 69 18 89 58 49 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
25
Rehashing When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. Because of lazy deletion, deleted cells are also counted when the load factor is calculated. Rehashing time is O(N). But the cost is shared by preceding N/2 insertions. So, it adds constant cost to each insertion. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
26
public interface Hashable { int hash( int tableSize ); }
public class MyInteger implements Comparable, Hashable { public int hash( int tableSize ) if ( value < 0 ) return -value % tableSize; else return value % tableSize; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
27
public static void main( String [ ] args )
{ SeparateChainingHashTable H = new SeparateChainingHashTable( ); final int NUMS = 4000; final int GAP = 37; for( int i = GAP; i != 0; i = ( i + GAP ) % NUMS ) H.insert( new MyInteger( i ) ); for( int i = 1; i < NUMS; i+= 2 ) H.remove( new MyInteger( i ) ); for( int i = 2; i < NUMS; i+=2 ) if( ((MyInteger)(H.find( new MyInteger( i ) ))). intValue( ) != i ) System.out.println( "Find fails " + i ); } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
28
public class SeparateChainingHashTable {
private LinkedList[ ] theLists; public SeparateChainingHashTable( ) public SeparateChainingHashTable( int size ) public void insert( Hashable x ) public void remove( Hashable x ) public void find( Hashable x ) public void makeEmpty( ) public static int hash( String key, int tableSize ) private static final int DEFAULT_TABLE_SIZE = 101 private static int nextPrime( int n ) private static boolean isPrime( int n ) } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
29
public class SeparateChainingHashTable
{ public SeparateChainingHashTable( ) this( DEFAULT_TABLE_SIZE ); } public SeparateChainingHashTable( int size ) theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); public void makeEmpty( ) theLists[ i ].makeEmpty( ); Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
30
public static int hash( String key, int tableSize ) { int hashVal = 0;
for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
31
public void insert( Hashable x )
{ LinkedList whichList = theLists[ x.hash( theLists.length ) ]; LinkedListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); } public void remove( Hashable x ) theLists[ x.hash( theLists.length ) ].remove( x ); public Hashable find( Hashable x ) return (Hashable)theLists[x.hash(theLists.length)]. find( x ).retrieve( ); Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
32
public class Employee implement Hashable {
public int hash( int tableSize ) { return SeparateChainingHashTable.hash( name, tableSize ); } public boolean equals( Object rhs ) { return name.equals( ((Employee)rhs).name ); } private String name; private double salary; private int seniority; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
33
public class QuadraticProbingHashTable {
public static final int DEFAULT_TABLE_SIZE = 11; protected HashEntry [ ] array; private int currentSize; public QuadraticProbingHashTable( ) public QuadraticProbingHashTable( int size ) public void makeEmpty( ) public Hashable find ( Hashable x) public void insert( Hashable x ) public void remove( Hashable x ) public static int hash( String key, int tableSize ) } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
34
class HashEntry { Hashable element; // the element
boolean isActive; // false is deleted public HashEntry( Hashable e ) this( e, true ); } public HashEntry( Hashable e, boolean i ) element = e; isActive = i; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
35
public class QuadraticProbingHashTable {
public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } public QuadraticProbingHashTable( int size ) allocateArray( size ); makeEmpty( ); } public void makeEmpty( ) currentSize = 0; for( int i = 0; i < array.length; i++ ) array[ i ] = null; private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
36
public Hashable find( Hashable x ) { int currentPos = findPos( x );
return isActive( currentPos ) ? array[ currentPos ].element : null; } private int findPos( Hashable x ) int collisionNum = 0; int currentPos = x.hash( array.length ); while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) currentPos += 2 * ++collisionNum - 1; if( currentPos >= array.length ) currentPos -= array.length; return currentPos; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
37
private boolean isActive( int currentPos )
{ return array[ currentPos ] != null && array[ currentPos ].isActive; } public void insert( Hashable x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) return; array[ currentPos ] = new HashEntry( x, true ); if( ++currentSize > array.length / 2 ) rehash( ); } public void remove( Hashable x ) if( isActive( currentPos ) ) array[ currentPos ].isActive = false; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
38
HashEntry [ ] oldArray = array;
private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive ) insert( oldArray[ i ].element ); return; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
39
private static int nextPrime( int n ) { if( n % 2 == 0 ) n++;
for( ; !isPrime( n ); n += 2 ) ; return n; } private static boolean isPrime( int n ) if( n == 2 || n == 3 ) return true; if( n == 1 || n % 2 == 0 ) return false; for( int i = 3; i * i <= n; i += 2 ) if( n % i == 0 ) return false; return true; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
40
Summary insert and find take constant average time
load factor affects performance load factor of separate chaining hashing should be close to 1 load factor of open addressing hashing should not exceed 0.5 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
41
Summary Hashing is good when ordering information is not required
Applications: symbol table on-line spelling checker Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.