Download presentation
Presentation is loading. Please wait.
Published byKimberly O’Brien’ Modified over 9 years ago
1
Hashtables
2
An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require an order relation to be defined an logarithmic time. Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.
3
Direct Access Tables Assume that the keys are distinct numbers in the range U = {1,2,3….m}, use an array of size m and place the k th element in the k th index of the array. O(1) time for all operations Problem: wasteful for small sets and impractical if m is very large
4
Hashtables Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices. Note U is the set representing all possible keys, it is therefore usually much larger than m.
5
Simple Uniform Hashing We assume that we use a hash function that given an key, will hash the key into any slot with equal probability. We will try to provide some reasonable hash functions later
6
hash functions The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties –1. Easy to evaluate - computing h(x) in O(1) –2. Uniform distribution over all the table slots –3. Similar keys will be mapped to different slots
7
hash functions The first step is to represent the key as a natural integer number. For example if S is a String then we can compute the interpret it as an integer value using the formula
8
Collisions Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index Solutions –Chaining –Open addressing
9
Collision resolution - Chaining All keys that have the same hash value are placed in a linked list Insertion can be done at the beginning of the list in O(1) time Searching is proportional to the length of the list
10
Collision resolution by chaining Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements : 6, 43, 23, 62, 1, 13, 34, 55, 25 h(6) = 6 mod 9 = 6 h(43) = 43 mod 9 = 7 h(23) = 23 mod 9 = 5 h(62) = 62 mod 9 = 8 h(1) = 1 mod 9 = 1 h(13) = 13 mod 9 = 4 h(34) = 34 mod 9 = 7 h(55) = 55 mod 9 = 1 h(25) = 25 mod 9 = 7
11
Analysis The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots An search will take under the assumption of uniform hashing
12
Division method An appropriate hash function for a hashtable that uses chaining is the division method. Powers of 10 and 2 should be avoided Good values are primes not close to powers of 2
13
Open Addressing Each element occupies a single slot in the hashtable. No chaining is done To insert an element, we probe the table according to the hash function until an empty slot is found. The hash function is now a function of both the key and the number of attempts in the insertion process
14
Hash Insert HashInsert (T,k) { int i; for (i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) break; } if (i < m) T[j] = k else hashtable overflow }
15
Hash Search HashSearch (T,k) { int i; for (int i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) return not found else if (T[j] ==k) return j }
16
Linear probing Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into: If a slot is occupied, we try the subsequent slot, etc., thus the initial slot determines the probing sequence for insertion and search.
17
Linear Probing Easy to implement but suffers from primary clustering. The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.
18
Linear Probing Given a hash function h’, the linear probing scheme is simply
19
Exercise You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m –10,22,31,4,15,28,17,88,59
20
Solution h(10,0) = (10mod11 + 0) mod 11 = 10 h(22,0) = (22mod11 + 0) mod 11 = 0 h(31,0) = (31mod11 + 0) mod 11 = 9 h(4,0) = (4mod11 + 0) mod 11 = 4 h(15,0) = (15mod11 + 0) mod 11 = 4 h(15,1) = (15mod11 + 0) mod 11 = 5 h(28,0) = (28mod11 +1) mod 11 = 6 h(17,0) = (17mod11 + 0) mod 11 = 6 h(17,1) = (17mod11 + 1) mod 11 = 7 012345678910 22884152817593110 h(88,0) = (88mod11 + 0) mod 11 = 10 h(88,1) = (88mod11 +1) mod 11 = 1 h(59,0) = (59mod11 + 0) mod 11 = 4 h(59,1) = (59mod11 + 1) mod 11 = 5 h(59,2) = (59mod11 + 2) mod 11 = 6 h(59,3) = (59mod11 + 3) mod 11 = 7 h(59,4) = (59mod11 + 4) mod 11 = 8
21
Quadric Probing Using quadratic probing the has function again uses an initial hash function h’, and is now Choosing a subsequent slot once a slot is full depends on the probe number i. Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,
22
Quadric Probing Given a hash function h’ quadric probing is done by:
23
Example You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function –10,22,31,4,15,28,17,88,59
24
h(10,0) = (10mod11 + 0) mod 11 = 10 h(22,0) = (22mod11 + 0) mod 11 = 0 h(31,0) = (31mod11 + 0) mod 11 = 9 h(4,0) = (4mod11 + 0) mod 11 = 4 h(15,0) = (15mod11 + 0) mod 11 = 4 h(15,1) = (15mod11 + 1 + 3) mod 11 = 8 h(28,0) = (28mod11 +1) mod 11 = 6 h(17,0) = (17mod11 + 0) mod 11 = 6 h(17,1) = (17mod11 + 1 + 3) mod 11 = 10 h(17,2) = (17mod11 + 2 + 12) mod 11 = 9 h(17,3) = (17mod11 + 3 + 27) mod 11 = 3 h(88,0) = (88mod11 + 0) mod 11 = 0 h(88,1) = (88mod11 + 1 + 3) mod 11 = 4 h(88,2) = (88mod11 + 2 + 12) mod 11 = 3 h(88,3) = (88mod11+ 3+ 27) mod 11 = 8 h(88,4) = (88mod11+ 4+ 48) mod 11 = 8 h(88,5) = (88mod11+ 5+ 75) mod 11 = 3 h(88,6) = (88mod11+ 6+ 108) mod 11 = 4 h(88,7) = (88mod11+ 7+ 147) mod 11 = 0 h(88,8) = (88mod11+ 8+ 192) mod 11 = 2 h(59,0) = (59mod11 + 0) mod 11 = 4 h(59,1) = (59mod11 + 1 + 3) mod 11 = 8 h(59,2) = (59mod11 + 1 + 12) mod 11 = 7 012345678910 22881742859153110
25
Double Hashing Given two hash functions Problem should not have any common divisors.
26
Double Hashing Example 1: select m to be a power of 2, and design to produce odd numbers. Example 2: select m to be prime, and m’ to be m-1.
27
Analysis In open addressing the load factor can not be more than 1. Insertion and unsuccessful searching requires at most attempts A successful search will take at most
28
Analysis When the table is 50% full, searching will require 1.387 probes on average When the table is 90% full, searching will require 2.599 probes on average
29
Problems with open addressing If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time Solution: Use a DELETED node.
30
Rehashing If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.
31
Example Given a set S of unique integers and a number z, find such that x+y = z –An efficient worst case algorithm –An efficient average case algorithm
32
An efficient worst case algorithm 1. Sort all elements in S -. 2. For every x in S we search for z-x (y) in S using binary search – Total of O(nlogn)
33
An efficient average case algorithm 1. We use a hash table where m is of order n for all we execute insert(x) 2. For all we execute search(z-x) Total - average case Total - worst case
34
Example Given a set S of sortable items, we are asked if all items in S are unique. 1. Sort the elements of S. 2. Iterate on the elements of S searching for subsequent equal values. Execution time
35
Example 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table. (every insert includes a search operation) Execution time - average case
36
Java hashcode Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps. The default implementation returns a unique number that is based on the memory location of the object. If two objects are equal they must have the same hashcode
37
Java hashcode It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables. Can the hashcode of an object change throughout it’s life cycle?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.