Download presentation
Presentation is loading. Please wait.
Published byRussell Phelps Modified over 8 years ago
1
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look at 4 methods –Chaining –Closed hashing –Rehashing –Extendible hashing (we spend more time on chaining than on the other 3!) ADS2 lecture 21 Chapter 10. Maps and hash tables contd.
2
2 Chaining E.g. Encode a large set as a fixed-size sequence of small sets. –Use a hash function h(x) to determine which small set x belongs to –This type of hashing is called chaining. ADS2 lecture 21
3
3 Example Set A, of naturals encoded as 10 element sequence of mini-sets: Each value of set A is placed in the mini-set with index x mod 10. Set Index 0 1 2 3 4 5 6 7 8 9 After inserting 15 and 27 we have Index 0 1 2 3 4 5 6 7 8 9 ADS2 lecture 21
4
4 Search To find if x occurs look in mini-set x mod 10. On average this reduces searching work by a factor of 10. To make indexing quick, implement the sequence as a 10-element array. Map ADT as hash table Suppose we already have a set defined (using a linked list). E.g. to add key-value pair (K,V) (put (K,V)) Use hash function to find h(K) =i. Use set operations on set stored at position I of hash table to insert new value. ADS2 lecture 21 Code for this implementation will appear in the Maps folder
5
5 Complexity issues The larger the hash table the better the speed-up but more space needed. If n is upper bound of the of the set's cardinality, (and all elements positive) average time complexity of MakeEmpty, AddToSet and ElementOf is O(1) for an n-element hash table. Hash function is h(x)=x. average time complexity is O(1) for an n/4-element table but operations are slower because each mini-set has 4 items. Hash function is h(x)=x mod n/4. If we know that data is well distributed, and hash collisions rare, we can assume that complexity is O(1) ADS2 lecture 21
6
6 Sequencing Writing all the elements in an injective linked list takes O(n) time where n is the cardinality. With a k-element hash table time becomes O(n+k). Minimum If the set were represented by a sorted sequence [array or linked list], finding the minimum would take O(1) time but with a k-element hash table time would become O(n+k). –If deleting the minimum is much used operation: do not hash. ADS2 lecture 21
7
7 Other types of collision resolution… Closed hashing/Open addresssing: Does not use linked lists to resolve hash collisions. If a collision occurs, alternative cells are tried until an empty cell is found. Requires more space. –Various collision resolution strategies available: e.g. linear probing, quadratic probing, double hashing –Method preferred when memory limited (e.g. small handheld device or sensor network) See next slide ADS2 lecture 21
8
Linear probing ADS2 lecture 218 To hash k, if h(k) is closed (i.e. full) successively check next cells along until an open (empty) cell is found (wrapping round if necessary) Providing amount of data is less than size of table, will always find a space eventually To search for a value k, look in cell h(k) and (if necessary) all successors until an open cell is found. (And delete similarly). But will this work? See board
9
Quadratic probing and double hashing ADS2 lecture 219 Quadratic probing: If value is to be inserted into A[i] and it is full, then cells A[i + j 2 (mod n)] checked, j=0,1,2, … until empty space found. Double hashing: A secondary hash function h used. IF original hash function h maps some key k to bucket A[i], with i=h(k) that is already occupied, then iteratively try buckets A[i+j.h(k) (mod n)] j=0,1,2, … until empty space found. Open addressing preferred when memory limited: e.g. in programs for small (memory-limited) handheld devices or a node in a sensor network.
10
10 Other types of collision resolution cont. Re-hashing: when table becomes too full running time for operations becomes prohibitively large. So build another table twice as large (with new hash function). Insert all elements into new table. See board Rehashing expensive ( O (n) ) but happens infrequently. Must have at least n/2 insertions prior to rehash. ADS2 lecture 21
11
11 Extendible hashing: when amount of data too large to be stored in main memory. This method allows put and get to be performed using only two disk accesses. Keys to smaller sets are stored in main memory, and the size of the smaller sets are at most m. When the smaller sets become full, new keys are introduced. Other types of collision resolution cont. See board ADS2 lecture 21
12
12 When should I use hashing? When we have a large amount of data and only need to do insert, delete and search operations Example applications: –compilers (to keep track of declared variables) –Graph theory problems where nodes have real names instead of numbers. –Online spell-checkers. –game playing programs ADS2 lecture 21
13
Map Implementation ADS2 lecture 2113 Hash table implementation of Map ADT. Rather complex, and involves several different files. All explained in a ReadMe file: Z:\public_html\ADS\CodeFromLectures\MapsAndHashTables\Maps\R eadMe.txt Important points: Use chaining method of collision resolution Use NodeSet to implement the linked lists for each entry Need new interface Hashable to define classes which have a defined hash code We use a hash code that is really a hash code + hash function (like one used in OOSE?) Wouldn’t expect you to reproduce any of this code in exam, but some of you may find it interesting. In OOSE you may have seen an implemention of a hash table, we use a hash table to implement a Map. Different!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.