Download presentation
Presentation is loading. Please wait.
Published byTeresa Jordan Modified over 9 years ago
1
Copyright © 2003-2005Curt Hill Hashing A quick lookup strategy
2
Copyright © 2003-2005Curt Hill What is a hash? Hashing is another name for key transformation The original key is usually a character string or other sparse key The result is usually a dense integer key
3
Copyright © 2003-2005Curt Hill Components Every hash scheme has three pieces –A hash function –A collection of buckets –Collision scheme The hash function transforms the key into an integer 0 – N The buckets are numbered 0 – N and hold the resulting data
4
Copyright © 2003-2005Curt Hill Basics The idea: –Have N containers for the data –Have a function that maps the original key to a number in the range 0 – (N-1) –One access retrieves the data regardless of size –When it works it is the fastest way to lookup a value
5
Copyright © 2003-2005Curt Hill Hash function Convert a sparse key into a dense key in the form of an integer The input is the real key which is a sparse key –Sparse keys use very few of very many possible key values Typical key is a character string while the result needs an integer
6
Copyright © 2003-2005Curt Hill Typical hash function Input is a character string Output is an integer in range 0 - N Sum the ordinal value of each character of the string Divide the result by N and take the remainder –This guarantees the right range Number theory tells us to make N prime
7
Copyright © 2003-2005Curt Hill Example values with N = 256 “Abcdef” returns 53 “Hi there” returns 233 “ABCDEF” returns 149 “FEDCBA” returns 149 “A character string” returns 197 “A big character string” returns 23 “Zoology” returns 243 See the pattern yet?
8
Copyright © 2003-2005Curt Hill Hash functions Computing the function is easy Two problems –One result produced by two different keys – called a collision –The order of the original key is scrambled in the result Very good for an equality test and very bad for a range test
9
Copyright © 2003-2005Curt Hill Is it that simple? Not usually The previous scheme maps all words with exactly the same characters to the same integer value Many variation –Do a different computation on even and odd characters so a re-arrangment will produce different values
10
Copyright © 2003-2005Curt Hill Collisions Two keys giving the same integer –Assume each bucket may only hold one record Prevent hashing from the optimal search technique it could be Almost every hashing scheme needs a collision strategy There have been many variations
11
Copyright © 2003-2005Curt Hill Collision Strategies Linear probing –Add one to index until right one is found Quadratic probing –Square the index when the collision occurs Rehash –Secondary hash function Chaining –Link duplicates in an overflow area
12
Copyright © 2003-2005Curt Hill Collision Strategies 0 1 5 6 7 4 2 3 6 m 6 y 2 e 2 t 1 d 5 c 0 5 4 3 2 1 6 7 1 d 2 e 5 c 6 m 2 t 6 y Linear QuadraticChaining 0 1 2 3 4 55 c 6 7 2 e 6 m1 d 2 t 6 y
13
Copyright © 2003-2005Curt Hill Free Space As the free space decreases the collisions increase The best plan is to have about twice as many buckets as will be needed
14
Copyright © 2003-2005Curt Hill Effect of load using linear probing Load factorNumber of probes 10%1.06 25%1.17 50%1.5 75%2.5 90%5.5 95%10.5
15
Copyright © 2003-2005Curt Hill Dynamics A hash table is much harder to both grow and shrink Increasing or decreasing the number of buckets requires: –Allocate a new table (larger or smaller) – Rehashing every item in old into the new Deleting a record (without resizing) is also a problem because of collisions
16
Copyright © 2003-2005Curt Hill Deletion of a key The problem is collisions If the key has no collisions just mark the bucket as empty If it has collisions –All of them need to be rehashed –Finding them depends on the collision strategy
17
Copyright © 2003-2005Curt Hill Two more terms A perfect hash is one that has no collisions –Only occurs in situations with two conditions: –Keys are fixed and known in advance –The hash function is tailored to these keys A minimal hash has no empty buckets
18
Copyright © 2003-2005Curt Hill Data Skew A good hash function spreads the keys uniformly among the integer range How well does this work when the data is not uniformly distributed? For example consider –Names - There are many more Smiths than Garnjobsts –Numbers – there are many more 101 courses than most other numbers Can the hash function still give a good distribution?
19
Copyright © 2003-2005Curt Hill Summary Hashing works best when it works well –Stable index –Good distribution from the hash function It is much harder to make dynamic than trees
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.