Copyright © 2003-2005Curt Hill Hashing A quick lookup strategy.

Copyright © 2003-2005Curt Hill What is a hash? Hashing is another name for key transformation The original key is usually a character string or other sparse key The result is usually a dense integer key

Copyright © 2003-2005Curt Hill Components Every hash scheme has three pieces –A hash function –A collection of buckets –Collision scheme The hash function transforms the key into an integer 0 – N The buckets are numbered 0 – N and hold the resulting data

Copyright © 2003-2005Curt Hill Basics The idea: –Have N containers for the data –Have a function that maps the original key to a number in the range 0 – (N-1) –One access retrieves the data regardless of size –When it works it is the fastest way to lookup a value

Copyright © 2003-2005Curt Hill Hash function Convert a sparse key into a dense key in the form of an integer The input is the real key which is a sparse key –Sparse keys use very few of very many possible key values Typical key is a character string while the result needs an integer

Copyright © 2003-2005Curt Hill Typical hash function Input is a character string Output is an integer in range 0 - N Sum the ordinal value of each character of the string Divide the result by N and take the remainder –This guarantees the right range Number theory tells us to make N prime

Copyright © 2003-2005Curt Hill Example values with N = 256 “Abcdef” returns 53 “Hi there” returns 233 “ABCDEF” returns 149 “FEDCBA” returns 149 “A character string” returns 197 “A big character string” returns 23 “Zoology” returns 243 See the pattern yet?

Copyright © 2003-2005Curt Hill Hash functions Computing the function is easy Two problems –One result produced by two different keys – called a collision –The order of the original key is scrambled in the result Very good for an equality test and very bad for a range test

Copyright © 2003-2005Curt Hill Is it that simple? Not usually The previous scheme maps all words with exactly the same characters to the same integer value Many variation –Do a different computation on even and odd characters so a re-arrangment will produce different values

Copyright © 2003-2005Curt Hill Collisions Two keys giving the same integer –Assume each bucket may only hold one record Prevent hashing from the optimal search technique it could be Almost every hashing scheme needs a collision strategy There have been many variations

Copyright © 2003-2005Curt Hill Collision Strategies Linear probing –Add one to index until right one is found Quadratic probing –Square the index when the collision occurs Rehash –Secondary hash function Chaining –Link duplicates in an overflow area

Copyright © 2003-2005Curt Hill Dynamics A hash table is much harder to both grow and shrink Increasing or decreasing the number of buckets requires: –Allocate a new table (larger or smaller) – Rehashing every item in old into the new Deleting a record (without resizing) is also a problem because of collisions

Copyright © 2003-2005Curt Hill Deletion of a key The problem is collisions If the key has no collisions just mark the bucket as empty If it has collisions –All of them need to be rehashed –Finding them depends on the collision strategy

Copyright © 2003-2005Curt Hill Two more terms A perfect hash is one that has no collisions –Only occurs in situations with two conditions: –Keys are fixed and known in advance –The hash function is tailored to these keys A minimal hash has no empty buckets

Copyright © 2003-2005Curt Hill Data Skew A good hash function spreads the keys uniformly among the integer range How well does this work when the data is not uniformly distributed? For example consider –Names - There are many more Smiths than Garnjobsts –Numbers – there are many more 101 courses than most other numbers Can the hash function still give a good distribution?

Copyright © 2003-2005Curt Hill Hashing A quick lookup strategy.

Similar presentations

Presentation on theme: "Copyright © 2003-2005Curt Hill Hashing A quick lookup strategy."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Copyright © 2003-2005Curt Hill Hashing A quick lookup strategy.

Similar presentations

Presentation on theme: "Copyright © 2003-2005Curt Hill Hashing A quick lookup strategy."— Presentation transcript:

Similar presentations

About project

Feedback