Download presentation
Presentation is loading. Please wait.
Published byCornelia Lloyd Modified over 9 years ago
1
CSC 213 – Large Scale Programming
2
Today’s Goal Consider what will be important when searching Why search in first place? What is its purpose? What should we expect & handle when searching? What factors matter to our users (and ourselves)? (Besides source of bad jokes) What is hashing? Why important for searching? How can it help? What are critical factors of good hash function? Commonly-used hash function example examined
3
Keys To Map & Dictionary 1. Used to convert the key into value 2. values cannot share a key and be in same Map 3. In searching failure is normal, not exceptional
4
Entry ADT Needs 2 pieces: what we have & what we want First part is the key: data used in search Item we want is value; the second part of an Entry Implementations must define 2 methods key() & value() return appropriate item NOT Usually includes setValue() but NOT setKey()
5
S EQUENCE -Based Map S EQUENCE ’s perspective of M AP that it holds P OSITION s elements
6
S EQUENCE -Based Map Outside view of M AP and how it is stored P OSITION s E NTRY s
7
S EQUENCE -Based Map M AP implementation’s view of data and storage P OSITION s Elements/E NTRY s
8
Emergency
12
Map Performance In all seriousness, can be matter of life-or-death immediately 911 Operators immediately need addresses Google’s search performance in TB/s O(log n) time too slow for these uses Would love to use arrays Get O(1) time to add, remove, or lookup data This HUGE array needs massive RAM purchase
13
Monster Amounts of RAM Java requires using int as array index Limit to int and RAM available in a machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index (2005) In US, possible phone numbers = 10,000,000,000 Must do more for O(1) array usage time
14
Monster Amounts of RAM
16
Hashing To The Rescue Hash function turns key into int from 0 – N -1 Result is usable as index for an array Specific for key’s type; cannot be reused Store the Entry s in array (“ HASH TABLE ”) (Great name for shop in Amsterdam, too) Begin by computing key’s hash value Result is array index for that Entry Now is possible to use array for O(1) time!
17
Hash Table Example Example shows table of Entry h(x) x mod 10,000 Simple hash function is h(x) x mod 10,000 x is/from Entry ’s key h(x) computes index to use Always is mod array length Not all locations used will Holes will appear in array Empties:set to null -or- use sentinel value Hash Table Entry s 0 1 0256120001“Jay Doe” 2 9811010002“Bob Doe” 3 4 4512290004“Jill Roe” 9997 9998 2007519998“Rhi Smith” 9999
18
When We Use Hash
19
Hash key to find index First step for most calls get() - need index to check Add at that index - put() remove() - index to set null Then check key at index At index many keys possible Still a Map, so results known If you find keys not same cannot treat as the same! Hash Table Entry s 0 1 0256120001“Jay Doe” 2 9811010002“Bob Doe” 3 4 4512290004“Jill Roe” 9997 9998 2007519998“Rhi Smith” 9999
20
Properties of Good Hash
22
Reliability of Hash Function Implement Map with a hash table To use Entry, get key to easily look up its index Always computes same index for that key
23
Speed of Hash Function Hash must be computed on each access Goal: O(1) efficiency by using an array Efficiency of array wasted if hash is slow If O(1) computation performed by hash function It is possible to perform get in O(1) time O(1) time for put & remove could also occur None of this is guaranteed; many problems can occur
24
Use Entire Table Important Hashing take lots of space because array is used When creating, make array big enough to hold all data Can copy to larger array, but this not O(1) operation Use prime number lengths but these quickly get large Spreads out Entry s equally across entire table Further apart it's spread, easier to find opening
25
Hash Function Analogy
26
Hash table
27
Hash Function Analogy Hash function Hash table
28
Examples of Bad Hash h(x) = 0 Reliable, fast, little use of table h(x) = random.nextInt () Unreliable, fast, uses entire table h(x) = current index -or- free index Reliable, slow, uses entire table h(x) = x 34 + 2x 33 + 24x 32 + 10x 31 … Reliable, moderate, too large
29
Incredibly Bad Hash
30
Using only part of key & not whole thing No matter what, inevitably, you will guess wrong
31
Incredibly Bad Hash Using only part of key & not whole thing No matter what, inevitably, you will guess wrong
32
Incredibly Bad Hash Using only part of key & not whole thing No matter what, inevitably, you will guess wrong Part used for hash
33
Incredibly Bad Hash Using only part of key & not whole thing No matter what, inevitably, you will guess wrong Part used for hash Part that matters
34
Good Hash Hash must first turn key into int Easy for numbers, but rarely that simple in real life For a String, could add value of each character Would hash to same index “spot”, “pots”, “stop” Instead we usually use polynomial code: Censored = ( x 0 * a k-1 ) + ( x 1 * a k-2 ) + … + ( x k-2 * a 1 ) + x k-1
35
Good Hash Hash must first turn key into int Easy for numbers, but rarely that simple in real life For a String, could add value of each character Would hash to same index “spot”, “pots”, “stop” Instead we usually use polynomial code: Censored = ( x 0 * a k-1 ) + ( x 1 * a k-2 ) + … + ( x k-2 * a 1 ) + x k-1 “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + (‘t’ * a 0 )
36
Good Hash Hash must first turn key into int Easy for numbers, but rarely that simple in real life For a String, could add value of each character Would hash to same index “spot”, “pots”, “stop” Instead we usually use polynomial code: Censored = ( x 0 * a k-1 ) + ( x 1 * a k-2 ) + … + ( x k-2 * a 1 ) + x k-1 “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + (‘t’ * a 0 ) “stop” = (‘s’ * a 3 ) + (‘t’ * a 2 ) + (‘o’ * a 1 ) + (‘p’ * a 0 )
37
Good, Fast Hash very Polynomial codes good, but very slow Major bummer since we use hash for its speed Cause of slowdown: computing a n takes n operations Horner’s method better by piggybacking work Slow Approach: “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + (‘t’ * a 0 ) Horner’s Method “spot” = ‘t’ + ( a * (‘o’ + ( a * (‘p’ + ( a * ‘s’)))))
38
Compression Hash’s only use is computing array indices Useless if larger than table’s length: no index exists! When a =33, “spot” hashed to 4,293,383 Some hash incalculable (like “triskaidekaphobia”) To compress result, work like array-based queue +% hash = (result + length) % length % returns by modulus (the remainder from division) Serves exact same purpose: keeps index within limits
39
Before Next Lecture… Continue working on week #4 assignment Due at usual time Tues. so may want to get cracking Start thinking of designs & CRC cards for project Due in 10 days as projects completed in stages Read sections 9.2.1 & 9.2.5 – 9.2.7 of the book Consider better ways of handling this situation:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.