Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What.

Similar presentations


Presentation on theme: "CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What."— Presentation transcript:

1 CSC 213 – Large Scale Programming

2 Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What should we expect & handle when searching?  What factors matter to our users (and ourselves)?  (Besides source of bad jokes) What is hashing?  Why important for searching? How can it help?  What are critical factors of good hash function?  Commonly-used hash function example examined

3 Keys To Map & Dictionary 1. Used to convert the key into value 2. values cannot share a key and be in same Map 3. In searching failure is normal, not exceptional

4 Entry ADT  Needs 2 pieces: what we have & what we want  First part is the key: data used in search  Item we want is value; the second part of an Entry  Implementations must define 2 methods  key() & value() return appropriate item NOT  Usually includes setValue() but NOT setKey()

5 S EQUENCE -Based Map  S EQUENCE ’s perspective of M AP that it holds P OSITION s elements

6 S EQUENCE -Based Map  Outside view of M AP and how it is stored P OSITION s E NTRY s

7 S EQUENCE -Based Map  M AP implementation’s view of data and storage P OSITION s Elements/E NTRY s

8 Emergency

9

10

11

12 Map Performance  In all seriousness, can be matter of life-or-death immediately  911 Operators immediately need addresses  Google’s search performance in TB/s  O(log n) time too slow for these uses  Would love to use arrays  Get O(1) time to add, remove, or lookup data  This HUGE array needs massive RAM purchase

13 Monster Amounts of RAM  Java requires using int as array index  Limit to int and RAM available in a machine  Integer.MAX_VALUE = 2,147,483,647  8,200,000,000 pages in Google’s index (2005)  In US, possible phone numbers = 10,000,000,000  Must do more for O(1) array usage time

14 Monster Amounts of RAM

15

16 Hashing To The Rescue  Hash function turns key into int from 0 – N -1  Result is usable as index for an array  Specific for key’s type; cannot be reused  Store the Entry s in array (“ HASH TABLE ”)  (Great name for shop in Amsterdam, too)  Begin by computing key’s hash value  Result is array index for that Entry  Now is possible to use array for O(1) time!

17 Hash Table Example  Example shows table of Entry h(x)  x mod 10,000  Simple hash function is h(x)  x mod 10,000  x is/from Entry ’s key  h(x) computes index to use  Always is mod array length  Not all locations used will  Holes will appear in array  Empties:set to null -or- use sentinel value Hash Table Entry s 0 1 0256120001“Jay Doe” 2 9811010002“Bob Doe” 3 4 4512290004“Jill Roe” 9997 9998 2007519998“Rhi Smith” 9999

18 When We Use Hash

19  Hash key to find index  First step for most calls  get() - need index to check  Add at that index - put()  remove() - index to set null  Then check key at index  At index many keys possible  Still a Map, so results known  If you find keys not same cannot treat as the same! Hash Table Entry s 0 1 0256120001“Jay Doe” 2 9811010002“Bob Doe” 3 4 4512290004“Jill Roe” 9997 9998 2007519998“Rhi Smith” 9999

20 Properties of Good Hash

21

22 Reliability of Hash Function  Implement Map with a hash table  To use Entry, get key to easily look up its index  Always computes same index for that key

23 Speed of Hash Function  Hash must be computed on each access  Goal: O(1) efficiency by using an array  Efficiency of array wasted if hash is slow  If O(1) computation performed by hash function  It is possible to perform get in O(1) time  O(1) time for put & remove could also occur  None of this is guaranteed; many problems can occur

24 Use Entire Table Important  Hashing take lots of space because array is used  When creating, make array big enough to hold all data  Can copy to larger array, but this not O(1) operation  Use prime number lengths but these quickly get large  Spreads out Entry s equally across entire table  Further apart it's spread, easier to find opening

25 Hash Function Analogy

26 Hash table

27 Hash Function Analogy Hash function Hash table

28 Examples of Bad Hash  h(x) = 0  Reliable, fast, little use of table  h(x) = random.nextInt ()  Unreliable, fast, uses entire table  h(x) = current index -or- free index  Reliable, slow, uses entire table  h(x) = x 34 + 2x 33 + 24x 32 + 10x 31 …  Reliable, moderate, too large

29 Incredibly Bad Hash

30  Using only part of key & not whole thing  No matter what, inevitably, you will guess wrong

31 Incredibly Bad Hash  Using only part of key & not whole thing  No matter what, inevitably, you will guess wrong

32 Incredibly Bad Hash  Using only part of key & not whole thing  No matter what, inevitably, you will guess wrong Part used for hash

33 Incredibly Bad Hash  Using only part of key & not whole thing  No matter what, inevitably, you will guess wrong Part used for hash Part that matters

34 Good Hash  Hash must first turn key into int  Easy for numbers, but rarely that simple in real life  For a String, could add value of each character  Would hash to same index “spot”, “pots”, “stop”  Instead we usually use polynomial code: Censored = ( x 0 * a k-1 ) + ( x 1 * a k-2 ) + … + ( x k-2 * a 1 ) + x k-1

35 Good Hash  Hash must first turn key into int  Easy for numbers, but rarely that simple in real life  For a String, could add value of each character  Would hash to same index “spot”, “pots”, “stop”  Instead we usually use polynomial code: Censored = ( x 0 * a k-1 ) + ( x 1 * a k-2 ) + … + ( x k-2 * a 1 ) + x k-1 “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + (‘t’ * a 0 )

36 Good Hash  Hash must first turn key into int  Easy for numbers, but rarely that simple in real life  For a String, could add value of each character  Would hash to same index “spot”, “pots”, “stop”  Instead we usually use polynomial code: Censored = ( x 0 * a k-1 ) + ( x 1 * a k-2 ) + … + ( x k-2 * a 1 ) + x k-1 “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + (‘t’ * a 0 ) “stop” = (‘s’ * a 3 ) + (‘t’ * a 2 ) + (‘o’ * a 1 ) + (‘p’ * a 0 )

37 Good, Fast Hash very  Polynomial codes good, but very slow  Major bummer since we use hash for its speed  Cause of slowdown: computing a n takes n operations  Horner’s method better by piggybacking work Slow Approach: “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + (‘t’ * a 0 ) Horner’s Method “spot” = ‘t’ + ( a * (‘o’ + ( a * (‘p’ + ( a * ‘s’)))))

38 Compression  Hash’s only use is computing array indices  Useless if larger than table’s length: no index exists!  When a =33, “spot” hashed to 4,293,383  Some hash incalculable (like “triskaidekaphobia”)  To compress result, work like array-based queue +% hash = (result + length) % length  % returns by modulus (the remainder from division)  Serves exact same purpose: keeps index within limits

39 Before Next Lecture…  Continue working on week #4 assignment  Due at usual time Tues. so may want to get cracking  Start thinking of designs & CRC cards for project  Due in 10 days as projects completed in stages  Read sections 9.2.1 & 9.2.5 – 9.2.7 of the book  Consider better ways of handling this situation:


Download ppt "CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What."

Similar presentations


Ads by Google