Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 the hash table. hash table A hash table consists of two major components …

Similar presentations


Presentation on theme: "1 the hash table. hash table A hash table consists of two major components …"— Presentation transcript:

1 1 the hash table

2

3 hash table

4 A hash table consists of two major components …

5 hash table … a bucket array

6 hash table … and a hash function

7 hash table Performance is expected to be O(1)

8 bucket array

9 hash table A bucket array is an array A of size N A[i] is a bucket, i.e. a collection of pairs N is the capacity of A is inserted in A[k] if keys are well distributed between 0.. N-1 if keys are unique integers in range 0.. N-1 then each bucket holds at most one entry. consequently O(1) for get, insert, delete downside: space is proportional to N if N is much larger than n (number of entries) we waste space downside: keys must be in range 0.. N this may not be the case (think matric number) bucket array

10 10 0123456789 (1,D) (3,C) (7,Q) (6,C) Bucket array of size 11 for the entries (1,D), (3,C), (3,F), (6,C) and (7,Q) If hashed keys unique entries in range [0..11] then each bucket holds at most one entry. Otherwise we have a collision and need to deal with it. hash tablebucket array

11 11 hash table collision When two different entries map to the same bucket we have a collision bucket array

12 12 hash table collision When two different entries map to the same bucket we have a collision It’s good to avoid collisions bucket array

13 hash function

14 hash table hash function A hash function maps each key to an integer in the range [0,N-1] Given entry … h(k) is the index into the bucket array store entry in A[h(k)] h is a good hash function if h maps keys so as to minimise collisions h is easy to compute/program h is fast to compute h(k) has two actions 1.map k to a hash code 2.map hash code into range [0,N-1]

15 hash table hash function hash codes in java But care should be taken as this might not be “good”

16 a bit of maths … that you know (af2)

17 Let A and B be sets A function is a mapping from elements of A to elements of B and is a subset of AxB i.e. can be defined by a set of tuples! af2

18 A is the domain B is codomain f(x) = y y is image of x x is preimage of y There may be more than one preimage of y There is only one image of x otherwise not a function There may be an element in the codomain with no preimage Range of f is the set of all images of A the set of all results af2

19 Injection (aka one-to-one, 1-1) a b c d u w y z injection a d x y z not an injection If an injection then preimages are unique b c v x af2

20 Injection (aka one-to-one, 1-1) a b c d u w y z injection a d x y z not an injection If an injection then preimages are unique b c v x Ideally we want our hash function to be injective (no collisions) have a small codomain and range may need to compress range af2

21 back to ads2

22 hash code & hash function Just to clear this up (but lets not make too big a deal about it) …

23 hash code & hash function Just to clear this up (but lets not make too big a deal about it) … We assume hash code is an integer in the codomain Hash function brings hash codes into the range [0,N-1] We will examine just a few hash functions, acting on strings

24 hash code & hash functionPolynomial hash codes Assume we have a key s that is a character String Here is a really dumb hash code public int dumbHash(String s){ int code = 0; for (int i=0;i<s.length();i++) code = code + s.charAt(i); return code; } What would we get for dumbHash(“spot”) dumbHash(“pots”) dumbHash(“tops”) dumbHash(“post”)

25 hash code & hash functionPolynomial hash codes Take into consideration the “position” of elements of the key So, this doesn’t look any different from an every-day number It’s to the base a and the coefficients are the components of the key

26 hash code & hash functionPolynomial hash codes Good values for a appear to be 33, 37, 39, 41

27 hash code & hash functionPolynomial hash codes Small scale experiments on unix dictionary a = 33 25104 words/strings minimum hash value -9165468936209580338 maximum hash value 8952279818009261254 collision count 7 Yikes! Look at that range!!!!

28 hash code & hash functionCyclic shift hash codes Start moving bits around

29 hash code & hash functionCyclic shift hash codes

30 hash code & hash functionCyclic shift hash codes Thanks to Arash Partow

31 hash code & hash functionCyclic shift hash codes

32 hash code & hash functionCyclic shift hash codes

33 hash code & hash functionCyclic shift hash codes

34 hash code & hash functionCyclic shift hash codes

35 hash code & hash functionCyclic shift hash codes

36 hash code & hash functionCyclic shift hash codes

37 hash code & hash functionCyclic shift hash codes

38 hash code & hash functionCompression Functions So, you think you’ve found something that produces a good hash code … How do we compress its range to fit into our machine?

39 hash code & hash functionCompression Functions Assume we want to limit storage to buckets in range [0,N-1] The division method int i = (int)(hash(s) % N); S[i] = s; … ideally, but there may be collisions  NOTE: keep N prime

40 hash code & hash functionCompression Functions Assume we want to limit storage to buckets in range [0,N-1] The multiply add and divide (MAD) method N is prime a > 1 is scaling factor b ≥ 0 is a shift a % N ≠ 0

41 hash tables Collision handling schemes

42 hash tables Collision handling schemes Separate Chaining

43 hash tables Collision handling schemes Separate Chaining bucket[i] is a small map implemented as a list bucket[i] should be a short list It may be sorted It might be something other than a list

44 hash tables Collision handling schemes Let N be number of buckets and n the amount of data stored load factor is n/M Downside: requires auxiliary data structures (to resolve collisions) this may put additional burden on space Separate Chaining Upside: simple

45 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7

46 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Jon,plumber) hash(Jon) = 3

47 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Jon,plumber) hash(Jon) = 3 Jon,plumber

48 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Fred,painter) hash(Fred) = 6 Jon,plumber

49 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Fred,painter) hash(Fred) = 6 Jon,plumber Fred,painter

50 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Joe,prof) hash(Joe) = 1 Jon,plumber Fred,painter

51 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Joe,prof) hash(Joe) = 1 Jon,plumber Fred,painter Joe,prof

52 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Ted,cat) hash(Ted) = 3 Jon,plumber Fred,painter Joe,prof

53 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Ted,cat) hash(Ted) = 3 Jon,plumber Fred,painter Joe,prof

54 hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list 2 3 4 5 6 7 put(Ted,cat) hash(Ted) = 3 Jon,plumber Fred,painter Joe,prof Ted,cat

55 hash tables Collision handling schemes Open Addressing

56 hash tables Linear Probing Open Addressing

57 hash tables Linear Probing Open Addressing i = hash(key); bucket[i] != null; collision! Try next bucket[(i+2) % N] Try next bucket[(i+N-1) % N] Try next bucket[(i+1) % N]

58 hash tables Linear Probing Open Addressing locnkeyvalue 0 1 2 3 4 5 6 7

59 hash tables Linear Probing Open Addressing locnkeyvalue 0 1 2 3 4 5 6 7 put(Jon,plumber) hash(Jon) = 3

60 hash tables Linear Probing Open Addressing locnkeyvalue 0 1 2 3Jonplumber 4 5 6 7 put(Jon,plumber) hash(Jon) = 3

61 hash tables Linear Probing Open Addressing locnkeyvalue 0 1 2 3Jonplumber 4 5 6 7 put(Fred,painter) hash(Fred) = 6

62 hash tables Linear Probing Open Addressing locnkeyvalue 0 1 2 3Jonplumber 4 5 6Fredpainter 7 put(Fred,painter) hash(Fred) = 6

63 hash tables Linear Probing Open Addressing locnkeyvalue 0 1 2 3Jonplumber 4 5 6Fredpainter 7 put(Joe,prof) hash(Joe) = 1

64 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Joe,prof) hash(Joe) = 1

65 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

66 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

67 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

68 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

69 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7 put(Jock,dancer) hash(Jock) = 7

70 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Jock,dancer) hash(Jock) = 7

71 hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Burt,poet) hash(Burt) = 0

72 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Burt,poet) hash(Burt) = 0

73 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

74 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

75 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

76 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

77 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

78 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2Bobfish 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

79 hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2Bobfish 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer

80 hash tables Linear Probing Open Addressing What happens with get(key)? 1.i = hash(key); 2.bucket[i] == key … found, return 3.bucket[i] == null … not found, return  4.bucket[i] != null and bucket[i] != key i = (i+1) % N goto 2 “Linear Probing” gets its name because accessing a bucket is viewed as a probe

81 hash tables Linear Probing Open Addressing What happens with remove(key)? 1.i = hash(key); 2.bucket[i] == key … found bucket[i] = “removed” return 3.bucket[i] == null … not found  return 4. bucket[i] != null and bucket[i] != key i = (i+1) % N goto 2 We have a special marker “removed”

82 hash tables Linear Probing Open Addressing What happens with put(key)? 1.Free location j = -1; 2.i = hash(key); 3.bucket[i] == key … found update bucket[i] return 4.bucket[i] == “removed” j = i; i = (i+1) % N goto 3 5.bucket[i] != null && bucket[i] != key i = (i+1) % N goto 3 6. bucket[i] == null // search stops if (j > -1) bucket[j] = if (j = -1) bucket[i] =

83 hash tables Linear Probing Open Addressing So? Advantages saves space as bucket[i] is only a bucket for a single entry that is, no additional data structures Disadvantages removals are complicated put is complicated if there are collisions entries might clump together search can then degenerate from O(1) down to O(N) We might use linear probing when memory is tight and we want FAST access

84 hash tables Quadratic Probing Open Addressing

85 hash tables Quadratic Probing Open Addressing Quadratic probing iteratively try …. bucket[(i + f(j)) % N] where i = hash(key) j = 0,1,2,… f(j) = j*j

86 hash tables Double Hashing Open Addressing

87 hash tables Double Hashing Open Addressing We have a secondary hash function (call it g) i = hash(key) and collision at bucket[i] Try bucket[(i + g(key)) % N] Where g(key) = q – (key % q) Where q is a prime number < N

88 hash tables So? Open Addressing

89 hash tablesSo? Open Addressing Open addressing saves space, but is complicated, and may be slower In experiments chaining is competitive or faster, depending on load factor If memory is not an issue: recommend use chaining with low load factor

90

91

92

93


Download ppt "1 the hash table. hash table A hash table consists of two major components …"

Similar presentations


Ads by Google