Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash C and Data Structure Baojian Hua

Similar presentations


Presentation on theme: "Hash C and Data Structure Baojian Hua"— Presentation transcript:

1 Hash C and Data Structure Baojian Hua bjhua@ustc.edu.cn

2 Searching A dictionary-like data structure contains a collection of tuple data:,, … keys are comparable and pair-wise distinct supports these operations: new () insert (dict, k, v) lookup (dict, k) delete (dict, k)

3 Examples ApplicationPurposeKeyValue Phone Bookphonenamephone No. Banktransactionvisa$$$ Dictionarylookupwordmeaning compilersymbolvariabletype www.google.c om searchkey wordscontents …………

4 Summary So Far rep ’ op ’ arraysorted array linked list sorted linked list binary search tree lookup()O(n)O(lg n)O(n) insert()O(n) delete()O(n)

5 What ’ s the Problem? For every mapping (k, v)s After we insert it into the dictionary dict, we don ’ t know it ’ s position! Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … and then lookup (d, “ zhang ” ); ( “ li ”, 97) … ( “ wang ”, 99) ( “ zhang ”, 100)

6 Basic Plan Start from the array-based approach Use an array A to hold elements (k, v)s For every key k: if we know its position (array index) i from k then lookup, insert and delete are simple: A[i] done in constant time O(1) … (k, v) i

7 Example Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … ;and then lookup (d, “ zhang ” ); … (“li”, 97) ? Problem#1: How to calculate index from the key?

8 Example Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … ;and then lookup (d, “ zhang ” ); … (“li”, 97) ? Problem#2: How long should array be?

9 Basic Plan Save (k, v)s in an array, index calculated from k Hash function: a method for computing index from given keys … (“li”, 97) hash (“li”)

10 Hash Function Given any key, compute an index Efficiently computable Ideal goals: for any key, the index is uniform different keys to different indexes However, thorough research problem, :-( Next, we assume that the array is of infinite length, so the hash function has type: int hash (key k); Next is a “ case analysis ” on how different key types affect “ hash ”

11 Hash Function On “ int ” If the key of hash is of “ int ” type, the hash function is trivial: int hash (int i) { return i; }

12 Hash Function On “ char ” If the key of hash is of “ char ” type, the hash function comes with type conversion: int hash (char c) { return c; }

13 Hash Function On “ float ” Also type conversion: int hash (float f) { return (int)f; } // how to deal with 0.aaa, say 0.5?

14 Hash Function On “ string ” int hash (char *s) { int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum; }

15 From “ int ” Hash to Index Problems with “ int ” Hash Type At any time, the array is finite no negative index (say -10) Our goal: int i ==> [0, N-1] Aha, that ’ s easy! It ’ s just: abs(i) % N

16 Bug! Note that “ int ” s range: -2 31 ~2 31 -1 So abs(-2 31 ) = 2 31 (Overflow!) The key step is to wipe the sign bit off int t = i & 0x7fffffff; int hc = t % N; In summary: hc = (i & 0x7fffffff) % N;

17 Collision Given two keys k1 and k2, we compute two hash values h1, h2  [0, N-1] If k1<>k2, but h1==h2, then a collision occurs … (k1, v1) i (k2, v2)

18 Collision Resolution Open Addressing Re-hash Chaining

19 For collision index i, we keep a separate linear list (chain) at index i … (k1, v1) i (k2, v2) k1 k2

20 Load Factor loadFactor=numItems/numBuckets defaultLoadFactor: default value of the load factor k1 k2 k5k8 k43

21 “ hash ” ADT: interface #ifndef HASH_H #define HASH_H #define T Hash_t typedef struct T *T; T Hash_new (); T Hash_new2 (double lf); void Hash_insert (T h, poly key, poly value); poly Hash_lookup (T h, poly key); void Hash_delete (T h, poly key); #undef T #endif

22 Implementation #include “linked-list.h” #include “hash.h” #define EXT_FACTOR 2 #define INIT_BUCKETS 16 #define T Hash_t struct T { LinkedList_t (*buckets)[INIT_BUCKETS]; int numBuckets; int numItems; double defaultLoadFactor; };

23 In Figure k1 k2 k5k8 k43 buckets h

24 “ newHash () ” T Hash_new () { T h; NEW (h); h->buckets = checkedMalloc (initBuckets * sizeof (linkedList)); h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->defaultLoadFactor = 0.25; return h; }

25 “ newHash2 () ” T Hash_new (double lf) { T h; NEW (h); h->buckets = checkedMalloc (initBuckets * sizeof (linkedList)); h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->defaultLoadFactor = lf; return h; }

26 “ lookup (hash, key) ” Poly_t Hash_lookup (T h, poly k) { int i = k->hashCode (); // how to take this? int hc = (i & 0x7fffffff) % (h->numBuckets); Poly_t t = List_Search ((h->buckets)[hc], k); return t; }

27 Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1

28 Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 compare k43 with k8,

29 Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 compare k43 with k43, found!

30 “ insert (hash, key, value) ” void Hash_insert (t h, poly k, poly v) { if (1.0*numItems/numBuckets >=defaultLoadFactor) // buckets extension & items re-hash; int i = k->hashCode (); // how to do this? int hc = (i & 0x7fffffff) % (h->numBuckets); Tuple_t x = Tuple_new (k, v); List_insertHead ((h->buckets)[hc], x); return; }

31 Ex: insert (ha, k13) k1 k2 k5k8 k43 buckets ha hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4

32 Ex: insert (ha, k13) k13 k1 k5k8 k43 buckets ha hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 k2

33 Complexity rep ’ op ’ arraysorted array linked list sorted linked list hash lookup()O(n)O(lg n)O(n) O(1) insert()O(n) O(1) delete()O(n) O(1)


Download ppt "Hash C and Data Structure Baojian Hua"

Similar presentations


Ads by Google