Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i <= n - 1; i++) { for (j = i + 1; j <= n; j++) { for (k = 1;

Hash Tables

2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i <= n - 1; i++) { for (j = i + 1; j <= n; j++) { for (k = 1; k <= j; k++) { /* Some statement taking O(1) time */ }

3 Exercise 3 /* Exercise 2 */ void veryodd(int n) { int i, j, x, y; x = 0; y = 0; for (i = 1; i <= n; i++) { if (i % 2 == 1) { for (j = i; j <= n; j++) { x = x + 1; } for (j = 1; j <= i; j++) { y = y + 1; }

4 Consider www.google.com  Efficient searches: lookup “laptop” in all web pages  How many web pages ? How fast is response ?

5 Consider www.google.com  4 billion pages  Consider data structures: linked list, sorted linked list, array, sorted array, BST

6 Unsorted Linked List of n elem int searchList(List *a, int key) { if (a == NULL) return NULL; //not found if (a->data == key) return a; return searchList(a->next, key); } Best, Average, Worst T(n) ?

7 Sorted Linked List of n elem int searchList(List *a, int key) { if (a == NULL) return NULL; //not found if (a->data == key) return a; return searchList(a->next, key); } Best, Average, Worst T(n) ?

8 Unsorted Array of n elem int seq_search(int n, int *a, int key) { int i = 0; while (i < n && a[i] != key) { i++; } return i; } Best, Average, Worst T(n) ?

9 Sorted Array of n elem int binary_search(int n, int *a, int key) { int lo = -1; int hi = n; while (hi - lo != 1) { int mid = (hi + lo) / 2; if (a[mid] <= key) { lo = mid; } else { hi = mid; } return lo; } Best, Average, Worst T(n) ?

10 How about BST ?  Best O(1)  Average O(logn)  Worst O(n) – very imbalanced (tree degenerates to list)

11 Answer: Hash Tables  Search complexity is O(1) with “good” hash function  Hash Table: A generalization of an array that under some assumptions allows O(1) for Insert/Delete/Search

12 Intuition  How can you store all Student Numbers in an array? Use an array with range 0 - 999,999,999 This will give you O(1) access time but … considering there are approx. 5000 students you waste lots of array entries!  Problem: The range of key values is too large (0-999,999,999) when compared to the # of keys (students)

13 Formal Definition  Hash Tables solve this problem by using a smaller array and mapping keys with a hash function.  Set of keys K and an array of size m. A hash function h is a function from K to 0…m-1, that is: h : K 0…m-1

14 Example Hash Function 0123456701234567 k 888999222 k 123456789

15 Example Hash Function For example, if we hash the student number keys into a hash table with 8 entries we could use h ( key) = key mod 8 0123456701234567 k 888999222 k 123456789

16 Problem ? Collisions: Two keys hash into the same array entry h ( 888888888) = h (000000000) = key % 8 = 0 0123456701234567 k 888999222 k 123456789

17 Solution Hashing with Chaining (Open Hashing): every hash table entry contains a pointer to a linked list of keys that hash in the same entry Closed Hashing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

18 Hashing with Chaining (Open Hashing)  h (54) = 54 % 5 = 4 = h (34) – solved by CHAIN-ing 0123401234 key next 2 21 5434 CHAIN

19 Hashing with Chaining 0123401234 key next 2 21 5434 CHAIN Insert 101 – where does it hash to ?

20 Hashing with Chaining  h (101) = 101 % 5 = 1 0123401234 Insert 101 2 21 5434 0123401234 key next 2 21 5434 CHAIN 101

21 Complexity Analysis  What is the running time to insert/search/delete? Insert: It takes O(1) time to compute the hash function and insert at head of linked list Search: It is proportional to max linked list length Delete: Same as search

22 What is a “good” hash ?  uniform hashing: each key is equally likely to hash in any of the m slots Creating a “good” hash function is black magic !  How about when keys are student names ?  Interpret characters as numbers: (int)‘a’, (int)‘b’, (int)‘c’ means 97 98 99 Ex. Hash for names:  Name “abc” hashes to (‘a’+‘b’+‘c’)% m

23 Example Hash Function For example, if we hash the student number keys into a hash table with 8 entries we could use h ( key) = key mod 8 0123456701234567 k 888999222 k 123456789

24 Hashing with Chaining 0123401234 key next 2 21 5434 CHAIN Insert 101 – where does it hash to ?

25 Closed Hashing  The key is first mapped to a slot: index = h(k)  If there is a collision, subsequent probes are performed  collision resolution is done as a linear search. This is known as linear probing. index = (index + 1) % m

26 Closed Hashing with Linear Probing 9537 1001 9875 9874 2009 3016 0123401234 5 6 7 8 9 10 H(k) = k % 11 Insert(1100)  ?

29 Closed Hashing with Linear Probing 9537 1001 9875 9874 2009 3016 0123401234 5 6 7 8 9 10 H(k) = k % 11 Insert(1100)  3 Same for keys that hash into 0 or 1 Prob(insert_into_3) = ?

30 Closed Hashing with Linear Probing 9537 1001 9875 9874 2009 3016 0123401234 5 6 7 8 9 10 H(k) = k % 11 Insert(1100)  3 Same for keys that hash into 0 or 1 Prob(insert_into_3) = 4/11

31 Closed Hashing with Linear Probing 9537 1001 9875 9874 2009 3016 0123401234 5 6 7 8 9 10 H(k) = k % 11 Insert(1100)  3 Same for keys that hash into 0 or 1 Prob(insert_into_4) = 1/11 Prob(insert_into_3) = 4/11

32 Closed Hashing with Linear Probing 9537 1001 9875 9874 2009 3016 0123401234 5 6 7 8 9 10 H(k) = k % 11 Assume: Insert(1052)  10 Prob(insert_into_4) = ? Prob(insert_into_3) = ? 1052

33 Closed Hashing with Linear Probing 9537 1001 9875 9874 2009 3016 0123401234 5 6 7 8 9 10 H(k) = k % 11 Assume: Insert(1052)  10 Prob(insert_into_4) = 1/11 Prob(insert_into_3) = 8/11 1052

34 Problem: Clustering  Even with a good hash function, linear probing has its problems: The position of the initial mapping i 0 of key k is called the home of k. When several insertions map to the same home position, they end up placed contiguously in the table. This collection of keys with the same home position is called a cluster. As clusters grow, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. As these clusters grow, they merge with other clusters forming even bigger clusters which grow even faster. This tendency of linear probing to place items together is known as primary clustering.

35 Complexity Analysis – Worst Case  What is the running time to insert/search/delete? Insert: Same as search Search: It is proportional to max no of probes Delete: Same as search Worst O(n)

36 Complexity Analysis  When hash table is empty – insert is in 1 step (in home position)  As the table fills up, the probab that a record can be inserted in 1 step decreases More and more records are likely to be inserted far from their home position

37 Complexity Analysis - Intuition  The expected (avg.) cost of hash (insert/search/delete) is a function of how full the table is

38 The Load Factor m n  n is the number of entries in a hash table that are occupied m is the size of the hash table  =1 means the table is full, and  =0 means the table is empty.

39 Complexity Analysis - Average Case  The load factor where n current no of records  On avg. probability to find the position occupied:  The probability to find both position and next position occupied is n/m * (n-1)/(m-1)  The probability of i collisions is: n/m * (n-1)/(m-1) * …(n- i +1)/(m – i +1) ~ (n/m) i probes = 1 +  i =1 to N (n/m) i m n  m  n

40 Complexity Analysis Average Case  It can be shown that the number of probes in a successful search, C, and the number of probes in an unsuccessful search, C’ is given by:       1 1 1 1 l C 2 1 2 1 2     C C                    2 1 1 1 2 1 1 1 1 2 1   C C Separate chainingLinear probing

41 0.81 Average # of probes Load factor Successful search Linear probing Double hashing Separate chaining

42 0.81 Average # of probes Load factor Unsuccessful search Linear probing Double hashing Separate chaining

43 Insert Implementation bool HashTable:: hashInsert(const Elem &e){ int home; int index = home = h(getkey(e)); for (int i = 1; !is_empty(HT[index]); i++) { index = (home + i) % m; // follow probes if (is_equal (e, HT[index]) return false; // duplicate } HT[index] = e; return true; }

44 Search Implementation bool HashTable:: hashSearch(const Key &k, Elem &e){ int home; int index = home = h(k); for (int i = 1; !is_empty(HT[index]) && !is_equal(k, HT[index]); i++) index = (home + i) % m; // follow probes if (is_equal (k, HT[index]){ //found it e = HT[index]; return true; } else return false; // k is not in the table }

Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i <= n - 1; i++) { for (j = i + 1; j <= n; j++) { for (k = 1;

Similar presentations

Presentation on theme: "Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i <= n - 1; i++) { for (j = i + 1; j <= n; j++) { for (k = 1;"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i <= n - 1; i++) { for (j = i + 1; j <= n; j++) { for (k = 1;

Similar presentations

Presentation on theme: "Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i <= n - 1; i++) { for (j = i + 1; j <= n; j++) { for (k = 1;"— Presentation transcript:

Similar presentations

About project

Feedback