Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.

Similar presentations


Presentation on theme: "Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010."— Presentation transcript:

1 Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010

2 Hashing 1.What is Hashing? 2.Problems in hashing 3.Collision Resolution Strategies

3 1. What is Hashing? Hashing is a quick and efficient searching technique. Hashing is a quick and efficient searching technique. So far, efficiency of search depended on the number of comparisons So far, efficiency of search depended on the number of comparisons In hashing the keys themselves point directly to records by applying a hashing function. In hashing the keys themselves point directly to records by applying a hashing function. All possible key values are mapped into in the hash table. All possible key values are mapped into in the hash table. The hashing function is used for search as well as for storing. The hashing function is used for search as well as for storing.

4 1. What is Hashing? The hash table is sequential and contiguous. The hash table is sequential and contiguous. Each slot is called a bucket. Each slot is called a bucket. Buckets may hold more than one key. Buckets may hold more than one key.

5 1. What is Hashing? Hashing methods: Hashing methods: Direct and Subtraction Direct and Subtraction Modulo-division (or division remainder) using list size ( prime, why?) Modulo-division (or division remainder) using list size ( prime, why?) Digit extraction Digit extraction Midsquare Midsquare Folding ( fold shift, fold boundary) Folding ( fold shift, fold boundary) Pseudo random ( seed) Pseudo random ( seed)

6 Hashing 1.What is Hashing? 2.Problems in hashing 3.Collision Resolution Strategies

7 Problems in Hashing Collision occurs whenever a hash function maps two distinct keys to the same bucket. Collision occurs whenever a hash function maps two distinct keys to the same bucket. The hashing function must generate bucket addresses quickly and efficiently, with minimum collisions. The hashing function must generate bucket addresses quickly and efficiently, with minimum collisions. As the domain of keys is usually larger than the number of buckets collisions are very likely to happen no matter how efficient the hashing function is. As the domain of keys is usually larger than the number of buckets collisions are very likely to happen no matter how efficient the hashing function is.

8 Hashing 1.What is Hashing? 2.Problems in hashing 3.Collision Resolution Strategies

9 Definitions: Definitions:  Load factor = list size/num of elements in list = list size/num of elements in list  Clustering ( primary, secondary)

10 3. Collision Resolution Strategies Open Addressing: (using prime area) Open Addressing: (using prime area) Probing (Linear, quadratic) Probing (Linear, quadratic) Double Hashing Double Hashing Pseudo-random Pseudo-random Key offset Key offset Linked Lists (Separate Chaining) Linked Lists (Separate Chaining) (Bucket Hashing) (Bucket Hashing) Re-hashing Re-hashing

11 3. Collision Resolution Strategies Open Addressing: Open Addressing:  Probing: Linear Probing: Search at constant intervals from collision (typically 1) Linear Probing: Search at constant intervals from collision (typically 1) Quadratic Probing: Search at quad- ratically increasing intervals, i.e. collision function f(i) = i 2 ; i.e. on collision searching 1 st, 4 th, 9 th, … location Quadratic Probing: Search at quad- ratically increasing intervals, i.e. collision function f(i) = i 2 ; i.e. on collision searching 1 st, 4 th, 9 th, … location

12 Linear Probing

13 3. Collision Resolution Strategies Open Addressing: (using prime area) Open Addressing: (using prime area) Probing (Linear, quadratic) Probing (Linear, quadratic) Double Hashing Double Hashing Pseudo-random Pseudo-random Key offset Key offset Linked Lists (Separate Chaining) Linked Lists (Separate Chaining) (Bucket Hashing) (Bucket Hashing) Re-hashing Re-hashing

14 3. Collision Resolution Strategies Open Addressing Open Addressing  Double Hashing: Apply a second hashing function and probe at the obtained address: hash 2 (x), 2* hash 2 (x), 3* hash 2 (x),...

15 3. Collision Resolution Strategies Open Addressing: (using prime area) Open Addressing: (using prime area) Probing (Linear, quadratic) Probing (Linear, quadratic) Double Hashing Double Hashing Pseudo-random Pseudo-random Key offset Key offset Linked Lists (Separate Chaining) Linked Lists (Separate Chaining) (Bucket Hashing) (Bucket Hashing) Re-hashing Re-hashing

16 3. Collision Resolution Strategies Linked lists (Separate Chaining): Linked lists (Separate Chaining): Separate chaining ( may be modified by keeping the chain sorted!) Separate chaining ( may be modified by keeping the chain sorted!) Modified Hash Table (by eliminating the first probe, hence the hash table becomes an array of records instead of an array of pointers to records) Modified Hash Table (by eliminating the first probe, hence the hash table becomes an array of records instead of an array of pointers to records)

17 Linked List (Separate Chaining)

18 3. Collision Resolution Strategies Open Addressing: (using prime area) Open Addressing: (using prime area) Probing (Linear, quadratic) Probing (Linear, quadratic) Double Hashing Double Hashing Pseudo-random Pseudo-random Key offset Key offset Linked Lists (Separate Chaining) Linked Lists (Separate Chaining) (Bucket Hashing) (Bucket Hashing) Re-hashing Re-hashing

19 3. Collision Resolution Strategies Rehashing: Rehashing: When table becomes too full, operations will start taking too long When table becomes too full, operations will start taking too long Solution: Build another hashing table of about double size + associated hashing function and scan down entire original hash table Solution: Build another hashing table of about double size + associated hashing function and scan down entire original hash table successful search unsuccessful search

20 3. Collision Resolution Strategies Rehashing: Rehashing: When is the table too full ? When is the table too full ? Rehash when table is half full Rehash when table is half full Rehash when an insertion fails Rehash when an insertion fails When table reaches a certain load factor..... best When table reaches a certain load factor..... best

21 End of Hashing

22 Probing  Definition: Each calculation of an address and test for success is known as probing

23 Key offset collision resolution  Offset = key/list size  Address= (Offset + old address) % list size


Download ppt "Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010."

Similar presentations


Ads by Google