Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.

Similar presentations


Presentation on theme: "External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key."— Presentation transcript:

1 External Memory Hashing

2 Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be in bucket h(K). - One disk I/O if there is only one block per bucket. Hash­Table Lookup: For record(s) with search key K, compute h(K); search that bucket.

3 Hash­Table Insertion Put in bucket h(K) if it fits; otherwise create an overflow block. - Overflow block(s) are part of bucket. Example: Insert record with search key g.

4 What if the File Grows too Large? Efficiency is highest if #records < #buckets  #(records/block) If file grows, we need a dynamic hashing method to maintain the above relationship. - Extensible Hashing: double the number of buckets when needed. - Linear hashing: add one more bucket as appropriate.

5 Dynamic Hashing Framework Hash function h produces a sequence of k bits. Only some of the bits are used at any time to determine placement of keys in buckets. Extensible Hashing (Buckets may share blocks!) Keep parameter i = number of bits from the beginning of h(K) that determine the bucket. Bucket array now = pointers to buckets. - A block can serve several buckets. - For each block, a parameter j  i tells how many bits of h(K) determine membership in the block. - i.e., a block represents 2 i-j buckets that share the first j bits of their number.

6 Example An extensible hash table when i=1:

7 Extensible Hash­table Insert If record with key K fits in the block B pointed to by h(K), put it there. If not, let this block B represent j bits. 1. j=i: a.Set i:=i+1; b.Double the bucket array, so it has now 2 i+1 entries; c.Let w be an old array entry. Both the new entries, w0 and w1, point to the same block that w used to point to. d.Split B into two and distribute the records (of B) according to (j+1) st bit; i.set j:=j+1; ii.fix pointers in bucket array, so that entries that formerly pointed to B now point either to B or the new block How? depending on…(j+1) st bit 2. j<i: a)Do as in 1.d

8 Example Insert record with h(K) = 1010. Before Now, after the insertion

9 Example: Next Next: records with h(K)=0000; h(K)=0111. - Bucket for 0... gets split, - but i stays at 2. Then: record with h(K) = 1000. - Overflows bucket for 10... - Raise i to 3. After the insertions Currently

10 Extensible Hash Tables: Advantages: Lookup; never search more than one data block. - Hope that the bucket array fits in main memory Defects: Doubling the bucket array could make the array to not fit in main memory. Problem with skewed key distributions. - E.g. Let 1 block=2 records. Suppose that three records have hash values, which happen to be the same in the first 20 bits. - In that case we would have i=20 and and one million bucket-array entries, even though we have only 3 records!!

11 Linear Hashing Use i bits from right (low­order) end of h(K). Buckets numbered [0…n-1], where 2 i-1 <n  2 i. Let last i bits of h(K) be m = a 1 a 2 …a i 1.If m < n, then record belongs to bucket m. 2.If n  m<2 i, then record belongs to bucket m-2 i-1, that is the bucket we would get if we changed a 1 (which must be 1) to 0. i=1 n=2 r=3 This is also part of the structure #of records #of buckets

12 Linear Hash­Table Insert Pick an upper limit on capacity, - e.g., 85% (1.7 records/bucket in our example). If an insertion exceeds capacity limit, set n := n + 1. - If new n is 2 i + 1, set i := i + 1. No change in bucket numbers needed --- just imagine a leading 0. - Need to split bucket n - 2 i-1 because there is now a bucket numbered (old) n.

13 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=0 n=1 i=1 0 r=1 n=1 i=1 0000 0 BeforeAfter

14 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=1 n=1 i=1 0000 0 r=2 n=2 i=1 1010 0000 01 Capacity limit exceeded; increment n BeforeAfter

15 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=2 n=2 i=1 1010 0000 01 r=3 n=2 i=1 1010 0000 0 1111 1 BeforeAfter

16 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=3 n=2 i=1 1010 0000 0 1111 1 r=4 n=3 i=2 0000 00 Capacity limit exceeded; increment n, which causes incrementing i as well. 1010 10 1111 01 0101 BeforeAfter

17 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=4 n=3 i=2 0000 00 1010 10 1111 01 0101 r=5 n=3 i=2 0000 00 1010 10 1111 01 0101 0001 BeforeAfter As long as capacity is not exceeded can add overflow blocks.

18 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=5 n=3 i=2 0000 00 1010 10 1111 01 0101 0001 r=6 n=4 i=2 1010 10 0001 01 0101 Capacity limit exceeded; increment n. 1111 11 BeforeAfter 0000 00 1100

19 Lookup in Linear Hash Table For record(s) with search key K, compute h(K); search the corresponding bucket according to the procedure described for insertion. If the record we wish to look up isn’t there, it can’t be anywhere else. E.g. lookup for a key which hashes to 1010, and then for a key which hashes to 1011. r=4 n=3 i=2

20 Exercise Suppose we want to insert keys with hash values: 0000…1111 in a linear hash table with 100% capacity threshold. Assume that a block can hold three records.


Download ppt "External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key."

Similar presentations


Ads by Google