Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.

Similar presentations


Presentation on theme: "DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals."— Presentation transcript:

1 DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

2 DBMS 2001Notes 4.2: Hashing2 Hashing? Locating the storage block of a record by the hash value h(k) of its key k Normally really fast –records (often) located by a single disk access

3 DBMS 2001Notes 4.2: Hashing3 key  h(key) Hashing...... Buckets (typically 1 disk block)

4 DBMS 2001Notes 4.2: Hashing4...... Two alternatives records...... key  h(key) (1) Hash value determines the storage block directly to implement a primary index

5 DBMS 2001Notes 4.2: Hashing5 key  h(key) Index record key 1 Two alternatives for a secondary index (2) Records located indirectly via index buckets

6 DBMS 2001Notes 4.2: Hashing6 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h = (x 1 + x 2 + … + x n ) mod b  {0, 1, …, b-1}

7 DBMS 2001Notes 4.2: Hashing7  This may not be best function … Good hash Expected number of function: keys/bucket is the same for all buckets  Read Knuth Vol. 3 if you really need to select a good function.

8 DBMS 2001Notes 4.2: Hashing8 Next: example to illustrate inserts, overflows, deletes h(K)

9 DBMS 2001Notes 4.2: Hashing9 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e

10 DBMS 2001Notes 4.2: Hashing10 01230123 a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

11 DBMS 2001Notes 4.2: Hashing11 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

12 DBMS 2001Notes 4.2: Hashing12 How do we cope with growth? Overflows and reorganizations Dynamic hashing: # of buckets may vary Extensible Linear also others...

13 DBMS 2001Notes 4.2: Hashing13 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i  grows over time…. 00110101 For example, b=32

14 DBMS 2001Notes 4.2: Hashing14 (b) Use directory h(K)[i ] to bucket............ Directory contains 2 i pointers to buckets, and stores i. Each bucket stores j, indicating #bits used for placing the records in this block (j  i)

15 DBMS 2001Notes 4.2: Hashing15 Extensible Hashing: Insertion If there's room in bucket h(k)[i], place record there; Otherwise … If j=i, set i=i+1 and double the directory If j<i, split the block in two, distribute records among them now using j+1 bits of h(k); (Repeat until some records end up in the new bucket); Update pointers of bucket array See the next example

16 DBMS 2001Notes 4.2: Hashing16 Example: h(k) is 4 bits; 2 keys/block i = 1 1 1 0001 1001 1100 Insert 1010 1 110010 New directory 2 00 01 10 11 i = 2 2 (j)

17 DBMS 2001Notes 4.2: Hashing17 1 0001 2 1001 10 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 00 0001 2 2

18 DBMS 2001Notes 4.2: Hashing18 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3

19 DBMS 2001Notes 4.2: Hashing19 Extensible hashing: deletion Reverse insert procedure … Example: Walk thru insert example in reverse!

20 DBMS 2001Notes 4.2: Hashing20 Extensible hashing Can handle growing files - without full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (First it fits in memory, then it does not  sudden performance degradation) - - Only one data block examined +

21 DBMS 2001Notes 4.2: Hashing21 Linear hashing: grow # of buckets by one  No bucket directory needed Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly

22 DBMS 2001Notes 4.2: Hashing22 Linear Hashing: Parameters n: number of buckets in use –buckets numbered 0…n-1 i: number of bits of h(k) used to address buckets r: number of records in hash table –ratio r/n limited to fit an avg bucket in a block –next example: r  1.7n, and block holds 2 records => AVG bucket occupancy is  1.7/2 = 0.85 of a block

23 DBMS 2001Notes 4.2: Hashing23 Example: 2 keys/block, b=4 bits, n=2, i =1 00 01 1111 0000 1010 If h(k)[i ] = (a 1 … a i ) 2 < n, then look at bucket h(k)[i ]; else look at bucket h(k)[i ] - 2 i -1 = (0a 2 … a i ) 2 Rule insert 0101 0101 now r=4 >1.7n  get new bucket 10 and distribute keys btw buckets 00 and 10

24 DBMS 2001Notes 4.2: Hashing24  n=3, i =2; distribute keys btw buckets 00 and 10: 00 01 10 0101 1111 00 1010

25 DBMS 2001Notes 4.2: Hashing25 n=3, i =2; insert 0001: 00 01 10 0101 1111 00 0001 10 can have overflow chains!

26 DBMS 2001Notes 4.2: Hashing26 n=3, i =2 00 01 10 0101 1111 00 10 0001 insert 0111 bucket 11 not in use  redirect to 01 0111 now r=6 > 1.7n -> get new bucket 11

27 DBMS 2001Notes 4.2: Hashing27 n=4, i =2; distribute keys btw 01 and 11 00 01 1011 0101 1111 00 10 0001 011111 0001

28 DBMS 2001Notes 4.2: Hashing28 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2 0000 101 110 111 3... 100 101 0101

29 DBMS 2001Notes 4.2: Hashing29 Linear Hashing Can handle growing files - without full reorganizations No indirection directory of extensible hashing Can have overflow chains - but probability of long chains can be kept low by controlling the r/n fill ratio (?) Summary + + -

30 DBMS 2001Notes 4.2: Hashing30 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary

31 DBMS 2001Notes 4.2: Hashing31 Next: Indexing vs Hashing Index definition in SQL

32 DBMS 2001Notes 4.2: Hashing32 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

33 DBMS 2001Notes 4.2: Hashing33 INDEXING (Including B-Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

34 DBMS 2001Notes 4.2: Hashing34 Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name

35 DBMS 2001Notes 4.2: Hashing35 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS ( e.g. Load Factor, Size of Hash,...)... at least in SQL … Oracle and IBM DB2 UDB provide a PCTFREE clause to inditate the proportion of B-tree blocks initially left unfilled Oracle: “Hash clusters” with built-in or DBA- specified hash function Note

36 DBMS 2001Notes 4.2: Hashing36 The BIG picture…. Chapters 2 & 3: Storage, records, blocks... Chapter 4: Access Mechanisms - Indexes - B trees - Hashing Chapters 6 & 7: Query Processing NEXT


Download ppt "DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals."

Similar presentations


Ads by Google