DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
Hashing and Indexing John Ortiz.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hash Indexes: Chap. 11 CS634 Lecture 6, Feb
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
Chapter 5 Record Storage and Primary File Organizations
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
Relational Database Systems 2
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

DBMS 2001Notes 4.2: Hashing2 Hashing? Locating the storage block of a record by the hash value h(k) of its key k Normally really fast –records (often) located by a single disk access

DBMS 2001Notes 4.2: Hashing3 key  h(key) Hashing Buckets (typically 1 disk block)

DBMS 2001Notes 4.2: Hashing Two alternatives records key  h(key) (1) Hash value determines the storage block directly to implement a primary index

DBMS 2001Notes 4.2: Hashing5 key  h(key) Index record key 1 Two alternatives for a secondary index (2) Records located indirectly via index buckets

DBMS 2001Notes 4.2: Hashing6 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h = (x 1 + x 2 + … + x n ) mod b  {0, 1, …, b-1}

DBMS 2001Notes 4.2: Hashing7  This may not be best function … Good hash Expected number of function: keys/bucket is the same for all buckets  Read Knuth Vol. 3 if you really need to select a good function.

DBMS 2001Notes 4.2: Hashing8 Next: example to illustrate inserts, overflows, deletes h(K)

DBMS 2001Notes 4.2: Hashing9 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e

DBMS 2001Notes 4.2: Hashing a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

DBMS 2001Notes 4.2: Hashing11 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

DBMS 2001Notes 4.2: Hashing12 How do we cope with growth? Overflows and reorganizations Dynamic hashing: # of buckets may vary Extensible Linear also others...

DBMS 2001Notes 4.2: Hashing13 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i  grows over time… For example, b=32

DBMS 2001Notes 4.2: Hashing14 (b) Use directory h(K)[i ] to bucket Directory contains 2 i pointers to buckets, and stores i. Each bucket stores j, indicating #bits used for placing the records in this block (j  i)

DBMS 2001Notes 4.2: Hashing15 Extensible Hashing: Insertion If there's room in bucket h(k)[i], place record there; Otherwise … If j=i, set i=i+1 and double the directory If j<i, split the block in two, distribute records among them now using j+1 bits of h(k); (Repeat until some records end up in the new bucket); Update pointers of bucket array See the next example

DBMS 2001Notes 4.2: Hashing16 Example: h(k) is 4 bits; 2 keys/block i = Insert New directory i = 2 2 (j)

DBMS 2001Notes 4.2: Hashing Insert: i = Example continued

DBMS 2001Notes 4.2: Hashing i = Insert: 1001 Example continued i = 3 3

DBMS 2001Notes 4.2: Hashing19 Extensible hashing: deletion Reverse insert procedure … Example: Walk thru insert example in reverse!

DBMS 2001Notes 4.2: Hashing20 Extensible hashing Can handle growing files - without full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (First it fits in memory, then it does not  sudden performance degradation) - - Only one data block examined +

DBMS 2001Notes 4.2: Hashing21 Linear hashing: grow # of buckets by one  No bucket directory needed Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly

DBMS 2001Notes 4.2: Hashing22 Linear Hashing: Parameters n: number of buckets in use –buckets numbered 0…n-1 i: number of bits of h(k) used to address buckets r: number of records in hash table –ratio r/n limited to fit an avg bucket in a block –next example: r  1.7n, and block holds 2 records => AVG bucket occupancy is  1.7/2 = 0.85 of a block

DBMS 2001Notes 4.2: Hashing23 Example: 2 keys/block, b=4 bits, n=2, i = If h(k)[i ] = (a 1 … a i ) 2 < n, then look at bucket h(k)[i ]; else look at bucket h(k)[i ] - 2 i -1 = (0a 2 … a i ) 2 Rule insert now r=4 >1.7n  get new bucket 10 and distribute keys btw buckets 00 and 10

DBMS 2001Notes 4.2: Hashing24  n=3, i =2; distribute keys btw buckets 00 and 10:

DBMS 2001Notes 4.2: Hashing25 n=3, i =2; insert 0001: can have overflow chains!

DBMS 2001Notes 4.2: Hashing26 n=3, i = insert 0111 bucket 11 not in use  redirect to now r=6 > 1.7n -> get new bucket 11

DBMS 2001Notes 4.2: Hashing27 n=4, i =2; distribute keys btw 01 and

DBMS 2001Notes 4.2: Hashing28 Example Continued: How to grow beyond this? m = 11 (max used block) i =

DBMS 2001Notes 4.2: Hashing29 Linear Hashing Can handle growing files - without full reorganizations No indirection directory of extensible hashing Can have overflow chains - but probability of long chains can be kept low by controlling the r/n fill ratio (?) Summary + + -

DBMS 2001Notes 4.2: Hashing30 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary

DBMS 2001Notes 4.2: Hashing31 Next: Indexing vs Hashing Index definition in SQL

DBMS 2001Notes 4.2: Hashing32 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

DBMS 2001Notes 4.2: Hashing33 INDEXING (Including B-Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

DBMS 2001Notes 4.2: Hashing34 Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name

DBMS 2001Notes 4.2: Hashing35 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS ( e.g. Load Factor, Size of Hash,...)... at least in SQL … Oracle and IBM DB2 UDB provide a PCTFREE clause to inditate the proportion of B-tree blocks initially left unfilled Oracle: “Hash clusters” with built-in or DBA- specified hash function Note

DBMS 2001Notes 4.2: Hashing36 The BIG picture…. Chapters 2 & 3: Storage, records, blocks... Chapter 4: Access Mechanisms - Indexes - B trees - Hashing Chapters 6 & 7: Query Processing NEXT