CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #11.

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #2.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #13.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #14.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
Chapter 5 Record Storage and Primary File Organizations
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11

secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database

The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees

Another Index Structure: Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.

Example hash function key k = ‘x 1 x 2 … x n ’ n byte character string have b buckets hash function h(k): h(k) = ( x 1 + x 2 + … + x n ) mod b 5

 This may not be the best function…  Read Knuth Vol. 3 if you really need to select a good function Good hash  Expected number of function: keys/bucket is roughly the same for all buckets 6

Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & inserts/deletes not too frequent 7

Next:example to illustrate inserts, overflows, deletes h(key) 8

EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b 9

EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e 10

a b c e d EXAMPLE: deletion DELETE: e f f g 11

a b c e d EXAMPLE: deletion DELETE: e f f g maybe move “g” up 12

a b c e d EXAMPLE: deletion DELETE: e f f g c maybe move “g” up 13

a b c e d EXAMPLE: deletion DELETE: e f f g c d maybe move “g” up 14

Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit 15

Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 16

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 17

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 18

Extensible hashing: two ideas (a) Use i of b bits output by hash function h(k)  use i  grows over time… b (b) Use directory h(k)[i] to bucket

Extensible Hashing: General framework 20 k h(k)h(k) i h h(k)ih(k)i i i 00…00 00…01 11…11 i # bits used by the directory j1j1 j1j1 j2j2 directory buckets # bits used by the buckets

Example: h(k) is 4 bits; 2 keys/bucket i = Insert

Example: h(k) is 4 bits; 2 keys/bucket i = Insert

Example: h(k) is 4 bits; 2 keys/bucket i = Insert New directory i =

i = Example continued 24

Insert: i = Example continued

Insert: i = Example continued

Insert: i = Example continued

Insert: i = Example continued

i = Example continued 29

i = Insert: 1000 Example continued

i = Insert: 1000 Example continued i =

Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 32

Note: Still need overflow chains Example: many records with duplicate keys insert if we split: 1101 ? 33

Solution: overflow chains insert 1100 add overflow block:

Extensible hashing: Searching input: a search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with the address D[m]. Summary A. 35

Extensible hashing: Insertion input: a tuple t with search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with address D[m]; 3.IF B has room THEN add t in B 4.ELSE let j be the bit number of B IF i = j THEN {double the size of D, i = i + 1; and let the pointers in the new D[2h] and D[2h+1] both equal to that in the old D[h], 0 ≤ h ≤ 2 i ; } split B + t into B 1 and B 2, both with block bit number j+1; let the two corresponding pointers in D go to B 1 and B 2, resp. Summary B. 36

Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary C. + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not)

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 38

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 39