CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Hash Table indexing and Secondary Storage Hashing.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
Dynamic Hashing (Chapter 12)
Lecture 21: Hash Tables Monday, February 28, 2005.
Hashing CENG 351.
CPSC-608 Database Systems
Database Management Systems (CS 564)
Dynamic Hashing.
CS 245: Database System Principles
External Memory Hashing
Introduction to Database Systems
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
External Memory Hashing
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS 245: Database System Principles
Hashing.
Index tuning Hash Index.
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CPS216: Advanced Database Systems
Module 12a: Dynamic Hashing
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Hash-Based Indexes Chapter 11
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #26 Notes #7

Another Index Structure: Hash Tables function h h(x) buckets search key x A bucket is typically a disk block (probably with overflow blocks) h(x), 0 ≤ h(x) ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(x); indirect: h(x) is the index in a directory. Notes #7

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear

Extensible Hashing: General framework # bits used by the buckets (block index) # bits used by the directory (hash index) i j1 00…00 00…01 j2 h i x h(x) h(x)i j2 i . 11…11 i directory buckets

Extensible hashing: Searching input: a search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); read in the disk block B with the address H[m].

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) b i mj h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1.

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)

Insertion in Extensible Hashing

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: c1001 1 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 g1010 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 g1010 k1100 Split the block, and increase the block index

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 k1100 g1010 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size k1100 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 Insert: g1010 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 e0000 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index b0001 e0000 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index 2 b0001 e0000 1 2 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index e0000 2 b0001 d0111 b0001 1 2 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 if the block index is equal to the hash index, first double the directory size Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 Insert: g1010 d0111 e0000 a1000 3 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 a1000 Insert: g1010 d0111 e0000 a1000 c1001 3 g1010 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 a1000 Insert: g1010 d0111 e0000 a1000 c1001 g1010 3 2 k1100

Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure)

Note: Still need overflow chains Example: many records with duplicate keys insert 1100 1 1101 1100

Note: Still need overflow chains Example: many records with duplicate keys if we split: insert 1100 2 For 10** 1 1101 1100 2 For 11** 1101 ? 1100 1100

Note: Still need overflow chains Example: many records with duplicate keys if we split: Even further split: 2 For 10** insert 1100 2 For 10** 1 1101 3 For 110* 1100 1101 2 For 11** 1101 1100 1100 ? ? 1100 1100 3 For 111*

Solution: overflow chains insert 1100 add overflow block: 1 1 1101 1100 1101 1101 1100

Extensible hashing Summary. + Can handle growing files - with less wasted space - with no full reorganizations +

Extensible hashing Summary. + - Can handle growing files Indirection - with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) -