CPSC-608 Database Systems

Slides:

Advertisements

Similar presentations

1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.

Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.

Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.

Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.

Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.

Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.

Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.

Hash Table indexing and Secondary Storage Hashing.

External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.

Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID

1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.

CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.

FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.

CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.

Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.

1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.

CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.

CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.

CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.

1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.

Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

1 Lecture 21: Hash Tables Wednesday, November 17, 2004.

Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.

1 Ullman et al. : Database System Principles Notes 5: Hashing and More.

CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.

Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins

COMP3017 Advanced Databases

CS 245: Database System Principles

Dynamic Hashing (Chapter 12)

Lecture 21: Hash Tables Monday, February 28, 2005.

Hashing CENG 351.

CPSC-608 Database Systems

Database Management Systems (CS 564)

Dynamic Hashing.

CS 245: Database System Principles

External Memory Hashing

Introduction to Database Systems

External Memory Hashing

CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing， Linear Hashing Instructor: Chen Li.

External Memory Hashing

CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing， Linear Hashing Instructor: Chen Li.

CS 245: Database System Principles

Index tuning Hash Index.

Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.

Database Design and Programming

2018, Spring Pusan National University Ki-Joune Li

CPS216: Advanced Database Systems

Module 12a: Dynamic Hashing

Chapter 11: Indexing and Hashing

CPSC-608 Database Systems

Hash-Based Indexes Chapter 11

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing， Linear Hashing Instructor: Chen Li.

CS4432: Database Systems II

Index Structures Chapter 13 of GUW September 16, 2019

Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #26 Notes #7

Another Index Structure: Hash Tables function h h(x) buckets search key x A bucket is typically a disk block (probably with overflow blocks) h(x), 0 ≤ h(x) ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(x); indirect: h(x) is the index in a directory. Notes #7

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear

Extensible Hashing: General framework # bits used by the buckets (block index) # bits used by the directory (hash index) i j1 00…00 00…01 j2 h i x h(x) h(x)i j2 i . 11…11 i directory buckets

Extensible hashing: Searching input: a search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); read in the disk block B with the address H[m].

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) b i mj h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1.

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)

Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. m = the first i bits of h(x); let the block with the address H[m] be B; IF B has room THEN add t in B ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k] and H[2k+1] both equal to that in the old H[k], 0 ≤ k ≤ 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b i m h(x) i > j b i mj h(x)

Insertion in Extensible Hashing

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: c1001 1 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 g1010 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 1 1 Insert: g1010 c1001 1 g1010 k1100 Split the block, and increase the block index

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 g1010 k1100 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 i = 1 1 Insert: g1010 2 c1001 1 k1100 g1010 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size k1100 2

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 Insert: g1010 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 b0001 e0000 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index b0001 e0000 1 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index 2 b0001 e0000 1 2 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 Split the block, and increase the block index e0000 2 b0001 d0111 b0001 1 2 i = 2 d0111 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 e0000 2 b0001 d0111 2 i = 2 00 01 10 11 Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 if the block index is equal to the hash index, first double the directory size Insert: g1010 d0111 e0000 a1000 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 Insert: g1010 d0111 e0000 a1000 3 2 c1001 g1010 a1000 Split the block, and increase the block index 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 a1000 Insert: g1010 d0111 e0000 a1000 c1001 3 g1010 2 c1001 g1010 2 k1100

Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 h(k) = 1100 i = 3 e0000 2 000 001 010 011 100 101 110 111 b0001 d0111 2 i = 2 00 01 10 11 3 a1000 Insert: g1010 d0111 e0000 a1000 c1001 g1010 3 2 k1100

Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure)

Note: Still need overflow chains Example: many records with duplicate keys insert 1100 1 1101 1100

Note: Still need overflow chains Example: many records with duplicate keys if we split: insert 1100 2 For 10** 1 1101 1100 2 For 11** 1101 ? 1100 1100

Note: Still need overflow chains Example: many records with duplicate keys if we split: Even further split: 2 For 10** insert 1100 2 For 10** 1 1101 3 For 110* 1100 1101 2 For 11** 1101 1100 1100 ? ? 1100 1100 3 For 111*

Solution: overflow chains insert 1100 add overflow block: 1 1 1101 1100 1101 1101 1100

Extensible hashing Summary. + Can handle growing files - with less wasted space - with no full reorganizations +

Extensible hashing Summary. + - Can handle growing files Indirection - with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) -