CPSC-608 Database Systems

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #2.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #13.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #14.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #6.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Scholastic Dishonesty
COMP3017 Advanced Databases
CS 245: Database System Principles
Dynamic Hashing (Chapter 12)
CPSC-608 Database Systems
Lecture 21: Hash Tables Monday, February 28, 2005.
CPSC-608 Database Systems
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
CPSC-310 Database Systems
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
Module 12a: Dynamic Hashing
Chapter 11: Indexing and Hashing
Scholastic Dishonesty
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CPSC-608 Database Systems Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Notes #27 Notes #7

Another Index Structure: Hash Tables function h h(x) buckets search key x A bucket is typically a disk block (probably with overflow blocks) h(x), 0 ≤ h(x) ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(x); indirect: h(x) is the index in a directory. Notes #7

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear

Linear hashing Another dynamic hashing scheme

Linear hashing Another dynamic hashing scheme Ideas: Use the same hash function h; Use only part of h when the hash table is smaller (use the i low order bits of h. 01110101 grows b i h(x) =

Linear hashing Another dynamic hashing scheme Ideas: Use the same hash function h; Use only part of h when the hash table is smaller (use the i low order bits of h. 01110101 grows b i h(x) = Similar to Extensible hash

(c) Hash table size n grows linearly Linear hashing Another dynamic hashing scheme Ideas: Use the same hash function h; Use only part of h when the hash table is smaller (use the i low order bits of h. 01110101 grows b i h(x) = Similar to Extensible hash (c) Hash table size n grows linearly Main difference n 00..0 (|n| = i)

Linear hashing Another dynamic hashing scheme Ideas: Use the same hash function h; Use only part of h when the hash table is smaller (use the i low order bits of h. 01110101 grows b i h(x) = Similar to Extensible hash (c) Hash table size n grows linearly Main difference n 00..0 (|n| = i) (d) Use overflow blocks.

Linear hashing b h(x) = 01110101 grows i Hash table size n grows linearly (n is a parameter for the hash structure b h(x) = n 00..0 (|n| = i) h(x)i i = |n|

Linear hashing b h(x) = 01110101 grows i Hash table size n grows linearly (n is a parameter for the hash structure (backet n is the first unused bucket) b h(x) = n 00..0 (|n| = i) h(x)i i = |n|

Where does x go if h(x)i ≥ n? Linear hashing b h(x) = 01110101 grows i Hash table size n grows linearly (n is a parameter for the hash structure (backet n is the first unused bucket) b h(x) = n 00..0 (|n| = i) h(x)i i = |n| Where does x go if h(x)i ≥ n?

Where does x go if h(x)i ≥ n? Linear hashing b h(x) = 01110101 grows i Hash table size n grows linearly (n is a parameter for the hash structure (backet n is the first unused bucket) b h(x) = n 00..0 (|n| = i) h(x)i i = |n| Where does x go if h(x)i ≥ n? Put x in h(x)i – 2i-1 (< n)!! (h(x)i – 2i-1 = h(x)i with the leading bit 1 replaced with 0)

Linear Hashing: Searching How Do We Search x? Linear Hashing: Searching input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| m = the last i bits of h(x); IF m ≥ n THEN m = m – 2i-1; read in the disk block(s) with the address m \\ you should check overflow blocks in the address m.

Linear Hashing: Searching How Do We Search x? Insert Linear Hashing: Searching Insertion input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| m = the last i bits of h(x); IF m ≥ n THEN m = m – 2i-1; read in the disk block(s) with the address m \\ you should check overflow blocks in the address m.

Linear Hashing: Searching How Do We Search x? Insert Linear Hashing: Searching Insertion input: a tuple t with search key x \\ h is the hash function, n is the current upper bound, i = |n| m = the last i bits of h(x); IF m ≥ n THEN m = m – 2i-1; insert t in the disk block B with the address m; \\ If B is full, you need to use an overflow block. Insert

Linear Hashing: Searching How Do We Search x? Delete Linear Hashing: Searching Deletion input: a tuple t with search key x \\ h is the hash function, n is the current upper bound, i = |n| m = the last i bits of h(x); IF m ≥ n THEN m = m – 2i-1; insert t in the disk block B with the address m; \\ If B is full, you need to use an overflow block. Insert

Linear Hashing: Searching How Do We Search x? Delete Linear Hashing: Searching Deletion input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| m = the last i bits of h(x); IF m ≥ n THEN m = m – 2i-1; insert t in the disk block B with the address m; \\ you may need to check overflow blocks. Delete

How Do We Expand the Hash Table?

How Do We Expand the Hash Table? When Do We Expand the Hash Table?

How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

Linear Hashing: Increasing Hash Table Size How Do We Expand the Hash Table? When Do We Expand the Hash Table? needed space Keep track of: R = available space If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| read in the disk block(s) B of address n – 2i -1 ; split (properly) the tuples in B and put them in the block B and the block B’ with address n; n = n + 1;

How Do We Shrink the Hash Table?

How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%)

Linear Hashing: Decreasing Hash Table Size How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| n = n − 1; move the tuples in the block(s) of address n to the block(s) of address n – 2i -1 (here i is the length of the new n).

Linear Hashing: Decreasing Hash Table Size How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| n = n − 1; move the tuples in the block(s) of address n to the block(s) of address n – 2i -1 (here i is the length of the new n).

Linear Hashing: Decreasing Hash Table Size How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| n = n − 1; move the tuples in the block(s) of address n to the block(s) of address n – 2i -1 (here i is the length of the new n).

Linear Hashing: Decreasing Hash Table Size How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| n = n − 1; move the tuples in the block(s) of address n to the block(s) of address n – 2i -1 (here i is the length of the new n).

Linear Hashing: Decreasing Hash Table Size How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| n = n − 1; move the tuples in the block(s) of address n to the block(s) of address n – 2i -1 (here i is the length of the new n).

Linear Hashing: Decreasing Hash Table Size How Do We Shrink the Hash Table? When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| n = n − 1; move the tuples in the block(s) of address n to the block(s) of address n – 2i -1 (here i is the length of the new n).

Linear Hashing: General framework 00…00 00…01 x h h(x) i h(x)i . b i no tuples 1*…** = n i grow linearly buckets

Example b=4 bits, 2 keys/bucket Future growth buckets 0000 0101 1010 1111 00 01 10

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) future growth buckets 0000 0101 1010 1111 00 01 n =10

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 0101 1010 1111 00 01 n =10

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 0101 1010 1111 00 01 n =10 Rules: If h(x)i < n, then look at bucket h(x)i

Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 0101 1010 1111 00 01 n =10 Rules: If h(x)i < n, then look at bucket h(x)i If h(x)i ≥ n, then look at bucket h(x)i − 2i -1 (i.e., replacing the leading bit 1 of h(x)i by 0)

Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 1101 (can have overflow chains!) Future growth buckets 0000 0101 1010 1111 00 01 n =10 Rules: If h(x)i < n, then look at bucket h(x)i If h(x)i ≥ n, then look at bucket h(x)i − 2i -1 (i.e., replacing the leading bit 1 of h(x)i by 0)

Increase size n: b=4 bits, n=10, i=2 0000 0101 1010 1111 00 01 n = 10

Increase size n: b=4 bits, n=10, i=2 0000 0101 1010 1111 00 01 n = 10

Increase size n: b=4 bits, n=10, i=2 0000 0101 1010 1111 00 01 n = 10

Increase size n: b=4 bits, n=11, i=2 0000 0101 1010 1010 1111 00 01 n = 10

Increase size n: b=4 bits, n=11, i=2 Future growth buckets 0000 0101 1010 1111 00 01 n =11 10

Insert: b=4 bits, n=11, i=2 insert 1101 1101 0000 0101 1010 1111 Future growth buckets 0000 0101 1010 1111 00 01 n =11 10

Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1010 1111 00 01 n =11 10

Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1010 1111 00 01 n =11 10

Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1010 1111 00 01 n =11 10

Increase size n: b=4 bits, n=100, i=3 1101 0000 0101 1010 1111 1111 00 01 n =11 10

Increase size n: b=4 bits, n=100, i=3 1101 0000 0101 1010 1111 00 01 11 10 n =100

Increase size n: b=4 bits, n=100, i=3 0101 0000 0101 1010 1111 1101 00 01 11 10 n =100

Linear Hashing Summary + + Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + +

Linear Hashing Summary + + ‒ Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + + overflow chains ‒

Example: BAD CASE Very full Very empty Need to move n here… Would waste space…

Summary Hashing - How it works - Dynamic hashing - Extensible - Linear

DBMS graduate database in tables (relations) lock table DDL language administrator DDL complier lock table DDL language file manager logging & recovery concurrency control transaction manager database programmer index/file manager buffer manager DML (query) language query execution engine DML complier main memory buffers secondary storage (disks) DBMS graduate database

Next Algorithms implementing the relational algebraic operations: Projection and selection Set and bag operations Join operations Grouping, duplicate elimination, sorting

Algorithms Implementing Relational Algebraic Operations Projection and selection π, σ Set/bag operations US, ∩S, −S, UB, ∩B, −B Join operations Extended operations γ, δ, τ, table-scan × C ,

DBMS graduate database in tables (relations) lock table DDL language administrator DDL complier lock table DDL language file manager logging & recovery concurrency control transaction manager database programmer index/file manager buffer manager DML (query) language query execution engine DML complier main memory buffers secondary storage (disks) DBMS graduate database