Lecture 5 Cost Estimation and Data Access Methods
Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter (parse tree) Planner & Optimizer (query plan) Executor Query System Storage System Access Methods Lock Manager Buffer Manager Log Manager Last time This time
Study Break Assuming disk can do 100 MB/sec I/O, seeks take 10 ms each, page size is 4096 bytes, and an 8 byte footer And the following schema: grades (cid int, g_sid int, grade char(2)) students (s_int, name char(100)) 1.Estimate time to sequentially scan grades, assuming it contains 1M records (Consider: field sizes, headers) 1.Estimate time to join these two tables, using nested loops, assuming students fits in memory but grades does not, and students contains 10K records. Try switching the table order.
Bitmap Index ColorT1T1 T2T3T4T5T6T7T8T9T10 Purple******** White* Red* 1 map per distinct value 1 bit per tuple
Hash Index On Disk Hash Table n buckets, on n disk pages Disk page 1 … Disk Page n H(f1) (‘sam’, 10k, …) (‘joe’, 20k, …) Issues How big to make table? If we get it wrong, either waste space, or end up with long overflow chains, or have to rehash e.g., H(x) = x mod n
Extendible Hashing Create a family of hash tables parameterized by k H k (x) = H(x) mod 2 k Start with k = 1 (2 hash buckets) Use a directory structure to keep track of which bucket (page) each hash value maps to When a bucket overflows, increment k (if needed), create a new bucket, rehash keys in overflowing bucket, and update directory
Example H k (x)Page Directory k=1 Hash Table Page NumberPage Contents 0 1 Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod 2^k
Example H k (x)Page Directory k=1 Hash Table Page NumberPage Contents 0 1 Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod 2^k 0 mod 2 = 0 0
Example H k (x)Page Directory k=1 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k 0 mod 2 = 0
Example H k (x)Page Directory k=1 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k 2 mod 2 = 0
Example H k (x)Page Directory k=1 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k 3 mod 2 = 1
Example H k (x)Page Directory k=1 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k 2 mod 2 = 0 - FULL!
Example H k (x)Page Directory k=1 2 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k
Example H k (x)Page Directory k=1 2 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k Allocate new page!
Example H k (x)Page Directory k=1 2 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k Only allocate 1 new page! Rehash
Example H k (x)Page Directory k=1 2 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k 2 mod 4 = 2
Example H k (x)Page Directory k=1 2 Hash Table Page NumberPage Contents Insert records with keys 0, 0, 2, 3, 2 H k (x) = x mod k 2 mod 4 = 2 Extra bookkeeping needed to keep track of fact that pages 0/2 have split and page 1 hasn’t
B+ Tree Indexes Balanced wide tree Fast value lookup and range scans Each node is a disk page (except root) Leafs point to tuple pages
Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)-- / O(P) O( log B n + R ) LookupO(P)O(C)O(1)O( log B n ) n : number of tuples P : number of pages in file B : branching factor of B-Tree (keys / node) R : number of pages in range C: cardinality (#) of unique values on key
Study Break #2 B+ Tree vs. Binary Search Tree If we have k keys on all of the leaf nodes, and the B+ Tree has b keys per node: What is the depth of each if both are balanced? How do the lookup times compare? Consider the time to look up a key inside each B+ tree node Why do we prefer a B+ tree over a BST for databases?