1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu
2 Insertion in a B-Tree 49 n = Insert: 62
3 Insertion in a B-Tree 49 n = Insert: 62 62
4 Insertion in a B-Tree 49 n = Insert: 50
5 Insertion in a B-Tree 49 n = Insert: 50 62
6 Insertion in a B-Tree 49 n = Insert: 75 62
7 Insertion in a B-Tree 49 n = Insert:
8 Insertion
9 Insertion
10 Insertion
11 Insertion
12 Insertion
13 Insertion
14 Insertion
15 Insertion
16 Insertion
17 Insertion
18 Insertion
19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node
20 Inserting into a Leaf Node
21 Inserting into a Leaf Node
22 Inserting into a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
Splitting a Leaf Node
[ 59, 66)[54, 59) … … [66,74) Splitting an Internal Node
… … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node
… … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node
[ 59, 66)[54, 59)[66,74) Splitting the Root
[ 59, 66)[54, 59)[66,74) Splitting the Root
[ 59, 66)[54, 59)[66,74) Splitting the Root
34 Deletion
35 Deletion redistribute
36 Deletion
37 Deletion - II
merge
39 Deletion - II
40 Deletion - II
41 Deletion - II
42 Deletion - II merge Not needed
43 Deletion - II
44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling
45 Merge Leaf into Sibling …72
46 Merge Leaf into Sibling …7285
47 Merge Leaf into Sibling …7285
48 Merge Leaf into Sibling …72 85
49 Merge Internal Node into Sibling [52, 59) [59,63) … …
50 Merge Internal Node into Sibling [52, 59) [59,63) 59 … …
51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes
52 Question How does insertion-based construction perform?
53 B-Tree Construction Sort
B-Tree Construction Scan
B-Tree Construction Scan
56 B-Tree Construction Why is sort-based construction better than insertion-based one?
57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:
58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels
59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates
60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes
61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches
Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers
64 Main Memory Hash Table buckets 32 (null) key h (key) h (key) = key % 8
65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …
66 Adapting to Disk How do we handle this?
67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)
68 Adapting to Disk
69 Adapting to Disk Is there any other issue? Map ‘bucket id’ to disk location
70 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block Bucket Id Disk Address mapping Bucket Id Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large?
71 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!
72 Adapting to disk 1 Hash Bucket = 1 Block (or more than one contiguous blocks) 1 Hash Bucket = 1 Block (or more than one contiguous blocks) Bucket Id Disk Address mapping Bucket Id Disk Address mapping Number of buckets Number of buckets ≈ Number of keys (main memory version) ≈ Number of keys (main memory version) ≈ Number of blocks (disk version) ≈ Number of blocks (disk version) Textbook: Static Hash Table
73 Assigned Reading Insertion and Deletion on Static Hash Table Section 13.4
74 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
75 Dynamic Hash Indexes Static Hash Table: Static Hash Table: Fixed number of buckets Fixed number of buckets Waste space / inefficient Waste space / inefficient Dynamic Hash Tables: Dynamic Hash Tables: Number of buckets can increase / decrease dynamically Number of buckets can increase / decrease dynamically
76 Extensible Hash Table: Main Ideas (Abstract) Hash Function: {Keys} {Large space of hash values} Hash Function: {Keys} {Large space of hash values} Buckets dynamically partition space of hash values Buckets dynamically partition space of hash values Insertions: partitioning grows finer Insertions: partitioning grows finer i.e., more buckets i.e., more buckets Deletions: partitioning grows coarser Deletions: partitioning grows coarser i.e., fewer buckets i.e., fewer buckets
77 Extensible Hash Table: Main Ideas (concrete) Hash Function: {Keys} bit string of length b Example: Bucket: prefix of bit string All (keys with) hash values having that prefix fall into that bucket
prefixes Hash Value bucket?
i = 2 i = max length of prefix
80 i = 0. Insertion
81 i = Insertion
82 i = Insertion
83 i = Insertion
84 i = Insertion
85 i = Insertion
86 i = Insertion
87 i = Insertion
88 i = Insertion
89 i = Insertion
90 i = Insertion
91 i = Insertion
92 i = Insertion
93 i = Insertion
94 i = Insertion
95 Deletion Inverse of insertion: work out details
96 i = Textbook Notation Number of bits in prefix 0
97 Extensible Hash Table Directory doubles in size during some inserts One Issue:
98 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
99 Linear Hash Table Differences from Extensible Hash Table: Differences from Extensible Hash Table: Bucket: suffix of the hash value Bucket: suffix of the hash value Grows linearly (avoids doubling of directory) Grows linearly (avoids doubling of directory)
suffixes Linear Hash Table
Linear Growth
redistribute Linear Growth
redistribute Linear Growth
104 What does linear growth buy? i = Redundant if we know # buckets = 5
105 What does linear growth buy? i = i = 3 n = 3