Download presentation
Presentation is loading. Please wait.
Published byJustin Norman Modified over 9 years ago
1
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu
2
2 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62
3
3 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62 62
4
4 Insertion in a B-Tree 49 n = 2 15 3662 Insert: 50
5
5 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 50 62
6
6 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62
7
7 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62 75
8
8 Insertion
9
9 Insertion
10
10 Insertion
11
11 Insertion
12
12 Insertion
13
13 Insertion
14
14 Insertion
15
15 Insertion
16
16 Insertion
17
17 Insertion
18
18 Insertion
19
19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node
20
20 Inserting into a Leaf Node 54576062 58
21
21 Inserting into a Leaf Node 54576062 58
22
22 Inserting into a Leaf Node 54576062 58
23
23 61 5457606258 5466 Splitting a Leaf Node
24
24 61 5457606258 5466 Splitting a Leaf Node
25
25 61 5457616258 5466 60 Splitting a Leaf Node
26
26 61 5457616258 5466 60 59 Splitting a Leaf Node
27
27 61 5457616258 5466 60 59 Splitting a Leaf Node
28
59 546640 [ 59, 66)[54, 59) 7484 9921 … … [66,74) Splitting an Internal Node
29
59 5466407484 9921 … … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node
30
5954 66 407484 9921 … … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node
31
5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root
32
5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root
33
54 66 40748459 [ 59, 66)[54, 59)[66,74) Splitting the Root
34
34 Deletion
35
35 Deletion redistribute
36
36 Deletion
37
37 Deletion - II
38
merge
39
39 Deletion - II
40
40 Deletion - II
41
41 Deletion - II
42
42 Deletion - II merge Not needed
43
43 Deletion - II
44
44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling
45
45 Merge Leaf into Sibling 545864687275 67 85…72
46
46 Merge Leaf into Sibling 5458646875 67 …7285
47
47 Merge Leaf into Sibling 5458646875 67 …7285
48
48 Merge Leaf into Sibling 5458646875 …72 85
49
49 Merge Internal Node into Sibling 41 4852 6374 59 [52, 59) [59,63) … …
50
50 Merge Internal Node into Sibling 41 485263 59 [52, 59) [59,63) 59 … …
51
51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes
52
52 Question How does insertion-based construction perform?
53
53 B-Tree Construction 111315213441485762758197 Sort
54
B-Tree Construction 759721415715111348346281 Scan 758197 111315213441 4857 62
55
B-Tree Construction 214875 111315213441 4857 62758197 Scan
56
56 B-Tree Construction Why is sort-based construction better than insertion-based one?
57
57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:
58
58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels
59
59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates
60
60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes
61
61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
62
62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches
63
Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers
64
64 Main Memory Hash Table buckets 32 (null) 10 48 2775 21 55 0 3 1 2 4 5 6 7 key h (key) h (key) = key % 8
65
65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …
66
66 Adapting to Disk How do we handle this?
67
67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)
68
68 Adapting to Disk
69
69 Adapting to Disk 0 1 2 Is there any other issue? Map ‘bucket id’ to disk location
70
70 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block Bucket Id Disk Address mapping Bucket Id Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large?
71
71 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!
72
72 Adapting to disk 1 Hash Bucket = 1 Block (or more than one contiguous blocks) 1 Hash Bucket = 1 Block (or more than one contiguous blocks) Bucket Id Disk Address mapping Bucket Id Disk Address mapping Number of buckets Number of buckets ≈ Number of keys (main memory version) ≈ Number of keys (main memory version) ≈ Number of blocks (disk version) ≈ Number of blocks (disk version) Textbook: Static Hash Table
73
73 Assigned Reading Insertion and Deletion on Static Hash Table Section 13.4
74
74 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
75
75 Dynamic Hash Indexes Static Hash Table: Static Hash Table: Fixed number of buckets Fixed number of buckets Waste space / inefficient Waste space / inefficient Dynamic Hash Tables: Dynamic Hash Tables: Number of buckets can increase / decrease dynamically Number of buckets can increase / decrease dynamically
76
76 Extensible Hash Table: Main Ideas (Abstract) Hash Function: {Keys} {Large space of hash values} Hash Function: {Keys} {Large space of hash values} Buckets dynamically partition space of hash values Buckets dynamically partition space of hash values Insertions: partitioning grows finer Insertions: partitioning grows finer i.e., more buckets i.e., more buckets Deletions: partitioning grows coarser Deletions: partitioning grows coarser i.e., fewer buckets i.e., fewer buckets
77
77 Extensible Hash Table: Main Ideas (concrete) Hash Function: {Keys} bit string of length b 0 1 1 1 0 1 0 0 Example: Bucket: prefix of bit string All (keys with) hash values having that prefix fall into that bucket
78
11 0 10 01011010 01100110 10110001 10011010 11011110 prefixes Hash Value bucket?
79
11 0 10 01011010 01100110 10110001 10011010 11011110 00 01 10 11 i = 2 i = max length of prefix
80
80 i = 0. Insertion
81
81 i = 0. 10110001 Insertion
82
82 i = 0. 10110001 Insertion
83
83 i = 0. 10110001 00110101 Insertion
84
84 i = 0. 10110001 00110101 11010010 Insertion
85
85 i = 0 0 10110001 00110101 11010010 1 Insertion
86
86 i = 0 0 10110001 00110101 11010010 1 Insertion
87
87 i = 1 0 10110001 00110101 11010010 1 0 1 Insertion
88
88 i = 1 0 10110001 00110101 11010010 1 0 1 Insertion
89
89 i = 1 0 10110001 00110101 11010010 1 0 1 11001101 Insertion
90
90 i = 1 0 10110001 00110101 11010010 1 0 1 11001101 Insertion
91
91 i = 1 0 10110001 00110101 11010010 10 0 1 11001101 11 Insertion
92
92 i = 1 0 10110001 00110101 11010010 10 0 1 11001101 11 Insertion
93
93 i = 2 0 10110001 00110101 11010010 10 00 11001101 11 01 10 11 Insertion
94
94 i = 2 0 10110001 00110101 11010010 10 00 11001101 11 01 10 11 11001101 Insertion
95
95 Deletion Inverse of insertion: work out details
96
96 i = 2 1 00 01 10 11 Textbook Notation Number of bits in prefix 0
97
97 Extensible Hash Table Directory doubles in size during some inserts One Issue:
98
98 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table
99
99 Linear Hash Table Differences from Extensible Hash Table: Differences from Extensible Hash Table: Bucket: suffix of the hash value Bucket: suffix of the hash value Grows linearly (avoids doubling of directory) Grows linearly (avoids doubling of directory)
100
10 00 1 01011000 01100100 10110001 10011001 11011110 suffixes Linear Hash Table
101
101 0 1 Linear Growth
102
102 00 1 10 redistribute Linear Growth
103
00 01 10 11 redistribute Linear Growth
104
104 What does linear growth buy? 000 01 10 11 100 i = 3 101 000 001 010 011 100 110 111 Redundant if we know # buckets = 5
105
105 What does linear growth buy? 000 01 10 11 100 i = 3 000 001 010 011 100 i = 3 n = 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.