1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu

2 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62

3 Insertion in a B-Tree 49 n = 2 15 36 Insert: 62 62

4 Insertion in a B-Tree 49 n = 2 15 3662 Insert: 50

7 Insertion in a B-Tree 49 n = 2 15 3650 Insert: 75 62 75

8 Insertion

9 Insertion

10 Insertion

11 Insertion

12 Insertion

13 Insertion

14 Insertion

15 Insertion

16 Insertion

17 Insertion

18 Insertion

19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node

20 Inserting into a Leaf Node 54576062 58

23 61 5457606258 5466 Splitting a Leaf Node

24 61 5457606258 5466 Splitting a Leaf Node

25 61 5457616258 5466 60 Splitting a Leaf Node

26 61 5457616258 5466 60 59 Splitting a Leaf Node

27 61 5457616258 5466 60 59 Splitting a Leaf Node

59 546640 [ 59, 66)[54, 59) 7484 9921 … … [66,74) Splitting an Internal Node

59 5466407484 9921 … … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node

5954 66 407484 9921 … … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node

5466407484 59 [ 59, 66)[54, 59)[66,74) Splitting the Root

54 66 40748459 [ 59, 66)[54, 59)[66,74) Splitting the Root

34 Deletion

35 Deletion redistribute

36 Deletion

37 Deletion - II

39 Deletion - II

40 Deletion - II

41 Deletion - II

42 Deletion - II merge Not needed

43 Deletion - II

44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling

45 Merge Leaf into Sibling 545864687275 67 85…72

46 Merge Leaf into Sibling 5458646875 67 …7285

47 Merge Leaf into Sibling 5458646875 67 …7285

48 Merge Leaf into Sibling 5458646875 …72 85

49 Merge Internal Node into Sibling 41 4852 6374 59 [52, 59) [59,63) … …

50 Merge Internal Node into Sibling 41 485263 59 [52, 59) [59,63) 59 … …

51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes

52 Question How does insertion-based construction perform?

53 B-Tree Construction 111315213441485762758197 Sort

B-Tree Construction 759721415715111348346281 Scan 758197 111315213441 4857 62

B-Tree Construction 214875 111315213441 4857 62758197 Scan

56 B-Tree Construction Why is sort-based construction better than insertion-based one?

57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:

58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels

59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates

60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes

61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches

Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers

64 Main Memory Hash Table buckets 32 (null) 10 48 2775 21 55 0 3 1 2 4 5 6 7 key h (key) h (key) = key % 8

65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …

66 Adapting to Disk How do we handle this?

67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)

68 Adapting to Disk

69 Adapting to Disk 0 1 2 Is there any other issue? Map ‘bucket id’ to disk location

70 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large?

71 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!

72 Adapting to disk 1 Hash Bucket = 1 Block (or more than one contiguous blocks) 1 Hash Bucket = 1 Block (or more than one contiguous blocks) Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Number of buckets Number of buckets ≈ Number of keys (main memory version) ≈ Number of keys (main memory version) ≈ Number of blocks (disk version) ≈ Number of blocks (disk version) Textbook: Static Hash Table

73 Assigned Reading Insertion and Deletion on Static Hash Table Section 13.4

75 Dynamic Hash Indexes Static Hash Table: Static Hash Table: Fixed number of buckets Fixed number of buckets Waste space / inefficient Waste space / inefficient Dynamic Hash Tables: Dynamic Hash Tables: Number of buckets can increase / decrease dynamically Number of buckets can increase / decrease dynamically

76 Extensible Hash Table: Main Ideas (Abstract) Hash Function: {Keys}  {Large space of hash values} Hash Function: {Keys}  {Large space of hash values} Buckets dynamically partition space of hash values Buckets dynamically partition space of hash values Insertions: partitioning grows finer Insertions: partitioning grows finer i.e., more buckets i.e., more buckets Deletions: partitioning grows coarser Deletions: partitioning grows coarser i.e., fewer buckets i.e., fewer buckets

77 Extensible Hash Table: Main Ideas (concrete) Hash Function: {Keys}  bit string of length b 0 1 1 1 0 1 0 0 Example: Bucket: prefix of bit string All (keys with) hash values having that prefix fall into that bucket

11 0 10 01011010 01100110 10110001 10011010 11011110 prefixes Hash Value  bucket?

11 0 10 01011010 01100110 10110001 10011010 11011110 00 01 10 11 i = 2 i = max length of prefix

80 i = 0. Insertion

81 i = 0. 10110001 Insertion

82 i = 0. 10110001 Insertion

83 i = 0. 10110001 00110101 Insertion

84 i = 0. 10110001 00110101 11010010 Insertion

85 i = 0 0 10110001 00110101 11010010 1 Insertion

86 i = 0 0 10110001 00110101 11010010 1 Insertion

87 i = 1 0 10110001 00110101 11010010 1 0 1 Insertion

88 i = 1 0 10110001 00110101 11010010 1 0 1 Insertion

89 i = 1 0 10110001 00110101 11010010 1 0 1 11001101 Insertion

90 i = 1 0 10110001 00110101 11010010 1 0 1 11001101 Insertion

91 i = 1 0 10110001 00110101 11010010 10 0 1 11001101 11 Insertion

92 i = 1 0 10110001 00110101 11010010 10 0 1 11001101 11 Insertion

93 i = 2 0 10110001 00110101 11010010 10 00 11001101 11 01 10 11 Insertion

94 i = 2 0 10110001 00110101 11010010 10 00 11001101 11 01 10 11 11001101 Insertion

95 Deletion Inverse of insertion: work out details

96 i = 2 1 00 01 10 11 Textbook Notation Number of bits in prefix 0

97 Extensible Hash Table Directory doubles in size during some inserts One Issue:

99 Linear Hash Table Differences from Extensible Hash Table: Differences from Extensible Hash Table: Bucket: suffix of the hash value Bucket: suffix of the hash value Grows linearly (avoids doubling of directory) Grows linearly (avoids doubling of directory)

10 00 1 01011000 01100100 10110001 10011001 11011110 suffixes Linear Hash Table

101 0 1 Linear Growth

102 00 1 10 redistribute Linear Growth

00 01 10 11 redistribute Linear Growth

104 What does linear growth buy? 000 01 10 11 100 i = 3 101 000 001 010 011 100 110 111 Redundant if we know # buckets = 5

105 What does linear growth buy? 000 01 10 11 100 i = 3 000 001 010 011 100 i = 3 n = 3

1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

Similar presentations

Presentation on theme: "1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

Similar presentations

Presentation on theme: "1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu."— Presentation transcript:

Similar presentations

About project

Feedback