1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Hashing and Indexing John Ortiz.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
CS4432: Database Systems II
Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?
CS CS4432: Database Systems II Basic indexing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
B+ Trees COMP
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 CMSC 341 Extensible Hashing Chapter 5, Section 6 (pp. 200 – 203)
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree- and Hash-Structured Indexes Selected Sections of Chapters 10 & 11.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
CS 405G: Introduction to Database Systems 22 Index Chen Qian University of Kentucky.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
B-Trees, Part 2 Hash-Based Indexes R&G Chapter 10 Lecture 10.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
Chapter 5 Record Storage and Primary File Organizations
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
COMP3017 Advanced Databases
CPS216: Data-intensive Computing Systems
CPSC-608 Database Systems
Database Management Systems (CS 564)
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CPS216: Advanced Database Systems
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu

2 Insertion in a B-Tree 49 n = Insert: 62

3 Insertion in a B-Tree 49 n = Insert: 62 62

4 Insertion in a B-Tree 49 n = Insert: 50

5 Insertion in a B-Tree 49 n = Insert: 50 62

6 Insertion in a B-Tree 49 n = Insert: 75 62

7 Insertion in a B-Tree 49 n = Insert:

8 Insertion

9 Insertion

10 Insertion

11 Insertion

12 Insertion

13 Insertion

14 Insertion

15 Insertion

16 Insertion

17 Insertion

18 Insertion

19 Insertion: Primitives Inserting into a leaf node Inserting into a leaf node Splitting a leaf node Splitting a leaf node Splitting an internal node Splitting an internal node Splitting root node Splitting root node

20 Inserting into a Leaf Node

21 Inserting into a Leaf Node

22 Inserting into a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

Splitting a Leaf Node

[ 59, 66)[54, 59) … … [66,74) Splitting an Internal Node

… … [ 59, 66)[54, 59)[66,74) Splitting an Internal Node

… … [66, 99) [ 59, 66)[54, 59) [21,66) [66,74) Splitting an Internal Node

[ 59, 66)[54, 59)[66,74) Splitting the Root

[ 59, 66)[54, 59)[66,74) Splitting the Root

[ 59, 66)[54, 59)[66,74) Splitting the Root

34 Deletion

35 Deletion redistribute

36 Deletion

37 Deletion - II

merge

39 Deletion - II

40 Deletion - II

41 Deletion - II

42 Deletion - II merge Not needed

43 Deletion - II

44 Deletion: Primitives Delete key from a leaf Delete key from a leaf Redistribute keys between sibling leaves Redistribute keys between sibling leaves Merge a leaf into its sibling Merge a leaf into its sibling Redistribute keys between two sibling internal nodes Redistribute keys between two sibling internal nodes Merge an internal node into its sibling Merge an internal node into its sibling

45 Merge Leaf into Sibling …72

46 Merge Leaf into Sibling …7285

47 Merge Leaf into Sibling …7285

48 Merge Leaf into Sibling …72 85

49 Merge Internal Node into Sibling [52, 59) [59,63) … …

50 Merge Internal Node into Sibling [52, 59) [59,63) 59 … …

51 B-Tree Roadmap B-Tree B-Tree Recap Recap Insertion (recap) Insertion (recap) Deletion Deletion Construction Construction Efficiency Efficiency B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes

52 Question How does insertion-based construction perform?

53 B-Tree Construction Sort

B-Tree Construction Scan

B-Tree Construction Scan

56 B-Tree Construction Why is sort-based construction better than insertion-based one?

57 Cost of B-Tree Operations Height of B-Tree: H Height of B-Tree: H Assume no duplicates Assume no duplicates Question: what is the random I/O cost of: Question: what is the random I/O cost of: Insertion: Insertion: Deletion: Deletion: Equality search: Equality search: Range Search: Range Search:

58 Height of B-Tree Number of keys: N Number of keys: N B-Tree parameter: n B-Tree parameter: n Height ≈ log N = n log N log n In practice: 2-3 levels

59 Question: How do you pick parameter n? 1. Ignore inserts and deletes 2. Optimize for equality searches 3. Assume no duplicates

60 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Sparse Index Sparse Index Duplicate Keys Duplicate Keys Hash-based Indexes Hash-based Indexes

61 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

62 Hash-Based Indexes Adaptations of main memory hash tables Adaptations of main memory hash tables Support equality searches Support equality searches No range searches No range searches

Indexing Problem (recap) a 1 2 a i a n a A = val Index Keys record pointers

64 Main Memory Hash Table buckets 32 (null) key h (key) h (key) = key % 8

65 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys …

66 Adapting to Disk How do we handle this?

67 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block All keys that hash to bucket stored in the block All keys that hash to bucket stored in the block Intuition: keys in a bucket usually accessed together Intuition: keys in a bucket usually accessed together No need for linked lists of keys … No need for linked lists of keys … … but need linked list of blocks (overflow blocks) … but need linked list of blocks (overflow blocks)

68 Adapting to Disk

69 Adapting to Disk Is there any other issue? Map ‘bucket id’ to disk location

70 Adapting to disk 1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Contiguous blocks Contiguous blocks Store mapping in main memory Store mapping in main memory Too large? Too large?

71 Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!

72 Adapting to disk 1 Hash Bucket = 1 Block (or more than one contiguous blocks) 1 Hash Bucket = 1 Block (or more than one contiguous blocks) Bucket Id  Disk Address mapping Bucket Id  Disk Address mapping Number of buckets Number of buckets ≈ Number of keys (main memory version) ≈ Number of keys (main memory version) ≈ Number of blocks (disk version) ≈ Number of blocks (disk version) Textbook: Static Hash Table

73 Assigned Reading Insertion and Deletion on Static Hash Table Section 13.4

74 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

75 Dynamic Hash Indexes Static Hash Table: Static Hash Table: Fixed number of buckets Fixed number of buckets Waste space / inefficient Waste space / inefficient Dynamic Hash Tables: Dynamic Hash Tables: Number of buckets can increase / decrease dynamically Number of buckets can increase / decrease dynamically

76 Extensible Hash Table: Main Ideas (Abstract) Hash Function: {Keys}  {Large space of hash values} Hash Function: {Keys}  {Large space of hash values} Buckets dynamically partition space of hash values Buckets dynamically partition space of hash values Insertions: partitioning grows finer Insertions: partitioning grows finer i.e., more buckets i.e., more buckets Deletions: partitioning grows coarser Deletions: partitioning grows coarser i.e., fewer buckets i.e., fewer buckets

77 Extensible Hash Table: Main Ideas (concrete) Hash Function: {Keys}  bit string of length b Example: Bucket: prefix of bit string All (keys with) hash values having that prefix fall into that bucket

prefixes Hash Value  bucket?

i = 2 i = max length of prefix

80 i = 0. Insertion

81 i = Insertion

82 i = Insertion

83 i = Insertion

84 i = Insertion

85 i = Insertion

86 i = Insertion

87 i = Insertion

88 i = Insertion

89 i = Insertion

90 i = Insertion

91 i = Insertion

92 i = Insertion

93 i = Insertion

94 i = Insertion

95 Deletion Inverse of insertion: work out details

96 i = Textbook Notation Number of bits in prefix 0

97 Extensible Hash Table Directory doubles in size during some inserts One Issue:

98 Roadmap B-Tree B-Tree B-Tree variants B-Tree variants Hash-based Indexes Hash-based Indexes Static Hash Table Static Hash Table Extensible Hash Table Extensible Hash Table Linear Hash Table Linear Hash Table

99 Linear Hash Table Differences from Extensible Hash Table: Differences from Extensible Hash Table: Bucket: suffix of the hash value Bucket: suffix of the hash value Grows linearly (avoids doubling of directory) Grows linearly (avoids doubling of directory)

suffixes Linear Hash Table

Linear Growth

redistribute Linear Growth

redistribute Linear Growth

104 What does linear growth buy? i = Redundant if we know # buckets = 5

105 What does linear growth buy? i = i = 3 n = 3