(Slides by Hector Garcia-Molina,

Slides:



Advertisements
Similar presentations
Index tuning-- B+tree. overview Variation on B+tree: B-tree (no +) Idea: –Avoid duplicate keys –Have record pointers in non-leaf nodes.
Advertisements

Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
Chapter 15 B External Methods – B-Trees. © 2004 Pearson Addison-Wesley. All rights reserved 15 B-2 B-Trees To organize the index file as an external search.
CS4432: Database Systems II
Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?
CS CS4432: Database Systems II Basic indexing.
1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CS 277 – Spring 2002Notes 41 CS 277: Database System Implementation Notes 4: Indexing Arthur Keller.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #7.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
CS 245Notes 41 CS 245: Database System Principles Notes 4: Indexing Hector Garcia-Molina.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CS 255: Database System Principles slides: B-trees
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
Index Tuning Conventional index Secondary index To speed up queries on attributes not within primary key Primary index –Determine.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Index Tuning Conventional index. Overview.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
B+ tree & B tree Extracted from Garcia Molina
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
Data Organization - B-trees
Data Indexing Herbert A. Evans.
CPS216: Data-intensive Computing Systems
CS 540 Database Management Systems
Indexing and hashing.
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
CS 245: Database System Principles Notes 4: Indexing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
CS 245: Database System Principles Notes 4: Indexing
CS232A: Database System Principles INDEXING
Database Management Systems (CS 564)
CPSC-629 Analysis of Algorithms
CPSC-310 Database Systems
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
Chapter 11: Indexing and Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
Indexing and Hashing Basic Concepts Ordered Indices
CS 245: Database System Principles Notes 4: Indexing
B+Tree Example n=3 Root
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
(a) Insert key = 32 n=
Indexing 4/11/2019.
CPSC-608 Database Systems
CPSC-608 Database Systems
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

(Slides by Hector Garcia-Molina, Chapter 13: Indexing (Slides by Hector Garcia-Molina, http://www-db.stanford.edu/~hector/cs245/notes.htm) Chapter 13

Chapter 13 Indexing & Hashing value record ? value Chapter 13

Topics Conventional indexes B-trees Hashing schemes (self-study) Chapter 13

Sequential File 20 10 40 30 60 50 80 70 100 90 Chapter 13

Sequential File Dense Index 10 20 30 40 50 60 70 80 90 100 10 20 30 40 110 120 100 90 Chapter 13

Sequential File Sparse Index 10 20 30 40 50 60 70 80 90 100 10 30 50 110 130 150 60 50 80 70 170 190 210 230 100 90 Chapter 13

Sequential File Sparse 2nd level 10 20 30 40 50 60 70 80 90 100 10 90 170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 Chapter 13

Question: Can we build a dense, 2nd level index for a dense index? Chapter 13

Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP Chapter 13

Notes on pointers: (2) If file is contiguous, then we can omit pointers (i.e., compute them) Chapter 13

K1 K2 K3 K4 R1 say: R2 1024 B per block R3 R4 if we want K3 block: get it at offset (3-1)1024 = 2048 bytes R2 K2 K3 R3 K4 R4 Chapter 13

Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Also: sparse better for insertions dense needed for secondary indexes) Chapter 13

Terms Index sequential file Search key (  primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Chapter 13

Next: Duplicate keys Deletion/Insertion Secondary indexes Chapter 13

Duplicate keys 10 20 10 30 20 30 45 40 Chapter 13

Dense index, one way to implement? Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 10 10 20 10 20 10 20 20 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40 Chapter 13

Duplicate keys Dense index, better way? 10 10 20 20 30 30 40 45 10 20 Chapter 13

Duplicate keys Sparse index, one way? careful if looking for 20 or 30! 10 10 10 20 20 10 30 30 20 30 45 40 Chapter 13

place first new key from block Duplicate keys Sparse index, another way? place first new key from block 10 should this be 40? 10 20 30 20 10 30 30 20 30 45 40 Chapter 13

Duplicate values, primary index Summary Index may point to first instance of each value only File Index a a a . b Chapter 13

Deletion from sparse index 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

Deletion from sparse index delete record 40 20 10 10 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

Deletion from sparse index delete record 30 20 10 10 40 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

Deletion from sparse index delete records 30 & 40 20 10 10 50 70 30 50 40 30 70 60 50 90 110 130 80 70 150 Chapter 13

Deletion from dense index 20 10 10 20 30 40 30 40 60 50 50 60 70 80 70 80 Chapter 13

Deletion from dense index delete record 30 20 10 10 20 40 40 30 30 40 40 60 50 50 60 70 80 70 80 Chapter 13

Insertion, sparse index case 20 10 10 30 40 30 60 50 40 60 Chapter 13

Insertion, sparse index case insert record 34 20 10 10 30 40 30 34 our lucky day! we have free space where we need it! 60 50 40 60 Chapter 13

Insertion, sparse index case insert record 15 20 10 15 20 30 10 30 40 30 60 50 40 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index 60 Chapter 13

Insertion, sparse index case insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 60 Chapter 13

Insertion, dense index case Similar Often more expensive . . . Chapter 13

Secondary indexes 30 50 20 70 80 40 100 10 90 60 Sequence field Chapter 13

Secondary indexes does not make sense! Sparse index 30 50 20 70 80 40 Sequence field Sparse index does not make sense! 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90 Chapter 13

Secondary indexes Dense index sparse high level 30 50 20 70 80 40 100 Sequence field Dense index 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Chapter 13

With secondary indexes: Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) Chapter 13

Summary so far Conventional index Basic Ideas: sparse, dense, multi-level… Duplicate Keys Deletion/Insertion Secondary indexes Chapter 13

Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Chapter 13

Outline: Conventional indexes B-Trees  NEXT Hashing schemes (self-study) Chapter 13

NEXT: Another type of index Give up on sequentiality of index Try to get “balance” Chapter 13

B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 30 35 100 101 110 180 200 150 156 179 Chapter 13

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 Chapter 13

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 85 Chapter 13

Size of nodes: n+1 pointers n keys (fixed) Chapter 13

Don’t want nodes to be too empty Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data Chapter 13

n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null Chapter 13

B+tree rules: tree of order n All leaves are at the same lowest level (balanced tree) (2) Pointers in leaves point to records, except for “sequence pointer” Chapter 13

(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 Chapter 13

Insert into B+tree (a) simple case (b) leaf overflow space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root Chapter 13

(a) Insert key = 32 n=3 100 30 3 5 11 30 31 32 Chapter 13

(a) Insert key = 7 n=3 100 30 7 3 5 11 30 31 3 5 7 Chapter 13

(c) Insert key = 160 n=3 100 160 120 150 180 180 150 156 179 180 200 160 179 Chapter 13

(d) New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25 32 40 40 45 Chapter 13

Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Chapter 13

n=4 (b) Coalesce with sibling Delete 50 10 40 100 40 10 20 30 40 50 Chapter 13

n=4 (c) Redistribute keys Delete 50 10 40 100 35 10 20 30 35 40 50 Chapter 13

n=4 (d) Non-leaf coalese Delete 37 new root 25 25 10 20 30 40 40 30 25 26 1 3 10 14 20 22 30 37 40 45 Chapter 13

B+tree deletions in practice Often, coalescing is not implemented Too hard and not worth it! Chapter 13

Variation on B+tree: B-tree (no +) Idea: Avoid duplicate keys Have record pointers in non-leaf nodes Chapter 13

K1 P1 K2 P2 K3 P3 to record to record to record with K1 with K2 with K3 to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3 K1 P1 K2 P2 K3 P3 Chapter 13

B-tree example n=2 sequence pointers not useful now! 65 125 25 45 85 (but keep space for simplicity) 65 125 25 45 85 105 145 165 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 Chapter 13

Tradeoffs:  B-trees have faster lookup than B+trees  in B-tree, non-leaf & leaf different sizes  in B-tree, deletion more complicated  B+trees preferred! Chapter 13

But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Chapter 13

So... B+ B Conclusion: For fixed block size, 8 records ooooooooooooo ooooooooo 156 records 108 records Total = 116 B+ B Conclusion: For fixed block size, B+ tree is better because it is bushier Chapter 13

Outline/summary Conventional Indexes B trees Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes (self-study) Chapter 13