1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Hashing and Indexing John Ortiz.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
BTrees & Bitmap Indexes
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina.
CS 277 – Spring 2002Notes 41 CS 277: Database System Implementation Notes 4: Indexing Arthur Keller.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #7.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B+ - Tree & B - Tree By Phi Thong Ho.
File Organizations March 2007R McFadyen ACS In SQL Server 2000 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
1 Classroom Exercise: Sequential Index uSuppose a block holds wx records or wy key-pointer pairs (as part of an index) uIf there are n records, how many.
CS 245Notes 41 CS 245: Database System Principles Notes 4: Indexing Hector Garcia-Molina.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Index Structures Parin Shah Id:-207. Topics Introduction Structure of B-tree Features of B-tree Applications of B-trees Insertion into B-tree Deletion.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CS 255: Database System Principles slides: B-trees
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Indexing dww-database System.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Chapter 14-1 Chapter Outline Types of Single-level Ordered Indexes –Primary Indexes –Clustering Indexes –Secondary Indexes Multilevel Indexes Dynamic Multilevel.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index Tuning Conventional index Secondary index To speed up queries on attributes not within primary key Primary index –Determine.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B+ tree & B tree Extracted from Garcia Molina
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
CS 405G: Introduction to Database Systems 12. Index.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Indexing and hashing.
CS 728 Advanced Database Systems Chapter 18
CS 245: Database System Principles Notes 4: Indexing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
CS 245: Database System Principles Notes 4: Indexing
(Slides by Hector Garcia-Molina,
CS 245: Database System Principles Notes 4: Indexing
B+Tree Example n=3 Root
Database Design and Programming
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina

2 Secondary Indexes uSometimes we want multiple indexes on a relation. wEx: search Candies(name,manf) both by name and by manufacturer uTypically the file would be sorted using the key (ex: name) and the primary index would be on that field. uThe secondary index is on any other attribute (ex: manf). uSecondary index also facilitates finding records, but cannot rely on them being sorted

3 Sparse Secondary Index? uNo! uSince records are not sorted on that key, cannot predict the location of a record from the location of any other record. uThus secondary indexes are always dense.

4 Sequence field Sparse index does not make sense!

5 Design of Secondary Indexes uAlways dense, usually with duplicates uConsists of key-pointer pairs ("key" means search key, not relation key) uEntries in index file are sorted by key uTherefore second-level index is sparse

6 Secondary indexes Sequence field sparse second- level dense first- level

7 Secondary Index and Duplicate Keys uScheme in previous diagram wastes space in the present of duplicate keys uIf a search key value appears n times in the data file, then there are n entries for it in the index.

8 Duplicate values & secondary indexes one option... Problem: excess overhead! disk space search time

9 Buckets uTo avoid repeating values, use a level of indirection uPut buckets between the secondary index file and the data file uOne entry in index for each search key K; its pointer goes to a location in a "bucket file", called the bucket for K uBucket holds pointers to all records with search key K

10 Duplicate values & secondary indexes buckets saves space as long as search-keys are larger than pointers and average key appears at least twice

11 Why “bucket” idea is useful IndexesRecords name: primary Emp (name,dept,floor,...) dept: secondary floor: secondary

12 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2 dept indexEmp floor index Toy 2  Intersect Toy dept bucket and floor 2 bucket to get set of matching Emp’s Saves disk I/O's

13 Summary of Indexes So Far uAdvantages: wsimple windex is sequential file, good for scans uDisadvantages weither inserts are expensive wor lose sequentiality (cf. next slide) uInstead use B-tree data structure to implement index

14 ExampleIndex (sequential) continuous free space overflow area (not sequential)

15 B-Trees uSeveral related data structures uKey features are: wautomatically adjust number of levels of indexes as size of data file changes wstorage on blocks is managed to keep every block between half full and full => no overflow blocks needed uWe'll actually study B+ trees

16 B-Tree Structure uan example of a balanced search tree: every root-to-leaf path has same length ueach node (vertex) in the tree is a block, which contains search keys and pointers uparameter n, which is largest value so that n+1 pointers and n keys fit in one block wEx: If block size is 4096 bytes, keys are 4 bytes, and pointers are 8 bytes, then n = 340.

17 Constraints on B-Tree Nodes uKeys in leaf nodes are copies of keys from data file, in sorted order uRoot contains between 2 and n+1 index node pointers uEach internal node contains between  (n+1)/2  and n+1 index node pointers uEach non-leaf node consists of ptr 1,key 1,ptr 2,key 2,…,key m-1,ptr m where ptr i points to index node with keys between key i-1 and key i

18 Constraints (cont'd) uEach leaf contains between  (n+1)/2  and n data record pointers, plus a "next leaf" pointer uAssociated with each data record pointer is a key, and the pointer points to the data record with that key

19 Example B-tree nodes with n = textbook notationmore concise notation Leaf: Non-leaf: to record with key 30 to record with key 35 to part of tree with keys < 30 to part of tree with keys ≥ 30

20 Sample non-leaf to keysto keysto keys to keys < 5757  k<8181  k<95 

21 Sample leaf node: From non-leaf node to next leaf in sequence To record with key 57 To record with key 81 To record with key 85

22 Full nodemin. node Non-leaf Leaf n= counts even if null

23 Root B-Tree Examplen= … to records …

24 Insert into B+tree (a) simple case wspace available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

25 (a) Insert key = 32 n=

26 (a) Insert key = 7 n=

27 (c) Insert key = 160 n=

28 (d) New root, insert 45 n= new root

29 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B-tree

30 (b) Coalesce with sibling wDelete n=4 40

31 (c) Redistribute keys wDelete n=4 35

(d) Non-leaf coalese wDelete 37 n= new root

33 B-tree deletions in practice –Often, coalescing is not implemented wToo hard and not worth it!

34 Applications of B-Trees uB-tree is used to implement indexes uThe data record pointers in the leaves correspond to the data record pointers in sequential indexes uSome example uses: wB-tree search key is primary key for data file, leaf pointers form a dense index on the file wB-tree search key is primary key for data file, leaf pointers form a sparse index on the file wB-tree search key is not primary key, leaf pointers form a dense index on the file

35 B-Trees with Duplicate Keys Change definition of B-tree: uIf key K appears in an internal node, then K is the smallest "new" key in the subtree S rooted at the pointer that follows K in the node u"New" means K does not appear in the part of the B-tree to the left of S but it does appear in S uAllow null key in certain situations

36 Example B-Tree with Duplicates

37 Lookup in B-Trees uAssume no duplicate keys. uAssume B-tree is a dense index. uTo find the record with key K, search starting at the root and ending at a leaf: wif current node is not a leaf and has keys K 1, K 2, …, K n, find the smallest key, K i, in the sequence that is ≤ K. wfollow the (i+1)-st pointer to a node at the next level and repeat wwhen a leaf node is reached, find the key with value K and follow the associated pointer to the data record

38 Range Queries with B-Trees uRange query: a query in which a range of values is sought. Examples: wSELECT * FROM R WHERE R.k > 40; wSELECT * FROM R WHERE R.k >= 10 AND R.k <= 25; uTo find all keys in the range [a,b]: wDo a lookup on a: leads to leaf where a could be wSearch the leaf for all keys ≥ a wIf we find a key > b, we are done wElse follow next-leaf pointer and continue searching in the next leaf wContinue until finding a key > b or no more leaves

39 Efficiency of B-Trees uB-trees allow lookup, insertion and deletion of records with very few disk I/Os uNumber of disk I/Os is number of levels in the B- tree plus cost of any reorganization uIf n is at least 10, then splitting/merging blocks will be rare and usually limited to the leaves uFor typical sizes of keys, pointers, blocks and files, 3 levels suffice (see next slide) uAlso can keep root block of B-tree in memory

40 Size of B-Tree uAssume w4096 bytes per block w4 bytes per key (e.g., integer) w8 bytes per pointer wno header info in the block uThen n = 340 (can keep n keys and n+1 pointers in a block) uAssume on average a block has 255 pointers uCount: wone node at level 1 (the root) w255 nodes at level 2 w255*255 = 65,025 nodes at level 3 (leaves) weach leaf has 255 pointers, so total number of records is more than 16 million