Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang.

Slides:



Advertisements
Similar presentations
B+-Trees and Hashing Techniques for Storage and Index Structures
Advertisements

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
2010/3/81 Lecture 8 on Physical Database DBMS has a view of the database as a collection of stored records, and that view is supported by the file manager.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Storage and Indexing February 26 th, 2003 Lecture 19.
Indexing - revisited CS 186, Fall 2012 R & G Chapter 8.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
File Organizations and Indexing
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
I/O Cost Model, Tree Indexes CS634 Lecture 5, Feb 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Indexing Lecture HW#3 & Project See course page for new instructions: submit source code and output of program on the given pairs of actors Can.
Tree-Structured Indexes Chapter 10
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
B+-Trees and Static Hashing
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Tree-Structured Indexes
Indexing and Hashing Basic Concepts Ordered Indices
Tree-Structured Indexes
Indexing 1.
Database Systems (資料庫系統)
Storage and Indexing.
Indexing 4/11/2019.
General External Merge Sort
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang Course URL

Spring 2003 ECE569 Lecture 05.2 Index  “If you don’t find it in the index, look very carefully through the entire catalog”  An index is a data structure that organizes data records on disk to optimize certain kind of retrieval operations.  A data entry refers to the records stored in an index file. A data entry with search key k, denoted as k*, contains enough information to locate (one or more) data records with search key value k. Three alternatives: 1. K* can be an actual data record 2. K* is a (k, tid) pair 3. K* is a (k, tid-list) pair

Spring 2003 ECE569 Lecture 05.3 Clustered Indexes  When is a file is organized so that the ordering of data records is the same as or close to the ordering of data entries in some index, we say that the index is clustered. l Alternative (1) is clustered l Alternatives (2) and (3) can be a clustered index only if the data records are sorted on the search key field => this is expensive => usually they are unclustered.

Spring 2003 ECE569 Lecture 05.4 Index Data Structures  Two basic approaches l Hash-based indexing l Tree-based indexing -ISAM tree -B + tree

Spring 2003 ECE569 Lecture 05.5 Indexed Sequential Access Method (ISAM)  Highly static  Each node is a disk page  Leaf nodes are first allocated, then index pages, then overflow pages  Once the ISAM file is created, inserts and deletes affect only the contents of leaf pages. Index pages leaf pages overflow pages primary pages

Spring 2003 ECE569 Lecture 05.6 ISAM lookup *15*20*27*33*37*40*46*51*55*63*97*  Primary leaf pages are allocated sequentially?  Is this assumption reasonable?  no “next-leaf” pointer is necessary

Spring 2003 ECE569 Lecture 05.7 ISAM insert *15*20*27*33*37*40*46*51*55*63*97*  Insert 23, 48, 41, 42 23*48*41* 42*

Spring 2003 ECE569 Lecture 05.8 ISAM delete  Removes the entry  If the page becomes empty l If it is overflow page, then delete it l If it is primary page, just leave it as a place holder

Spring 2003 ECE569 Lecture 05.9 ISAM discussion  Pros l We know that the index nodes will not be changed, so that we don’t need to lock them  Cons l Long chains of overflow pages are performance bottleneck

Spring 2003 ECE569 Lecture B + -tree  The tree grows/shrinks dynamically  Root index fits in one page and directs search for records in index below it  B + -tree is balanced, i.e., every path through tree is same length. Reasonably easy to maintain this property  Large fan-out of index nodes result in few levels. Three levels can address 16M pages (256 records / page)  depth Index entries (to direct search) data entries

Spring 2003 ECE569 Lecture Format of a node  Index node l An index node contains m entries, with d  m  2d. d is called the order of the tree. The root node is required to have 1  m  2d. l p 0 K 1 p 1 K 2 p 2 … K m p m  Leaf node l Leaf nodes contain the data entries. l A page contains at most 2e-1 records l Records sorted by key value l Doubly linked list

Spring 2003 ECE569 Lecture Lookup of key K  Assume B + -tree is of depth l  Construct path B 0 B 1 …B l-1 where l B 0 is root node l K j in block B i-1 covers K and j th block pointer in B i-1 is B i.  Example – Find key K = l Path is B0 (168) B1 (220) B7 l 245 is not in B7 => 245 is not in main file B0 B1B2B3 B4 B5 B6B7B8B9B10

Spring 2003 ECE569 Lecture Insertion of record with key K  Follow lookup procedure to find block in which K belongs. Path is B 0 B 1 …B l-1 l If room in B l-1, then insert there. (Maintain sorted order of B l-1 ) l Otherwise, allocate B’ and split records evenly between B l-1 and B’ l Keys in B l-1 are less than K’ and those in B’ are greater than or equal to K’ -Insert record (K’, B’) in Block B l-1. (This insertion can also cause split) l Splitting the root (maintain B0 as the root) -Allocate two new blocks B l and B r. -Move half of keys in root (keys smaller than K’) to B l and the rest (keys greater than or equal to K’) to B r. -Modify B0 (original root) to contain (B l, K’, B r ) -Depth is increased to l+1

Spring 2003 ECE569 Lecture Delete record with key K  lookup K and find path B 0 B 1 …B l-1  Delete record from B l-1  If B l-1 now contains fewer than e records l If a neighbor B’ has more than e records, divide the records between B l-1 and B’ as evenly as possible. Update any ancestors necessary to reflect change. l Otherwise, the records of B l-1 and one of its neighbors B’ can be combined. B’ is removed and (K’, B’) is removed from parent. This merge can propagate to the root. l If last two children of root are combined, depth is decreased.

Spring 2003 ECE569 Lecture Discussion  Merge operations have a high performance penalty; databases tend to grow, so some merges may not be necessary l Remove blocks when empty l Treat merge as a maintenance operation, and do it periodically  What kind of queries can B + -trees help with?

Spring 2003 ECE569 Lecture Dense Indices  Decouple allocation of tuples from access method l Allocate tuples following a heap organization (good utilization) l Access tuples using hashing, B + -tree, etc.  Access methods must be modified slightly l B + -trees:Keys adjacent in key space need not be physically adjacent. Need tuple pointer for each key value (not key range) in leaf nodes. (each tuple can be in a different page) l Hashing: Hash buckets contain key value, tuple pointer pairs

Spring 2003 ECE569 Lecture Secondary Indices  Primary indices provide access based on primary key  Secondary indices provide access based on search fields other than primary key  Index can be used to cluster tuples l Sparse B-tree can be easily modified

Spring 2003 ECE569 Lecture Secondary Indexes (cont’d)  Non-clustered indexes

Spring 2003 ECE569 Lecture Performance  Lookup requires l accesses where l is depth  The depth is directly dependant on the fanout of index nodes  Define (sparse B + -tree) – l n = number of records l R = number of records / block (max) l F = number of index entries / block (max0 l u = average node occupancy l R eff = R  u = average number of records / page l F eff = F  u = average number of index entries / page  N  F l-1 eff  R eff   log F eff ( n / R eff )  + 1 = l

Spring 2003 ECE569 Lecture Performance – cont’d  Utilization l If nodes are merged as described above u  69% l If nodes are removed when empty -# inserts = # deletes, u  40% -60% inserts, 40% deletes, u  60%

Spring 2003 ECE569 Lecture Example  4000 bytes / block  200 bytes / record  Key requires 20 bytes  Block pointer requires 4 bytes  n =  What is the depth? ( 4)

Spring 2003 ECE569 Lecture Key compression  Tree height can be reduced by increasing fan-out of index nodes  Key compression can increase the number of keys that can be stored in an index node l Suffix compression -Store only enough of the key value to discriminate between the children of the index node. For the following keys artful deliver hand access alert amassartful boom dealDeliver everyone fiddleHand integral leaf -Only need to store the following ar del h access alert amassartful boom dealDeliver everyone fiddleHand integral leaf

Spring 2003 ECE569 Lecture Key compression (cont’d) l Prefix compression -Rather than storing each key value, store the difference from the previous key value -Represent key i as where j is the length of the common prefix shared by key i and key i-1, and key i ’ is the remainder of key i after the common prefix is removed -The following keys (length 36 bytes) – can, cannon, canter, cantor, capacity, capital – can be encoded as (length 27 bytes) -,,,,, -How would you decide which of these two techniques to use in a particular situation?