ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
B+-tree and Hashing.
Tree-Structured Indexes Lecture 5 R & G Chapter 9 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
Tree-Structured Indexes Lecture 17 R & G Chapter 9 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Introduction to Database Systems1 Indexing Techniques Storage Technology: Topic 4.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
ISAM: Indexed-Sequential-Access-Method
Indexing (cont.). Insertion in a B+ Tree Another B+ Tree
Tree-Structured Indexes Lecture 5 R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Storage and Indexing February 26 th, 2003 Lecture 19.
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Tree-Structured Indexes Lecture 5 R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
1.1 CS220 Database Systems Tree Based Indexing: B+-tree Slides courtesy G. Kollios Boston University via UC Berkeley.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree- and Hash-Structured Indexes Selected Sections of Chapters 10 & 11.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
B+ Tree Index tuning--. overview B + -Tree Scalability Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities (root at.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 Database Systems ( 資料庫系統 ) November 1, 2004 Lecture #8 By Hao-hua Chu ( 朱浩華 )
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Data on External Storage – File Organization and Indexing – Cluster Indexes - Primary and Secondary Indexes – Index data Structures – Hash Based Indexing.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
I/O Cost Model, Tree Indexes CS634 Lecture 5, Feb 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Tree-Structured Indexes Chapter 10
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Tree-Structured Indexes R & G Chapter 10 “If I had eight hours to chop down a tree, I'd spend six sharpening my ax.” Abraham Lincoln.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Tree-Structured Indexes: Introduction
Tree-Structured Indexes
Tree-Structured Indexes
COP Introduction to Database Structures
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
B+-Trees and Static Hashing
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Introduction to Database Systems Tree Based Indexing: B+-tree
Tree-Structured Indexes
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Tree-Structured Indexes
Indexing 1.
Database Systems (資料庫系統)
Storage and Indexing.
Database Systems (資料庫系統)
General External Merge Sort
Tree-Structured Indexes
Tree-Structured Indexes
Indexing February 28th, 2003 Lecture 20.
Tree-Structured Indexes
Tree-Structured Indexes
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim -- University of Hawaii at Manoa

How to speed up queries? 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa2 SELECT * FROM Sailors WHERE age>40 File of Record for Sailors Array of Sailor Tuples/Records

Binary Search Trees Given search value – if value < node.value, then follow left pointer – Else follow right pointer How do generalize each index node to an index page ? How do we generalize this to search pages of records ? 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa

Indexes What do we store in the index nodes ? Let k be the key value for an index entry: 1.Data record with key value k What kind of queries does the index support? – Range – Point (or equality) 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa4

Indexed Sequential Access Method (ISAM) Static (m+1)-way Search Tree 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa5 P 0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf Pages Overflow page Primary pages Leaf

ISAM: Example Store data record at the leaf pages Do we still need the file of record ? 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa6 sid sname rating age Insert new record with age 98

ISAM Facts File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. Index entries: ; they `direct’ search for data entries, which are in leaf pages. Search: Start at root; use key comparisons to go to leaf. Cost=O(log F N) ; F = # entries/index pg, N = # leaf pgs Insert: Find leaf data entry belongs to, and put it there. If full, allocate and put in overflow page Delete: Find and remove from leaf; if empty overflow page, de-allocate. Static tree structure: inserts/deletes affect only leaf pages. 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa7

B+ Tree Index Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. Supports equality and range- searches efficiently. 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa8 B+ Tree Index Data Entries/Leaf Pages (“Sequence Set”) Index Entries

B+ Tree: Search Example Leaf entries store pairs What is the order ? Search for: age=5, age=15, age>=24 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa

Inserting a new data entry Find correct leaf L. Put data entry onto L. – If L has enough space, done! – Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively – To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top. 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa10

Example: Insert 8* 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa copied up to parent node pushed up into parent node 17

Deleting a data entry Start at root, find leaf L where entry belongs. Remove the entry. – If L is at least half-full, done! – If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height. 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa12

Miscellaneous How do we handle data with duplicates ? – Overflow buckets – Make rid part of the key – Each data entry stores Clustered vs Unclustered indexes 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa13

Bulk Loading a B+ Tree If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. Bulk Loading can be done much more efficiently. Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page. 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa14 3* 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* Sorted pages of data entries; not yet in B+ tree Root

Bulk Loading (cont.) Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.) Much faster than repeated inserts, especially when one considers locking! 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa15

Creating Indexes Most DBMS (eg. DB2) supports only B+ tree indexes: CREATE INDEX myIdx ON mytable(col1, col3) CREATE UNIQUE INDEX myUniqIdx ON mytable(col2, col5) CREATE INDEX myIdx ON mytable(col1, col3) CLUSTER If a primary key is specified in the CREATE TABLE statement, an (unclustered) index is automatically created for the PK. To create a clustered PK index: – Create table without PK constraint – Create index on PK with cluster option – Alter table to add PK constraint To get rid of unused indexes: DROP INDEX myIdx; 02/18/2010Lipyeow Lim -- University of Hawaii at Manoa16