1 Lecture 18: Indexes Monday, November 10, 2003. 2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =

Slides:



Advertisements
Similar presentations
Indexes An index on a file speeds up selections on the search key fields for the index. Any subset of the fields of a relation can be the search key for.
Advertisements

B+-Trees and Hashing Techniques for Storage and Index Structures
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
Primary Indexes Dense Indexes
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Storage and Indexing February 26 th, 2003 Lecture 19.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Indexing - revisited CS 186, Fall 2012 R & G Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope ( )
DATABASE MANAGEMENT SYSTEMS TERM B. Tech II/IT II Semester UNIT-VIII PPT SLIDES Text Books: (1) DBMS by Raghu Ramakrishnan (2) DBMS by Sudarshan.
Adapted from Mike Franklin
Implementation of Relational Operators/Estimated Cost 1.Select 2.Join.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
File Organizations and Indexing
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
CS 540 Database Management Systems
Storage and Indexes Chapter 8 & 9
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
File Organizations and Indexing
File Organizations and Indexing
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
CSE 544: Lectures 13 and 14 Storing Data, Indexes
Lecture 21: B-Trees Monday, Nov. 19, 2001.
Lecture 6: Data Storage and Indexes
Storage and Indexing May 17th, 2002.
Adapted from Mike Franklin
CSE 544: Lecture 11 Storing Data, Indexes
Indexing 1.
Storage and Indexing.
General External Merge Sort
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Files and access methods
Indexing February 28th, 2003 Lecture 20.
Lecture 11: B+ Trees and Query Execution
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
Presentation transcript:

1 Lecture 18: Indexes Monday, November 10, 2003

2 Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid = takes.sid group by student.sid, student.sname

3 Midterm Problem 1b: Another solution: use “not in” select student.sname from student where 3.0 < all (select takes.sid from takes where student.sid = takes.sid)

4 Midterm Problem 1c: Another solution: “student.dept = ALL…” Most people got SQL right, or almost right select name from student where sid not in (select takes.sid from takes, courses where takes.cid = courses.cid and courses.dept <> student.dept)

5 Midterm Problem 2: Very few people got this detail right isa River Sea endsIn Water isa name

6 Midterm Problem 3a: Customer  Department Problem 3b: AC, BC, BD Most people got it right !

7 Midterm Problem 4a /db/website[document/author/text()="John Smith"]/url

8 Midterm Problem 4b This was the hard question: very few got it right Notice: the question had a second interpretation for $r in /db let $a = distinct-values( for $w in $r/website, $x in $w/document/auhtor/text() where count(distinct-values($w/document [author/text()=$x]/word/text())) > 5000 return $x ) for $z in $a return $z

9 Midterm Problem 4c This was easy, yet few got it right –Out of time ? Website(url ) Document(docID, url, author) Word(docID, word) Href(docID, href) Website(url ) Document(docID, url, author) Word(docID, word) Href(docID, href)

10 Outline Index structures (13.1, 13.2) B-trees (13.3) Administrative: Phase 2 of the project due Monday, 11/17 HW3 due on Friday, 11/21

Indexes An index on a file speeds up selections on the search key fields for the index. –Any subset of the fields of a relation can be the search key for an index on the relation. –Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given key value k.

12 Index Classification Primary/secondary –Primary = may reorder data according to index –Secondary = cannot reorder data Clustered/unclustered –Clustered = records close in the index are close in the data –Unclustered = records close in the index may be far in the data Dense/sparse –Dense = every key in the data appears in the index –Sparse = the index contains only some keys B+ tree / Hash table / …

13 Primary Index File is sorted on the index attribute Dense index: sequence of (key,pointer) pairs

14 Primary Index Sparse index

15 Primary Index with Duplicate Keys Dense index:

16 Primary Index with Duplicate Keys Sparse index: pointer to lowest search key in each block: Search for is here......but need to search here too

17 Better: pointer to lowest new search key in each block: Search for 20 Search for 15 ? 35 ? Primary Index with Duplicate Keys is here......ok to search from here 30

18 Secondary Indexes To index other attributes than primary key Always dense (why ?)

19 Clustered/Unclustered Primary indexes = usually clustered Secondary indexes = usually unclustered

Clustered vs. Unclustered Index Data entries ( Index File ) ( Data file ) Data Records Data entries Data Records CLUSTERED UNCLUSTERED

21 Secondary Indexes Applications: –index other attributes than primary key –index unsorted files (heap files) –index clustered data

22 Applications of Secondary Indexes Secondary indexes needed for heap files Also for Clustered data: Company(name, city), Product(pid, maker) Select city From Company, Product Where name=maker and pid=“p045” Select city From Company, Product Where name=maker and pid=“p045” Select pid From Company, Product Where name=maker and city=“Seattle” Select pid From Company, Product Where name=maker and city=“Seattle” Company 1Company 2Company 3 Products of company 1 Products of company 2Products of company 3

Composite Search Keys Composite Search Keys: Search on a combination of fields. –Equality query: Every field value is equal to a constant value. E.g. wrt index: age=20 and sal =75 –Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10 sue1375 bob cal joe nameagesal 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80, Data records sorted by name Data entries in index sorted by Data entries sorted by Examples of composite key indexes using lexicographic order.

24 B+ Trees Search trees Idea in B Trees: –make 1 node = 1 block Idea in B+ Trees: –Make leaves into a linked list (range queries are easier)

25 Parameter d = the degree Each node has >= d and <= 2d keys (except root) Each leaf has >=d and <= 2d keys: B+ Trees Basics Keys k < 30 Keys 30<=k<120 Keys 120<=k<240Keys 240<=k Next leaf

26 B+ Tree Example d = 2 Find the key  < 40  < 40  40

27 B+ Tree Design How large d ? Example: –Key size = 4 bytes –Pointer size = 8 bytes –Block size = 4096 byes 2d x 4 + (2d+1) x 8 <= 4096 d = 170

28 Searching a B+ Tree Exact key values: –Start at the root –Proceed down, to the leaf Range queries: –As above –Then sequential traversal Select name From people Where age = 25 Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30 Select name From people Where 20 <= age and age <= 30

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities: –Height 4: = 312,900,700 records –Height 3: = 2,352,637 records Can often hold top levels in buffer pool: –Level 1 = 1 page = 8 Kbytes –Level 2 = 133 pages = 1 Mbyte –Level 3 = 17,689 pages = 133 MBytes

30 Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only K1K2K3K4K5 P0P1P2P3P4p5 K1K2 P0P1P2 K4K5 P3P4p5 parent K3 parent

31 Insertion in a B+ Tree Insert K=19

32 Insertion in a B+ Tree After insertion

33 Insertion in a B+ Tree Now insert 25

34 Insertion in a B+ Tree After insertion 50

35 Insertion in a B+ Tree But now have to split ! 50

36 Insertion in a B+ Tree After the split

37 Deletion from a B+ Tree Delete

38 Deletion from a B+ Tree After deleting May change to 40, or not

39 Deletion from a B+ Tree Now delete

40 Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate

41 Deletion from a B+ Tree Now delete

42 Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 50

43 Deletion from a B+ Tree Final tree 50