Indexes. An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a fixed value for attribute.

Slides:



Advertisements
Similar presentations
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
Advertisements

Database Modifications CIS 4301 Lecture Notes Lecture /30/2006.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
January 11, Csci 2111: Data and File Structures Week1, Lecture 1 Introduction to the Design and Specification of File Structures.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Lecture 8: Data structures for databases II Jose M. Peña
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Subqueries Example Find the name of the producer of ‘Star Wars’.
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
15.6 Index-Based Algorithms Sadiya Hameed ID: 206 CS257.
Chapter 8 File organization and Indices.
SQL. 1.SQL is a high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
SQL SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in.
Database Modifications A modification command does not return a result as a query does, but it changes the database in some way. There are three kinds.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Joins Natural join is obtained by: R NATURAL JOIN S; Example SELECT * FROM MovieStar NATURAL JOIN MovieExec; Theta join is obtained by: R JOIN S ON Example.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Primary Indexes Dense Indexes
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CS4432: Database Systems II
Indexing dww-database System.
Storage and Indexing February 26 th, 2003 Lecture 19.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
1 Physical Data Organization and Indexing Lecture 14.
© D. Wong  Indexes  JDBC  JDBC in J2EE (Java 2 Enterprise Edition)
Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree Traversal –Update, Read –Insert, Delete phantom problem: need.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Introduction to Indexes. Indexes An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a.
Storage and Indexing1 Overview of Storage and Indexing.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index Example From Garcia-Molina, Ullman, and Widom: Database Systems, the Complete Book pp
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
CS4432: Database Systems II
1 Constraints and Triggers in SQL. 2 Constraints are conditions that must hold on all valid relation instances SQL2 provides a variety of techniques for.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
File organization and Indexing
Advance Database System
Chapter 8 Views and Indexes
Presentation transcript:

Indexes

An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a fixed value for attribute A. –Helps with queries in which the attribute A is compared with a constant, for instance A = 3, or even A <= 3. Key for the index can be –any attribute or –set of attributes, and –need not be the key for the relation on which the index is built. Most important data structure used by a typical DBMS is the "B-tree," –which is a generalization of a balanced binary tree. –Will talk about them later (in another lecture)

B-Tree (we will talk in detail later) Recursive procedure: If we are at a leaf, look among the keys there. If the i-th key is K, the the i-th pointer will take us to the desired record. If we are at an internal node with keys K 1,K 2,…,K n, then if K<K 1 we follow the first pointer, if K 1  K<K 2 we follow the second pointer, and so on. Try to find a record with search key 40.

Motivation for Indexes Consider: SELECT * FROM Movie WHERE studioName = 'Disney' AND year =1990; There might be 10,000 Movies tuples, of which only 200 were made in –Naive way to implement this query is to get all 10,000 tuples and test the condition of the WHERE clause on each. –Much more efficient if we had some way of getting only the 200 tuples from the year 1990 and testing each of them to see if the studio was Disney. –Even more efficient if we could obtain directly only the 10 or so tuples that satisfied both the conditions of the WHERE clause.

Declaring Indexes Examples: CREATE INDEX YearIndex ON Movies(year); CREATE INDEX KeyIndex ON Movies(title, year); How the second compares to: CREATE INDEX KeyIndex ON Movies (year, title); When would it be beneficial to create the third vs. second? Dropping an index: DROP INDEX Year Index;

Selection of Indexes Trade-off The existence of an index on an attribute may speed up greatly the execution of those queries in which a value, or range of values,is specified for that attribute, and may speed up joins involving that attribute as well. On the other hand, every index built for one or more attributes of some relation makes insertions, deletions, and updates to that relation more complex and time- consuming.

Cost Model 1.Tuples of a relation are stored in many pages (blocks) of a disk. 2.One block, which is typically several thousand bytes (e.g. 16K) at least, will hold many tuples. 3.To examine even one tuple requires that the whole block be brought into main memory. 4.There is a great time saving if the block you want is already in main memory, but for simplicity we shall assume that never to be the case, and every block we need must be retrieved from the disk. 5.The cost of a query is dominated by the number of block accesses. Main memory accesses can be neglected.

Some Useful Indexes Often, the most useful index we can put on a relation is an index on its key. Two reasons: –Queries in which a value for the key is specified are common. –Since there is at most one tuple with a given key value, the index returns either nothing or one location for a tuple. Thus, at most one page of the relation must be retrieved to get that tuple into main memory Example SELECT name FROM Movie, MovieExec WHERE title = 'Star Wars' AND producerC =cert;

Some Useful Indexes Without Key Indexes Read each of the blocks of Movies and each of the blocks of MovieExec at least once. –In fact, since these blocks may be too numerous to fit in main memory at the same time, we may have to read each block from disk many times. With Key Indexes Only two block reads. –Index on the key (title, year) for Movies helps us find the one Movie tuple for 'Star Wars' quickly. Only one block - containing that tuple - is read from disk. –Then, after finding the producer-certificate number in that tuple, an index on the key cert for MovieExec helps us quickly find the one tuple for the producer in the MovieExec relation. Only one block is read again.

Non Beneficial Indexes When the index is not on a key, it may or may not be beneficial. Example (of not being beneficial) Suppose the only index we have on Movies is one on year, and we want to answer the query: SELECT * FROM Movie WHERE year = 1990; –Suppose the tuples of Movie are stored alphabetically by title. –Then this query gains little from the index on year. If there are, say, 100 movies per page, there is a good chance that any given page has at least one movie made in 1990.

Some Useful Indexes There are two situations in which an index can be effective, even if it is not on a key. 1.If the attribute is almost a key; that is, relatively few tuples have a given value for that attribute. Even if each of the tuples with a given value is on a different page, we shall not have to retrieve many pages from disk. Example Suppose Movies had an index on title rather than (title, year). SELECT name FROM Movie, MovieExec WHERE title = 'King Kong' AND producerC =cert;

Some Useful Indexes 2.If the tuples are "clustered" on the indexed attribute. We cluster a relation on an attribute by grouping the tuples with a common value for that attribute onto as few pages as possible. Then, even if there are many tuples, we shall not have to retrieve nearly as many pages as there are tuples. Example Suppose Movies had an index on year and tuples are clustered on year. SELECT * FROM Movie WHERE year = 1990;

Calculating the Best Indexes to Create StarsIn(movieTitle, movie Year, starName) Q1: SELECT movieTitle, movieYear FROM StarsIn WHERE starName = s; Q2: SELECT starName FROM StarsIn WHERE movieTitle = t AND movieYear= y; I: INSERT INTO Stars In VALUES(t, y, s);

Assumptions 1.StarsIn occupies 10 pages, so if we need to examine the entire relation the cost is On the average, a star has appeared in 3 movies and a movie has 3 stars. 3.Since the tuples for a given star or a given movie are likely to be spread over the 10 pages of StarsIn, even if we have an index on starName or on the combination of movie title and movieYear, it will take 3 disk accesses to find the 3 tuples for a star or movie. If we have no index on the star or movie, respectively, then 10 disk accesses are required. 4.One disk access is needed to read a page of the index every time we use that index to locate tuples with a given value for the indexed attribute(s). If an index page must be modified (in the case of an insertion), then another disk access is needed to write back the modified page. 5.Likewise, in the case of an insertion, one disk access is needed to read a page on which the new tuple will be placed, and another disk access is needed to write back this page. We assume that, even without an index, we can find some page on which an additional tuple will fit, without scanning the entire relation.

Costs p 1 is the fraction of times Q1 is executed p 2 is the fraction of times Q2 is executed 1-p 1 -p 2 is the fraction of times I is executed

Discussion If p 1 = p 2 = 0.1, then the expression 2+ 8p 1 + 8p 2 is the smallest, so we would prefer not to create any indexes. If p 1 = p 2 = 0.4, then the formula 6 - 2p 1 - 2p 2 turns out to be the smallest, so we would prefer indexes on both starName and on the (movieTitle, movieYear) combination. If p 1 = 0.5 and p 2 = 0.1, then an index on stars only gives the best average value, because 4 + 6p 2 is the formula with the smallest value. If p 1 = 0.1 and p 2 = 0.5, then create an index on only movies.