ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.

Slides:



Advertisements
Similar presentations
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Relational Query Optimization (this time we really mean it)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Copyright © Curt Hill Query Evaluation Translating a query into action.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Database Systems ( 資料庫系統 ) Chapter 12 Overview of Query Evaluation November 22, 2004 By Hao-hua Chu ( 朱浩華 )
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part VIII Lecture 19, March 29, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
Database Management System
Chapter 12: Query Processing
Introduction to Query Optimization
Evaluation of Relational Operations: Other Operations
Relational Operations
CS222P: Principles of Data Management Notes #11 Selection, Projection
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Overview of Query Evaluation
Overview of Query Evaluation
Overview of Query Evaluation
ICOM 5016 – Introduction to Database Systems
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
Database Systems (資料庫系統)
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
ICOM 5016 – Introduction to Database Systems
Presentation transcript:

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to Query Processing ©Manuel Rodriguez – All rights reserved

ICOM 6005Dr. Manuel Rodriguez Martinez2 Query Evaluation Techniques Read : –Chapter 12, sec –Chapter 13 Purpose: –Study different algorithms to execute (evaluate) SQL relational operators Selection Projection Joins Aggregates Etc.

ICOM 6005Dr. Manuel Rodriguez Martinez3 Relational DBMS Architecture Disk Space Management Buffer Management File and Access Methods Relational Operators Query Optimizer Query Parser Client API Client DB Execution Engine Concurrency and Recovery

ICOM 6005Dr. Manuel Rodriguez Martinez4 Processing a query Parser – –transforms query into a tree expression (parse tree) –Looks into catalog for metadata about tables, attributes, operators, etc. Optimizer –transforms query expression tree into a query plan tree Tree with metadata about relations, attributes, operators, and algorithms to run each operator –Searches several alternative query plan trees to find the cheapest Based on I/O cost, CPU cost (and network cost) Execution Engine –Takes the plan from optimizer, interprets it and runs it.

ICOM 6005Dr. Manuel Rodriguez Martinez5 Issues in selecting a query plan Need to understand cost of different plans –Plan – algorithm to run a given relational operator –Example- Selection: “Get all students with gpa = 4.00” Plan 1 – scan heap file to find records with gap 4.00 Plan 2 – Use an index on gpa to find records with 4.00 Plan 3 – Sort table students on gap attribute, then scan sorted records for those with gpa = 4.00 Need to have statistics about: –Relation size, attribute size –Distribution of attribute values Uniform vs skewed –Disk speed –Memory size –Etc.

ICOM 6005Dr. Manuel Rodriguez Martinez6 System Catalog The System catalog (Catalog for short) –Collection of tables with data about the database data (metadata) and system configuration Often called data dictionary Closure property –Catalog is a collection of tables –Can be queried via SQL!!! Easy way to find out information about the system Every DBMS vendor has its own way to organize the catalog –Also have quick mechanism to read the information

ICOM 6005Dr. Manuel Rodriguez Martinez7 What information is stored? Table –Table name –File that stores and file type (heap file, clustered B+ tree, …) –Attributes names and types –Indices defined on the table –Integrity constrains –Statistics Index –Index name and type of structure (B+ tree, external hash, …) –Search key –Statistics Views –View name and definition

ICOM 6005Dr. Manuel Rodriguez Martinez8 What are the statistics? Relational Cardinality –Number of tuples in table R : Ntuples(R) Relation Size –Number of pages used to store R : NPages(R) Index Cardinality –Number of distinct key values in index I : NKeys(I) Index size –Number of pages for index I: NPages(I) B+-tree – number of leaf pages Number of Tuples Per Page –(Avg) Number of tuples in a page of R or I: NTPages(R) Index Height –Number of non-leaf levels: IHeight(I) Index Range –Min and max values in the index: ILow(I) and IHigh(I)

ICOM 6005Dr. Manuel Rodriguez Martinez9 Some issues about stats … DBMS must periodically update statistics Often done once or twice per week –More fine tuning can be done to increase accuracy Histograms can also be used –Give a better distribution of values Relation attributes and Indices search keys Tradeoff –Computing stats is expensive –Accurate stats yield very good query evaluation plans Often stats are a bit inaccurate, plus assumptions are simplified –Ex. : attribute values are distributed independently

ICOM 6005Dr. Manuel Rodriguez Martinez10 Sample Catalog Tables: –Sailors(sid:integer, sname:string, rating:integer, age:real); –Reservations(sid:integer, bid:integer, day:dates,rname:string); Catalog table for attributes: –Attribut_Cat(attr_name:string, rel_name:string, type:string, position:integer)

ICOM 6005Dr. Manuel Rodriguez Martinez11 Catalog Instance: Attribute_Cat Attr_nameRel_nameTypePosition Attr_nameAttribute_Catstring1 Rel_nameAttribute_Catstring2 TypeAttribute_Catstring3 PositionAttribute_CatInteger4 SidSailorsInteger1 SnameSailorsstring2 RatingSailorsinteger3 AgeSailorsreal4 SidReservesinteger1 BidReservesInteger2 DayReservesDates3 rnameReservesstring4

ICOM 6005Dr. Manuel Rodriguez Martinez12 Access paths An access path is a mechanism (algorithm) to retrieve tuples from a table(s). –Also called access method Three main schemes used for access paths –File Scan - iterate over all tuples in a table –Indexing - Use an index to extract records Good for selections –Partitioning – sort or hash the data before the tuples are examined Good for joins, aggregation, and projections

ICOM 6005Dr. Manuel Rodriguez Martinez13 Single-Table access paths Consider SQL query: Select sid, slogin, sname From Students Where gpa == 4.00 and age < 25; Need to read tuples and evaluate where clause! Accessing a table mostly done via two kinds of access paths –File Scan Read each page on data file (e.g. heap file) and get every tuple, then evaluate predicate –Index Scan Use B+-tree or Hashing to find tuples that match a condition (or part of it) on where clause

ICOM 6005Dr. Manuel Rodriguez Martinez14 SQL and relational algebra Typically query parser will transform SQL query into parse tree, and then into query plan tree Query plan tree is a tree that represents a relational algebra expression! Every node in the tree implements a relational operator or the scan operator. The scan operator is used to fetch tuples from disk –First access path that gets evaluated! The tuples processed by a given node are then passed to its parent on the plan tree.

ICOM 6005Dr. Manuel Rodriguez Martinez15 Conjuctive Normal form Where clause is assumed to be a conjunction: –Attr1 op value1 AND Attr2 op value2 … and AttrK op value K Each term Attr1 op value1 is called a predicate or conjunct. Each conjunct is a filter for the tuples. –Those tuples that do not pass the conjunct are excluded from the result Disjunctive normal form is also used but is less amicable to optimization Conjuncts are prime candidates for evaluation using an index.

ICOM 6005Dr. Manuel Rodriguez Martinez16 Index Matching Given a query Q, with a predicate (conjunct) p, we say that an index I matches the predicate p if and only if –the index can be used to retrieve just the tuples that satisfy p. The index might match all the conjuncts in the selection or just a few (or even just one) Primary conjuncts – conjuncts that are matched by the index If the index matches the whole where clause, index is enough to implement the selection Otherwise, use index to fetch tuples, evaluate the other predicates iteratively –Each predicate will filter out unwanted tuples

ICOM 6005Dr. Manuel Rodriguez Martinez17 Rules for matching Hash index: one or more conjunct in the form attribute = value in a selection have attributes that match the index search key. –Need to include all attributes in search key Tree Index (B+-tree or ISAM): one or more terms of the form attribute op value in a selection have attributes that match a prefix of the search key. –op can be any of : >,, >=, <= Note on prefixes: –If the search key for a B+tree is: the following are prefixes of this search key:,, –The following are not prefixes:,

ICOM 6005Dr. Manuel Rodriguez Martinez18 Examples on matching Assume tables: –Sailors(sid:integer, sname:string, rating:integer, age:real); –Reservations(sid:integer, bid:integer, day:dates,rname:string); Hash Index on Reservations & key –Matches: rname = “Bob” and bid = 5 and sid = 10 –Not matching : rname = “Bob”; rname = Bob and bid = 5 Matching a prefix tells me nothing, since hash key depends on the while set of attributes! Same scenario, but with B+-tree –Matches: rname = “Bob” and bid = 5 and sid = 10; rname = “Bob” and bid = 5; rname = “Bob” –Not matching: bid = 5 and sid = 10

ICOM 6005Dr. Manuel Rodriguez Martinez19 More examples Index on Reserves with search key Hash Index or B-tree match the condition –rname = “Tom” and bid = 10 and sid = 4 Why? –Can be rewritten as bid = 10 and sid = 4 and rname = “Tom” –First two conjuncts bid = 10 and sid = 4 match search key –Conjuct rname = “Tom” must be evaluate afterward Iteratively Matching deals with whether or not index can be used to make a first pass on the table.

ICOM 6005Dr. Manuel Rodriguez Martinez20 Selectivity of Access paths Each access paths a selectivity, which is the number of pages to be retrieved by it –Both data pages and index pages (if any) It is helpful to define the notion of selectivity factors for conjuncts –Given a predicate p, the selectivity factor p is the fraction of tuples in a relation R that satisfy p. –Denoted as SF p Selectivity for conjunct p applied to a relation R can then be computed as:

ICOM 6005Dr. Manuel Rodriguez Martinez21 Several useful SF formulas Equality: –Attr = value: SF = 1/# of distinct values for attr in table R –If we have an index I on attribute Attr, then SF = 1/ NKeys(I) –If the information is not available use: 1/10 Greater than: –Attr > value: SF = (IHigh(I) – value) / (IHigh(I) - ILow(I) ) –Otherwise, use 1/3 Attr between value1 and value2 –SF = (value2 – value1)/(IHigh(I) – ILow(I)) –Otherwise, use 1/4

ICOM 6005Dr. Manuel Rodriguez Martinez22 Selection and SF Selections can have where clause with CNF, or DNF. CNF case: –If where clause is of the form p1 and p2 and … and pn, then –Assume that all predicates are independent