Lecture 11: Query processing and optimization Jose M. Peña

Slides:



Advertisements
Similar presentations
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
Advertisements

IS698: Database Management Min Song IS NJIT. The Relational Data Model.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
CS 540 Database Management Systems
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
SQL Lecture 10 Inst: Haya Sammaneh. Example Instance of Students Relation  Cardinality = 3, degree = 5, all rows distinct.
The Relational Database Model
1 Lecture 8: Data structures for databases II Jose M. Peña
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
QUERY OPTIMIZATION AND QUERY PROCESSING.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CS 4432query processing1 CS4432: Database Systems II.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Chapter 19 Query Processing and Optimization
CS405G: Introduction to Database Systems Final Review.
The Relational Model Codd (1970): based on set theory Relational model: represents the database as a collection of relations (a table of values --> file)
10/3/2000SIMS 257: Database Management -- Ray Larson Relational Algebra and Calculus University of California, Berkeley School of Information Management.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Processing Presented by Aung S. Win.
Relational Model & Relational Algebra. 2 Relational Model u Terminology of relational model. u How tables are used to represent data. u Connection between.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
1 The Relational Data Model, Relational Constraints, and The Relational Algebra.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CSCE Database Systems Chapter 15: Query Execution 1.
Database Management 9. course. Execution of queries.
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Query Processing and Optimization
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
FEN Introduction to the database field:  The Relational Model Seminar: Introduction to relational databases.
Module Coordinator Tan Szu Tak School of Information and Communication Technology, Politeknik Brunei Semester
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 17 A First Course in Database Systems.
1 CS 430 Database Theory Winter 2005 Lecture 4: Relational Model.
CS 157B Database Systems Dr. T Y Lin. Updates 1.Red color denotes updated data (ppt) 2.Class participation will be part of “extra” credits to to “quiz.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Relational Algebra p BIT DBMS II.
3 1 Database Systems The Relational Database Model.
Query Processing – Implementing Set Operations and Joins Chap. 19.
The Relational Model of Data Prof. Yin-Fu Huang CSIE, NYUST Chapter 2.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Week 2 Lecture The Relational Database Model Samuel ConnSamuel Conn, Faculty Suggestions for using the Lecture Slides.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Query Processing and Optimization, and Database Tuning
COMP3017 Advanced Databases
Database Management System
Relational Algebra Chapter 4, Part A
Chapter 15 QUERY EXECUTION.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query processing and optimization
QUERY OPTIMIZATION.
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
Presentation transcript:

Lecture 11: Query processing and optimization Jose M. Peña

ER diagram Relational model MySQL

Relation schema PNumberNameAddressTelephone Age Attributes yymmdd-xxxx Textual string less than 30 chars rrr - nn nn nn aaaaannn Positive integer 0<x<150 Domain = set of atomic values

Relation PNumberNameAddressTelephone Age Anders Andersson Rydsvägen andan Veronika Pettersson Alsätersg verpe22227 Tuple = list of values in the corresponding domains, or NULL

Key constraints Relation = set of tuples. Then, no duplicates are allowed. Then, every tuple is uniquely identifiable (superkey, candidate key, primary key which are all time-invariant). PNumberNameAddressTelephone Age Anders Andersson Rydsvägen andan Veronika Pettersson Alsätersg verpe22227

Integrity constraints Entity integrity constraint = no primary key value is NULL. A set of attributes FK in a relation R1 is a foreign key to another relation R2 with primary key PK if i.domain(FK) = domain(PK), and ii.FK in R1 takes value NULL or one of the values of PK in R2. Referential integrity constraint = conditions (i) and (ii) above hold.

Relational algebra Relational algebra = language for querying the relational model. It is a procedural language = how to carry out the query, as opposed to what to retrieve = declarative language, i.e. relational calculus. Basis for SQL. Basis for implementation and optimization of queries.

Select Selects the tuples of a relation satisfying some condition over its attributes.

Example: select PNumNameAddressTelNr ElinRydsvägen NisseAlsätersgatan NisseRydsvägen PelleRydsvägen MonikaRydsvägen PatrikRydsvägen CamillaAlsätersgatan STUDENT: PNumNameAddressTelNr NisseRydsvägen CamillaAlsätersgatan

Project Projects a relation over some attributes. The result must be a relation = duplicates are removed.

Example: project PNumNameAddressTelNr ElinRydsvägen NisseAlsätersgatan NisseRydsvägen STUDENT: PNumName Elin Nisse Nisse

Union, intersection and difference R and S must be compatible, i.e. the same number of attributes and with the same domains. The result must be a relation = duplicates are removed (union).

Example: Intersection PNumNameAddressTelNr ElinRydsvägen NisseAlsätersgatan NisseRydsvägen STUDENT: PNumNameOffice addressTelNr MonikaTeknikringen NisseAlsätersgatan PatrikTeknikringen EMPLOYEE: PNumNameAddressTelNr NisseAlsätersgatan

Cartesian product NameSTATE Los AngelesCalif OaklandCalif AtlantaGa San FransiscoCalif BostonMass KeyCity 5San Fransisco 7Oakland 8Boston NameSTATEKeyCity Los Angeles Calif5San Fransisco Los Angeles Calif7Oakland Los Angeles Calif8Boston Oakland Calif5San Fransisco Oakland Calif7Oakland Calif8Boston Atlanta Ga5San Fransisco Atlanta Ga7Oakland Atlanta Ga8Boston San Fransisco Calif5San Fransisco Calif7Oakland San Fransisco Calif8Boston Mass5San Fransisco Boston Mass7Oakland Boston Mass8Boston R: S: R x S

Join Joins two tuples from two relations if they satisfy some condition over their attributes. Join = Cartesian product followed by selection. Tuples with NULL in the condition attributes do not appear in the result. Recall: Join only on foreign key-primary key attributes. R.A1=S.B3 AND R.A5<S.A1 R S

Example: join NameSTATE Los AngelesCalif OaklandCalif AtlantaGa San FransiscoCalif BostonMass KeyCity 5San Fransisco 7Oakland 8Boston R: NameSTATEKeyCity Oakland Calif7Oakland San Fransisco Calif5San Fransisco Boston Mass8Boston S: R.Name=S.City R S

NameSTATEKeyCity Los Angeles Calif5San Fransisco Los Angeles Calif7Oakland Los Angeles Calif8Boston Oakland Calif5San Fransisco Oakland Calif7Oakland Calif8Boston Atlanta Ga5San Fransisco Atlanta Ga7Oakland Atlanta Ga8Boston San Fransisco Calif5San Fransisco Calif7Oakland San Fransisco Calif8Boston Mass5San Fransisco Boston Mass7Oakland Boston Mass8Boston

Example: join NameArea Los Angeles2 Oakland9 Atlanta7 San Fransisco11 Boston16 KeyCity 5San Fransisco 7Oakland 8Boston S: R: R.Area<=S.Key R S NameAreaKeyCity Los Angeles 25San Fransisco Los Angeles 27Oakland Los Angeles 28Boston Atlanta 77Oakland Atlanta 78Boston

NameAreaKeyCity Los Angeles 25San Fransisco Los Angeles 27Oakland Los Angeles 28Boston Oakland 95San Fransisco Oakland 97 98Boston Atlanta 75San Fransisco Atlanta 77Oakland Atlanta 78Boston San Fransisco 115San Fransisco 117Oakland San Fransisco 118Boston 165San Fransisco Boston 167Oakland Boston 168Boston

Variants of join Theta join = join. Equijoin = join with only equality conditions. Natural join = equijoin in which one of the duplicate attributes is removed (attributes in the conditions must have the same name). Unless otherwise specified, natural join joins all the attributes with the same name in R and S. A RS *

Example

Query trees Tree that represents a relational algebra expression. Leaves = base tables. Internal nodes = relational algebra operators applied to the node’s children. The tree is executed from leaves to root. Example: List the last name of the employees born after 1957 who work on a project named ”Aquarius”. SELECT E.LNAME FROM EMPLOYEE E, WORKS_ON W, PROJECT P WHERE P.PNAME = ‘Aquarius’ AND P.PNUMBER = W.PNO AND W.ESSN = E.SSN AND E.BDATE > ‘ ’ Canonial query tree SELECT attributes FROM A, B, C WHERE condition X C A B σ condition π attributes Construct the canonical query tree as follows Cartesian product of the FROM-tables Select with WHERE-condition Project to the SELECT-attributes

Equivalent query trees

Real world Model Physical database Database management system Processing of queries and updates Access to stored data QueriesAnswersUpdates User 4 QueriesAnswersUpdates User 3 QueriesAnswersUpdates User 2 QueriesAnswersUpdates User 1 Query processing

StarsIn( movieTitle, movieYear, starName ) MovieStar( name, address, gender, birthdate ) SELECT movieTitle FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ’%1960’); Canonical query tree (usually very inefficient)

Parsing and validating Control of used relations: –They have to be declared in FROM. –They must exist in the database. Control and resolve attributes: –Attributes must exist in the relations. Type checking: –Attributes that are compared must be of the same type.

Query optimizer Heuristic: Use joins instead of cartesian product+selections and do selection and projection as soon as possible, in order to keep the intermediate tables as small as possible, because –if the tables do not fit in memory, then we need to perform fewer disc accesses, –if the tables fit in memory, then we use less memory, –if the tables are distributed, then we reduce communication, and –if the tables have to be sorted, joined, etc., then we use less computation power

Query optimizer Heuristic algorithm: 1.Break up conjunctive select into cascade. 2.Move down select as far as possible in the tree. 3.Rearrange select operations: The most restrictive should be executed first. 4.Convert Cartesian product followed by selection into join. 5.Move down project operations as far as possible in the tree. Create new projections so that only the required attributes are involved in the tree. 6.Identify subtrees that can be executed by a single algorithm. Fewest tuples ? Smallest size ? Smallest selectivity ? DBMS catalog contains required info.

Equivalence rules

Execution plans Execution plan: Optimized query tree extended with access methods and algorithms to implement the operations.

Query optimizer Compare the estimate cost estimate of different execution plans and choose the cheapest. The cost estimate decomposes into the following components. –Access cost to secondary storage. Depends on the access method and file organization. Leading term for large databases. –Storage cost. Storing intermediate results on disk. –Computation cost. In-memory searching, sorting, computation. Leading term for small databases. –Memory usage cost. Memory buffers needed in the server. –Communication cost. Remote connection cost, network transfer cost. Leading term for distributed databases. The costs above are estimated via the information in the DBMS catalog (e.g. #records, record size, #blocks, primary and secondary access methods, #distinct values, selectivity, etc.).

Exercises True or false ? Optimize the queries below:

Solutions