Examples of Physical Query Plan Alternatives

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Optimization Goal: Declarative SQL query
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
1 Relational Query Optimization Module 5, Lecture 2.
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Implementation of Relational Operations Module 5, Lecture 1.
Relational Query Optimization (this time we really mean it)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Overview of Query Evaluation R&G Chapter 12 Lecture 13.
Query Optimization: Transformations May 29 th, 2002.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
CS186 Final Review Query Optimization.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
1 Implementation of Relational Operations: Joins.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
1 Database Systems ( 資料庫系統 ) Chapter 12 Overview of Query Evaluation November 22, 2004 By Hao-hua Chu ( 朱浩華 )
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction To Query Optimization and Examples Chpt
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Running Example – Airline
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Evaluation of Relational Operations
Overview of Query Optimization
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Relational Operations
Implementing Relational Operators Query Evaluation
CS222P: Principles of Data Management Notes #11 Selection, Projection
Relational Query Optimization
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Relational Query Optimization
Overview of Query Evaluation
Overview of Query Evaluation
Overview of Query Evaluation
Implementation of Relational Operations
Relational Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Database Systems (資料庫系統)
Relational Query Optimization (this time we really mean it)
Overview of Query Evaluation
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Evaluation of Relational Operations: Other Techniques
Relational Algebra Chpt 4a Xintao Wu Raghu Ramakrishnan
Presentation transcript:

Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 The slides for this text are organized into chapters. This lecture covers Chapter 12, providing an overview of query optimization and execution. This chapter is the first of a sequence (Chapters 12, 13, 14, 15) on query evaluation that might be covered in full in a course with a systems emphasis. It can also be used stand-alone, as a self-contained overview of these issues, in a course with an application emphasis. It covers the essential concepts in sufficient detail to support a discussion of physical database design and tuning in Chapter 20. 1

Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating a query. Optimizer is important for query performance: Generates alternative plans Chooses plan with least estimated cost. Ideally, find best plan. Realistically, consistently find a quite good one.

A Query Evaluation Plan An extended relational algebra tree Annotations at each node indicate: access methods to use for each table. implementation methods used for each relational operator.

A Query Evaluation Plan Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Reserves Sailors sid=sid bid=100 rating > 5 sname

Query Optimization Multi-operator Queries: Pipelined Evaluation On-the-fly: The result of one operator is pipelined to another operator without creating a temporary table to hold intermediate result, called on-the-fly. Materialized : Otherwise, intermediate results must be materialized before the next operator can access it. C B A

Alternative Plans: Schema Examples Reserves (sid: integer, bid: integer, day: dates, rname: string) Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 3

Alternative Plans: Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 RA Tree: Reserves Sailors sid=sid bid=100 rating > 5 sname 4

For each page of Sailors, scan Reserves 500+500*1000 I/Os Or, RA Tree: Reserves Sailors sid=sid bid=100 rating > 5 sname SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Costs : 1. Scan Sailors : For each page of Sailors, scan Reserves 500+500*1000 I/Os Or, 2. Scan Reserves For each page of Reserves, scan Sailors 1000+1000 * 500 I/Os Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Plan: 4

Alternative Plans: Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Cost: 500+500*1000 I/Os Typically, bad plan ! Reasons : selections could be `pushed’ earlier, no use made of indexes Goal of optimization: Find more efficient plan Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Plan: 4

Alternative Plans - 2 -- No Indexes Main idea : push selects down. Reserves Sailors sid=sid bid=100 rating > 5 sname 5

Alternative Plans - 2 -- No Indexes Main idea : push selects down. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) temp T2) (Sort-Merge Join) 5

Alternative Plan - 2 With 5 buffer pages, cost of plan: Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) temp T2) (Sort-Merge Join) With 5 buffer pages, cost of plan: Scan Reserves (1000) + write temp T1 (if we have 100 boats, uniform distribution then it is : 10 pages,). Scan Sailors (500) + write temp T2 ( if we have 10 ratings then it is : 250 pages). Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250) Total: 4060 page I/Os. 5

Alternative Plans - 2 With 5 buffer pages, Scanning and filtering: Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) temp T2) (Sort-Merge Join) Alternative Plans - 2 With 5 buffer pages, Scanning and filtering: 1010 + 750 IOs Optimization1: block nested loops join: join cost = 10+4*250, total cost = 2770. Optimization2: `push’ projections: T1 only sid, 10/4=[2.5]=3;T1 fits in 3 pages, T2 only sid and sname, 250/2=125 pages cost of BNL drops to 125 IOs, Total cost < 2000 IOs temp T1 temp T2 5

Alternative Plan : Using Indices? Reserves Sailors sid=sid bid=100 sname rating > 5 Push Selections Down ? What Indices help here? Index on Reserves.bid? Index on Sailors.rating? Index on Sailors.sid? Index on Reserves.sid? 6

Example Plan : With Index With index on Reserves.bid : Assume 100 bid values. Assume 100,000 tuples. Assume 100 tuples/disk We get 100,000/100 = 1000 tuples On 1000/100 = 10 disk pages. If index clustered, Cost = 10 I/Os. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) 6

Example Plan : With Index With index on Reserves.bid : Assume 100 bid values. Assume 100,000 tuples. Assume 100 tuples/disk We get 100,000/100 = 1000 tuples On 1000/100 = 10 disk pages. If index clustered, Cost = 10 I/Os. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) 6

Example Plan : Use Another Index Index on Sailors? Which? Selection on Sailors may reduce number of tuples considered in join. But then requires us to materialize the Sailor tuples again Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loop Join, with pipelining ) ? 6

Index Nested Loop with Pipelining: Outer is not materialized Projecting out unnecessary fields from outer doesn’t help Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) 6

Example Plan Continued Index on Sailors.sid : sid is key for Sailors. At most one matching tuple, unclustered on sid is OK. Cost? For each Reserves tuples (1000): get matching Sailors tuple (1.2 I/O). So total 1200 + 10 IOs. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) 6

Alternative Plan : With Second Index Selection Push down? Push (rating>5) before join ? Answer: No, because of availability of sid index on Sailors. Reason : No index on selection result. Then lookup requires scan Sailors. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) 6

Summary A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree. There are alternative evaluation algorithms for each relational operator. Query evaluation must compare alternative plans based on their estimated costs Must understand query optimization to understand performance impact of a given database design on a query workload 19