Analyzing Plan Diagrams of Database Query Optimizers Naveen Reddy Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, INDIA.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Efficient Management of Inconsistent and Uncertain Data Renée J. Miller University of Toronto.
University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
Introduction to Structured Query Language (SQL)
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Fundamentals, Design, and Implementation, 9/e Chapter 7 Using SQL in Applications.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Motivation Mobile devices often work offline, and users often need to download large query results for later use. Results are often accessed in small pieces.
Introduction to Structured Query Language (SQL)
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
Query Processing Presented by Aung S. Win.
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
CSC2012 Database Technology & CSC2513 Database Systems.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.
Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi.
Access Path Selection in a Relational Database Management System Selinger et al.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CMSC424: Database Design Instructor: Amol Deshpande
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
TPC-H Studies Joe Chang
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
GSLPI: a Cost-based Query Progress Indicator
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
POP/FED: Progressive Query Optimization for Federated Queries in DB2 Wook-Shin Han, Volker Markl, Stephan Ewen Vijayshankar Raman, Holger Kache Goal: Add.
1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing and Query Optimization CS 157B Dennis Le Weishan Wang.
CSCI 3327 Visual Basic Chapter 13: Databases and LINQ UTPA – Fall 2011.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Random Sampling in Database Systems: Techniques and Applications Ke Yi Hong Kong University of Science and Technology Big Data.
Rationale Databases are an integral part of an organization. Aspiring Database Developers should be able to efficiently design and implement databases.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Chapter # 7 Introduction to Structured Query Language (SQL) Part II.
Anorexic Plan Diagrams
Overview of Query Evaluation
A Framework for Testing Query Transformation Rules
New Perspectives on Microsoft
Presentation transcript:

Analyzing Plan Diagrams of Database Query Optimizers Naveen Reddy Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, INDIA

August 2005Picasso (VLDB)2 Query Execution Plans SQL, the standard database query language, is declarative in nature – does not specify how the query should be evaluated – Example: select StudentName, CourseName from STUDENT, COURSE, REGISTER where STUDENT.RollNo = REGISTER.RollNo and REGISTER.CourseNo = COURSE.CourseNo join order and techniques are left unspecified DBMS query optimizer identifies efficient execution strategy: “query execution plan”

August 2005Picasso (VLDB)3 Example Query and Plan select c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment from customer, orders, nation where c_custkey = o_custkey and c_nationkey = n_nationkey and o_orderdate < date (' ') and n_name = 'IRAQ' RETURN 201,689 HSJOIN 201,689 TBSCAN 175,025 ORDERS HSJOIN 26,571 TBSCAN 50 CUSTOMERNATION TBSCAN 26,512

August 2005Picasso (VLDB)4 Query Plan Selection Core technique Cost difference between best plan choice and a sub-optimal choice can be enormous (orders of magnitude) Query (Q) Query Optimizer (dynamic programming) Minimum Cost Plan P(Q) DB catalogs Cost Model

August 2005Picasso (VLDB)5 Plan and Cost Diagrams Given a query, the optimizer’s plan choice is a function (among other factors) of the selectivities of the base relations participating in the query – selectivity is the estimated number of rows of a relation relevant to producing the final result A plan diagram is a pictorial enumeration of the plan choices of a database query optimizer over the relational selectivity space A cost diagram is a visualization of the associated (estimated) plan execution costs over the same relational selectivity space

August 2005Picasso (VLDB)6 Example Query [Q7 of TPC-H] select supp_nation, cust_nation, l_year, sum(volume) as revenue from (select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')) and l_shipdate between date ' ' and date ' ' group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year and o_totalprice < C1 and c_acctbal < C2) as shipping

August 2005Picasso (VLDB)7 Example Plan Diagram

August 2005Picasso (VLDB)8 Specific Plan Choices

August 2005Picasso (VLDB)9 Example Cost Diagram

PICASSO

August 2005Picasso (VLDB)11 Picasso A Java tool that, given a query template, automatically generates plan and cost diagrams – Fires queries at user-specified granularity (default 100x100 grid) – Currently restricted to 2-D plan diagrams and 3-D cost diagrams Using the tool, enumerated the plan/cost diagrams produced by industrial-strength query optimizers on TPC-H-based queries – IBM DB2 v8, Oracle 9i and Microsoft SQL Server 2000 Plan diagrams appear similar to cubist paintings [ Pablo Picasso  founder of the cubist painting genre ]

August 2005Picasso (VLDB)12 Picasso GUI

August 2005Picasso (VLDB)13 Testbed Environment Database – TPC-H database (1 GB scale) representing a manufacturing environment, featuring the following relations: Query Set – Queries based on TPC-H benchmark [Q1 through Q22] – Uniform 100x100 grid (10000 queries) [0.5%, 0.5%] to [99.5%, 99.5%] Relational Engines – Default installations (with all optimization features on) – Stats on all columns, no extra indices Computational Platform – Pentium-IV 2.4 GHz, 1GB RAM, Windows XP Professional RelationCardinality REGION NATION SUPPLIER CUSTOMER PART PARTSUPP ORDERS LINEITEM

RESULTS Optimizers randomly identified as Opt A, Opt B, Opt C NOT intended to make comparisons across optimizers Black-box testing  our conclusions are speculative Full result listing at

August 2005Picasso (VLDB)15 Smooth Plan Diagram [Q7, OptB]

August 2005Picasso (VLDB)16 Complex Plan Diagram [Q8, OptA*] Extremely fine- grained coverage (P68 ~ 0.02%) Highly irregular plan boundaries Intricate Complex Patterns Increases to 80 plans with 300x300 grid !

August 2005Picasso (VLDB)17 Cost Diagram [Q8, OptA*]

August 2005Picasso (VLDB)18 Skew in Plan Space Coverage TPC-H Query Avg(dense) Opt A Plan 80% Gini Cardinality Coverage Index 22 18% % % % % %0.79 Opt B Plan 80% Gini Cardinality Coverage Index 14 21% % % % % %0.72 Opt C Plan 80% Gini Cardinality Coverage Index 35 20% % % % % % % % % % % % % % % % % % % % % % % % Rule Gini index > 0.5 Dense  Plan Cardinality  10

August 2005Picasso (VLDB)19 Cost Domination Principle Cost of executing any “foreign” query point in the first quadrant of q s is an upper bound on the cost of executing the foreign plan at q s Cost of executing q s with Plans P 4 and P 1 is less than 91 and 90, respectively. Cost of Query point q s with plan P 2 is 88 qsqs

August 2005Picasso (VLDB)20 Cost Domination Principle Dominating Point Given a pair of distinct points q 1 (x 1,y 1 ) and q 2 (x 2,y 2 ) in 2-D selectivity space, we say that q 2 ≻ q 1, if and only if x 2 ≥ x 1, y 2 ≥ y 1 and result cardinality R q2 ≥ R q1 Cost Domination Principle If points q 1 (x 1,y 1 ) and q 2 (x 2,y 2 ) are associated with distinct plans P 1 and P 2 respectively, in the original space, the cost of executing query q 1 with plan P 2 is upper-bounded by the cost of executing q 2 with P 2, if and only if q 2 ≻ q 1

August 2005Picasso (VLDB)21 Plan Swallowing Algorithm 1. For each query point q s, look for replacements by “foreign” query points that are in the first quadrant relative to q s as the origin. 2. For the foreign points that are within λ (e.g. λ=10%) threshold, choose point with lowest cost as potential replacement. 3. An entire plan is “swallowed” only if all its query points can be swallowed by a single plan or group of plans. 4. Order the plans in ascending order of size; go up the list, checking for possibility of swallowing each plan.

August 2005Picasso (VLDB)22 Reduced Plan Diagram (λ=10%) Complex Plan Diagram [Q8, OptA*] Reduced to 7 plans from 68 Note: Plan Reduction ≠ Change in Optimization Levels

August 2005Picasso (VLDB)23 Plan Cardinality Reduction by Swallowing TPC-H Query Opt A % Avg Max Card Cost Cost Decrease Increase Increase Opt C % Avg Max Card Cost Cost Decrease Increase Increase Opt B % Avg Max Card Cost Cost Decrease Increase Increase Avg(dense) Average Cost Increase < 2%

Interesting Patterns Duplicates and Islands Plan Switch Points Footprint Pattern Speckle Pattern

August 2005Picasso (VLDB)25 Duplicates and Islands [Q10, OptA] Duplicate locations of P3 Duplicate locations of P10 P18 is an island within P6

August 2005Picasso (VLDB)26 Duplicates and Islands [Q5, OptC] Three duplicates of P7, which are islands within P1

August 2005Picasso (VLDB)27 Duplicates and Islands Removal Databases Opt A Opt B Opt C # Duplicates Original Reduced # Islands Original Reduced With Plan Reduction by Swallowing, significant decrease in duplicates and islands

August 2005Picasso (VLDB)28 Plan Switch Points [Q9, OptA] Plan Switch Point: line parallel to axis with a plan shift for all plans bordering the line. Hash-Join sequence PARTSUPP►◄SUPPLIER►◄PART is altered to PARTSUPP►◄PART►◄SUPPLIER

August 2005Picasso (VLDB)29 Venetian Blinds [Q9, Opt B] Six plans simultaneously change with rapid alternations to produce a “Venetian blinds” effect. Left-deep hash join across NATION, SUPPLIER and LINEITEM relations gets replaced by a right-deep hash join.

August 2005Picasso (VLDB)30 Footprint Pattern [Q7, Opt A] P7 exhibits a thin broken curved pattern in the middle of P2’s region. P2 has sort-merge-join at the top of the plan tree, while P7 has hash- join

August 2005Picasso (VLDB)31 Speckle Pattern [Q17, Opt A*] An additional sort operation is present on PART relation in P2, whose cost is very low

Non-Monotonic Cost Behavior Plan-Switch Non-Monotonic Costs Intra-Plan Non-Monotonic Costs

August 2005Picasso (VLDB)33 Plan-Switch Non-Monotonic Costs [Q2, Opt A] Plan Diagram Cost Diagram 26% Selectivity 50% Selectivity 26%: Cost decreases by a factor of 50 50%: Cost increases by a factor of 70 Presence of Rules? Parameterized changes in search space?

August 2005Picasso (VLDB)34 Intra-Plan Non-Monotonic Costs [Q21, Opt A] Plan Diagram Cost Diagram Plans P1, P3, P4 and P6 Nested loops join whose cost decreases with increasing input cardinalities

Relationship to PQO

August 2005Picasso (VLDB)36 PQO (Parametric Query Optimization) Active research area for last 15 years Identify the optimal set of plans for the entire relational selectivity space at compile time At run time, use actual selectivity values to identify the appropriate plan choice Assumptions – Plan Convexity: If a plan is optimal at two points, then it is optimal at all points on the straight line joining them – Plan Uniqueness: An optimal plan appears at only one contiguous region in the entire space – Plan Homogeneity: An optimal plan is optimal within the entire region enclosed by its plan boundaries

August 2005Picasso (VLDB)37 Validity of PQO [Q8, Opt A] Plan Convexity is severely violated by regions covered by P12 (dark green) and P16 (light gray). Plan Homogeneity is violated by P14 Plan uniqueness is violated by P4

August 2005Picasso (VLDB)38 Note: PQO is a more viable proposition in the space of reduced plan diagrams due to the removal of most duplicates and islands

August 2005Picasso (VLDB)39 Conclusions Conceived and developed the Picasso tool for automatically generating plan and cost diagrams Presented and analyzed representative plan and cost diagrams on popular commercial query optimizers – Optimizers make fine grained choices – Complexity of plan diagrams can be drastically reduced without materially affecting the query processing quality – Plan optimality regions can have intricate patterns and complex boundaries – Non-Monotonic cost behavior exists where increasing result cardinalities decrease the estimated cost – Basic assumptions underlying research literature on PQO do not hold in practice; but hold approximately for reduced plan diagrams

August 2005Picasso (VLDB)40 Future Work Extend Picasso to generate higher-dimensional plan and cost diagrams Port Picasso to Sybase and PostgreSQL Conduct a deeper investigation on the query features that result in high plan cardinalities Investigate whether it is possible to simplify the optimizer, so as to be able to directly produce reduced plan diagrams Public release of Picasso software (by end-2005)

QUESTIONS ?

END PICASSO