CMSC724: Database Management Systems Instructor: Amol Deshpande

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Query Optimization Goal: Declarative SQL query
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
1 Relational Query Optimization Module 5, Lecture 2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
CMSC828K: Anatomy of a Database System Instructor: Amol Deshpande
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CMSC424: Database Design Instructor: Amol Deshpande
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Processing CS 405G Introduction to Database Systems.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
CS 440 Database Management Systems Query Optimization 1.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Chapter 14: Query Optimization
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Introduction to Query Optimization
Chapter 15 QUERY EXECUTION.
Introduction to Database Systems
Overview of Query Evaluation
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Query Optimization.
Monday, 5/13/2002 Hash table indexes, query optimization
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

CMSC724: Database Management Systems Instructor: Amol Deshpande

Query Processing Assume single-user, single-threaded Concurrency managed by lower layers Steps: Parsing: attritube references, syntax etc… Catalog stored as “denormalized” tables Rewriting: Views, constants, logical rewrites (transitive predicates, true/false predicates), semantic (using constraints), subquery flattening

Query Processing Steps: Optimizer Executor “get_next()” iterator model  Narrow interface between iterators  Can be implemented independently  Assumes no-blocking-I/O Some low-level details  Tuple-descriptors  Very carefully allocated memory slots  “avoid in-memory copies”  Pin and unpin

Query Processing SQL Update/Delete “Halloween” problem Access Methods B+-Tree and heap files Multi-dimensional indexes not common init(SARG) “avoid too many back-and-forth function calls” Allow access by RID

Query Optimization Goal: Given a SQL query, find the best physical operator tree to execute the query Problems: Huge plan space More importantly, cheapest plan orders of magnitude cheaper than worst plans Typical compromise: avoid really bad plans Complex operators/semantics etc (R outerjoin S) join T =? R outerjoin (S join T)

Query Optimization Heuristical approaches Perform selection early (reduce number of tuples) Perform projection early (reduce number of attributes) Perform most restrictive selection and join operations before other similar operations. Don’t do Cartesian products INGRES: Always use NL-Join (indexed inner when possible) Order relations from smallest to biggest

Query Optimization A systematic approach: Define “plan space” A “cost estimation technique” An “enumeration algorithm” to search through the space

System-R Query Optimizer Define “plan space” Left-deep plans, no Cartesian products Nested-loops and sort-merge joins, sequential scans or index scans A “cost estimation technique” Use statistics (e.g. size of index, max, min etc) or magic numbers Formulas for computing the costs An “enumeration algorithm” to search through the space Dynamic programming

System-R Query Optimizer Dynamic programming Uses “principle of optimality” Bottom-up algorithm Compute the optimal plan(s) for each k-way join, k = 1, …, n  Only O(2^n) instead of O(n!) Computers plans for different “interesting orders”  Extended to “physical properties” later Another way to look at it:  Plans are not comparable if they produce results in different orders  An instance of multi-criteria optimization

Since then… Search space “Bushy” plans (especially useful for parallelization) Cartesian products (star queries in data warehouses) Algebraic transformations Can “group by” and “join” commute ? More physical operators Hash joins, semi-joins (crucial for distributed systems) Sub-query flattening, merging views “Query rewrite” Parallel/distributed scenarios…

Since then… Statistics and cost estimation Optimization only as good as cost estimates Histograms, sampling commonly used Correlations ? Ex: where model = “accord” and make = “honda”  Say both have selectivities  Then combined selectivity is also , not Learning from previous executions Learning optimizer SITS (MS SQL Server) Cost metric: Response time in parallel databases, buffer utilization…

Since then… Enumeration techniques Bottom-up more common Easier to implement, low memory footprint Top-down (Volcano/Cascades/SQL Server) More extensible, typically larger memory footprint etc... Neither work for large number of tables Randomized, genetic etc… More common to use heuristics instead “Parametric query optimization”

Other issues Non-centralized environments Distributed/parallel, P2P Data streams, web services Sensor networks?? User-defined functions Materialized views