Chiu Luk CS257 Database Systems Principles Spring 2009

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

CS4432: Database Systems II
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
CS 257, Spring’08 Presented By: Presented By: Gayatri Gopalakrishnan Gayatri Gopalakrishnan ID : 201.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Access Path Selection in a Relation Database Management System (summarized in section 2)
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Access Path Selection in a Relational Database Management System Selinger et al.
CSCE Database Systems Chapter 15: Query Execution 1.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
15.1 – Introduction to physical-Query-plan operators
Database Management System
Query Optimization Kush Kashyap B.Tech -IT.
Prepared by : Ankit Patel (226)
Ripple Joins for Online Aggregation
Physical Database Design for Relational Databases Step 3 – Step 8
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
Overview of Query Optimization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Introduction to Database Systems
File Processing : Query Processing
File Processing : Query Processing
Database Management Systems (CS 564)
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Database Query Execution
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 11 Database Performance Tuning and Query Optimization
Query Processing.
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Presentation transcript:

Chiu Luk CS257 Database Systems Principles Spring 2009 Query Compiler Chiu Luk CS257 Database Systems Principles Spring 2009

Cost-based query optimization 1. Generate all possible execution plans (heuristics to avoid some unlikely ones) 2. Estimate the cost of executing each of the generated plans 3. Choose the cheapest one Optimization criteria – # of disk blocks read (dominates) – CPU usage – Normally weighted average of different criteria. Based on data statistics and cost model for operators in physical relational algrebra

Query execution plan (physical algebra) Query execution plan is functional program with evaluation primitives: – Tuple scan operator – Tuple selection operator – Various index scan operators – Various join algorithms – Sort operator – Duplicate elimination operator – ..... Normally pipelined execution – Streams of tuples produced as intermediate results – Intermediate results can sometimes be materialized to

Degrees of freedom for optimizer Query plan must be efficient and correct Choice of, e.g.: – scan tuples sequentially – traverse index structure (e.g. B-tree, hash table) – Choose order of joining tables – Choose algorithms used for each join – Adapt to available main memory – pipeline intermediated results (streaming) – materialize intermediate results if favourable – Eliminate duplicates in stream – sort intermediate results

Query Cost Model Basic costs parameters – Cost of accessing disk block randomly – Data transfer rate – Clustering of data rows on disk – Sort order of data rows on disk – Cost of scanning disk segment containing tuples – Cost models for different index access methods (tree structures - hashing) – Cost models for different join methods – Cost of sorting intermediate results Total cost of an execution plan – The total cost depends on how often primitive operations are invoked. – The invocation frequency depends on size of intermediate results. – Intermediate results estimated by statistics computed over data stored in database.

Data statistics Statistics used to estimate size of intermediate results: – Size of tables – Number of different values in column – Histogram of distributions of column values – Model for estimating how selective a predicate is, its selectivity: • E.g. selectivity of PNR=xxxx, AGE>xxx, etc. – Model for estimating sizes of results from joins The models are often very rough – Work rather well since models used only for comparing different execution strategies - not for getting the exact execution costs. Cost of maintaining data statistics – Cheap: e.g size of relation, depth of B-tree. – Expensive: e.g. distibution on non-indexed columns, histograms – Occational statistics updates when load is low Statistics not always up-to-date – Wrong statistics -> sub-optimal but correct plans

Optimizing large queries Don’t optimize at all, i.e. order of predicates significant (old Oracle, old MySQL) Optimize partly, i.e. up to ca 8 joins, leave rest unoptimized (new Oracle) Heuristic methods (e.g. greedy optimization) Randomized (Monte Carlo) methods User breaks down large queries to many optimizable small queries manually (often necessary for translating relational representations to complex object structures in application programs)

END

References Database Systems: The Complete Book (2nd Edition) (Hardcover) by Hector Garcia-Molina (Author), Jeffrey D. Ullman (Author), Jennifer Widom (Author) Publisher : http://webhome.cs.uvic.ca/~thomo/csc586/querycomp.ppt http://www.net-security.org/dl/articles/Securing_IBM_DB2.pdf http://www.neresc.ac.uk/resources/OGSA-DQP-Nov1-2005.ppt http://user.it.uu.se/~torer/kurser/dbt/qproc0.pdf http://discovery.csc.ncsu.edu/Courses/csc742-S02/T02_Concepts.pdf