Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.

Slides:



Advertisements
Similar presentations
Query optimisation.
Advertisements

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Query Processing (overview)
Task Assignment and Transaction Clustering Heuristics.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
Session – 9 QUERY OPTIMATIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
1 Query Optimization Vishy Poosala Bell Labs. 2 Outline Introduction Necessary Details –Cost Estimation –Result Size Estimation Standard approach for.
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
low level data manipulation
Access Path Selection in a Relational Database Management System Selinger et al.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
1 6. Distributed Query Optimization Chapter 9 Optimization of Distributed Queries.
Overview of Query Processing
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Query Processor  A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Ricochet Robots Mitch Powell Daniel Tilgner. Abstract Ricochet robots is a board game created in Germany in A player is given 30 seconds to find.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Query Processing and Optimization Muheet Ahmed Butt.
Chapter 13: Query Processing
L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Database System Implementation CSE 507
Database Management System
Query Optimization Kush Kashyap B.Tech -IT.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Outline Introduction Background Distributed DBMS Architecture
QUERY OPTIMIZATION.
Advance Database Systems
Chapter 12 Query Processing (1)
A Framework for Testing Query Transformation Rules
Distributed Database Management Systems
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Distributed Database Management Systems
Presentation transcript:

Query Optimization

Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost. In earlier distributed query optimizers was ignored that local processing cost such as (I/O, CPU) are also important. The important inputs to the optimizers are to estimating execution costs are fragments statistics and formulas for estimating the cardinalities of results of relational operations. Query optimization is a general term i.e., independent of whether the environment is centralized or distributed. Query optimization point out the process of producing a query execution plan(QEP) which represents and execution strategy for the query .

This strategy or plan is used to minimize the cost function. A query optimizer is a software module that is used to perform query optimization. Normally it has three main components A Search space A cost model Search strategy Search Space: The search space is the set of alternative execution plans to represent the input query. These plans are normally equal to yield the same result but their execution order of operations, and the way these operations are implemented and therefore on performance.

The search space is obtained by using the transformation rules i. e The search space is obtained by using the transformation rules i.e. relation algebra E.g The query “Find the names of employees other than “J.Doe” who worked on the CAD/CAM project for either one or two years “ sql query for above search Select ename from proj, asg, emp where asg.end = emp.end AND asg.pno = proj.pno AND ename not = “J.Deo” AND proj.pname = “CAD/CAM” AND (DUR = 12 OR DUR =24)

A tree transformation of the above query

We can make another tree for the same above query

Another format for the above query

Search space (Cont) The above three trees shown the transformation rules Query execution plans are typically abstracted by mean of operator trees which define the order in which the operations are executed. These operations are filled from additional information, such as the best algorithm chosen for each operation. So we can say that the search space can be defined as the set of equivalent operator trees that can be produced using transformation rules. To characterize query optimizer, it is useful to concentrate on join and operator trees.

Search space e.g. Select ename, resp from emp, asg, proj where emp.eno=asg.eno AND asg.pno=proj.pno

The 1st restriction is use the heuristics. The c part of the equivalent join tree starts from Cartesian product may have a much higher cost than the other join trees. Query optimizers typically restricts the size of the search space they consider. The 1st restriction is use the heuristics. The most common heuristics is to perform selection and projection when accessing base relations. Another important restriction is with the shape of join tree. Two kind of join trees are usually distinguished. A linear tree A bushy tree Linear Tree: A tree that at least one operand of each operator node is a base relation. By considering the linear trees the size of search space can be reduced.

Bushy Tree: Is more general and may have operators with no base relations as operands. However bushy trees useful in distributed environment.

Cost Model Total_Time=Tcpu*#insts+TI/O*#I/Os+TMSG*#msgs+TTR*#bytes The cost model predicts the cost of a given execution plan The cost of a distributed execution plan can be calculated either in the form of total time or the response time. The total time is sum of all time components While the response time is the elapsed time from the initiation to the completion of query. A general formula for determinig total time can be specified by [Lohman et al.,1985] Total_Time=Tcpu*#insts+TI/O*#I/Os+TMSG*#msgs+TTR*#bytes TCPU is the time of a CPU instruction Ti/o is the time of disk I/O. The communication time is watched by the two last components

Tmsg is fixed tunning of sending and receiving any message TTR is the time it takes to transmit a data unit from one site to another site. The data unit is used in term of bytes( #bytes is the sum of the sizes of all messages)

Search Strategy This explores the search space and selects the best plan, using the cost model. It defines which plans are examined and in which order and details of the environment are captured by the search space and the cost model. The most popular search strategy used by optimizers is dynamic programming that is deterministic. Deterministic strategies building plans starting from base relation and add one relation at each step until complete plans are obtained. Dynamic programming builds all possible plans before it selects the “best” plan.

On deterministic strategy is that partial plans those are not lead to the optimal plans are pruned as they found. (B.D.F) Another deterministic strategy the greedy algorithm, builds only one plan.(D.F) Dynamic programming is an exhaustive and try to get the “BEST” of all plans found. It is acceptable when less number of relations are in the query. Unlike deterministic strategies , randomized strategies allow the optimizer to obtain optimization for execution time. Such as iterative improvements Simulated Annealing focus on searching the optimal solution around some particular points. But it also not guarantee that the best solution is obtained but avoid the high cost of optimization, in term of memory and time consumption. 

Thanks