Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

Query optimisation.
Examples of Physical Query Plan Alternatives
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
CMSC724: Database Management Systems Instructor: Amol Deshpande
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Query Processing & Optimization
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
Query Processing Presented by Aung S. Win.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing and Optimization
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Methodology – Physical Database Design for Relational Databases.
Relational DBs Basics. Formally understood Set theoretic Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb.
Query Optimization Heuristic Optimization
Database Management System
Query Optimization Kush Kashyap B.Tech -IT.
Teradata Join Processing
Modern Systems Analysis and Design Third Edition
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Introduction to Database Systems
File Processing : Query Processing
File Processing : Query Processing
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Chapter 11 Database Performance Tuning and Query Optimization
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.

Pre-processing a query  Convert SQL to algebra  Create a binary parse tree  Assign execution fragments to each interior node

Optimizing a query: stage 1  Manipulate parse tree to create new, but equivalent trees that will be faster independently of the properties of the database  Select a handful that would most likely be the fastest

Stage 2: collect information about the DB’s physical state  Size of tables being manipulated (in both rows and space)  Sorting order of tables  Location of indices on tables  Distribution of PKs and FKs (to estimate hit ratio on joins)  Hit ratio of attributes referenced by 1-operator commands (select and project)

Stage 3: transform parse tree into a tree that is likely to be faster  Heuristics: unary operations  Push unary operations down stream  Bundle unary operations on common leaves  Goal is to perform unary operations together and before binary operations, to make binary operations faster  Heuristics: joins  Most join algorithms tree the two operands differently  Make use of sort orders and index locations

Stage 4: choose implementations of binary operations  If both tables sorted by desired attributes, do a merge join  If one table has an index, to a nested join where the inner table is accessed via the index and the outer one is accessed sequentially  Order unary in-line operations so that only necessary attributes are funneled upstream

Stage 4, Notes on joins  If no indices exist and the tables are not sorted the way we want, build an index on one, then do a sort merge or a nested join  If both tables have indices, use the one for the shortest table  If the hit ratio of a join will be very small, do it as early as you can

Stage 5: consider multiple execution plans  Estimate the costs of various execution plans and pick the best one  Balance time spent optimizing versus time spent executing  For one time only transactions, pick a plan fast

Other important issues  Use indices that are fast with respect to pulling in blocks from the disk  Wide, but shallow trees, like b-trees  An index on a sorted attribute can be interpolated  Hash indices good for key-based selections, but are not as versatile as b-trees

A note on b-trees  They are dynamic and therefore do not suffer from the migrating artificial primary key problems  Since they are dynamic, the tradeoff is that they are slightly less space efficient and often lead to an extra layer in the tree

Important: manipulating parse trees  There is a sort of algebra of parse tree manipulation  For example, A join B and B join A are equivalent  (A join B) join C = A join (B join C)  If a and b are used as join attributes, we can strip away all but those and the ones that will be chosen by a later projection or selection

More notes on optimization  The smallest parse tree might not be the best  Sometimes we add unary operations to make the operands of binary operations smaller  If a unary operation will have a very small hit ratio, do it early  The cost of a complete execution plan is complex to calculate  Statistics of the data size and value distribution  Choice of parse tree  Choice of execution plan for each tree  !! It is not a linear process

In light of modern database applications  Often the cost of optimization is not worth it  Many one of a kind transactions  Database with very small transaction load, perhaps do to multiple servers and hard drives  Tables are small and queries are simple, and so almost any execution plan is good  This is a key reason by traditional relational database servers are often far less efficient than no-SQL DBs – because they were built to efficiently run a high volume of similar transactions that involve multiple tables

Making use of parallelism  SQL, compared to various non-declarative languages, is very east to parallelize  But communication costs can be very high if the servers are not co-located  Often, disk arrays are the chosen form of parallelism

Optimization in the real world today  Languages and queries are often not set-based  The code in a query can be very imperative  Binary operations might be in the minority  Databases can be so huge that optimization has to be heavily biased toward certain kinds of queries