Query Optimizer Overview Conor Cunningham Principal Architect, SQL Server Query Processor Representing Microsoft Serbia Development Center 1.

Slides:



Advertisements
Similar presentations
Understanding SQL Server Query Execution Plans
Advertisements

Query Optimizer Overview
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
10 Things Not To Do With SQL SQLBits 7. Some things you shouldn’t do.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Update Queries Deep Dive Conor Cunningham, Principal Software Architect, SQL QP Team, Microsoft.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Database Management 9. course. Execution of queries.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Columnstore Indexes in SQL Server 2012 Conor Cunningham Principal Architect, Microsoft SQL Server Representing Microsoft Development.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Dive into the Query Optimizer Dive into the Query Optimizer: Undocumented Insight Benjamin Nevarez Blog: benjaminnevarez.com
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Da li su kvalitetna SharePoint rješenja samo mit? Adis Jugo, PlanB. GmbH
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Query Optimizer Execution Plan Cost Model Joe Chang
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Mostafa Elhemali Leo Giakoumakis. Problem definition QRel system overview Case Study Conclusion 2.
Session 1 Module 1: Introduction to Data Integrity
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
How to kill SQL Server Performance Håkan Winther.
October 15-18, 2013 Charlotte, NC SQL Server Index Internals Tim Chapman Premier Field Engineer.
SQL Server Deep Dive Denis Reznik Data Architect at Intapp.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Proactive Index Design Using QUBE Lauri Pietarinen Courtesy of Tapio Lahdenmäki November 2010 IDUG 2010.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Chris Index Feng Shui Chris
Query Optimizer Overview
Database Performance Tuning and Query Optimization
Hustle and Bustle of SQL Pages
Introduction to Query Optimization
Chapter 15 QUERY EXECUTION.
Execution Plans Demystified
SQL Server Query Plans Journeyman and Beyond
Lab 2 and Merging Data (with SQL)
Chapter 12 Query Processing (1)
Chapter 11 Database Performance Tuning and Query Optimization
Indexing 4/11/2019.
Diving into Query Execution Plans
Presentation transcript:

Query Optimizer Overview Conor Cunningham Principal Architect, SQL Server Query Processor Representing Microsoft Serbia Development Center 1

Who Is This Guy? I work on the SQL Server Query Processor team –Most of my time was spent working on the Optimizer –13+ years at Microsoft in various positions My current role is “architect” –I’ve also been development lead, individual contributor, and I even went outside the company for awhile and played with startups. I also had my own SQL consulting business –I know a fair amount about a lot of areas and how they fit I spend a lot of time talking with customers to see how to improve the product 2

Administration Disclaimers –We change the QP each release. –That includes most of the topics covered in this talk –So, please understand that I am talking beyond the documented/supported portion of the product. –All of the information in this talk is “public”, but that does not mean “fully supported” as far as CSS is concerned Questions –If you are completely lost, ask a question right away. –If you have more specific questions that won’t apply to everyone, please hold them to the end – time is short and the topic is large –I promise to stay as long as necessary to answer your questions, even if I have to stand in the hallway to do it 3

Query Optimization In A Nutshell 4 SQL Statement Magic Happens Awesome Query Plan Commercial Query Optimizers do not try to find the “best” plan. Instead, they try to find a “good enough” plan quickly. Most of the time, SQL Server does this well. Most of the time… An hour isn’t enough time to completely learn this topic This talk is a practical overview of an academic topic I will show you enough details to hopefully ask the right questions when you are working on your next SQL plan problem

Next Level of Detail “Optimize” takes a valid query tree and generates a valid query plan If compilation takes awhile, this is almost always the place where it happens Different plans are considered and the one thought to be the best is returned Heuristics guide that search so keep the time as small as is reasonably possible. 5 SQLTree Query Plan ParseBind Bound Tree Optimize

What’s a Query Tree? The output query plan is a tree Inside the optimizer, there are logical and physical trees. Logical Tree Concepts: –JOIN, WHERE, GROUP BY Physical Tree Concepts: –Hash/Loop/Merge Join, Filter, … 6 BA JOIN (A.a = B.b) WHERE (A.c = 5) GROUP BY (B.d)

Another Level Deeper… 7 (This is simplified) We perform operations in a specific order to reduce complexity and runtime cost “Simplification” scrubs the tree based on rules into a shape we prefer. (Predicate Pushdown, Contradiction Detection) Cardinality and Costing provide ranking for each plan “Exploration” examines many plans at once to find the least-cost plan

Properties The Optimizer actually derives a LOT of information over each node in the tree We call these “Properties”. This is done over both logical and physical trees –Example Logical Property: Key columns, valid ranges for each column used in a query –Example Physical Property: Sort columns –There are properties for almost everything of interest (partitioning, distributed query, Halloween protection) Properties help drive which transformations are considered to search the set of possible plans 8

Contradiction Detection Example SQL derives domain constraint properties on each column in each query at each node When you do a join, we do a join of those sets We can detect when conditions are always false and then convert that whole tree to a zero-row fake table. (Note – no row locks are needed to run this query) 9

Rules Plans are explored using “Rules” that transform the tree from one form to another Rules match patterns in the tree and create new patterns. Patterns describe top-down tree fragments but not complete query trees Rules can go logical -> logical or logical -> physical We have many, many rules 10 ba join ab Join Commute ab Hash Join Join to Hash Join ab join

Memo The Optimizer does not fully materialize every single alternative. The Memo quickly identifies and stores equivalent sub-trees –Stored in “groups” –Groups point to other groups so we can share trees –The top-down rule framework makes this “easy” to do This allows us to search many more plans in the same amount of time/space 11

Encoding of alternatives 12 0 – 1 join 4 1 – 2 join 3 2 – {a} 3 – {b} 4 – {c} table of alternatives (MEMO) insert join ba c input tree SELECT * FROM A INNER JOIN B ON A.a=B.b INNER JOIN C on A.a = C.c;

Example Rule Transformation 13 0 – 1 join 4 1 – 2 join 3 2 – {a} 3 – {b} 4 – {c} table of alternatives 0 – 1 join 4; 4 join 1 1 – 2 join 3 2 – {a} 3 – {b} 4 – {c} extract insert join 14 transform join 41

Cardinality Estimation Statistics objects + table cardinalities feed the cardinality estimation process. They include: –Histograms –Multi-column frequencies/densities –Trie trees for LIKE estimation Cardinality derived “bottom-up” for each operator There are many very hard problems in this space –Example: cross-table predicates with highly correlated data Simplifying assumptions –Statistical independence –Uniform data distributions within ranges –Containment – if you query something in a range, it is likely in that range. Try SET STATISTICS PROFILE ON – I start with this for most query performance issues to determine if the cardinality estimates are good 14

Costing Based on Cardinality Estimation (so, that needs to be right for this to make sense) We do costing for each _physical_ tree Things we include in costing: –How much IO (sequential, random), CPU, space in memory for buffer pages –DOP/Partitioning data Things we do NOT include in costing: –CPU Speed, IO Subsystem Speed The costing formulas were originally the # of seconds on Nick’s machine Assumptions: –We didn’t cost it separately for each installation to simply debugging queries –Cold buffer pool cache – we assume rows are not in the buffer pool when the query starts –Uniformity – IO from a series of seeks are distributed randomly across a B-Tree 15

Dynamic Search/Stages The optimizer does fancy things to reduce compilation time 1.We stop looking at alternatives that we know cost more than our best solution found so far 2.We split rules into groups based on cost vs. reward (Expensive or domain-specific rules are run in later stages). We end optimization if we’ve found a “good enough” plan 3.We limit the optimizer to stop optimization after a certain number of rules (You can see some of this information in XML Showplan) 16

Index Matching SQL converts predicates into Index operations (Seeks). These predicates are called “SARGable” Some predicates can’t be converted – Non-SARGable Some Non-SARG predicates are evaluated within the Storage Engine for Speed. (“Pushed Non-SARG Predicates”) –This optimization reduces instruction counts by evaluating some expressions within the Storage Engine while holding the latch on the page –Only “cheap” expressions can be done this way We do rewrite rules to move operations as close as possible towards the leaves of the tree to allow index matching Examples –SARGable: WHERE col = 5 –Non-SARGable: WHERE convert(col, string) = ‘5’ 17

More Index Matching Just indexing is usually insufficient – you also want to evaluate whether an index is “covering”. Example –CREATE INDEX i1 on T(col1) –SELECT col1, col2 FROM T –(Index i1 is not covering  scan) Non-covering indexes typically need to do fetches to the base table (random IO is slow) SQL 2005 added INCLUDE clause to allow you to add columns to make indexes covering To avoid/reduce this cost, the Optimizer considers strategies like this when picking indexes for plans: –1. Seek the Clustered Index –2. Seek NC1 and join with other NCIs –3. Do 2 and then fetch back to the Heap/CI to get the rest of the needed columns (so, fetch last) Missing Index DMVs can help suggest covering indexes (best for repeated OLTP-like workloads) 18

Index Matching Examples 19 First Example – Basic SARGable Seek + Fetch to get columns from Heap Second case – NOT SARGable – notice the pushed predicate  Scan

Parallel Queries Considered for “expensive” queries (> 5 cost units) Implemented as a property (top-down requirement in our engine) Considered in later optimization phases –We generate the best serial and best parallel plan and take lowest cost result in 2 nd Stage Not all operators support parallelism –Ex: GROUP BY supports parallelism only on GB cols –Not all scalar functions support parallelism If you see that a query is not parallel that should be, try compiling it in slight variations until you find out if any of the predicates are causing the plan to go non-parallel. 20

Update Processing Updates do: 1.Read/Qualify source data 2.Compute new values (ComputeScalar) 3.Apply Changes There are two patterns of Update Plan: 1.Narrow/Per-Row (apply changes for a row to each index, then do next row) 2.Wide/Per-Index (apply all changes to each index, then do the next index) Narrow works better for small updates. Wide is the general form. (There are physical optimizations on each model so you might not see only two plans) 21

Updates For “Wide” (Per-Index) Plans, we often use this pattern: 1.Split Updates into Delete, Insert 2.Sort by key, operation code 3.Collapse the result to remove no-ops This shape is useful because it: Puts changes in the order of each index (sequential IO instead of random IO) Can find cases where some changes are not needed and (greatly) reduce the IO + Log needed to complete a query I cover Update plan generation in my book chapter (SQL 2008 Internals) – that’s probably the best place to learn about this. 22

Where to Learn More SQL Server 2008 Internals – Query Optimizer chapterSQL Server 2008 Internals – Query Optimizer chapter –I wrote a chapter on the Optimizer, but there are lots of great things in that book from the other authorsI wrote a chapter on the Optimizer, but there are lots of great things in that book from the other authors My Blog: – “Conor vs. SQL” “Conor vs. SQL” 23

Conclusion Thank you very much for attending the talk Any questions? 24

Microsoft and Open Source gateway for deeper exploration of open source engagements openness Port25 blogs from the platform community and the OSS Lab teams Codeplex resources for developers and consumers of open source projects Interoperability Bridges technical collaborative works bridges.com Open Up cross-Industry Interoperability and Standards activities interop/openup Shared Source portal for programmatically sharing code sharedsource OData open source starter kit for Internet publishing of Government datasets using the Open Data Open Spec protocols, file formats, standards, technical specifications openspecifications BizSpark Program for Start-Up companies from both commercial and open source backgrounds bizspark Openness and How can I receive up-to-date Openness announcements from Microsoft? In addition to the websites above, you can receive regular updates to Microsoft’s openness, interoperability and standards efforts via the following channels:

Help us choose the best Sinergija lecturer! Telekom Srbija and Microsoft will award you – at the conference end, we’ll give one HTC Mozart WP7 phone to someone from the audience – randomly. Go to log in and cast your votes. Please rate this lecture and WIN HTC MOZART! You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Please use computers at the front of this room, or rate lecture from your phone or home computer, at Sinergija portal. This prize contest will end at Thursday, October 20 th at 9 PM. Winner will be announced at the official Sinergija web portal, is a friend of Sinergija 2011 Conference and Imagine Cup student competition in Serbia.