Execution plans (300) Tomaž Kaštrun Spar ICS GmbH, Spar Slovenija

Execution plans (300) Tomaž Kaštrun Spar ICS GmbH, Spar Slovenija
@tomaz_tsql

Agenda Basics of Query execution (warm-up Demo I)
Execution plan overview (Demo II) Plan cache (Demo III)

Basics of Query execution
Is SQL Server a black box? Several steps on internal execution of query 1) Parsing all SQL Statements Checks for typos, valid Handles error messaginging Breaks down into logical units, keywords, expressions, operators and identifiers 2) SQL Server creates Execution tree Creates structure for query optimization; MSSQL verifies that the tables and columns exists Perform data conversion (implicit), replace views with definitions, Performs simple syntax-based optimization SQL Server creates milions of permutations of execution plans and checks which valid plan is the best for this query 3) SQL Server Optimizes a query Perform trivial optimization (it is a „how-to“ join the tables, do some aggregations, order the data and retrieve the data) Perform more sytactical transformation (if needed, perform full cost-based optimization (CBO)) Actual execution plan is generated here

Is SQL Server a black box? Several steps on internal execution of query 4) Cache execution plan SQL Server selects a execution plan (is estimates the cost of each of the plans and selects the plan with the lowest cost) It has to do a cost-based balancing act, considering both the cost of finding the potential plan and the cost of the plans themselves (hence it is the biggest impact on the performance of your db) Plan is stored in execution plan cache – it is a memory based SQL Server component for storing plans If the plan is already available in cache, SQL Server can reuse it (instead of calculating it again) 5) Memory allocation SQL server must allocate query memory for selected plan Based on (up-to-date) statistics of the query, SQL Server grants the memory for usage Statistics might be ouf-of-date, data distribution might be uneven for a particular tabel and SQL Server can allocate wrong amount of the memory In case of wrong memory allocation (e.g.: sort operators, hash operators,…) SQL Server must spread over to temp db. 6) Execution Huh…this was a loooooong way Yeeeeh!  At last!

Basics of Query execution (simplified)
Query is submitted When passing query to SQL Server, it goes to relational engine (relational engine is query processor with components helping to determine what query needs to do and how to do this). Query is parsed In relational engine, query goes through the process that checks that the T-SQL is written correctly, that it is well formed). Output of this process Parser process is a parse tree (or query tree (even called a sequence tree)). A parse tree represents the logical steps necessary to execute the requested query. * If query is not DML statement , but DDL, it will no be optimized, because there is only one right way for SQL Server to define an object, hence there are no opportunities for improvement. Algebrizer A parse tree is passed on to query algebrizer. Algebrizer resolves the names of objects, tables and columns. It identifies also the individual column level such as data types, locations of aggregation (SUM, COUNT, GROUP BY,..). Algebrizer outputs a binary called query process tree, which is sent to query optimizer. Output includes a hash coded value representing a query. !!!! If the plan is available, process stops here and existing plan is selected. Query optimizer

Logical query processing (thanks Itzik Ben-Gan)
My Query: (8) SELECT (9) DISTINCT (11) <TOP_specification> <select_list> (1) FROM <left_table> (3) <join_type> JOIN <right_table> (2) ON <join_condition> (4) WHERE <where_condition> (5) GROUP BY <group_by_list> (6) WITH {CUBE | ROLLUP} (7) HAVING <having_condition> (10) ORDER BY <order_by_list> Actual query processing: 1. FROM 2. ON 3. OUTER (join) 4. WHERE 5. GROUP BY 6. CUBE | ROLLUP 7. HAVING 8. SELECT 9. DISTINCT 10. ORDER BY 11. TOP Source: Inside Microsoft SQL Server™ 2005 T-SQL Querying, Itzik Ben Gan

Query optimization Execution plans and cost-based optimization Optimization phases Index and distribution statistics Join selection (three types: nested loop, hash and merge join) On clause (with respect to indexes) Where clause (with respect to distribution statistics)

Plan re-usage „On going“ debate: - to use stored procedures
Execution plan is cached Better security, no SQL Injections Seldom recompilation of plans - ad-hoc queries Execution plan is compiled every time Weaker security, with possible SQL Injections Less prone to errors

Plans, plans, plans….what are they?
- Is a best-cost strategy by the optimizer how to access (and manipulate – in case od DML) data Is SQL server smarter than developer? Or is developer smarter when using QUERY HINTS? - Optimizer‘s key decisions are made here (refer to logical query processing) How to perform JOIN, ON operations and what is correct table order Which data should be aggregated (SUM, COUNT, DISTINCT, COUNT) and which ordered (ORDER BY) Which are the proper indexes to use AND!!! Can cached plan be reused?

Execution plan overview
- Understanding execution plans is important! Why? Insight into SQL internals If we don‘t know what SQL Server is doing, we don‘t know what is wrong with our query Troubleshooting the query performance -> tuning query performance Understanding execution and processing strategy Why CPU is high? Why my I/O is high? -> is it hardware problem or you need to refactor your code

Estimated vs. Actual execution plan
Estimated execution plan Is the output of the optimizer Operations and steps within the plan are logical steps (they represent how optimizer is seeing the plan and don‘t represent what physically occurred) Using the statistics for estimation Actual execution plan Is the plan of actual query execution It shows data representing what actually happened to the data Using actual data for plan

Estimated vs. Actual execution plan
When can estimated and actual plan differ? Most of the time they will be the same! But can differ: When statistics on the table/index are out of date When estimated execution plan is not valid any longer due to recompiles (lazy writer) and it will delete and create new one Any other changes that occur as the storage engine process the query. Plus side: Estimated plans don‘t access data and can be useful for large and complex queries.

Type of execution plans
Text (depricated) XML Graphical (SSMS built-in, based on XML file)

Plan reuse Generating execution plans is expensive
Algebrizer process creates hash of the query (like fingerprint or like signature) With this hash sql server compares hash queries in cache and if query exists in cache that matches the query coming into the engine, the cost of optimization process is skipped and execution plan is reused.

Plan reuse Good practices
Best practice is to write query in such way that SQL Server can reuse it‘s plan Hence stored procedures or parameterized queries are the best If variables are hard-coded, the smallest change to the string that defines a query can cause a cache miss and new plan is created. SQL server doesn‘t keep execution plan in cache for ever.

Lazywriter Formula for slowly aging execution plans is calculated with
internal process named lazywriter Plan of cost X is referenced Y times. When X*Y value is higher than Z, plan is considered „aged“ and hence removed from cache.

Reasons for plan recompilation
Changing stucture of table or schema used by query Changing index used by query Dropping/rebuilding index used by query Updating the statistics used by query Large UPDATEs, large INSERTs Causing deferred recompile due to mixing DDL and DML in single query Changing SET options within the execution of the query Changing stucture or schema of temporary table Changing dynamic views, cursor options Parameters sniffing …

Reading Execution plans
SHOWPLAN permission needs to be granted Execution plan is series of sequential operators Are executed one after another Are read from top to bottom and right to left.

Demo 1

Operator Properties Physical Operation - the physical operation for this part of the execution plan, such as joins, seeks, scans... Logical Operation - the logical operation for this part of the execution plan Actual Number of Rows - The actual number of rows if the query was run. Estimated I/O Cost - these are relative values used for presenting whether a given operation is I/O intensive. The Query Optimizer assigns these values during parsing and they serve only as a comparative tool to aid in determining where the costs of a given operation lie. The larger the value, the more cost-intensive the process. Estimated CPU Cost - these are relative values used for presenting whether a given operation is CPU intensive. The Query Optimizer assigns these values during parsing and they serve only as a comparative tool to aid in determining where the costs of a given operation lie. The larger the value, the more cost-intensive the process. Estimated Operator Cost - This is the summation of the I/O and CPU estimated costs. This is the value that is also presented under each operation icon in the graphical execution plan. Estimated Subtree Cost - This is the total of this operation's cost as well as all other operations that preceded it in the query to this point. Estimated Number of Rows - This value is derived based upon the statistics available to the Query Optimizer at the time the execution plan is drafted. The more current (and the larger the sampling size of the statistics) the more accurate this metric and others will be when compared to the actual data. Estimated Row Size - Also based upon the statistics available at the time of parsing, this value corresponds to how wide the Query Optimizer believes the affected rows to be. The same rule applies to statistics here as well as with the Estimated Number of Rows - the more current and descriptive the data set used to generate the statistics - the more accurate this value will be in comparison to the actual data. Good stats also lead to better (more accurate) decisions made by the Query Optimizer when actually processing your queries. Ordered - is a Boolean value signifying whether the rows are ordered in the operation. NodeID - Is the ordinal value associated with this particular operation in the query execution plan.

Common Operators Estimated number of rows Estimated row size
Physical operation Logical operation Estimated I/O cost Estimated CPU cost Estimated operator cost Estimated subtree cost

Common Operators Data Retrieval operators Join operators
Table scan Index scan (clustered, nonclustered) Index seek (clustered, nonclustered) Join operators Merge join (must have sorted input) Hash join (internal hash table) Nested loop join (outer join, inner join) Aggregation operators (group by) Stream aggregate (when data is presorted) Hash aggregate (data are not presorted, SQL Server is granted memory for working with temporary tables) Lookup Operators Key Lookup RID Lookup Others: Compute scalar Filter Sort Top Source:

Joins - Nested loop join Data are not sorted in any of input tables
SQL server make outer loop, with result it makes inner join „RBAR“ operator Makes output from one table and triggers another RBAR operator Very inefficient when outer table is large table (and inner table has only a few records)!

Joins - Merge join Data are pre-sorted in both of input tables (data are presorted by using index, or SQL Server is using explicit sort operator ) SQL server reads each record in merge operation just once (in comparison to nested loop join) Efficient join

Joins - Hash join Needs at least one equijoin predicate (is condition used in join predicate to compare values from one table to another) Reads rows from one input, hashes the rows into in-memory hash table, repeats for second input and returns mathing rows Apply hash function, uses hash buckets build and probe phase Needs memory grant to build hash table Useful for largers tables (datawarehouse) with unindex columns Very efficient

Key Lookup Key lookup operator (there are two RID and Key) is required to get data from clustered index (or non-clustered index) but is not in a covering index. Covering index is a non-clustered index that contain all the needed columns from select list or join predicate Optimizer can not retrieve the rows in a single operation and it has to use a clustered key (or ID in heap) to return corresponding rows

RID Lookup RID (Row ID) lookup operator is heap equivalent to Key Lookup operator. When there is no clustered index and query must get all the data to satisfy the query, additional operation is required to get the data A type of bookmark lookup, which uses a row identifier to find the rows to return. Adds additional disk I/O –> two operations instead of a single operation

Compute scalar Computer scalar is an operator that evaluates an expression to procude a scalar value or a single defined value. It may be returned to the user or referenced in the query. Is result of calculation, string concatenation.

Filter Filter operator is applied to limit the output of the values in selected column By adding HAVING clause, FILTER operator is added to execution plan.

Sort Sort operator sort all incoming rows for selected column
By adding ORDER BY predicate, SORT operator is added to execution plan. Actual Rebinds, Actual Rewinds

Actual rebinds, actual rewinds
Sort operator is one of few to use actual (or estimated) rebinds and rewinds When sort operator occurs, following will happen: Init()method is called physical operator (sort) is initialized and any data structure can be set up GetNetx()method is called physical operator gets (receives) the rows of data to work on it. It can receive several or none Close()method is called physical operator has performed its job and is cleaned and shut down. Only a single close method is received

Rebind or rewind is a count of the numbers of times the init()method is called by operator (sort). A rebind occurs when one or more correlated parameters of the nested loop join change and the inner side (of nested loop) must be reevaluated A rewind occurs when none of correlated parameters change and the prior inner result set may be reused. Example: Rebind = 1; init() method was called one time on a sort operator that is not on the inner side of a loop join.

Following operators use rebinds and rewinds: Non-clustered Index Spool Row Count Spool Table Spool Sort Table-Valued Function Remote Query

Demo 2

Capturing Plans - Using SQL Management studio - SQL Profiler
When working on production When working with several concurrent sessions Tracing several components Storing to XML Additional cost on server!

Plan Cache - Why caching execution plans?
Because creation of the execution plans takes resources and time. Each unique query gets a „hash“ or „fingerprint“ or binary … it gets an execution plan Plans are stored in plan cache Cache is internal memory storage Available in: sys.dm_exec_cached_plans Performance questions

Sys.dm_exec_cached_plans
Structure of DMV: Refcounts – number of cache objects that are referencing this cache object Usecounts – number of the cache objects has been used (plan has been used). Size_in_bytes – number of bytes used by plan Memory_object_address – memory address of the cached entry Cache_objects – type of objects in the cache: Compiled plan – Complited execution plan Compiled plan Stub Parse tree - a plan stored for viewin Extended Proc CLR compiled Proc / Func Objtype – tye of object: Proc – stored procedure Prepared - Prepared statement Adhoc – Ad hoc query View Trigger Default UsrTab – User table SysTab – Sytem Table Rule Check – check constraint Plan_handle – identifies for the in-memory plan. Source:

Cache plans with plan handle
SELECT cp.refcounts ,cp.usecounts ,cp.objtype ,st.[dbid] ,st.objectid ,st.[text] ,qp.query_plan ,cp.plan_handle FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp; GO Using dm_exec_sql_text to retrieve T-SQL statement Using dm_exec_query_plan to retrieve an XML execution plan (can be graphically opened or stored in file) Requires VIEW SERVER STATE permission on the server!

Plan cache polluting For Example:
SELECT * FROM adventureworks2012.[dbo].[DatabaseLog] WHERE DatabaseLogID = 1; DatabaseLogID = 2; This generates two different Execution plans. Solution: Create parametrized query

Clearing CACHE Using: Or using:
DBCC FREEPROCCACHE; GO DBCC FREEPROCCACHE(0x D02F0D740EB ); It removes all (or a specific) plan from a plan cache. Or using: DBCC FREESYSTEMCACHE(‚ALL‘); It cleans all unused cache entries from all caches. SQL Server proactively cleans up unused cache in the background.

Cache aging Plan „age“ is calculated using Age (Lazywriter) and cost (cost of the plan) Each time plan is used, Lazywriter increments age by 1 Plan cache is cleaned periodically (and decreased by 1) Plan is removed from cache when: Memory manager requires more memory All available memory is currently in use Age = 0 Is not referenced by any query (session) IF all criteria are fulfilled, cache plan is deleted.

Demo 3

Execution plans (300) Tomaž Kaštrun Spar ICS GmbH, Spar Slovenija

Similar presentations

Presentation on theme: "Execution plans (300) Tomaž Kaštrun Spar ICS GmbH, Spar Slovenija"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Execution plans (300) Tomaž Kaštrun Spar ICS GmbH, Spar Slovenija

Similar presentations

Presentation on theme: "Execution plans (300) Tomaž Kaštrun Spar ICS GmbH, Spar Slovenija"— Presentation transcript:

Similar presentations

About project

Feedback