Modern Performance - SQL Server

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Modern Performance - SQL Server Joe Chang yahoo.
Exadata Distinctives Brown Bag New features for tuning Oracle database applications.
Modern Performance - SQL Server
Statistics That Need Special Attention Joe Chang yahoo
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Modern Performance - SQL Server Joe Chang & SolidQ.
SQL Server Query Optimizer Cost Formulas Joe Chang
#SQLSatRiyadh Special Topics Joe Chang
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Insert, Update & Delete Performance Joe Chang
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
By Shanna Epstein IS 257 September 16, Cnet.com Provides information, tools, and advice to help customers decide what to buy and how to get the.
Query Optimizer Execution Plan Cost Model Joe Chang
Session 1 Module 1: Introduction to Data Integrity
MISSION CRITICAL COMPUTING Siebel Database Considerations.
Stored Procedure Optimization Preventing SP Time Out Delay Deadlocking More DiskReads By: Nix.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Query Processing – Implementing Set Operations and Joins Chap. 19.
How to kill SQL Server Performance Håkan Winther.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
sponsored by HP Enterprise
SQL Server Statistics and its relationship with Query Optimizer
Parameter Sniffing in SQL Server Stored Procedures
Tuning Transact-SQL Queries
Query Optimization Techniques
Execution Planning for Success
Flash Storage 101 Revolutionizing Databases
Stored Procedures – Facts and Myths
Query Tuning without Production Data
UFC #1433 In-Memory tables 2014 vs 2016
Very Large Databases in your future
Query Tuning without Production Data
Joe Chang yahoo . com qdpma.com
Building Modern Transaction Systems on SQL Server
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Software Architecture in Practice
Database Performance Tuning and Query Optimization
Upgrading to Microsoft SQL Server 2014
Introduction to Execution Plans
Instruction Level Parallelism and Superscalar Processors
Now where does THAT estimate come from?
Cardinality Estimator 2014/2016
Real world In-Memory OLTP
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Statistics: What are they and How do I use them
Very large Databases in your future Eric Peterson.
Joe Chang yahoo Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Shaving of Microseconds
In Memory OLTP Not Just for OLTP.
SQL Server Query Optimizer Cost Formulas
Introduction to Execution Plans
Chapter 11 Database Performance Tuning and Query Optimization
Diving into Query Execution Plans
Are you following SQL Server Development Best Practices?
SQL Server Query Design and Optimization Recommendations
A – Pre Join Indexes.
Introduction to Execution Plans
Sourav Mukherjee Are you following SQL Server Development Best Practices? March 30, 2019 Cincinnati.
Query Optimization Techniques
Introduction to Execution Plans
All about Indexes Gail Shaw.
Presentation transcript:

Modern Performance - SQL Server Joe Chang www.qdpma.com Jchang6 @ yahoo

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See http://www.qdpma.com/ Download: http://www.qdpma.com/ExecStatsZip.html Blog: http://sqlblog.com/blogs/joe_chang/default.aspx

Overview General SQL Server Performance Why performance is still important today? Brute force? Yes, but … Special Topics – spectacular fails Automating data collections SQL Server Engine What developers/DBA need to know?

Not in this session List of rules to be followed blindly without consideration for the underlying reason and whether rule actually applies in the current circumstance DBA skill: cause and effect analysis & assessment

Common Themes? execution plan Single (execute) of large operation Very large (multiple order of magnitude) error in row estimate Single (execute) of large operation Might still be tolerable Multiple (executes) of large operations

CPU & Memory 2001 versus 2014 DMI 2 PCI-E PCI-E PCI-E FSB P L2 MCH QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI QPI QPI QPI 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP 2002-4 QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI DMI 2 PCI-E PCI-E PCI-E Each core today is more than 10x over Pentium III (700MHz?) Xeon E7 v2 (Ivy Bridge), 15 cores, 3 QPI 4 x 15 = 60 cores 3TB (96 x 32GB) 24 DIMMs per socket 40 PCI-E gen3 lanes + x4 g2 / socket PCH DMI x4 MC GFX Mem___2013 __ 2014 16GB __ $191 __ $180 32GB __ $794 __ $650 64GB _____ __ $4510

CPU & Memory 2001 versus 2012 DMI 2 PCI-E PCI-E PCI-E P P P P MI PCI-E C1 C6 C2 C5 C3 C4 LLC QPI C7 C0 MI PCI-E C1 C6 C2 C5 C3 C4 LLC QPI C7 C0 L2 QPI FSB MCH QPI QPI 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP 2002-4 MI PCI-E C1 C6 C2 C5 C3 C4 LLC QPI C7 C0 MI PCI-E C1 C6 C2 C5 C3 C4 LLC QPI C7 C0 QPI DMI 2 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E Each core today is more than 10x over Pentium III (700MHz?) Xeon E5 (Sandy Bridge), 8 cores, 2 QPI 4 x 8 = 32 cores total Westmere-EX 1TB (64x16GB) (3 QPI) Sandy Bridge E5: 768GB (48 x 16GB) (2 QPI) Mem___2013 __ 2014 16GB __ $191 __ $180 32GB __ $794 __ $650 64GB _____ __ $4510

Intel E5 & E7 v2 (Ivy-Bridge) PCH DMI x4 MC GFX

Processor – Core

Microprocessor Pipeline 3GHz 0.33ns clock BP IF ID RAT ROB Sch Exec Flags 1st Retire BP IF ID RAT ROB Sch Exec Flags 2nd Retire 5 ns from start to finish 200MHz BP Microprocessor (core) is (multi-lane) assembly line Each core is superscalar Processor (socket) has multiple cores System has multiple sockets Old compiler optimization strategies now completely obsolete Intel Processor Architecture January 2013 Software & Services Group, Developer Products Division Local access 60ns, remote 90ns Branch Predict Instruction Fetch Decode Register Allocate & Rename Re-Ordering Buffer Schedule Execute Flags Retire

Micro-architecture Sandy-Bridge

Haswell (Xeon E5/7 v3)

CPU Access Times Core – 3.33GHz 1 CPU cycle = 0.3ns L1 cache – 4 CPU clocks (1ns) L2 cache 12 CPU cycles (4ns?) L3 cache 29+ cycles Local node memory 28 cycles + 49 ns (open page) 28 cycles + 56 ns (random page) Remote node (1-hop) memory 28 + 100ns 2-hop 150-300ns+? Logical 0 Logical 1 L1 I L1 D L2 Unified L3 Slice DRAM

Latency Orders of Magnitude Core – 3.33GHz 1 CPU cycle = 0.3ns L1 cache – 4 CPU clocks (1ns) L2 cache 12 CPU cycles (4ns?) L3 cache 29+ cycles Local node memory 28 cycles + 49 ns (open page) 28 cycles + 56 ns (random page) Remote node (1-hop) memory 28 + 100ns 2-hop 150-300ns+? Core L1 Cache L1 Cache LLC http://www.7-cpu.com/cpu/SandyBridge.html https://software.intel.com/en-us/forums/topic/287236 https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf Core i7 Xeon 5500 Series Data Source Latency (approximate) L1 CACHE hit, ~4 cycles L2 CACHE hit, ~10 cycles L3 CACHE hit, line unshared ~40 cycles L3 CACHE hit, shared line in another core ~65 cycles L3 CACHE hit, modified in another core ~75 cycles remote L3 CACHE ~100-300 cycles Local Dram ~60 ns Remote Dram ~100 ns GFX MC x4 x4 x4 x4 DMI PCH

Westmere-EX 8-Socket System QPI QPI QPI QPI Large server systems are very complicated Software developed without consideration for system architecture will likely have severe problems This applies to the OS, SQL Server and the application C4 LLC C5 C4 LLC C5 C3 C6 C3 C6 C2 C7 C2 C7 IOH 0 C1 C8 C1 C8 IOH 1 C0 C9 C0 C9 QPI QPI MC MC MC MC QPI QPI SMB QPI QPI QPI QPI C4 LLC C5 C4 LLC C5 C3 C6 C3 C6 C2 C7 C2 C7 QPI C1 C8 C1 C8 C0 C9 C0 C9 MC MC MC MC SMB QPI QPI QPI QPI C4 LLC C5 C4 LLC C5 C3 C6 C3 C6 C2 C7 QPI C2 C7 C1 C8 C1 C8 C0 C9 C0 C9 MC MC MC MC QPI QPI QPI QPI QPI QPI IOH 2 C4 LLC C5 C4 LLC C5 QPI IOH 3 C3 C6 C3 C6 C2 C7 C2 C7 PCI-E x8 PCI-E x8 PCI-E x8 PCI-E x8 PCI-E x4 ESI C1 C8 C1 C8 PCH C0 C9 C0 C9 MC MC MC MC

Storage 2001 versus 2012/13 QPI 192 GB MCH QPI PCIe x8 PCIe x4 IB RAID 10GbE HDD SSD PCI PCI PCI PCI RAID RAID RAID RAID HDD HDD 2001 100 x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) 2013 64 SSDs, >10K+ IOPS each, 1M IOPS total possible 10-20GB/s+ IO Bandwidth easy 6.4GB/s on each PCIe G3 x8 SAN vendors – questionable BW http://www.qdpma.com/Storage/Storage2013.html http://www.qdpma.com/ppt/Storage_2013.pptx

SAN Auto-tier pools SSD 10K 7.2K Hot Spares 8 Gb FC x4 SAS 2GB/s or Node 1 Node 2 768 GB Node 1 Node 2 1024 GB 1024 GB SSD 10K 7.2K Hot Spares Auto-tier pools Switch SP A SP B 8 Gb FC x4 SAS 2GB/s 24 GB HBA PCIe or 10Gb FCOE 0.8 GB/s x8 x8 x8 x8 x8 x8 x8 x8 SSD SSD SSD SSD Switch Switch 8 Gb FC SP A SP B 24 GB 24 GB x4 SAS 2GB/s Data 1 Data 2 Data 3 Data 4 SAN – pools comprised of multiple RAID groups Volumes created from pool, containing a slice from each RAID group Data 5 Data 6 Data 7 Data 8 Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Data 16 SSD 1 SSD 2 SSD 3 SSD 4 Log 1 Log 2 Log 3 Log 4 http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx

Performance Past, Present, Future When will servers be so powerful that … Been saying this for a long time Today – 10 to 100X overkill 32-cores in 2012, 60-cores in 2014 Enough memory that IO is only sporadic Unlimited IOPS with SSD What can go wrong? Today’s topic

SQL Performance SQL Tables natural keys Indexes Execution Plan Statistics & Compile parameters Compile Row estimate propagation errors Storage Engine Hardware DOP Memory Parallel plans Recompile temp table / table variable Query Optimizer Index & Stats Maintenance API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages Tables and SQL combined implement business logic Natural keys with unique indexes, not SQL Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Plan cache bloat? The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well

Factors to Consider SQL Tables Indexes Query Optimizer Storage Engine Statistics Query Optimizer Compile Parameters Storage Engine DOP memory Hardware

Special Topics Data type mismatch Multiple Optional Search Arguments (SARG) Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) OR, AND/OR combinations IN/NOT IN, EXISTS Complex Query with sub-expressions Parallel Execution Not in order of priority http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

1a. Data type mismatch DECLARE @name nvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER WHERE C_NAME = @name Table column is varchar Parameter/variable is nvarchar SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name) .NET auto-parameter discovery? Unable to use index seek

1b. Type Mismatch – Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%' SELECT * FROM CUSTOMER WHERE C_NAME LIKE N’Customer#00000276%' Row estimate error could have severe consequences in a complex query

SELECT TOP + Row Estimate Error SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1,1000064,1000269) AND EXISTS (   SELECT [DocumentBatch].[BatchArtifactID]   FROM [DocumentBatch] (NOLOCK)   INNER JOIN [Batch] (NOLOCK)   ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID]   WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID]   AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID] Data type mismatch – results in estimate rows high Top clause – easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows http://www.qdpma.com/CBO/Relativity.html

Multiple Optional SARG

2. Multiple Optional SARG DECLARE @Orderkey int, @Partkey int = 1 SELECT * FROM LINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)

IF block DECLARE @Orderkey int, @Partkey int = 1 IF (@Orderkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) ELSE IF (@Partkey IS NOT NULL) WHERE (L_PARTKEY = @Partkey) These are actually the stored procedure parameters Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint

Dynamically Built Parameterized SQL DECLARE @Orderkey int, @Partkey int = 1 , @SQL nvarchar(500), @Param nvarchar(100) SELECT @SQL = N‘/* Comment */ SELECT * FROM LINEITEM WHERE 1=1‘ , @Param = N'@Orderkey int, @Partkey int' IF (@Orderkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey' IF (@Partkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey' PRINT @SQL exec sp_executesql @SQL, @Param, @Orderkey, @Partkey IF block is easier for few options Dynamically built parameterized SQL better for many options Consider /*comment*/ to help identify source of SQL

2b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN '1995-01-01' AND '1995-01-31' DECLARE @Startdate date, @Days int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd,1,@Startdate)

Estimated versus Actual Plan - rows Estimated Plan – 1 row??? Actual Plan – actual rows 77,356

3 Parameter Sniffing -- first call, procedure compiles with these parameters exec p_Report @startdate = '2011-01-01', @enddate = '2011-12-31' -- subsequent calls, procedure executes with original plan exec p_Report @startdate = '2012-01-01', @enddate = '2012-01-07' Assuming date data type Need different execution plans for narrow and wide range Options: 1) OPTIMIZE FOR – one plan for all ranges 2) WITH RECOMPILE – compile on each execute 3) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers

STATISTICS

4 Statistics Auto-recompute points Sampling strategy How much to sample - theory? Random pages versus random rows Histogram Equal and Range Rows Out of bounds, value does not exist etc. Statistics Used by the Query Optimizer in SQL Server 2008 Eric N. Hanson and Yavor Angelov, Contributor: Lubor Kollar Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator Joseph Sack http://msdn.microsoft.com/en-us/library/dd535534.aspx

Statistics Structure Stored (mostly) in binary field Scalar values Density Vector – limit 30, half in NC, half Cluster key Histogram Up to 200 steps http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx

Statistics Auto/Re-Compute Automatically generated on query compile Recompute at 6 rows, 500, every 20%? Has this changed? 2008 R2 Trace 2371 – lower threshold auto recomputed for large tables http://support.microsoft.com/kb/2754171

Statistics Sampling Sampling theory SQL Server sampling True random sample Sample error - square root N Relative error 1/ N SQL Server sampling Random pages But always first and last page??? All rows in selected pages

Row Estimate Problems (at source) Skewed data distribution Out of bounds Value does not exist Row estimate errors at source – is classified under statistics topic

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname = @ServerName But CAESIUM does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

SqlPlan Compile Parameters <?xml version="1.0" encoding="utf-8"?> <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname = @ServerName" StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168"> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> </RelOp> <ParameterList> <ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence> </ShowPlanXML> Compile parameter values at bottom of sqlplan file

AND – OR, IN / NOT IN, EXISTS / NOT EXISTS combinations

5a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY = 184826

5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099

5a UNION (ALL) instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 UNION (ALL) FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates (with Sort operation) UNION ALL does not -- Hugo Kornelis trick --

5b AND/OR Combinations Hash Join is good method to process many rows Requirement is equality join condition AND/OR, IN NOT IN, EXISTS NOT EXISTS combinations Query optimizer may not be to determine that equality join condition exists Execution plan will use loop join, and attempt to force hash join will be rejected Re-write using UNION in place of OR And LEFT JOIN in place of NOT IN SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html

Complex QUERies

Complex Queries High Compile effort Many joins, Many indexes Estimated plan cost correlation Row estimation errors after multiple operations Row estimate errors at source – is classified under statistics topic

Complex Query with Sub-expression Query complexity – really high compile cost Repeating sub-expressions (including CTE) Must be evaluated multiple times Main Problem - Row estimate error propagation Solution/Strategy – Get a good execution plan Temp table when estimate is high, actual is low. When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work? More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

More Plan Details Query with joining 6 tables Each table has too many indexes Row estimate is high – plan cost is high Query optimizer tries really really hard to find better plan Actual rows is moderate, any plan works

Temp Table and Table Variable Forget what other people have said Most is cr@p Temp Tables – subject to statistics auto/re-compile Table variable – no statistics, assumes 1 row Question: In each specific case: does the statistics and recompile help or not? Yes: temp table No: table variable Is this still true?

Row Estimate Error after Join IO – synchronous when estimate rows is < 25, asynchronous when > 25

Row Estimate 2

Parallelism Designed for 1998 era Today – complex system – 32 cores Cost Threshold for Parallelism: default 5 Max Degree of Parallelism – instance level OPTION (MAXDOP n) – query level Today – complex system – 32 cores Plan cost 5 query might run in 10ms? Some queries at DOP 4 Others at DOP 16? Really need to rethink parallelism / NUMA strategies Number of concurrently running queries x DOP less than number of logical/physical processors? Tables with computed columns may inhibit parallelism? More on Parallelism: http://www.qdpma.com/CBO/ParallelismComments.html http://www.qdpma.com/CBO/ParallelismOnset.html Number of concurrently running queries x DOP less than number of logical/physical processors?

Parallel Execution – or not? Tables with computed columns using UDF prevent parallelism

Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

varchar(max) stored in lob pages Disk IO to lob pages is synchronous? Must access row to get 16 byte link? Feature request: index pointer to lob SQL PASS 2013 Understanding Data Files at the Byte Level Mark Rasmussen

legacy API Server Cursors / Cursor Stored Procedures sp_prepare / sp_prepexec, sp_execute, sp_unprepare sp_cursoropen, sp_cursorfetch, sp_cursorclose sp_cursorprepare / sp_cursorprepexec, sp_cursorexecute, sp_cursorunprepare Guess which is not called? Symptom: sp_reset_connection http://technet.microsoft.com/en-us/library/ms187088(v=sql.105).aspx API Server Cursors http://technet.microsoft.com/en-us/library/ms187801(v=sql.120).aspx Cursor Stored Procedures

Summary Hardware today is really powerful Storage may not be – SAN vendor disconnect Standard performance practice Top resource consumers, index usage But also Look for serious blunders http://www.qdpma.com/CBO/Relativity.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html http://www.qdpma.com/CBO/Relativity.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx Kevin Boles – Common TSQL Mistakes

Thank you to our sponsors

Special Topics Data type mismatch Multiple Optional Search Arguments (SARG) Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query Parallel Execution

SQL Server Edition Strategies Enterprise Edition – per core licensing costs Old system strategy 4 (or 2)-socket server, top processor, max memory Today: How many cores are necessary 2 socket system, max memory (16GB DIMMs) Is standard edition adequate Low cost, but many important features disabled BI edition – 16 cores Limited to 64GB for SQL Server process

New Features in SQL Server 2005 Index included columns Filtered index CLR 2008 Partitioning Compression 2012 Column store (non-clustered) 2014 Column store clustered Hekaton

General Performance General Performance

SQL Performance General Client-side architecture Connection pooling stored procedures versus SQL, parameterized Database Architecture Cluster key, primary key, natural keys, foreign keys SQL – Indexing Indexes & Statistics Maintenance

Client-side Architecture Connection pooling: Connection.Open, Execute, Connection.Close Sp_reset_connection Stored procedures – parameterized SQL Stored procedure name is short Parameterized SQL may not be Larger than 1 Ethernet packet? 2?, 8?

Database Architecture Normalization Cluster key Primary Key & other unique / natural keys Foreign keys

Principles Testing Data Server Storage Network

CPU & Memory 2001 versus 2014x DMI 2 PCI-E FSB P L2 MCH QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI QPI QPI QPI 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP 2002-4 QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 QPI DMI 2 PCI-E PCI-E PCI-E Each core today is more than 10x over Pentium III (700MHz?) Xeon E7 v2 (Ivy Bridge, 3 QPI) 4 x 15 = 60 cores 3TB (96 x 32GB) 24 DIMMs per socket 40 PCI-E gen3 lanes + x4 g2 / socket PCH DMI x4 MC GFX Mem___2013 __ 2014 16GB __ $191 __ $180 32GB __ $794 __ $650

Work in progress C1 C6 C2 C5 C3 C4 LLC C7 C0 C3 LLC C4 C2 C5 C1 C6 C0 DMI 2 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E QPI PCI-E QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC 11 12 13 10 14 C4 LLC C5 C3 C6 C2 C7 C1 C8 MI MI MI PCI-E C1 C6 C2 C5 C3 C4 LLC QPI C7 C0 QPI PCI-E QPI MI PCI-E C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC B C D E C3 LLC C4 C2 C5 C1 C6 C0 C7 MI MI