Statistics for beginners

Slides:



Advertisements
Similar presentations
Cardinality How many rows? Distribution How many distinct values? density How many rows for each distinct value? Used by optimizer A histogram 200 steps.
Advertisements

SQL Performance 2011/12 Joe Chang, SolidQ
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Working with SQL Server Database Objects
Dave Ballantyne Clear Sky SQL. ›Freelance Database Developer/Designer –Specializing in SQL Server for 15+ years ›SQLLunch –Lunchtime usergroup –London.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
Maciej Pilecki | Project Botticelli Ltd.. SELECT Bio FROM Speakers WHERE FullName=‘Maciej Pilecki’;  Microsoft Certified Trainer since 2001  SQL Server.
Chapter 4 Indexes. Index Architecture  By default data is inserted on a first-come, first-serve basis  Indexes bring order to this chaos  Once you.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Session 1 Module 1: Introduction to Data Integrity
Stored Procedure Optimization Preventing SP Time Out Delay Deadlocking More DiskReads By: Nix.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Pinal Dave Mentor | Solid Quality India |
Virtual techdays INDIA │ august 2010 Filtered Indexes – The unexplored index … Vinod Kumar M │ Microsoft India Technology Evangelist – DB and BI.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Module 6: Creating and Maintaining Indexes. Overview Creating Indexes Understanding Index Creation Options Maintaining Indexes Introducing Statistics.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SQL Server Magic Buttons! What are Trace Flags and why should I care? Steinar Andersen, SQL Service Nordic AB Thanks to Thomas Kejser for peer-reviewing.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
SQL Server Statistics and its relationship with Query Optimizer
Managing Tables, Data Integrity, Constraints by Adrienne Watt
Relational Database Design
Advanced SQL Programming for SQL Server 2008
Query Tuning without Production Data
Finding more space for your tight environment
Query Tuning without Production Data
Query Tuning without Production Data
Designing Database Solutions for SQL Server
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Introduction to Execution Plans
Chapter 15 QUERY EXECUTION.
Statistics And New Cardinality Estimator (CE)
Statistics for beginners
Now where does THAT estimate come from?
Cardinality Estimator 2014/2016
Query Optimization Statistics: The Driving Force Behind Good Performance G. Vern Rabe -
Statistics What are the chances
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Execution Plans Demystified
Statistics: What are they and How do I use them
Transactions, Locking and Query Optimisation
SQL Server 2016 Execution Plan Analysis Liviu Ieran
Hugo Kornelis Now where does THAT estimate come from? The nuts and bolts of cardinality estimation.
SQL Server Query Plans Journeyman and Beyond
Microsoft SQL Server 2014 for Oracle DBAs Module 7
Introduction To Structured Query Language (SQL)
Database systems Lecture 3 – SQL + CRUD
Database systems Lecture 6 – Indexes
Ascending Key Problem in SQL Server Large Tables
Statistics for beginners – In-Memory OLTP
Introduction To Structured Query Language (SQL)
Introduction to Execution Plans
Implementation of Relational Operations
“Magic numbers”, local variable and performance
Diving into Query Execution Plans
A – Pre Join Indexes.
Introduction to Execution Plans
T-SQL Basics: Coding for performance
Introduction to Execution Plans
Presentation transcript:

Statistics for beginners Lies, a blatant lie, statistics. Demystification of the statistics. Андрій Зробок azrobok@gmail.com

Agenda CREATING STATISTICS: UPDATING STATISTICS: USAGE SAMPLES: CREATE STATISTICS (FULLSCAN, SAMPLE NNN PERCENT) CREATE INDEX AUTO-CREATING: EXECUTE SQL-QUERY STATISTICS ON SEVERAL COLUMNS (TWO FOR EXAMPLE) UPDATING STATISTICS: Automatic updates Synchronous / Asynchronous Manual updates USAGE SAMPLES: WHERE COL_NAME = VALUE WHERE COL_NAME = VARIABLE (@ID) WHERE COL_NAME > VARIABLE (@ID) COMPUTED COLUMN FILTERED STATISTICS (SEVERAL COLUMNS) LIKE ‘%VAR%’ 2 | 11/6/2018 | Statistics for beginners

Test data database: AdventureWorks2012 table: Person.Address table: dbo.Address 3 | 11/6/2018 | Statistics for beginners

Optimizer Server Query Optimizer - cost-based optimizer Cardinality estimation – number of record, will be returned Selectivity – percentage of rows from input that satisfy a predicate Memory incorrect cardinality and cost estimation inefficient plans negative impact on the performance Quality of the execution plans = accuracy of cost estimations 4 | 11/6/2018 | Statistics for beginners

CE: model assumptions SQL Server’s CE component makes certain assumptions based on typical customer database designs, data distributions, and query patterns. The core assumptions are: Independence: Data distributions on different columns are independent unless correlation information is available. Uniformity: Within each statistics object histogram step, distinct values are evenly spread and each value has the same frequency. Containment: If something is being searched for, it is assumed that it actually exists. For a join predicate involving an equijoin for two tables, it is assumed that distinct join column values from one side of the join will exist on the other side of the join. In addition, the smaller range of distinct values is assumed to be contained in the larger range. Inclusion: For filter predicates involving a column-equal-constant expression, the constant is assumed to actually exist for the associated column. If a corresponding histogram step is non- empty, one of the step’s distinct values is assumed to match the value from the predicate. Given the vast potential for variations in data distribution, volume and query patterns, there are circumstances where the model assumptions are not applicable. 5 | 11/6/2018 | Statistics for beginners

CE: under \ over estimating Under estimating rows can lead to memory spills to disk, for example, where not enough memory was requested for sort or hash operations. Under estimating rows can also result in: The selection of serial plan when parallelism would have been more optimal. Inappropriate join strategies. Inefficient index selection and navigation strategies. Inversely, over estimating rows can lead to: Selection of a parallel plan when a serial plan might be more optimal. Inappropriate join strategy selection. Inefficient index navigation strategies (scan versus seek). Inflated memory grants. Wasted memory and unnecessarily throttled concurrency. Improving the accuracy of row estimates can improve the quality of the query execution plan and, as a result, improve the performance of the query. 6 | 11/6/2018 | Statistics for beginners

Creating / updating statistics: SETs 7 | 11/6/2018 | Statistics for beginners

Creating statistics: Test table USE [AdventureWorks2012] GO SET ANSI_NULLS ON SET QUOTED_IDENTIFIER ON CREATE TABLE [dbo].[Address]( [AddressID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, [AddressLine1] [nvarchar](60) NOT NULL, [AddressLine2] [nvarchar](60) NULL, [City] [nvarchar](30) NOT NULL, [StateProvinceID] [int] NOT NULL, [PostalCode] [nvarchar](15) NOT NULL, [SpatialLocation] [geography] NULL, [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED ( [AddressID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] ALTER TABLE [dbo].[Address] ADD CONSTRAINT [DF_dbo_Address_rowguid] DEFAULT (newid()) FOR [rowguid] ALTER TABLE [dbo].[Address] ADD CONSTRAINT [DF_dbo_Address_ModifiedDate] DEFAULT (getdate()) FOR [ModifiedDate] 8 | 11/6/2018 | Statistics for beginners

Creating statistics: data loading set nocount on declare @i table (i int), @j int = 1 while @j <200 begin insert into @i (i) values (@j) set @j =@j +1 end INSERT INTO [dbo].[Address] ([AddressLine1] ,[AddressLine2] ,[City] ,[StateProvinceID] ,[PostalCode] ,[SpatialLocation] ) SELECT [AddressLine1] ,[StateProvinceID] +i FROM [Person].[Address], @i Statistics are updated (created) when need (not immediately after data loading / updating ): DBCC SHOW_STATISTICS ('[dbo].[Address]', [PK_Address_AddressID]) Statistics is empty 9 | 11/6/2018 | Statistics for beginners

Creating statistics: primary key select * from [dbo].[Address] where [AddressID] = 1 DBCC SHOW_STATISTICS ('[dbo].[Address]', PK_Address_AddressID) 10 | 11/6/2018 | Statistics for beginners

Creating statistics: definition Density is calculated based on the formula: (1 / frequency), where frequency indicates the average number of the duplicates per key value All density is calculated based on (1 / number of distinct values) formula, and it indicates how many rows on average every combination of key values has The RANGE_HI_KEY column stores the sample value of the key. This value is the upper-bound key value for the range defined by histogram step. The RANGE_ROWS column estimates the number of rows within the interval EQ_ROWS indicates how many rows have a key value equal to the RANGE_HI_KEY upper-bound value DISTINCT_RANGE_ROWS indicates how many distinct values of the keys are within the interval AVG_RANGE_ROWS indicates the average number of rows per distinct key value in the interval. 11 | 11/6/2018 | Statistics for beginners

Creating statistics: auto-creating select distinct PostalCode from [dbo].[Address] where city = 'Concord‘ (1 row(s) affected) select * from sys.stats where object_id = object_id('[dbo].[Address]','U') What does mean WA in the name of statistic? DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000004_041093DD) WITH STAT_HEADER go DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000006_041093DD) WITH STAT_HEADER SQL Server stores additional information in the statistics for the string values called Trie Trees 12 | 11/6/2018 | Statistics for beginners

Creating statistics: auto-creating DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000004_041093DD) WITH DENSITY_VECTOR go DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000006_041093DD) WITH DENSITY_VECTOR 13 | 11/6/2018 | Lies, damned lies, and statistics

Creating statistics: auto-creating DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000004_041093DD) WITH HISTOGRAM go DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000006_041093DD) WITH HISTOGRAM 14 | 11/6/2018 | Statistics for beginners

Creating statistics: index SET ANSI_PADDING ON GO CREATE NONCLUSTERED INDEX [IDX_StateProvinceID] ON [dbo].[Address] ( [StateProvinceID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) DBCC SHOW_STATISTICS ('[dbo].[Address]', IDX_StateProvinceID) 15 | 11/6/2018 | Statistics for beginners

Creating statistics: index (two columns) CREATE NONCLUSTERED INDEX [idx_city_postalcode] ON [dbo].[Address] ( [City] ASC, [PostalCode] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO DBCC SHOW_STATISTICS ('[dbo].[Address]', idx_city_postalcode) Histogram is creating for first column only 16 | 11/6/2018 | Statistics for beginners

Creating statistics: two columns CREATE STATISTICS CityProvince ON dbo.Address(City,StateProvinceID) GO DBCC SHOW_STATISTICS ('[dbo].[Address]', CityProvince) WITH STAT_HEADER CREATE STATISTICS CityProvince ON dbo.Address(City,StateProvinceID) WITH FULLSCAN GO DBCC SHOW_STATISTICS ('[dbo].[Address]', CityProvince) WITH STAT_HEADER CREATE STATISTICS CityProvince ON dbo.Address(City,StateProvinceID) WITH SAMPLE 50 PERCENT GO DBCC SHOW_STATISTICS ('[dbo].[Address]', CityProvince) WITH STAT_HEADER 17 | 11/6/2018 | Statistics for beginners

Updating statistics Auto Rules: More then 500 records : 20% + 500 records are modified Less then 500 records : 500 modifications Amount of records are change from 0 Temp tables: after every 6 modification Filtered statistics – the same algorithm (as for usual statistics) Sp_autostats (ON;OFF auto-updating statistics for particular objects) NORECOMPUTE (Create Statistics OPTION) STATISTICS_NORECOMPUTE (Create Index OPTION) Synchronous / Asynchronous Manual UPDATE STATISTICS Sp_UpdateStats (will update all statistics that have experienced the change of at least one underlying row since the last statistics update) Index rebuild operation Does not automatically deleted (in case of index creating) Updating statistics will result in cached plan invalidations. Auto - Updating statistics: sampling 20% rows 18 | 11/6/2018 | Statistics for beginners

Updating statistics: information SELECT OBJECT_NAME([sp].[object_id]) AS "Table", [sp].[stats_id] AS "Statistic ID", [s].[name] AS "Statistic", [sp].[last_updated] AS "Last Updated", [sp].[rows], [sp].[rows_sampled], [sp].[unfiltered_rows], [sp].[modification_counter] AS "Modifications" FROM [sys].[stats] AS [s] OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp] WHERE [s].[object_id] = OBJECT_ID(N'dbo.Address'); 19 | 11/6/2018 | Statistics for beginners

New 2014 CE Trace Flag 9481 reverts query compilation and execution to the pre-SQL Server 2014 legacy CE behavior for a specific statement. Trace Flag 2312 enables the new SQL Server 2014 CE for a specific query compilation and execution. --SQL Server 2014 compatibility level - New Cardinality Estimator ALTER DATABASE [AdventureWorks2012] SET COMPATIBILITY_LEVEL = 120 select city, count(*) from [Person].[Address] group by city OPTION (QUERYTRACEON 9481) --SQL Server 2012 compatibility level - Old Cardinality Estimator ALTER DATABASE [AdventureWorks2014] SET COMPATIBILITY_LEVEL = 110 select city, count(*) from [Person].[Address] group by city OPTION (QUERYTRACEON 2312) 20 | 11/6/2018 | Statistics for beginners

Undocumented (8666) 21 | 11/6/2018 | Statistics for beginners SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; GO DBCC TRACEON (8666); WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p) SELECT qt.text AS SQLCommand, qp.query_plan, StatsUsed.XMLCol.value('@FieldValue','NVarChar(500)') AS StatsName FROM sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) qt CROSS APPLY query_plan.nodes('//p:Field[@FieldName="wszStatName"]') StatsUsed(XMLCol) WHERE qt.text LIKE '%SELECT%' AND qt.text LIKE '%addressline1%'; DBCC TRACEOFF(8666); 21 | 11/6/2018 | Statistics for beginners

Undocumented (rowcount, pagecount) <update_stats_stream_option> ::= [ STATS_STREAM = stats_stream ] [ ROWCOUNT = numeric_constant ] [ PAGECOUNT = numeric contant ] <update_stats_stream_option> This syntax is for internal use only and is not supported. Microsoft reserves the right to change this syntax at any time. use tempdb go create table t1(i int, j int) create table t2(h int, k int) 22 | 11/6/2018 | Statistics for beginners

Undocumented (rowcount, pagecount) select distinct(i) from t1 select * from t1, t2 where i = k order by j + k update statistics t1 with rowcount = 10000, pagecount = 10000 update statistics t2 with rowcount = 100000, pagecount = 100000 23 | 11/6/2018 | Statistics for beginners

Undocumented (rowcount, pagecount) select distinct(i) from t1 select * from t1, t2 where i = k order by j + k 24 | 11/6/2018 | Statistics for beginners

Statistics for beginners SAMPLES 25 | 11/6/2018 | Statistics for beginners

Sample 1: constant = select addressid, AddressLine1, addressline2, city from [dbo].[Address] where [StateProvinceID] = 17 Go 8429 row(s) affected 26 | 11/6/2018 | Statistics for beginners

Sample 2: local variable = declare @id int = 17 select addressid, AddressLine1, addressline2, city from [dbo].[Address] where [StateProvinceID] = @id Go 8429 row(s) affected 27 | 11/6/2018 | Statistics for beginners

Sample 2a: local variable = constant create table #t (id int not null identity(1,1) primary key, descr varchar(20) not null) go insert into #t (descr) values ('descr 0'),('descr 1'),('descr 2'),('descr 3'),('descr 4'),('descr 5'),('descr 6'),('descr 7'),('descr 8'),('descr 9') go 100 select * from #t where descr = 'descr 2' go declare @descr varchar(20) = 'descr 2' select * from #t where descr = @descr 28 | 11/6/2018 | Statistics for beginners

Sample 3: local variable < (>) delete top (5) percent from [dbo].[Address] go declare @id int = 9 select addressid, AddressLine1, addressline2, city from [dbo].[Address] where [StateProvinceID] < @id 0 row(s) affected DBCC SHOW_STATISTICS ('[dbo].[Address]', IDX_StateProvinceID) 29 | 11/6/2018 | Lies, damned lies, and statistics

Sample 4: like % 30 | 11/6/2018 | Statistics for beginners ;with rs as (select addressid, AddressLine1, addressline2, city, StateProvinceID from [dbo].[Address] where AddressLine1 like '%Monti%') select distinct rs.city, p.StateProvinceCode, p.Name from rs inner join [Person].[StateProvince] p on rs.StateProvinceID = p.StateProvinceID go ;with rs as (select addressid, AddressLine1, addressline2, city, StateProvinceID from [dbo].[Address] where AddressLine1 like '%Circle') DBCC SHOW_STATISTICS ('[dbo].[Address]', _WA_Sys_00000004_041093DD) WITH STAT_HEADER 30 | 11/6/2018 | Statistics for beginners

Sample 4: like % (with statistics) (1244 row(s) affected) (23321 row(s) affected) 31 | 11/6/2018 | Statistics for beginners

Sample 4: like % (with statistics) (23321 row(s) affected) (1244 row(s) affected) 32 | 11/6/2018 | Statistics for beginners

Sample 4: like % (without statistics) drop statistics dbo.address._WA_Sys_00000004_041093DD go USE [master] GO ALTER DATABASE [AdventureWorks2012] SET AUTO_UPDATE_STATISTICS OFF ALTER DATABASE [AdventureWorks2012] SET AUTO_CREATE_STATISTICS OFF 33 | 11/6/2018 | Statistics for beginners

Sample 4: like % (without statistics) 34 | 11/6/2018 | Statistics for beginners

Sample 4: like % (without statistics) Estimated With statistics 1078 vs Without statistics 575316 Estimated With statistics 121907 vs Without statistics 239715 35 | 11/6/2018 | Statistics for beginners

Sample 5: computed columns SELECT (count(*)/100.0)*30 as _30_percent FROM Sales.SalesOrderDetail go SET STATISTICS PROFILE ON GO SELECT * FROM Sales.SalesOrderDetail WHERE UnitPrice * OrderQty > 30000 SET STATISTICS PROFILE OFF 36 | 11/6/2018 | Statistics for beginners

Sample 5: computed columns ALTER TABLE Sales.SalesOrderDetail ADD total AS UnitPrice * OrderQty DBCC SHOW_STATISTICS ('[Sales].[SalesOrderDetail]', _WA_Sys_0000000F_44CA3770) ALTER TABLE Sales.SalesOrderDetail DROP COLUMN total 37 | 11/6/2018 | Statistics for beginners

Sample 7: two condition (2012; independent) SELECT [AddressLine1] ,[AddressLine2] ,[City] ,[StateProvinceID] ,[PostalCode] FROM [Person].[Address] where city = 'Melbourne' and stateprovinceid = 77 DBCC SHOW_STATISTICS ('[Person].[Address]', IX_Address_StateProvinceID) DBCC SHOW_STATISTICS ('[Person].[Address]', _WA_Sys_00000004_164452B1) Select ((901.0/count(*)) * (110.0/count(*))) * count(*) as EstimatedNumberofRows from [Person].[Address] 38 | 11/6/2018 | Statistics for beginners

Sample 7: two condition (2014; selectivity) [AddressLine1] ,[AddressLine2] ,[City] ,[StateProvinceID] ,[PostalCode] FROM [Person].[Address] where city = 'Melbourne' and stateprovinceid = 77 39 | 11/6/2018 | Statistics for beginners

Sample 7: filtered staistics CREATE STATISTICS Victoria ON Person.Address(City) WHERE StateProvinceID = 77 40 | 11/6/2018 | Statistics for beginners

Sample 7: filtered statistics DBCC FREEPROCCACHE GO SELECT [AddressLine1] ,[AddressLine2] ,[City] ,[StateProvinceID] ,[PostalCode] FROM [Person].[Address] where city = 'Melbourne' and stateprovinceid = 77 Partition table 41 | 11/6/2018 | Statistics for beginners

Sample 8: out of date statistics DBCC SHOW_STATISTICS ('[dbo].[Address]', PostalCode) INSERT INTO [dbo].[Address] ([AddressLine1] ,[AddressLine2],[City],[StateProvinceID],[PostalCode]) VALUES ('AddressLine1', 'AddressLine2','City',5,N'YO16') GO 100000 select AddressLine1,AddressLine2, city from [dbo].[Address] where postalcode = N'YO16' logical reads 300279 42 | 11/6/2018 | Statistics for beginners

Sample 8: out of date statistics (memory) select city,count(*) from [dbo].[Address] where postalcode = N'YO16' group by city 43 | 11/6/2018 | Statistics for beginners

Sample 8: corrected out of date statistics update statistics [dbo].[Address] [postalcode] with fullscan select AddressLine1,AddressLine2, city from [dbo].[Address] where postalcode = N'YO16' logical reads 86357 44 | 11/6/2018 | Statistics for beginners

Sample 8: corrected out of date statistics select city,count(*) from [dbo].[Address] where postalcode = N'YO16' group by city 45 | 11/6/2018 | Statistics for beginners

Sample 9: auto – increment column (2012 vs 2014) create table dbo.Address_1 ( [AddressID] [int] IDENTITY NOT NULL PRIMARY KEY, [AddressLine1] [nvarchar](60) NOT NULL, [AddressLine2] [nvarchar](60) NULL, [City] [nvarchar](30) NOT NULL, [StateProvinceID] [int] NOT NULL, [PostalCode] [nvarchar](15) NOT NULL ) insert into dbo.Address_1 ([AddressLine1] , [AddressLine2] , [City] , [StateProvinceID] , [PostalCode] ) select [AddressLine1] , [PostalCode] from person.address SELECT * FROM dbo.Address_1 where addressid =1 insert into dbo.Address_1 ([AddressLine1] , [AddressLine2] , [City] , [StateProvinceID] , [PostalCode] ) select top 500 [AddressLine1] , [PostalCode] from person.address 46 | 11/6/2018 | Statistics for beginners

Sample 9: auto – increment column (2012 vs 2014) 47 | 11/6/2018 | Statistics for beginners

Sample 9: auto – increment column (2012 vs 2014) dbcc freeproccache go SELECT * FROM dbo.Address_1 where addressid between 19800 and 19950 48 | 11/6/2018 | Statistics for beginners

Question: variation between estimated and actual row There is no hard-coded variance that is guaranteed to indicate an actionable cardinality estimate problem. Instead, there are several overarching factors to consider beyond just differences between estimated and actual row counts: Does the row estimate skew result in excessive resource consumption? For example, spills to disk because of underestimates of rows or wasteful reservation of memory caused by row overestimates. Does the row estimate skew coincide with specific query performance problems (e.g., longer execution time than expected)? 49 | 11/6/2018 | Statistics for beginners

Statistics for beginners Model assumptions are differ from real world Statistics are approximate Performance depends on up-to-date statistics Statistics on non-indexed column make sense Q&A The end 50 | 11/6/2018 | Statistics for beginners