Azure SQL Data Warehouse Performance Tuning

Azure SQL Data Warehouse Performance Tuning
Simon Facer Microsoft PFE

Simon Facer Microsoft PFE since 2011
SQL Server since 1995 – version 4.21 (Sybase System 12) APS since 2013 ADW since 2016

Out of SCope Data Bricks Polybase Optimized for Elasticity
What we aren’t going to talk about (in detail) … Data Bricks Polybase Optimized for Elasticity Adaptive Caching HDInsight Azure Data Lake Azure Data Factory Azure Analytics Power BI Optimized for Compute / Gen 2

In Scope (aka the Agenda)
Azure SQL DW Overview Table Basics Capturing Query Data Common Design and Performance Issues

What is MPP? Control Compute Compute Compute Compute Compute Compute
Massively Parallel Processing Control Compute Compute Compute Compute Compute Compute

What is Azure Data Warehouse?
Massively Parallel Processing Storage Compute Compute Compute Compute Compute Compute Compute Compute Compute Compute (60)

Table use cases Fact Tables Dimension Tables Stage Tables
Goals Fact Tables Millions / Billions of rows Aggregatable data Dimension Tables Attribute data Stage Tables Data sink Fast scan of M or B of rows SCAN Fast read of specific rows SEEK Fast write

Table geometries Hash Distributed Tables
Data is distributed based on the hash of the distribution key value 60 Distributions Fact Tables Round-Robin Distributed Tables Data is distributed evenly across all distributions Staging Tables Replicated Tables Copy on each compute node * Dimension tables < 2GB Hash Distributed: Hashing algorithm is deterministic – value ‘x’ will always go to the same distribution. All tables use the same hashing algorithm, so value ‘x’ in the hash key for table tbl_SalesHdr will be in the same distribution as value ‘x’ in the hash key for table tbl_SalesDtl. The Hashing algorithm is data-type agnostic - it is based on the bytes of the field, not the value. The results are counter-intuitive – value 1 in an INT field will hash differently to value 1 in a BIGINT field, because the BIGINT field has more bytes than the INT field. May be a good candidate for tables with frequent DML operations. Round-Robin Distributed Rows are distributed evenly across all 60 distributions. Data is not assigned to distributions in any deterministic pattern. Queries against RR tables will almost always incur data movement. Can be a good candidate when … There is no obvious JOIN key There is no good candidate HASH key, The table does not share a common join key with other tables When the table is a temporary staging table Replicated Tables Data for Replicated tables is copied on each Compute node Should be < 2GB is size After scaling the DW, needs to be re-initialized on Compute nodes After updates, needs to be re-initialized on Compute Nodes Query DMV sys.pdw_replicated_table_cache_state to identify tables that need to be re-initialized -- Code Start: SELECT [ReplicatedTable] = t.[name] FROM sys.tables t JOIN sys.pdw_replicated_table_cache_state c ON c.object_id = t.object_id JOIN sys.pdw_table_distribution_properties p ON p.object_id = t.object_id WHERE c.[state] = 'NotReady' AND p.[distribution_policy_desc] = 'REPLICATE’ -- Re-initialize table with: SELECT TOP 1 * FROM [ReplicatedTable] Re-initialization incurs a table-level EXCLUSIVE lock Re-initialization also rebuild all indexes on the table

Storage Options Rowstore – Clustered Index Indexed / ordered data
Indexes get Fragmented over time Data insert ordered on Cluster Key - new rows appended to the end of the table – fast load performance Data insert not ordered on Cluster Key - new rows inserted into existing pages results in Page Splits – poor load performance Index maintenance on DML – overhead on data load Good lookup performance Ideal for limited range scans & singleton selects (Seeks) Slower for table scans / partition scans / loading

Storage Options Rowstore – Heap No clustered index / unordered data
New rows appended to the end of the table – fast load performance Whole table is / may be read for lookups (Seeks) Whole table is read for Scans Bad read performance

Storage Options Clustered ColumnStore Index (CCI)
Highly compressed – IO efficient Compression up to 15x (vs. RowStore up to 3.5x) Load performance dependent on Batch Size Lookup (seek) queries perform badly Scan queries – optimized! Query performance depends on CCI quality / health

How to capture Query MetaData
EXPLAIN Equivalent of SQL Server’s ‘Estimated Execution Plan’

DMVs Equivalent of SQL Servers ‘Actual Execution Plan’

XML Output shows D-SQL operations:

Demo … See the ‘Resources’ slide at the end for scripts used in this session.

Common Design and Perf. Issues
Data Movement Data Skew Statistics CCI Health Locking Resource Contention

Common Issues – Data Movement
Why does data move? fact_OrderHeader OrderID 1 … 2 … 3 … 4 … 7 … 8 … fact_OrderDetail SKU (Order ID) (7) … (3) … (1) … (7) … Compute #1 Compute # 2 Compute # 3

Why does data move? Distribution incompatible JOINs Distribution incompatible AGGREGATIONs Store_Sales HASH([ProductKey]) [ProductKey] INT NULL Web_Sales HASH([ProductKey]) [ProductKey] BIGINT NULL

Why does data move? Distribution incompatible JOINs Distribution incompatible AGGREGATIONs SELECT COUNT_BIG(*) FROM [dbo].[FactOnlineSales] GROUP BY [StoreKey]; Incompatiblity: FactOnlineSales distributed by ProductKey Query groups by Store Resolution: Re-distribute data on ProductKey

Common Issues – Data Skew
Causes of Data Skew Natural Skew NULL hash key values Default hash key value Bad hash key choice Resolution: Pick a different hash key Split default values into a secondary table

Common Issues - Statistics
The MPP Query Optimizer heavily relies on statistics to evaluate plans Out-of-Date or Non-Existent Statistics is the most common reason for MPP performance issues! Avoid issues with statistics by creating them on all recommended columns and updating them after every load

It is recommended Statistics are created on all columns used in: Joins Predicates Aggregations Group By’s Order By’s Computations Don’t forget about multi-column statistics …

Azure SQL DW now supports automatic creation of column level statistics Auto Update not supported Multi-column stats not auto created Stats Creation is Synchronous Stats Creation is triggered by: , yet SELECT INSERT-SELECT CTAS UPDATE DELETE EXPLAIN May 10th release

Common Issues – CCI Health
Clustered ColumnStore Indexes Better Performance with > 100K rows / Compressed Row Group Best Performance with 1,048,576 rows / Compressed Row Group Deleted rows impact performance Open Row Groups (Delta Store) – HEAPs Loading Batches – > 100K rows / distribution – direct to Compressed format Small Resource Class – memory pressure can limit Compressed RGs size Compressing a Row Group requires Memory: 72MB + (r * c * 8) + (r * short str col * 32) + (long str col * 16MB) Distributed tables have 60 sets of Row Groups Recommended ≥ 60M rows (1M / distribution) Each distribution has its own Delta Store Partitions add CCIs / distribution Short str ≤ 32 bytes, Long str > 32 bytes

Common Issues – Resource ConteNtion
Queries occupy Concurrency Slots, based on Resource Class # of concurrent queries depends on DWU service objective Allocated RAM / Query allocated depends on Resource Class and DWU

Memory: Gen 1 Performance level Compute nodes Memory per data warehouse (GB) DW100 1 24 DW200 2 48 DW300 3 72 DW400 4 96 DW500 5 120 DW600 6 144 DW1000 10 240 DW1200 12 288 DW1500 15 360 DW2000 20 480 DW3000 30 720 DW6000 60 1440 Gen 2 Performance level Compute nodes Memory per data warehouse (GB) DW1000c 2 600 DW1500c 3 900 DW2000c 4 1200 DW2500c 5 1500 DW3000c 6 1800 DW5000c 10 3000 DW6000c 12 3600 DW7500c 15 4500 DW10000c 20 6000 DW15000c 30 9000 DW30000c 60 18000

Gen 1 (Static Resource Classes): Concurrency Slots Used Service level Maximum concurrent queries Maximum concurrency slots staticrc10 staticrc20 staticrc30 staticrc40 staticrc50 staticrc60 staticrc70 staticrc80 DW100 4 1 2 DW200 8 DW300 12 DW400 16 DW500 20 DW600 24 DW1000 32 40 DW1200 48 DW1500 60 DW2000 80 64 DW3000 120 DW6000 128 240

Gen 1 (Dynamic Resource Classes): Concurrency Slots Used Service level Maximum concurrent queries Concurrency slots available smallrc mediumrc largerc xlargerc DW100 4 1 2 DW200 8 DW300 12 DW400 16 DW500 20 DW600 24 DW1000 32 40 DW1200 48 DW1500 60 DW2000 80 64 DW3000 120 DW6000 128 240

Gen 2 (Static Resource Classes): Concurrency Slots Used Service Level Maximum concurrent queries Concurrency slots available staticrc10 staticrc20 staticrc30 staticrc40 staticrc50 staticrc60 staticrc70 staticrc80 DW1000c 32 40 1 2 4 8 16 DW1500c 60 DW2000c 48 80 64 DW2500c 100 DW3000c 120 128 DW5000c 200 DW6000c 240 DW7500c 300 DW10000c 400 DW15000c 600 DW30000c 1200

Gen 2 (Dynamic Resource Classes): Concurrency Slots Used Service Level Maximum concurrent queries Concurrency slots available smallrc mediumrc largerc xlargerc DW1000c 32 40 1 4 8 28 DW1500c 60 6 13 42 DW2000c 80 2 17 56 DW2500c 100 3 10 22 70 DW3000c 120 12 26 84 DW5000c 200 20 44 140 DW6000c 240 7 24 52 168 DW7500c 300 9 30 66 210 DW10000c 400 88 280 DW15000c 600 18 132 420 DW30000c 1200 36 264 840

Resources Microsoft Azure SQL Data Warehouse
SQL Data Warehouse Documentation

Questions …

Azure SQL Data Warehouse Performance Tuning

Similar presentations

Presentation on theme: "Azure SQL Data Warehouse Performance Tuning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Azure SQL Data Warehouse Performance Tuning

Similar presentations

Presentation on theme: "Azure SQL Data Warehouse Performance Tuning"— Presentation transcript:

Similar presentations

About project

Feedback