Azure SQL DWH: Optimization

Slides:



Advertisements
Similar presentations
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Advertisements

Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Connect with life Nauzad Kapadia Quartz Systems
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
How to kill SQL Server Performance Håkan Winther.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Enable Operational Analytics (HTAP) in SQL Server 2016 and Azure SQL Database Sunil Agarwal Principal Program Manager, SQL Server Product Tiger Team
SQL Server on Linux CTP 1.1 Florian
Cloud BI with Azure Analysis Services
All about JSON Scenarios and value proposition for JSON data enabled in Azure SQL Database and SQL Server Ralph Kemperdick Digital Business Architect,
Azure SQL Data Warehouse for Beginners
PolyBase: T-SQL Reaching Beyond the Database
Advanced Topics for Azure SQL Data Warehouse
SQL Data Warehouse: lesson learned and practical implementation tips
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Power BI Architecture, Best Practices, and Performance Tuning
Cloud BI with Azure Analysis Services
Why Is My SQL DW Query Slow?
UFC #1433 In-Memory tables 2014 vs 2016
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
A time travel With temporal tables Leonel Abreu
Presented by: Warren Sifre
Database Performance Tuning and Query Optimization
Before we begin Goals Non-goals (but feel free to ask questions)
Azure SQL Datawarehouse - Datawarehouse on Cloud
Machine Learning, Analytics, & Data Science Conference
A developers guide to Azure SQL Data Warehouse
Azure SQL Data Warehouse for SQL Server DBAS
SSAS Tabular Toolbelt Sergiy Lunyakin.
Azure SQL Data Warehouse Scaling: Configuration and Guidance
Blazing-Fast Performance:
Analytics for Apps: Landing and Loading Data into SQL Data Warehouse
What is the Azure SQL Datawarehouse?
Dynamics AX Performance
Microsoft Analytics Platform System 04 – APS Data Loading
Cardinality Estimator 2014/2016
ColumnStore Index Primer
Azure SQL Data Warehouse Performance Tuning
Azure SQL Data Warehouse for SQL Server DBAS
A developers guide to Azure SQL Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
MPP – Maximize Parallel Productivity
20 Questions with Azure SQL Data Warehouse
Cloud BI with Azure Analysis Services
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Azure SQL DWH: Tips and Tricks for developers
Managing batch processing Transient Azure SQL Warehouse Resource
Microsoft SQL Server 2014 for Oracle DBAs Module 7
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Context about the Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
Clustered Columnstore Indexes (SQL Server 2014)
Power BI with Analysis Services
Chapter 11 Database Performance Tuning and Query Optimization
Welcome to SQLSaturday #767! Hosted by Lincoln SQL Server User Group
Azure SQL DWH: Tips and Tricks for developers
Microsoft Analytics Platform System 03 – Distribution Theory & Design
Applying Data Warehouse Techniques
Outperform the Competition with Azure SQL Data Warehouse
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Applying Data Warehouse Techniques
Moving your on-prem data warehouse to cloud. What are your options?
Advanced Database Topics
Visual Data Flows – Azure Data Factory v2
Using Columnstore indexes in Azure DevOps Services. Lessons learned.
Visual Data Flows – Azure Data Factory v2
Presentation transcript:

Azure SQL DWH: Optimization Sergiy Lunyakin Azure SQL DWH: Optimization

Our Partners If you think, that a SQL Saturday is a nice possibility to learn from and network with fellow SQL Server enthusiasts FOR FREE, I just ask you one thing: Visit the sponsor booths and chat with the sponsors! They are covering the expenses for each and every of you, with is around EUR 60 …

About me Data Architect in BigData and Analytics Team at SoftServe Inc Data Platform MVP, MCSE BI, MCSA Cloud Platform Leader of Speaker at SQL Conferences Organizer of SQLSaturday Lviv Contacts: sergey.lunyakin@gmail.com @slunyakin

Agenda Architecture of Azure SQL DW Sizing. Service Layer and Resource Classes Data loading options. What is better Choose a right distribution type Update Statistics Indexing

Architecture

Architecture of Azure SQL DW Dist_DB_1 Dist_DB_2 Dist_DB_15 Dist_DB_16 Dist_DB_17 Dist_DB_30 … Dist_DB_46 Dist_DB_47 Dist_DB_60 … … … … …

Sizing

Sizing factors Number of nodes Tempdb size Concurrency & Memory Load Transaction size DWU/cDWU DWU – Optimized for Elasticity cDWU – Optimized for Compute (more resources, NVMe Solid State Disk cache that keeps the most frequently accessed data close to the CPUs). Gets 2.5x more memory.

Introducing DWU CPU RAM I/O DWU Max queries Max MB mem/ dist DWU Max 4 400 DW200 8 800 DW300 12 1200 DW400 16 1600 DW500 20 2000 DW600 24 2400 DW1000 32 4000 DW1200 4800 DW1500 6000 DW2000 8000 DW3000 12000 DW6000 24000 DWU Max queries Max GB mem/ dist DW1000c 32 10 DW1500c 15 DW2000c 20 DW2500c 25 DW3000c 30 DW5000c 50 DW6000c 60 DW7500c 75 DW10000c 100 DW15000c 150 DW30000c 300 CPU RAM I/O

Resource Classes Static Resource Classes Dynamic Resource Classes allocate the same amount of memory regardless of the current service level. Dynamic Resource Classes  allocate a variable amount of memory depending on the current service level. When you scale up to a larger service level, your queries automatically get more memory. https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-concurrency

Load management The load performance scales as you increase DWUs. Microsoft Build 2016 12/4/2018 7:54 PM Load management The load performance scales as you increase DWUs. PolyBase automatically parallelizes the data load process Multiple readers will not work against compressed text files (e.g. gzip) Multiple readers will work against compressed columnar/block format files (e.g. ORC, RC) DWU Readers Writers DW100 8 60 DW200 16 DW300 24 DW400 32 DW500 40 DW600 48 DW1000+ 80+ © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data Loading

Options Parallel PolyBase Azure Data Factory (PolyBase) SSIS (Azure SQL DW Comp) Single Gated Client bcp / Insert Bulk SQLBulkCopy SSIS (data flow) Azure Data Factory

Single Gated Client Compute Node DMS Bridge Control Node Compute Node

Single Gated Client Parallelised Compute Node DMS Bridge Client Control Node Compute Node DMS Bridge Client DMS Client Compute Node DMS Bridge

Parallel Loading with PolyBase Compute Node DMS Bridge Azure Storage Blob (ASB) Control Node Compute Node DMS Bridge DMS Compute Node DMS Bridge

Recommendations for data loading Load data with enough compute Separate user with high resource class Use CTAS to Load data to staging table from Azure Blob with Polybase Heap, Round-Robin (Hash whether prod table is hash distributed ), No partition Do transformation and load data to production table

Data preparation Transfer data to blob storage One root folder per table Split uncompressed text files bigger than 2 GB Each split can be targeted by a different reader Compressed text files cannot be split by reader(gzip) Multiple compressed text files can be read in parallel Multiple readers will work against compressed columnar/block format files (e.g. ORC, RC)

Distributions

Distributions Distribution – SQL Database which stores one or more distributed table Splits data table to 60 buckets through compute nodes Hash distributed table * Round-Robin distributed table * Replicate table * * Selecting the right distribution method is key for good performance

Creating distributed tables Microsoft Build 2016 12/4/2018 7:54 PM Creating distributed tables CREATE TABLE [build].[FactOnlineSales] ( [OnlineSalesKey] int NOT NULL , [DateKey] datetime NOT NULL , [StoreKey] int NOT NULL , [ProductKey] int NOT NULL , [PromotionKey] int NOT NULL , [CurrencyKey] int NOT NULL , [CustomerKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [SalesOrderLineNumber] int NULL , [SalesQuantity] int NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = ROUND_ROBIN ; CREATE TABLE [build].[FactOnlineSales] ( [OnlineSalesKey] int NOT NULL , [DateKey] datetime NOT NULL , [StoreKey] int NOT NULL , [ProductKey] int NOT NULL , [PromotionKey] int NOT NULL , [CurrencyKey] int NOT NULL , [CustomerKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [SalesOrderLineNumber] int NULL , [SalesQuantity] int NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH([ProductKey]) ; © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Round Robin Distribution 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

HASH Distribution 01 03 01 02 N HASH ( ) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Hash distribution key guidance Distribution key is not updateable! Use a column that has static values Does not contain NULL values Large number of distinct values Even distribution of values Used frequently in joins and group by Avoid columns used in the where clause

Replicated (vs. hash distributed) Node 00 Node 01 Node 02 SALES PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT PRODUCT Node 03 Node 04 Node 05 Node 06

Recommendations Replicated Round-Robin Hash Small dimension tables in a star schema with less than 2 GB of storage after compression (~5x compression) Warm-Up Replicated tables after create, scale, pause/resume Round-Robin Temporary/Staging table No obvious joining key or good candidate column Hash Fact tables Large dimension tables

Statistics

Key points Created manually (should be fixed soon) Updated manually Can make a huge difference to performance DISTINCT JOIN (+composite) WHERE GROUP BY (+composite) ORDER BY

Recommendations Create Stored Procedure to identify and create missed statistics Create Stored Procedure to update statistics after data loading or changing Automate it

Indexing

Indexing Heap Clustered Index Clustered ColumnStore (CCI) Staging/temporary table Small tables with small lookups Clustered Index Up to 100-m rows table Large tables (more than 100-m rows)  with only 1-2 columns are heavily used Clustered ColumnStore (CCI) Large tables (more than 100-m rows)

Recommendations Consider to add Nonclustered Index to a column heavily used for filter. Make updates on the indexed columns, it takes memory. Use higher resource class Avoid trimming and creating many small compressed Row Groups in CCI At least 100k rows per compressed Row Groups. The ideal is 1-m rows in a row group.

Recommendations Slow performance can happen due to poor compression of your Row Groups, consider to rebuild or reorganize CCI using higher resource class. Consider to partition your table when you have a large fact tables (>1B row table). The partition key should be based on date. Be careful to not over-partition, especially with a CCI.  Benefit from CCI = (60 distributions * N partitions * 1m rows) >=Count(1) from YourTable

Summary Service Level size and Resource Classes are very important for performance Data Loading with higher RC, DWUs and Polybase Carefully select your distribution type Control Statistics Check compression in CCI’s Row Groups, rebuild or reorganize CCI with higher RC. More memory Not over-partition table with CCI

Links https://docs.microsoft.com/en-us/azure/sql-data-warehouse/cheat-sheet https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-memory-optimizations-for-columnstore-compression https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-index#rebuilding-indexes-to-improve-segment-quality

Questions?

Our Partners If you think, that a SQL Saturday is a nice possibility to learn from and network with fellow SQL Server enthusiasts FOR FREE, I just ask you one thing: Visit the sponsor booths and chat with the sponsors! They are covering the expenses for each and every of you, with is around EUR 60 …