SQL Data Warehouse: lesson learned and practical implementation tips

Slides:



Advertisements
Similar presentations
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Advertisements

Use relational database as a service
Data Platform and Analytics Foundational Training
Business Continuity & Disaster Recovery
PolyBase: T-SQL Reaching Beyond the Database
Successfully migrate existing databases to Azure SQL Database
What’s new in Entity Framework Core 2.0
Secure Hyperconnectivity with TeamViewer and Windows technologies
5/22/2018 1:39 AM BRK2156 Power BI Report Server: Self-service BI and enterprise reporting on-premises Christopher Finlan Senior Program Manager © Microsoft.
The story of an IoT solution
Creating Enterprise Grade BI Models with Azure Analysis Services
Operational Analytics in SQL Server 2016 and Azure SQL Database
Azure File Sync Setup, configuration and management
Delivering enterprise BI with Azure Analysis Services
6/2/ :21 AM THR2179 Integrating Microsoft Visio, PowerApps and Flow to create compelling online solutions David Parker Owner, bVisual Visio MVP ©
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Use any Amazon S3 application with Azure Blob Storage
Why Is My SQL DW Query Slow?
6/12/2018 2:19 PM BRK3245 DirectQuery in Analysis Services: best practices, performance, and use cases Marco Russo SQLBI © Microsoft Corporation. All rights.
Azure Cloud Shell Magic of Modern Command-line Management
Developing Hybrid Apps on Microsoft Azure Stack
TFS Database Import Service for Visual Studio Team Services
Azure SDKs and Tools for You
Lessons learned from moving to Microsoft Azure
6/26/2018 2:09 PM THR4002 Achieving Upward Mobility Top 3 Strategies for Migrating Data and Workloads to the Cloud
Optimizing Microsoft OneDrive for the enterprise
Performing a Seamless Migration in Azure SQL DB
Virtual Machine Diagnostics in Microsoft Azure
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
8/6/2018 3:21 AM THR2261 Groups, and Teams and Sites, Oh My! The Ultimate Office 365 Groups Teardown John Peluso SVP Product Strategy, AvePoint Inc. Microsoft.
8/6/ :17 AM THR2214 Hybrid Cloud Activated A customer case study optimizing on-premises & Azure performance and cost Mor Cohen-Tal Senior Product.
Installation and database instance essentials
SQL Server for Java developers
Workflow Orchestration with Adobe I/O
Customize Office 365 Search and create result sources
Online virtual labs: The hidden gem for free hands-on learning, practice, and exploration CA Callahan.
9/14/ :46 AM BRK3293 How the Portland Trail Blazers Use Personalization and Acxiom Data to Target Customers Chris Hoder Program Manager, AI + Research.
A developers guide to Azure SQL Data Warehouse
Business Continuity & Disaster Recovery
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Azure SQL Data Warehouse Scaling: Configuration and Guidance
What is the Azure SQL Datawarehouse?
Azure PowerShell Aaron Roney Senior Program Manager Cormac McCarthy
Port your AWS Knowledge to Azure
Azure SQL Data Warehouse Performance Tuning
Azure SQL Database: A Guided Tour
A developers guide to Azure SQL Data Warehouse
Azure Advisor: Optimization in the best way
Mobile Center and VSTS:​ Better together for your Mobile DevOps
20 Questions with Azure SQL Data Warehouse
Azure SQL DWH: Optimization
Power-up NoSQL with Azure Cosmos DB
Sunil Agarwal | Principal Program Manager
Context about the Data Warehouse
Overview: Dynamics 365 for Project Service Automation
Azure SQL DWH: Tips and Tricks for developers
Virtual Reality with Azure and Unity
2/24/2019 7:49 PM BRK2198 Four new Azure management experiences to run your business critical applications Dushyant Gill | Jan Kalis.
Surviving identity management in a hybrid world
Breaking Down the Value of A Yammer Post: 20 Things to Do
Cool Microsoft Edge Tips and Tricks
Explore PnP Partner Pack for IT pros, admins and architects
Getting the most out of Azure resources with Azure Advisor
“Hey Mom, I’ll Fix Your Computer”
4/21/2019 7:09 AM THR2098 Unlock New Opportunities with Nintex Hawkeye Process Intelligence and Workflow Analytics Sr. Product.
Business Continuity and the Microsoft Cloud
Designing Bots that Fit Your Organization
Ask the Experts: Windows 10 deployment and servicing
Digital Transformation: Putting the Jigsaw Together
Diagnostics and troubleshooting in Azure App Service Support Center
Presentation transcript:

SQL Data Warehouse: lesson learned and practical implementation tips 6/1/2018 2:52 AM BRK3377 SQL Data Warehouse: lesson learned and practical implementation tips Joe Yong Sr. Program Manager SQL Data Warehouse © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Before we begin Goals Non-goals (but feel free to ask questions) 6/1/2018 2:52 AM Before we begin Goals Azure SQL Data Warehouse architecture Good and bad workloads for SQL DW Key lessons learned and recommended practices Non-goals (but feel free to ask questions) Every detail about SQL DW Every possible scenario applicable to SQL DW Read every bullet in every slide Flashy demos with pretty charts, browsing PB of data with hololens, etc… Pre-requisites Working knowledge of SQL Server and data warehouse scenarios and workloads Thanks to SQL CAT John Hoang & Murshed Zaman © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Data Warehouse: refresher and what’s new 6/1/2018 2:52 AM SQL Data Warehouse: refresher and what’s new © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL DW Architecture Changing to DW500 DW1000 6/1/2018 SQL DW Architecture Changing to DW1000 DW500 Control Queries Engine DMS DMS = Data Movement Service SQL DB Compute Compute Compute Compute Compute DMS DMS DMS DMS DMS SQL DB SQL DB SQL DB SQL DB SQL DB Dist_DB_1 Dist_DB_2 Dist_DB_12 Dist_DB_13 Dist_DB_14 Dist_DB_24 Dist_DB_25 Dist_DB_26 Dist_DB_36 Dist_DB_37 Dist_DB_38 Dist_DB_48 Dist_DB_49 Dist_DB_50 Dist_DB_60 … … … … … Premium storage Dist_DB_1.mdf Dist_DB_13.mdf Dist_DB_37.mdf Dist_DB_49.mdf © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL DW Architecture Changing to DW1000 DW1000 6/1/2018 SQL DW Architecture Changing to DW1000 DW1000 Control Queries Engine DMS = Data Movement Service DMS SQL DB Compute Compute Compute Compute Compute Compute Compute Compute Compute Compute DMS DMS DMS DMS DMS DMS DMS DMS DMS DMS SQL DB SQL DB SQL DB SQL DB SQL DB SQL DB SQL DB SQL DB SQL DB SQL DB Dist_DB_1 Dist_DB_2 Dist_DB_6 Dist_DB_7 Dist_DB_8 Dist_DB_12 Dist_DB_13 Dist_DB_14 Dist_DB_18 Dist_DB_19 Dist_DB_20 Dist_DB_24 Dist_DB_25 Dist_DB_26 Dist_DB_30 Dist_DB_31 Dist_DB32 Dist_DB_26 Dist_DB_37 Dist_DB_38 Dist_DB_42 Dist_DB_43 Dist_DB_44 Dist_DB_48 Dist_DB_49 Dist_DB_50 Dist_DB_54 Dist_DB_55 Dist_DB_56 Dist_DB_60 … … … … … … … … … … Premium storage Dist_DB_1.mdf Dist_DB_13.mdf Dist_DB_37.mdf Dist_DB_55.mdf © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6/1/2018 2:52 AM Why is this important Azure SQL Data Warehouse is based on an MPP architecture, not SMP Underlying engine is SQL Server but performance, scale and concurrency behaviors are very different Size does matter and not in aggregate; individual table size and rowcount are important Small data mart type workloads are generally poor candidates; exceptions are rare, few workarounds OLTP reporting type workloads are usually poor candidates; some exceptions, some viable workarounds If proper schema design was important in SQL Server, it is critical in SQL DW (or any MPP DW) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL DW targeted Workloads 6/1/2018 2:52 AM SQL DW targeted Workloads © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL DW Targeted Workloads 6/1/2018 2:52 AM SQL DW Targeted Workloads SQL DW is designed for DW and not OLTP; all traditional DW workload characteristics apply Not good for singleton DML heavy operations, example: clients issuing mostly singleton update, insert, delete Incremental data is loaded via ETL/ELT process in batch mode; not intended for real time ingestion DW workload typically considered to be tier-2 SLA (99.9%); no built-in low latency high availability Complex queries operating against large datasets © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Provision, scale, pause

Migration and Data Loading 6/1/2018 2:52 AM Migration and Data Loading © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data Preparation and Metadata Migration Filter essential objects to migrate Create performant local storage to receive exported data Follow SQLCAT guidance on choosing the type of distributed table (link in resources slide) Establish standard or dedicated connectivity to cloud Chose region nearest to you with Azure SQL DW PolyBase: One folder per table in storage container

Data Migration Recommendations Use Migration Tool - convert DDL, generate T-SQL compat report, data migration Understand current T-SQL surface area and workarounds Avoid Singleton DML operations (INSERT, UPDATE, DELETE) Batch DML if possible If unavoidable, wrap in transaction (BEGIN TRAN…COMMIT) Use heap table, or temp table for “staging” data Avoid large fully logged operations Considers CTAS as this is minimal logged operation Use LOJ as alternative for DELETE Process by partition to leverage parallelism and partition switching Design retry logic to address service disruption

Data Migration Recommendations Tips Incorrect format means migration needs to be entirely repeated Exploit bcp options, hints, parallelism Multiple compressed files, Split files Parallel import, reliable transfer Don’t use multiple files in the same gziped file Efficient Copy Parallel, Async, Resumable Limit concurrent copies if low bandwidth Very Large Data transfer Express Route, Import/Export Service Data Format Conversion Date Format, Field delimiters, escaping, field order, encoding Compression Use Gzip, ORC, Parquet 7-Zip utility, .NET/JAVA libraries Export BCP for fast export Multiple files per large table, one folder per table Copy AZCopy Data Movement Library © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data Loading Recommendations PolyBase and SSIS (with 2017 Azure feature pack) the fastest method Upload to BLOB via AZCOPY or PowerShell library Historical load – use CTAS Incremental – use INSERT…SELECT Use the highest resource class (without sacrificing concurrency) Increase DWU during load, decrease when done PolyBase now supports UTF-16 file types. ADLS as a source and target is also supported Known Issues: Does not support extended ASCII Does not support custom multi-date format. E.g. 2000-1-6 No reject files/reason for rejected rows.

Data Loading Options PolyBase SSIS* ADF BCP SQLBulkCopy API 6/1/2018 2:52 AM Data Loading Options PolyBase SSIS* ADF BCP SQLBulkCopy API Attunity Cloudbeam ASA/Storm** Method Performance PolyBase SSIS ADF BCP SQL Bulkcopy Rate Rate increases with higher DWU Yes Yes* No Rate increases with more concurrent loads Fastest Slowest * With SSIS Azure Feature Pack June 2017 or newer ** Not a good idea © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

PolyBase characteristics 6/1/2018 2:52 AM PolyBase characteristics Single PolyBase load provides best performance for non-compressed files Load performance scales as you increase service level objective Automatically parallelizes data load process; no need to manually break the input data into multiple files and issue concurrent loads Each reader will slice 512 MB block from data files Max throughput depends on number of readers available on the DWU level Multiple readers will not work against a compressed text file (gzip) Only a single reader is used per compressed file since uncompressing the file in the buffer is single threaded Alternatively, generate multiple compressed files Number of files should be greater than or equal to the total number of readers of your service level objective (SLO) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Single Gated Client Compute Node DMS Bridge Control Node Compute Node

Single Gated Client Parallelised Compute Node DMS Bridge Client Control Node Compute Node DMS Bridge Client DMS Client Compute Node DMS Bridge

Parallel Loading with PolyBase Compute Node DMS Bridge Azure Storage Blob Control Node Compute Node DMS Bridge DMS Compute Node DMS Bridge

Data loading with PolyBase

6/1/2018 2:52 AM Schema design © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Table Distribution Options 6/1/2018 2:52 AM Table Distribution Options Data divided across nodes based on hashing algorithm Same value will always hash to same distribution Single column only Hash Data distributed evenly across nodes Easy place to start, don’t need to know anything about the data Simplicity at a cost Round Robin (Default) Replicated (Public Preview) Data repeated on every node Simplifies many query plans and reduces data movement Best with joining hash table Check for Data Skew, NULLS, -1 Will incur more data movement at query time Consumes more space Joining two Replicated Table runs on one node © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Selecting a Distribution Method 6/1/2018 2:52 AM Selecting a Distribution Method For large fact tables, best option is to Hash Distribute Distribute on column that is joined to other fact tables Primary or surrogate key However, be mindful of … Hash column should have highly distinct values (Minimum >60 distinct values) Avoid distributing on a date column Avoid distributing on column with high frequency of NULLs and default values (e.g. -1) Distribution column is NOT updatable For compatible joins use the same data types for two distributed tables If there are no distribution columns that make sense, then use Round Robin as last resort © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Small dimension table (< 60M rows) 6/1/2018 2:52 AM Dimension Table Clustered index Round Robin Replicated tables Small dimension table (< 60M rows) Same design as fact table Clustered columnstore (by default) and distribute on join key Large dimension table © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Data Movement Data must be located on the same distribution to join… 6/1/2018 Data Movement Data must be located on the same distribution to join… Recommendation: Design to minimize data movement Mitigate data movement impact if unavoidable Data Movement does not occur when Two distribution compatible tables are joined Aggregation is distribution compatible Data Movement does occur when Two distribution incompatible tables are joined Round robin tables are distribution incompatible with all tables Aggregation by nature is distribution incompatible © 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Optimizing with Indexes 6/1/2018 2:52 AM Optimizing with Indexes Optimal choice for large tables Limits scans to columns in the query Optimal compression Slower to load than Heap Keep partitions large enough to compress (> 1 million rows) Clustered ColumnStore (SQL DW Default) Optimal choice for temporary or staging tables Fastest load performance Heap Clustered Index Optimal for tables < 60M rows Sorting operation slows down load Non-clustered Index Use sparingly Optimize single row lookups Will slow down load © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Statistics Cost based Query Optimizer needs statistics 6/1/2018 Statistics Cost based Query Optimizer needs statistics Create statistics for all columns used in JOINs, GROUP BY, WHERE Update statistics after incremental load If needed, use multi-column statistics on join and group by Default sampled stats are usually fine except for very large tables Auto create/update statistics currently in preview create statistics l_orderkey on [dbo.lineitem] (l_orderkey); select * from sys.stats where name = ‘l_orderkey’; dbcc show_statistics ("lineitem","l_orderkey"); © 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Partitioning Partition on date column for archiving purposes 6/1/2018 2:52 AM Partitioning Partition on date column for archiving purposes Improves performance by partition elimination Partition Granularity depends on your workload Reload, re-process At least 1 million rows per distribution/partition Optimize load performance through partition switching Considers different grain partitions if you have hot/cold data in different tables Example: Hot data daily, cold data monthly Keep the number of partitions “reasonable” as there is overhead Re-indexing by partition when needed © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Row Store & Column Store

Row Store & Column Store & Partitioning

Row Store & Column Store & Partitioning

DDL Example CREATE TABLE FactFinance ( FinanceKey int NOT NULL, 6/1/2018 DDL Example CREATE TABLE FactFinance ( FinanceKey int NOT NULL, Date datetime2 NOT NULL, OrganizationKey int NOT NULL, DepartmentGroupKey int NOT NULL, ScenarioKey int NULL, AccountKey int NULL, Amount float NOT NULL) WITH (clustered columnstore index, DISTRIBUTION = HASH(FinanceKey), PARTITION (Date RANGE RIGHT FOR VALUES (N‘2016-01-01T00:00:00.000', N‘2016-02-01T00:00:00.000', N‘2016-03-01T00:00:00.000')) ); © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6/1/2018 2:52 AM Querying © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Common Data Movement Types 6/1/2018 Common Data Movement Types DMS Operation Description ShuffleMoveOperation Redistributes data for compatible join or aggregation PartitionMoveOperation Data moves from compute to control node BroadcastMoveOperation Table needs to become replicated for join compatibility © 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Query Performance Recommendations Check for SKEW (DBCC PDW_SHOWSPACEUSED) Statistics CETAS or CTAS large return operation Denormalize Tables if needed DSQL Query Plan Minimize data movement operations Distribution & aggregation compatible Minimize size of data movement Check for predicate pushdown. Rewrite query if needed Use higher resource class for memory intensive queries Load large external tables rather than querying directly All data is brought back, no push down

SQLCAT Performance primitives Operation DWU400 (GB/HR) DWU1000 (GB/HR) DWU2000 DWU3000 DWU6000 Scan 9,464 22,168 39,928 54,788 91,344 Load heap no partitioned 584 1,172 2,657 3,397 6,993 Load CCI no partitioned 440 1,038 2,225 3,381 6,024 Load CCI partitioned 283 729 910 1,098 1,376 Shuffle 410 879 1,458 1,709 2,021 CTAS copy 958 1,874 2,814 2,831 3,083 Scan 40TB/HR Load 7TB/Hr Shuffle 410 GB/HR

Investigating queries

6/1/2018 2:52 AM Lessons learned © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Important lessons learned 6/1/2018 2:52 AM Important lessons learned Optimal architecture & design depends on your workload; best practices are guides, not rules Design for the cloud anticipate service disruption; retry, retry, retry! SELECT * without WHERE… can saturate your network Schema design is critical to the performance of your workload Skew is about data and queries Creating and updating statistics as appropriate; not blindly Consider manual stats management for very large tables Use appropriate hub and spoke architecture or caching layer to enable high concurrency and/or low latency workloads © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Important lessons learned 6/1/2018 2:52 AM Important lessons learned Avoid noisy, interactive clients like PowerBI direct query; implement caching layer or hub/spoke architecture Drain transactions before pausing/scaling Avoid real time data ingestion (ASA, Storm) Concurrency is not the root to all scalability challenges Verify your application is at least MPP aware, preferably optimized Server admin, SQL or AAD, are placed in SmallRC; not changeable If you AAD groups, be careful with its resource class assignment and group membership Pre-populate cache for replicated tables (e.g. SELEC TOP 1..) after resume, DML or scaling operation © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Relevant sessions at ignite 2017 6/1/2018 2:52 AM Relevant sessions at ignite 2017 Dining on data: Consume and query petabytes of data with Azure SQL Data Warehouse Getting peak performance from your SQL Data Warehouse column store Architect your big data solutions with SQL Data Warehouse and Azure Analysis Services © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Data Warehouse Resources 6/1/2018 2:52 AM SQL Data Warehouse Resources SQL DW Free Trial https://azure.microsoft.com/en-us/services/sql-data-warehouse/extended-trial/ Migration Guide Public Preview datamigration.microsoft.com Azure Database Migration Service (Limited Preview) Preview signup: aka.ms/migrating Channel 9 Video: Oracle migrations; Azure SQL Database migrations Best practices for Azure SQL Data Warehouse https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-best-practices Azure SQL Data Warehouse loading patterns and strategies https://blogs.msdn.microsoft.com/sqlcat/2017/05/17/azure-sql-data-warehouse-loading-patterns-and-strategies/ Azure feature pack for SSIS https://docs.microsoft.com/en-us/sql/integration-services/azure-feature-pack-for-integration-services-ssis Ask questions or help others in the community: https://social.msdn.microsoft.com/forums/azure/en-US/home?forum=AzureSQLDataWarehouse https://dba.stackexchange.com/questions/tagged/azure-sql-data-warehouse © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Please evaluate this session Tech Ready 15 6/1/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite https://myignite.microsoft.com/evaluations Phone: download and use the Microsoft Ignite mobile app https://aka.ms/ignite.mobileapp Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6/1/2018 2:52 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.