Azure SQL Data Warehouse for SQL Server DBAS

Slides:

Advertisements

Similar presentations

2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.

Advertisements

High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.

GETTING STARTED WITH AZURE SQL DB Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.

Azure Services Platform Piotr Zierhoffer. Agenda Cloud? What is Azure? Environment Basic glossary Architecture Element description Deployment.

Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.

An Introduction to Infrastructure Ch 11. Issues Performance drain on the operating environment Technical skills of the data warehouse implementers Operational.

Martin Cairney Hybrid data platform – making the most of Azure plus your on- prem kit DAT341 B.

Information managers are seeking innovative DBMS’s which are able to handle large data volumes in new ways or to optimize existing products and processes.

Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.

Scaling out and in with Azure SQL DB Elastic Scale DBA-203 Warner Chaves, MCM/MVP, SQLTurbo.com, Pythian.com.

SQL Server as a Cloud Service November 11th 2015 Warner Chaves SQL MVP/MCM.

SQL Server as a Cloud Service April 15th 2016 Warner Chaves Data Platform MVP/SQL Server MCM.

Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.

Microsoft Dynamics NAV Dynamics NAV 2016 one Azure SQL Dmitry Chadayev Microsoft.

Use Cases for In-Memory OLTP Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.

…the secret sauce! Diagrams and video from Microsoft white papers and slide decks.

Cloud Database Platforms for the SQL DBA

Managing a database environment in the cloud

Cloud Computing for Science

Cloud BI with Azure Analysis Services

Data Platform and Analytics Foundational Training

Azure SQL Data Warehouse for Beginners

Real Time Data with Azure and Power BI

Where Should My Data Live (and Why)?

Lead SQL BankofAmerica Blog: SQLHarry.com

Cloud BI with Azure Analysis Services

Antonio Abalos Castillo

Why Is My SQL DW Query Slow?

Incrementally Moving to the Cloud Using Biml

Example of a page header

7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.

Couchbase Server is a NoSQL Database with a SQL-Based Query Language

Data Warehouse in the Cloud – Marketing or Reality?

Installation and database instance essentials

Azure SQL Datawarehouse - Datawarehouse on Cloud

Machine Learning, Analytics, & Data Science Conference

Warner Chaves MCM / MVP / SQLTurbo.com / Pythian.com

A developers guide to Azure SQL Data Warehouse

Azure SQL Data Warehouse for SQL Server DBAS

Azure SQL Data Warehouse Scaling: Configuration and Guidance

Analytics for Apps: Landing and Loading Data into SQL Data Warehouse

What is the Azure SQL Datawarehouse?

What Azure have to offer for your data

Azure SQL Data Warehouse Performance Tuning

Implementing AI solutions using the cognitive services in Azure

Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.

A developers guide to Azure SQL Data Warehouse

Azure SQL DWH: Tips and Tricks for developers

MPP – Maximize Parallel Productivity

20 Questions with Azure SQL Data Warehouse

Cloud BI with Azure Analysis Services

Azure SQL DWH: Tips and Tricks for developers

Power BI for large databases

Azure SQL DWH: Optimization

Managing batch processing Transient Azure SQL Warehouse Resource

Understanding Azure SQL DB Service Tiers

Warner Chaves MCM / MVP / SQLTurbo.com / Pythian.com

Cloud Data Replication with SQL Data Sync

Warner Chaves MCM / MVP / SQLTurbo.com / Pythian.com

Microsoft Azure for SQL Server Professionals

Stretch Database - Historical data storage in SQL Server 2016

Context about the Data Warehouse

Azure SQL DWH: Tips and Tricks for developers

Azure SQL DWH: Tips and Tricks for developers

04 | Performance and the Premium SKU

Outperform the Competition with Azure SQL Data Warehouse

Moving your on-prem data warehouse to cloud. What are your options?

Advanced Database Topics

SQL Server on Containers

The Database World of Azure

Presentation transcript:

Azure SQL Data Warehouse for SQL Server DBAS June 2018 Warner Chaves SQL MCM/ Data Platform MVP

Thanks to our sponsors And Global Gold Silver Bronze Microsoft JetBrains Rubrik Delphix Solution OMD

Bio DBA and Consultant for 11 years Previously L3 DBA at HP in Costa Rica, now Principal Consultant at Pythian in Ottawa, Ontario. Microsoft Data Platform MVP. Twitter: @warchav Blog: Sqlturbo.com Email: warner@sqlturbo.com Company: Pythian.com

Agenda Objective: cover Azure SQL Data Warehouse in a way that is easy to understand and adopt for SQL Server DBAs We will go over: Why Data Warehousing in the cloud? Service Cost and Model Fundamental differences with SQL Server Loading and querying data

Pre-requisites Sql Server experience. Basic Data Warehousing concepts.

Cloud vs Traditional Data Warehousing Significant upfront investment Capacity is forecasted and fixed Client needs to manage the solution Static or semi-static software Client needs to complete the ecosystem Predictable recurring bill Dynamic capacity Solution managed by the provider Software in continuous improvement Tightly integrated with the rest of cloud services

So what is Azure SQL DW? Microsoft Azure Service Successor to the on-premises appliance known as APS/PDW Targeted at running multi-TB Data Warehousing workloads It’s a PaaS service – DWaaS (AWS RedShift – Google BigQuery) It’s an MPP (Massively Parallel Processing) system Compute and storage are distributed and independent

SMP vs MPP Symmetric MultiProcessing Massively Parallel Processing

Azure SQL DW – Gen1 Azure Premium Storage Connection Client Data Movement Service Connection Client Control Node Compute Nodes Distributions

Azure SQL DW – Compute Optimized (Gen 2) Azure Premium Storage for files and Columnstore segments NVMe Cache Data Movement Service Control Node NVMe Cache Compute Nodes Distributions

Service Model Compute and Storage are scaled and billed separately Compute is measured in Data Warehousing Units The DWU control the capacity of the Compute Nodes Storage is billed in 1TB increments The service allows you to PAUSE compute and stop getting charged for it

Backup and Recovery The service keeps backups for 7 days A snapshot is made every 4 to 8 hours In case of DR, you can do a geo-restore to a ‘paired datacenter’ with the daily backup If you need to retain a copy for more than 7 days, right now the option is to do a restore and then pause compute so you’re only paying for storage (we’re hoping for improvements in this regard…)

How is the engine different from SQL Server?

Distribution Method Most important concept for good performance in Azure SQL DW. It determines the way ASDM will distribute the records in different buckets. There are three methods: HASH distribution Round-Robin distribution Replicated

Hash Distribution Same values end up in the same bucket. If the distribution column is used in joins or for a Group By then no data movement is necessary. If a particular value is dominant in the table then a distribution can be overloaded compared to the other ones and lower system performance.

Overloaded distribution

Round Robin Distribution ASDW simply does a Round-Robin over the records and puts each record in a different bucket. The values in the record don’t matter when assigning a bucket. Data movement is required for most operations. If a table doesn’t have a good HASH column and is too big to be a replicated table then this can be the best option. If a value is skewed, the distribution will still be uniform.

Replicated Distribution The table is copied to each compute node. Recommended for tables smaller than 2GB. For smaller tables that are usually part of join predicates. For simple predicates like equality or inequality. The storage is table size X amount of compute nodes so don’t abuse it.

T-SQL Differences ASDW encourages the use of the CTAS (Create Table AS) construct Fully Parallel Logging is minimized Joins on UPDATE – DELETE not supported (there are workarounds) MERGE not supported (for now at least) Some of the complex data types are not present (geography, geometry, hierarchy, xml) Full list here: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-migrate-code

DEMO: Portal and Metadata

Data Warehouse Design Good service to consider if your DW is at 1TB+ and growing. Default table type is Clustered Columnstore. Ideal columnstore segment is 1 million records (same as SQL Server). ASDW uses 60 Distributions. Fact Tables: Columnstores (optionally with Partitioning) with HASH distribution (if possible) Dimension Tables: B-Tree o Columnstore (if it’s a large dimension) HASH, Replicated or Round-Robin.

The thing about Partitioning Daily 365 partitions 60 distributions 21900 partitions 1 million is the ideal segment 21 900 000 000 records

Partitioning usually at the weekly or monthly level if necessary

The thing about Partitioning Monthly 12 partitions 60 distributions 720 partitions 1 million is the ideal segment 720 000 000 records

Data Loading Two ways of loading data: Control Node Methods Through the Control Node PolyBase Control Node Methods SSIS BCP Loads from Blob Storage or Azure Data Lake Parallel multi-threaded load that does not go through the Control Node For large data loads the Control Node can become a bottleneck

DEMO: Loading data with PolyBase

Querying Data Azure SQL DW has some differences in terms of query execution. There are concurrency limits depending on the DWUs. There are transaction size limits per Distribution also based off DWUs. Each user gets assigned a resource class to determine how much compute they get. Some DMVs keep historical information. The use of Query Labels is recommended for troubleshooting and monitoring.

Query execution is queued if necessary Concurrency Limits DWUs 100 200 300 400 500 600 1000+ Concurrent Queries 4 8 12 16 20 24 32 (Gen1) – 128 (Gen2) Query execution is queued if necessary

Memory assigned is per distribution. The classes can also be static. Resource Classes CLASS SMALL MEDIUM LARGE X-LARGE Default X Memory 100MB Up to 3200MB Up to 6400MB Up to 12800MB Memory assigned is per distribution. The classes can also be static.

OPTION (LABEL = 'QuantitySum'); Query Label SELECT sum(Quantity) FROM FactTransactionHistory OPTION (LABEL = 'QuantitySum');

DEMO: Querying Data

Questions?

Thanks!!