Azure SQL Data Warehouse for SQL Server DBAS

Slides:



Advertisements
Similar presentations
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.
Advertisements

High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
GETTING STARTED WITH AZURE SQL DB Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
Azure Services Platform Piotr Zierhoffer. Agenda Cloud? What is Azure? Environment Basic glossary Architecture Element description Deployment.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
MySQL Data Warehousing Survival Guide Marius Moscovici Steffan Mejia
Martin Cairney Hybrid data platform – making the most of Azure plus your on- prem kit DAT341 B.
Information managers are seeking innovative DBMS’s which are able to handle large data volumes in new ways or to optimize existing products and processes.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
Modern Data Warehousing Symmetric Multi-Processing SQL (SMP) vs Massive Parallel Processing SQL (MPP) Alain Dormehl P-Cubed Session Level : Intermediary.
Scaling out and in with Azure SQL DB Elastic Scale DBA-203 Warner Chaves, MCM/MVP, SQLTurbo.com, Pythian.com.
SQL Server as a Cloud Service November 11th 2015 Warner Chaves SQL MVP/MCM.
SQL Server High Availability Introduction to SQL Server high availability solutions.
SQL Server as a Cloud Service April 15th 2016 Warner Chaves Data Platform MVP/SQL Server MCM.
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
Microsoft Dynamics NAV Dynamics NAV 2016 one Azure SQL Dmitry Chadayev Microsoft.
Use Cases for In-Memory OLTP Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
…the secret sauce! Diagrams and video from Microsoft white papers and slide decks.
Cloud Database Platforms for the SQL DBA
Managing a database environment in the cloud
Cloud BI with Azure Analysis Services
Azure SQL Data Warehouse for Beginners
System Center Marketing
Real Time Data with Azure and Power BI
Where Should My Data Live (and Why)?
Lead SQL BankofAmerica Blog: SQLHarry.com
Cloud BI with Azure Analysis Services
Why Is My SQL DW Query Slow?
with the Microsoft BI Ecosystem
Incrementally Moving to the Cloud Using Biml
Example of a page header
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Data Warehouse in the Cloud – Marketing or Reality?
The Science of Success: Building Faith in a Data Warehouse
Installation and database instance essentials
Azure SQL Datawarehouse - Datawarehouse on Cloud
Machine Learning, Analytics, & Data Science Conference
Warner Chaves MCM / MVP / SQLTurbo.com / Pythian.com
A developers guide to Azure SQL Data Warehouse
Azure SQL Data Warehouse Scaling: Configuration and Guidance
Analytics for Apps: Landing and Loading Data into SQL Data Warehouse
What is the Azure SQL Datawarehouse?
Azure SQL Data Warehouse Performance Tuning
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
Cloud BI with Azure Analysis Services
Azure SQL Data Warehouse for SQL Server DBAS
A developers guide to Azure SQL Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
MPP – Maximize Parallel Productivity
20 Questions with Azure SQL Data Warehouse
Welcome to SQL Saturday Denmark
Cloud BI with Azure Analysis Services
Azure SQL DWH: Tips and Tricks for developers
Azure SQL Database - Managing your database on the cloud
Power BI for large databases
Azure SQL DWH: Optimization
Managing batch processing Transient Azure SQL Warehouse Resource
Understanding Azure SQL DB Service Tiers
Warner Chaves MCM / MVP / SQLTurbo.com / Pythian.com
Cloud Data Replication with SQL Data Sync
Warner Chaves MCM / MVP / SQLTurbo.com / Pythian.com
Microsoft Azure for SQL Server Professionals
Stretch Database - Historical data storage in SQL Server 2016
Context about the Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
Azure SQL DWH: Tips and Tricks for developers
Outperform the Competition with Azure SQL Data Warehouse
Moving your on-prem data warehouse to cloud. What are your options?
Beyond orchestration with Azure Data Factory
SQL Server on Containers
Presentation transcript:

Azure SQL Data Warehouse for SQL Server DBAS August 2017 Warner Chaves SQL MCM/ Data Platform MVP

Bio DBA and Consultant for 11 years Previously L3 DBA at HP in Costa Rica, now Principal Consultant at Pythian in Ottawa, Ontario. Microsoft Data Platform MVP. Twitter: @warchav Blog: Sqlturbo.com Email: warner@sqlturbo.com Company: Pythian.com

SQLSaturday Sponsors! Titanium & Global Partner Gold Silver Bronze Without the generosity of these sponsors, this event would not be possible! Please, stop by the vendor booths and thank them.

Agenda Objective: cover Azure SQL Data Warehouse in a way that is easy to understand and adopt for SQL Server DBAs We will go over: Why Data Warehousing in the cloud? Service Cost and Model Fundamental differences with SQL Server Loading and querying data

Pre-requisites Sql Server experience. Basic Data Warehousing concepts.

Cloud vs Traditional Data Warehousing Significant upfront investment Capacity is forecasted and fixed Client needs to manage the solution Static or semi-static software Client needs to complete the ecosystem Predictable recurring bill Dynamic capacity Solution managed by the provider Software in continuous improvement Tightly integrated with the rest of cloud services

So what is Azure SQL DW? Microsoft Azure Service Successor to the on-premises appliance known as APS/PDW Targeted at running multi-TB Data Warehousing workloads It’s a PaaS service – DWaaS (AWS RedShift – Google BigQuery) It’s an MPP (Massively Parallel Processing) system Compute is distributed Storage is distributed

SMP vs MPP Symmetric MultiProcessing Massively Parallel Processing

Azure SQL DW Connection Client Control Node Compute Nodes Data Movement Service Connection Client Control Node Compute Nodes Distributions

Service Model Compute and Storage are scaled and billed separately Compute is measured in Data Warehousing Units The DWU control the capacity of the Compute Nodes Storage is billed in 1TB increments The service allows you to PAUSE compute and stop getting charged for it

Backup and Recovery The service keeps backups for 7 days A snapshot is made every 4 to 8 hours In case of DR, you can do a geo-restore to a ‘paired datacenter’ with the daily backup If you need to retain a copy for more than 7 days, right now the option is to do a restore and then pause compute so you’re only paying for storage (we’re hoping for improvements in this regard…)

How is the engine different from SQL Server?

Distribution Method Most important concept for good performance in Azure SQL DW. It determines the way ASDM will distribute the records in different buckets. There are three methods: HASH distribution Round-Robin distribution Replicated

Hash Distribution Same values end up in the same bucket. If the distribution column is used in joins or for a Group By then no data movement is necessary. If a particular value is dominant in the table then a distribution can be overloaded compared to the other ones and lower system performance.

Overloaded distribution

Round Robin Distribution ASDW simply does a Round-Robin over the records and puts each record in a different bucket. The values in the record don’t matter when assigning a bucket. Data movement is required for most operations. If a table doesn’t have a good HASH column and is too big to be a replicated table then this can be the best option. If a value is skewed, the distribution will still be uniform.

Replicated Distribution The table is copied to each compute node. Recommended for tables smaller than 2GB. For smaller tables that are usually part of join predicates. For simple predicates like equality or inequality. The storage is table size X amount of compute nodes so don’t abuse it.

T-SQL Differences ASDW encourages the use of the CTAS (Create Table AS) construct Fully Parallel Logging is minimized Joins on UPDATE – DELETE not supported (there are workarounds) MERGE not supported (for now at least) Statistics creation and updates is not automatic (for now…) Some of the complex data types are not present (geography, geometry, hierarchy, xml) Full list here: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-migrate-code

DEMO: Portal and Metadata

Data Warehouse Design Good service to consider if your DW is at 1TB+ and growing. Default table type is Clustered Columnstore. Ideal columnstore segment is 1 million records (same as SQL Server). ASDW uses 60 Distributions. Fact Tables: Columnstores (optionally with Partitioning) with HASH distribution (if possible) Dimension Tables: B-Tree o Columnstore (if it’s a large dimension) HASH, Replicated or Round-Robin.

The thing about Partitioning Daily 365 partitions 60 distributions 21900 partitions 1 million is the ideal segment 21 900 000 000 records

Partitioning usually at the weekly or monthly level if necessary

The thing about Partitioning Monthly 12 partitions 60 distributions 720 partitions 1 million is the ideal segment 720 000 000 records

Data Loading Two ways of loading data: Control Node Methods Through the Control Node PolyBase Control Node Methods SSIS BCP Loads from Blob Storage or Azure Data Lake Parallel multi-threaded load that does not go through the Control Node For large data loads the Control Node can become a bottleneck

DEMO: Loading data with PolyBase

Querying Data Azure SQL DW has some differences in terms of query execution. There are concurrency limits depending on the DWUs. There are transaction size limits per Distribution also based off DWUs. Each user gets assigned a resource class to determine how much compute they get. Some DMVs keep historical information. The use of Query Labels is recommended for troubleshooting and monitoring.

Query execution is queued if necessary Concurrency Limits DWUs 100 200 300 400 500 600 1000+ Concurrent Queries 4 8 12 16 20 24 32 Query execution is queued if necessary

Memory assigned is per distribution Resource Classes CLASS SMALL MEDIUM LARGE X-LARGE Default X Memory 100MB Up to 3200MB Up to 6400MB Up to 12800MB Memory assigned is per distribution

OPTION (LABEL = 'QuantitySum'); Query Label SELECT sum(Quantity) FROM FactTransactionHistory OPTION (LABEL = 'QuantitySum');

DEMO: Querying Data

Questions?

Thanks!!