Download presentation
Presentation is loading. Please wait.
1
Azure SQL Data Warehouse for SQL Server DBAS
August 2017 Warner Chaves SQL MCM/ Data Platform MVP
2
Bio DBA and Consultant for 11 years
Previously L3 DBA at HP in Costa Rica, now Principal Consultant at Pythian in Ottawa, Ontario. Microsoft Data Platform MVP. Blog: Sqlturbo.com Company: Pythian.com
3
SQLSaturday Sponsors! Titanium & Global Partner Gold Silver Bronze
Without the generosity of these sponsors, this event would not be possible! Please, stop by the vendor booths and thank them.
4
Agenda Objective: cover Azure SQL Data Warehouse in a way that is easy to understand and adopt for SQL Server DBAs We will go over: Why Data Warehousing in the cloud? Service Cost and Model Fundamental differences with SQL Server Loading and querying data
5
Pre-requisites Sql Server experience. Basic Data Warehousing concepts.
6
Cloud vs Traditional Data Warehousing
Significant upfront investment Capacity is forecasted and fixed Client needs to manage the solution Static or semi-static software Client needs to complete the ecosystem Predictable recurring bill Dynamic capacity Solution managed by the provider Software in continuous improvement Tightly integrated with the rest of cloud services
7
So what is Azure SQL DW? Microsoft Azure Service
Successor to the on-premises appliance known as APS/PDW Targeted at running multi-TB Data Warehousing workloads It’s a PaaS service – DWaaS (AWS RedShift – Google BigQuery) It’s an MPP (Massively Parallel Processing) system Compute is distributed Storage is distributed
8
SMP vs MPP Symmetric MultiProcessing Massively Parallel Processing
9
Azure SQL DW Connection Client Control Node Compute Nodes
Data Movement Service Connection Client Control Node Compute Nodes Distributions
10
Service Model Compute and Storage are scaled and billed separately
Compute is measured in Data Warehousing Units The DWU control the capacity of the Compute Nodes Storage is billed in 1TB increments The service allows you to PAUSE compute and stop getting charged for it
11
Backup and Recovery The service keeps backups for 7 days
A snapshot is made every 4 to 8 hours In case of DR, you can do a geo-restore to a ‘paired datacenter’ with the daily backup If you need to retain a copy for more than 7 days, right now the option is to do a restore and then pause compute so you’re only paying for storage (we’re hoping for improvements in this regard…)
12
How is the engine different from SQL Server?
13
Distribution Method Most important concept for good performance in Azure SQL DW. It determines the way ASDM will distribute the records in different buckets. There are three methods: HASH distribution Round-Robin distribution Replicated
14
Hash Distribution Same values end up in the same bucket.
If the distribution column is used in joins or for a Group By then no data movement is necessary. If a particular value is dominant in the table then a distribution can be overloaded compared to the other ones and lower system performance.
15
Overloaded distribution
16
Round Robin Distribution
ASDW simply does a Round-Robin over the records and puts each record in a different bucket. The values in the record don’t matter when assigning a bucket. Data movement is required for most operations. If a table doesn’t have a good HASH column and is too big to be a replicated table then this can be the best option. If a value is skewed, the distribution will still be uniform.
17
Replicated Distribution
The table is copied to each compute node. Recommended for tables smaller than 2GB. For smaller tables that are usually part of join predicates. For simple predicates like equality or inequality. The storage is table size X amount of compute nodes so don’t abuse it.
18
T-SQL Differences ASDW encourages the use of the CTAS (Create Table AS) construct Fully Parallel Logging is minimized Joins on UPDATE – DELETE not supported (there are workarounds) MERGE not supported (for now at least) Statistics creation and updates is not automatic (for now…) Some of the complex data types are not present (geography, geometry, hierarchy, xml) Full list here:
19
DEMO: Portal and Metadata
20
Data Warehouse Design Good service to consider if your DW is at 1TB+ and growing. Default table type is Clustered Columnstore. Ideal columnstore segment is 1 million records (same as SQL Server). ASDW uses 60 Distributions. Fact Tables: Columnstores (optionally with Partitioning) with HASH distribution (if possible) Dimension Tables: B-Tree o Columnstore (if it’s a large dimension) HASH, Replicated or Round-Robin.
21
The thing about Partitioning Daily
365 partitions 60 distributions 21900 partitions 1 million is the ideal segment records
22
Partitioning usually at the weekly or monthly level if necessary
23
The thing about Partitioning Monthly
12 partitions 60 distributions 720 partitions 1 million is the ideal segment records
24
Data Loading Two ways of loading data: Control Node Methods
Through the Control Node PolyBase Control Node Methods SSIS BCP Loads from Blob Storage or Azure Data Lake Parallel multi-threaded load that does not go through the Control Node For large data loads the Control Node can become a bottleneck
25
DEMO: Loading data with PolyBase
26
Querying Data Azure SQL DW has some differences in terms of query execution. There are concurrency limits depending on the DWUs. There are transaction size limits per Distribution also based off DWUs. Each user gets assigned a resource class to determine how much compute they get. Some DMVs keep historical information. The use of Query Labels is recommended for troubleshooting and monitoring.
27
Query execution is queued if necessary
Concurrency Limits DWUs 100 200 300 400 500 600 1000+ Concurrent Queries 4 8 12 16 20 24 32 Query execution is queued if necessary
28
Memory assigned is per distribution
Resource Classes CLASS SMALL MEDIUM LARGE X-LARGE Default X Memory 100MB Up to 3200MB Up to 6400MB Up to 12800MB Memory assigned is per distribution
29
OPTION (LABEL = 'QuantitySum');
Query Label SELECT sum(Quantity) FROM FactTransactionHistory OPTION (LABEL = 'QuantitySum');
30
DEMO: Querying Data
31
Questions?
32
Thanks!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.