Download presentation
Presentation is loading. Please wait.
1
Azure SQL Datawarehouse - Datawarehouse on Cloud
Luca Ferrari Azure SQL Datawarehouse - Datawarehouse on Cloud
2
Sponsor
3
Organizzatori GetLatestVersion.it
4
Agenda: Solutions On Prem vs Cloud Introduction to Azure DWH SMP vs MPP Architecture Tables design Queries Scaling Data Load
5
Data warehousing solutions
SQL Azure SQL Data Warehouse Traditional SMP SQL Server MPP SQL Server PolyBase Hadoop Analytics Platform System (PDW - APS) SMP MPP MPP On-Premise Cloud
6
SMP vs MPP
7
SMP vs MPP MPP: a divide and conquer approach to solving large data problems by using parallel computing Data divided and distributed across many computing resources Each computing resource operates on its portion of the data in parallel
8
Data warehousing solutions – On Premise
Pro Contro Fast ETLs loading data (On-Prem to On-Prem only) HW cost No external network needed Storage cost Flexible Backup/Restore policy Energy cost License cost Limited Scalability (Sql SMP) Maintenance No Big Data Support On the Box * * Only using: Microsoft APS AU2 SQL Server 2016
9
Data warehousing solutions - Cloud
Pro Contro No HW cost Slow ETLs (on-prem to cloud) & viceversa No Storage cost No Backup on-demand No License cost Scalable (Up and Down) on your actual need Big Data support on the Box
10
Introducing Azure SQL DWH
A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with proven SQL Server capabilities. Support your smallest to your largest data storage needs. From GB to PB
11
Architecture Control Node Endpoint for connections
SQL DB Control Node Control Node Massively Parallel Processing (MPP) Engine SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node Endpoint for connections Regular SQL endpoint (TCP 1433) Persists no user data (metadata only) Coordinates compute activity using MPP Blob storage [WASB(S)] HDInsight
12
Architecture Compute Node(s) Azure SQL Database
Control Node Massively Parallel Processing (MPP) Engine Compute Node(s) Azure SQL Database SQL DB SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node Blob storage [WASB(S)] An increase of DWU will increase the number of compute nodes HDInsight
13
Architecture GRS storage +PB’s of storage
Control Node Blob storage [WASB(S)] Massively Parallel Processing (MPP) Engine SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node GRS storage +PB’s of storage Load data without incurring compute costs Blob storage [WASB(S)] HDInsight
14
DMS (Data Movement Service) executes across all database nodes
Architecture Storage and Compute are de-coupled, enabling a true elastic service and separate charging for both compute and storage Application or User connection DMS (Data Movement Service) executes across all database nodes Control Node Data Loading (SSIS, REST, OLE, ADO, ODBC, WebHDFS, AZCopy, PS) DMS Massively Parallel Processing (MPP) Engine Compute Scale compute up or down when required (SLA <= 60 seconds). Pause, Restart, Stop, Start. DMS DMS DMS DMS SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node Azure Infrastructure and Storage Storage Add\Load data to WASB(S) without incurring compute costs Blob storage [WASB(S)] HDInsight
15
Architecture Data Movement Service
Data Movement Service (DMS) moves data between the nodes. DMS gives the Compute nodes access to data they need for joins and aggregations. DMS is not an Azure service. It is a Windows service that runs alongside SQL Database on all the nodes. Since DMS runs behind the scenes, you won't interact with it directly. However, when you look at query plans, you will notice they include some DMS operations since data movement is necessary to run each query in parallel. Depends on Tables Design Table Statistics Queries
16
Demo - 1 Create Azure SQL DW
17
Data can be distributed across nodes or replicated Three table types
Table Architecture Data can be distributed across nodes or replicated Three table types Hash Distributed Round-Robin Replicated
18
Hash Distributed Rows are distributed across multiple distributions based on a hash function applied to a column CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH([ProductKey]) );
19
Round-Robin Data is evenly (or as evenly as possible) distributed among all the distributions without a hash function CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN );
20
Replicated Data are replicated among all the distributions. Each nodes own the entire table’s dataset CREATE TABLE DimCustomer ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH DISTRIBUTION = REPLICATE, CLUSTERED INDEX (lastName) );
21
Table Architecture and query considerations
Join type Left Table Right Table Compatibility All join types Replicated Compatible - no data movement required. Inner Join - Right Outer Join - Cross Join Distributed Compatible – no data movement required. Inner Join - Left Outer Join - Cross Join All join types, except cross joins, can be compatible. Compatible – no data movement required if the join predicate meets the following conditions: Predicate is an equality join. Predicate joins two distributed columns that have matching data types. For example, if table A is distributed on column a and table B is distributed on column b, and both a and b have matching data types, the following join is compatible: SELECT * FROM A JOIN B ON(A.a=B.b) SQL Server PDW analyzes the logic for some conjunctive predicates to see if they are compatible. For example, the following join is compatible: SELECT * FROM A JOIN B ON(A.a=B.b AND A.a = B.b+1) Cross Joins are always incompatible.
22
Table Architecture - Unsupported Features
Primary key, Foreign keys, Unique and Check Table Constraints Unique Indexes Computed Columns Sparse Columns User-Defined Types Sequence Triggers Indexed Views Synonyms
23
Table Architecture - Unsupported Data ypes
Data Type Workaround geometry varbinary geography hierarchyid nvarchar(4000) image text varchar ntext nvarchar sql_variant Split column into several strongly typed columns. table Convert to temporary tables. timestamp Rework code to use datetime2 and CURRENT_TIMESTAMP function. Only constants are supported as defaults, therefore current_timestamp cannot be defined as a default constraint. If you need to migrate row version values from a timestamp typed column then use BINARY(8) or VARBINARY(8) for NOT NULL or NULL row version values. xml user defined types convert back to their native types where possible default values default values support literals and constants only. Non-deterministic expressions or functions, such as GETDATE() or CURRENT_TIMESTAMP, are not supported.
24
Table Store Types Azure SQL DataWarehouse support both: Row store
Traditional B-Tree Clustered Non Clustered Heap Columnstore Only Clustered Columnstore Indexes Data compression:
25
Table Store Types Row group Segments C1 C2 C3 C5 C6 C4 Columnstore
26
Table Store Types Columnstore Provides Dramatic Performance
Updateable and clustered columnstore index (CCI) Stores data in columnar format Memory-optimized for next-generation performance Updateable to support bulk and/or trickle loading Up to 100x faster Up to 15x compression Save time and costs
27
Table Partitioning Full partitions support: Merge Split Switch
Do not over-partition your data !!!
28
Table Statistics The more SQL Data Warehouse knows about your data, the faster it can execute queries against your data. The way that you tell SQL Data Warehouse about your data, is by collecting statistics about your data. Statistics are not created automatically on Control Node we have to create them ourselves Statistics are not updated automatically on Control Node We have to maintain them ourselves
29
Demo - 2 Distributed Hash Round-Robin CTL/CMP Table Mapping Data Skew
30
Queries Query execution:
All queries point the Control Node (Azure Sql Database) CTL create the MPP Plan It depends on Table design Statistics Joins MPP plan is executed by CMP Nodes Results are sent to the client
31
Queries Almost all T-SQL commands could be used with Azure SQL DWH DDL
DML Security Monitoring
32
Queries - Monitoring Monitoring Azure Sql DWH using DMVs System Views
Connection / Sessions / Requests Cmp configuration ... System Views Tables Tables Space Allocation
33
Monitoring - Tools DWInsight – History Monitoring tool
34
Queries Troubleshoouting Label your queries !!!
SELECT * FROM sys.tables OPTION (LABEL = 'My Query Label')
35
Demo - 3 MPP Plan Monitoring Azure DWH
36
Data Warehouse Unit (DWU)
DWUs are a measure of underlying resources like CPU, memory, IOPS, which are allocated to your SQL Data Warehouse. Increasing the number of DWUs increases resources and performance
37
Data Warehouse Unit (DWU)
How Fast do you wanna go ? Difficult for a customer choose which HW to go with what the implications will be for performance Customer can grow compute and storage as needed independently of each other
38
Data Warehouse Unit (DWU)
Select small number of DWUs Monitor your application performance Determine how much faster or slower performance should be for you Increase or decrease the number of DWU Continue making adjustments until you reach an optimum performance level for your business requirements
39
Data Warehouse Unit (DWU)
Workload Management DW 100 200 300 400 500 600 1000 1200 1500 2000 3000 6000 Engine Nodes 1 WorkerNodes 2 3 4 5 6 10 12 15 20 30 60 Concurrency Slots 8 16 24 32
40
Scaling Increase or decrease DWUs on your need By Azure Portal, T-SQL, Powershell, Rest API PS Command: Set-AzureRmSqlDatabase -DatabaseName "MySQLDW" -ServerName "MyServer" -RequestedServiceObjectiveName "DW1000« T-SQL Command: ALTER DATABASE MyDWHName MODIFY (SERVICE_OBJECTIVE = 'DWxxxxx');
41
Cost saving You can pause your DWH when you don’t need it By Azure Portal, Powershell, Rest API PS Command: Suspend-AzureRmSqlDatabase –ResourceGroupName "ResourceGroup1" –ServerName "Server01" –DatabaseName "Database02"
42
Pause/Resume Scaling ... And my queries ??? Demo - 4 Azure Portal
T-SQL ... And my queries ???
43
Data Load Azure to Azure On Prem To Azure All data resides on Azure
Fast and Simple On Prem To Azure Data needs to be send to Azure over internet Many options but slower than Azure to Azure
44
Data Load On Prem to Azure Azure to Azure Blob storage
PolyBase to load data from Azure blob storage T-SQL Azure Data Factory SSIS Integration Services AzCopy ( < 10TB) Load to Azure Blob storage Bcp From Sql to Flat File From flat File to Azure SQL DWH Export Data to Disk ( > 10TB) Send Disk to the Data center by FedEx, DHL, UPS External Network is a potentialbottleneck
45
Data Load - (Furgone as a Service)
46
Questions ?
47
Resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.