Azure SQL Datawarehouse - Datawarehouse on Cloud

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Microsoft Ignite /16/2017 4:08 PM
Passage Three Introduction to Microsoft SQL Server 2000.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Windows Azure SQL Database and Storage Name Title Organization.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Martin Cairney Hybrid data platform – making the most of Azure plus your on- prem kit DAT341 B.
Information managers are seeking innovative DBMS’s which are able to handle large data volumes in new ways or to optimize existing products and processes.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Comprehensive Flexible Global Storage and Search Responsive Available Secure Manageable Federation Coordination Consolidation Transformation Synchronization.
With Temporal Tables and More
Connected Infrastructure
Data Platform and Analytics Foundational Training
Azure SQL Data Warehouse for Beginners
Advanced Topics for Azure SQL Data Warehouse
System Center Marketing
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Antonio Abalos Castillo
Why Is My SQL DW Query Slow?
The Model Architecture with SQL and Polybase
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Data Warehouse in the Cloud – Marketing or Reality?
The Client/Server Database Environment
Connected Infrastructure
Windows Azure Migrating SQL Server Workloads
Installation and database instance essentials
Database Performance Tuning and Query Optimization
A developers guide to Azure SQL Data Warehouse
Azure SQL Data Warehouse for SQL Server DBAS
Azure SQL Data Warehouse Scaling: Configuration and Guidance
Azure Automation and Logic Apps:
Migrating Your BI Platform To Azure
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
What is the Azure SQL Datawarehouse?
ColumnStore Index Primer
Azure SQL Data Warehouse Performance Tuning
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
Azure SQL Data Warehouse for SQL Server DBAS
A developers guide to Azure SQL Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
MPP – Maximize Parallel Productivity
20 Questions with Azure SQL Data Warehouse
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Azure SQL DWH: Tips and Tricks for developers
Ch 4. The Evolution of Analytic Scalability
TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Power BI for large databases
Azure SQL DWH: Optimization
Managing batch processing Transient Azure SQL Warehouse Resource
Sunil Agarwal | Principal Program Manager
Stretch Database - Historical data storage in SQL Server 2016
Context about the Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
Clustered Columnstore Indexes (SQL Server 2014)
Power BI with Analysis Services
Chapter 11 Database Performance Tuning and Query Optimization
Azure SQL DWH: Tips and Tricks for developers
Microsoft Analytics Platform System 03 – Distribution Theory & Design
Azure Data Storage Options
Moving your on-prem data warehouse to cloud. What are your options?
All about Indexes Gail Shaw.
Advanced Database Topics
Presentation transcript:

Azure SQL Datawarehouse - Datawarehouse on Cloud Luca Ferrari Azure SQL Datawarehouse - Datawarehouse on Cloud

Sponsor

Organizzatori GetLatestVersion.it

Agenda: Solutions On Prem vs Cloud Introduction to Azure DWH SMP vs MPP Architecture Tables design Queries Scaling Data Load

Data warehousing solutions SQL Azure SQL Data Warehouse Traditional SMP SQL Server MPP SQL Server PolyBase Hadoop Analytics Platform System (PDW - APS) SMP MPP MPP On-Premise Cloud

SMP vs MPP

SMP vs MPP MPP: a divide and conquer approach to solving large data problems by using parallel computing Data divided and distributed across many computing resources Each computing resource operates on its portion of the data in parallel

Data warehousing solutions – On Premise Pro Contro Fast ETLs loading data (On-Prem to On-Prem only) HW cost No external network needed Storage cost Flexible Backup/Restore policy Energy cost License cost Limited Scalability (Sql SMP) Maintenance No Big Data Support On the Box * * Only using: Microsoft APS AU2 SQL Server 2016

Data warehousing solutions - Cloud Pro Contro No HW cost Slow ETLs (on-prem to cloud) & viceversa No Storage cost No Backup on-demand No License cost Scalable (Up and Down) on your actual need Big Data support on the Box

Introducing Azure SQL DWH A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with proven SQL Server capabilities. Support your smallest to your largest data storage needs. From GB to PB

Architecture Control Node Endpoint for connections SQL DB Control Node Control Node Massively Parallel Processing (MPP) Engine SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node Endpoint for connections Regular SQL endpoint (TCP 1433) Persists no user data (metadata only) Coordinates compute activity using MPP Blob storage [WASB(S)] HDInsight

Architecture Compute Node(s) Azure SQL Database Control Node Massively Parallel Processing (MPP) Engine Compute Node(s) Azure SQL Database SQL DB SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node Blob storage [WASB(S)] An increase of DWU will increase the number of compute nodes HDInsight

Architecture GRS storage +PB’s of storage Control Node Blob storage [WASB(S)] Massively Parallel Processing (MPP) Engine SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node GRS storage +PB’s of storage Load data without incurring compute costs Blob storage [WASB(S)] HDInsight

DMS (Data Movement Service) executes across all database nodes Architecture Storage and Compute are de-coupled, enabling a true elastic service and separate charging for both compute and storage Application or User connection DMS (Data Movement Service) executes across all database nodes Control Node Data Loading (SSIS, REST, OLE, ADO, ODBC, WebHDFS, AZCopy, PS) DMS Massively Parallel Processing (MPP) Engine Compute Scale compute up or down when required (SLA <= 60 seconds). Pause, Restart, Stop, Start. DMS DMS DMS DMS SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node SQL DB Compute Node Azure Infrastructure and Storage Storage Add\Load data to WASB(S) without incurring compute costs Blob storage [WASB(S)] HDInsight

Architecture Data Movement Service Data Movement Service (DMS) moves data between the nodes. DMS gives the Compute nodes access to data they need for joins and aggregations. DMS is not an Azure service. It is a Windows service that runs alongside SQL Database on all the nodes. Since DMS runs behind the scenes, you won't interact with it directly. However, when you look at query plans, you will notice they include some DMS operations since data movement is necessary to run each query in parallel. Depends on Tables Design Table Statistics Queries

Demo - 1 Create Azure SQL DW

Data can be distributed across nodes or replicated Three table types Table Architecture Data can be distributed across nodes or replicated Three table types Hash Distributed Round-Robin Replicated

Hash Distributed Rows are distributed across multiple distributions based on a hash function applied to a column CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH([ProductKey]) );

Round-Robin Data is evenly (or as evenly as possible) distributed among all the distributions without a hash function CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN );

Replicated Data are replicated among all the distributions. Each nodes own the entire table’s dataset CREATE TABLE DimCustomer ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH DISTRIBUTION = REPLICATE, CLUSTERED INDEX (lastName) );

Table Architecture and query considerations Join type Left Table Right Table Compatibility All join types Replicated Compatible - no data movement required. Inner Join - Right Outer Join - Cross Join Distributed Compatible – no data movement required. Inner Join - Left Outer Join - Cross Join All join types, except cross joins, can be compatible. Compatible – no data movement required if the join predicate meets the following conditions: Predicate is an equality join. Predicate joins two distributed columns that have matching data types. For example, if table A is distributed on column a and table B is distributed on column b, and both a and b have matching data types, the following join is compatible: SELECT * FROM A JOIN B ON(A.a=B.b) SQL Server PDW analyzes the logic for some conjunctive predicates to see if they are compatible. For example, the following join is compatible: SELECT * FROM A JOIN B ON(A.a=B.b AND A.a = B.b+1) Cross Joins are always incompatible.

Table Architecture - Unsupported Features Primary key, Foreign keys, Unique and Check Table Constraints Unique Indexes Computed Columns Sparse Columns User-Defined Types Sequence Triggers Indexed Views Synonyms

Table Architecture - Unsupported Data ypes Data Type Workaround geometry varbinary geography hierarchyid nvarchar(4000) image text varchar ntext nvarchar sql_variant Split column into several strongly typed columns. table Convert to temporary tables. timestamp Rework code to use datetime2 and CURRENT_TIMESTAMP function. Only constants are supported as defaults, therefore current_timestamp cannot be defined as a default constraint. If you need to migrate row version values from a timestamp typed column then use BINARY(8) or VARBINARY(8) for NOT NULL or NULL row version values. xml user defined types convert back to their native types where possible default values default values support literals and constants only. Non-deterministic expressions or functions, such as GETDATE() or CURRENT_TIMESTAMP, are not supported.

Table Store Types Azure SQL DataWarehouse support both: Row store Traditional B-Tree Clustered Non Clustered Heap Columnstore Only Clustered Columnstore Indexes Data compression:

Table Store Types Row group Segments C1 C2 C3 C5 C6 C4 Columnstore

Table Store Types Columnstore Provides Dramatic Performance Updateable and clustered columnstore index (CCI) Stores data in columnar format Memory-optimized for next-generation performance Updateable to support bulk and/or trickle loading Up to 100x faster Up to 15x compression Save time and costs

Table Partitioning Full partitions support: Merge Split Switch Do not over-partition your data !!!

Table Statistics The more SQL Data Warehouse knows about your data, the faster it can execute queries against your data. The way that you tell SQL Data Warehouse about your data, is by collecting statistics about your data. Statistics are not created automatically on Control Node we have to create them ourselves Statistics are not updated automatically on Control Node We have to maintain them ourselves

Demo - 2 Distributed Hash Round-Robin CTL/CMP Table Mapping Data Skew

Queries Query execution: All queries point the Control Node (Azure Sql Database) CTL create the MPP Plan It depends on Table design Statistics Joins MPP plan is executed by CMP Nodes Results are sent to the client

Queries Almost all T-SQL commands could be used with Azure SQL DWH DDL DML Security Monitoring

Queries - Monitoring Monitoring Azure Sql DWH using DMVs System Views Connection / Sessions / Requests Cmp configuration ... System Views Tables Tables Space Allocation

Monitoring - Tools DWInsight – History Monitoring tool

Queries Troubleshoouting Label your queries !!! SELECT * FROM sys.tables OPTION (LABEL = 'My Query Label')

Demo - 3 MPP Plan Monitoring Azure DWH

Data Warehouse Unit (DWU) DWUs are a measure of underlying resources like CPU, memory, IOPS, which are allocated to your SQL Data Warehouse. Increasing the number of DWUs increases resources and performance

Data Warehouse Unit (DWU) How Fast do you wanna go ? Difficult for a customer choose which HW to go with what the implications will be for performance Customer can grow  compute and storage as needed independently of each other

Data Warehouse Unit (DWU) Select small number of DWUs Monitor your application performance Determine how much faster or slower performance should be for you Increase or decrease the number of DWU Continue making adjustments until you reach an optimum performance level for your business requirements

Data Warehouse Unit (DWU) Workload Management DW 100 200 300 400 500 600 1000 1200 1500 2000 3000 6000 Engine Nodes 1 WorkerNodes 2 3 4 5 6 10 12 15 20 30 60 Concurrency Slots 8 16 24 32

Scaling Increase or decrease DWUs on your need By Azure Portal, T-SQL, Powershell, Rest API PS Command: Set-AzureRmSqlDatabase -DatabaseName "MySQLDW" -ServerName "MyServer" -RequestedServiceObjectiveName "DW1000« T-SQL Command: ALTER DATABASE MyDWHName MODIFY (SERVICE_OBJECTIVE = 'DWxxxxx');

Cost saving You can pause your DWH when you don’t need it By Azure Portal, Powershell, Rest API PS Command: Suspend-AzureRmSqlDatabase –ResourceGroupName "ResourceGroup1" –ServerName "Server01" –DatabaseName "Database02"

Pause/Resume Scaling ... And my queries ??? Demo - 4 Azure Portal T-SQL ... And my queries ???

Data Load Azure to Azure On Prem To Azure All data resides on Azure Fast and Simple On Prem To Azure Data needs to be send to Azure over internet Many options but slower than Azure to Azure

Data Load On Prem to Azure Azure to Azure Blob storage PolyBase to load data from Azure blob storage T-SQL Azure Data Factory SSIS Integration Services AzCopy ( < 10TB) Load to Azure Blob storage Bcp From Sql to Flat File From flat File to Azure SQL DWH Export Data to Disk ( > 10TB) Send Disk to the Data center by FedEx, DHL, UPS External Network is a potentialbottleneck

Data Load - (Furgone as a Service)

Questions ?

Resources https://azure.microsoft.com/it-it/services/sql-data-warehouse/ https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-load https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-manage https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-best-practices https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-manage-monitor https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-reference-tsql-statements https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-overview https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-label https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-data-types