MPP – Maximize Parallel Productivity

Slides:



Advertisements
Similar presentations
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Advertisements

High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
1 Chapter Overview Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with.
Copying, Managing, and Transforming Data With DTS.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Overview of SQL Server Alka Arora.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
Andy Roberts Data Architect
SQL Server Evolution New innovations Jen Underwood Sr. Program Manager of Business Intelligence & Analytics Microsoft George Walters Sr. Technical Solutions.
Modern Data Warehousing Symmetric Multi-Processing SQL (SMP) vs Massive Parallel Processing SQL (MPP) Alain Dormehl P-Cubed Session Level : Intermediary.
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
Introduction to Partitioning in SQL Server
SQL Server Statistics and its relationship with Query Optimizer
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Azure SQL Data Warehouse for Beginners
System Center Marketing
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Antonio Abalos Castillo
Why Is My SQL DW Query Slow?
SQL Server 2000 and Access 2000 limits
The Model Architecture with SQL and Polybase
Incrementally Moving to the Cloud Using Biml
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Informix Red Brick Warehouse 5.1
Data Warehouse in the Cloud – Marketing or Reality?
Maximize the performance and scale of Microsoft Dynamics AX
Implementing an Azure SQL Data Warehouse
Installation and database instance essentials
IBM DATASTAGE online Training at GoLogica
Presented by: Warren Sifre
Azure SQL Datawarehouse - Datawarehouse on Cloud
Machine Learning, Analytics, & Data Science Conference
A developers guide to Azure SQL Data Warehouse
Azure SQL Data Warehouse for SQL Server DBAS
Blazing-Fast Performance:
Analytics for Apps: Landing and Loading Data into SQL Data Warehouse
What is the Azure SQL Datawarehouse?
Populating a Data Warehouse
Populating a Data Warehouse
Populating a Data Warehouse
Azure SQL Data Warehouse Performance Tuning
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
Azure SQL Data Warehouse for SQL Server DBAS
A developers guide to Azure SQL Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
20 Questions with Azure SQL Data Warehouse
Populating a Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Populating a Data Warehouse
Populating a Data Warehouse
Azure SQL DWH: Optimization
Managing batch processing Transient Azure SQL Warehouse Resource
Azure SQL DWH: Tips and Tricks for developers
Power BI with Analysis Services
Staging Data for Azure SQL Services
Azure SQL DWH: Tips and Tricks for developers
Data Warehousing Concepts
SQL Server 2019: What’s new? Eugene Meidinger
Understanding Core Database Concepts
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Moving your on-prem data warehouse to cloud. What are your options?
Sql Server 2019: what’s new?.
Best Practices in Higher Education Student Data Warehousing Forum
Architecture of modern data warehouse
Presentation transcript:

MPP – Maximize Parallel Productivity Getting the most for your efforts and money in SQL DW

Agenda MPP Architecture Data Warehousing Units (DWUs) Cost: Storage and Compute Creating an Instance Distribution Keys, Indexes, Partitions, and Statistics Loading Data DTSQL, the Language of SQL DW AGENDA: Gregg – Opening Remarks (15 mins) Jim & Robert – Sales Outlook & Solution Portfolio (20 mins) Louise - 2016 Goal Recap (10 mins) Louise & teams – Project Spotlights (15 mins – 3-4 mins each Zimmer, Nautilus, ATI Support) Brian A – Agile Overview & Table Activity (15 mins) Awards & Recognition (15 mins) ----End Meeting---- After Activity – Client Networking Activity

About Me Live in Indianapolis, Indiana, USA Data Warehousing / Analytics Consultant at DMI Was a software developer for 8 years Been in analytics for 10 years MCSE: Data Management and Analytics Also hold Hortonworks and SAP BI Certifications AGENDA: Gregg – Opening Remarks (15 mins) Jim & Robert – Sales Outlook & Solution Portfolio (20 mins) Louise - 2016 Goal Recap (10 mins) Louise & teams – Project Spotlights (15 mins – 3-4 mins each Zimmer, Nautilus, ATI Support) Brian A – Agile Overview & Table Activity (15 mins) Awards & Recognition (15 mins) ----End Meeting---- After Activity – Client Networking Activity

Architecture SQL DB - SMP Shared

Architecture SQL DW - MPP Shared? Nothing

Architecture SQL DW – MPP (a little more detail)

Architecture 100 DWU

Architecture 200 DWU

Architecture 500 DWU

Cost FREE €8.72/ DWU/mo. COMPUTE Assignment: Jimmy STORAGE DATA TRANSFER FREE €8.72/ DWU/mo. Assignment: Jimmy

CREATE AND CONNECT Starting the Process

What You Need Azure Account Spending Limit Azure SQL Database Azure VM (optional) Software

What You Need Azure Account Spending Limit Azure SQL Database Azure VM (optional) Software MSDN / VS or Trial

What You Need Azure Account Spending Limit Azure SQL Database Azure VM (optional) Software Use existing: ---or--- Create later

What You Need Azure Account Spending Limit Azure SQL Database Azure VM (optional) Software

What You Need Azure Account Spending Limit Azure SQL Database Azure VM (optional) Software SQL Server Management Studio Visual Studio SQL Server Data Tools Azure Storage Explorer Azure Feature Pack for SSIS

Create The Instance Provision the SQL DW

Create The Instance The Blade

Create The Instance Set Firewall Rules Connecting from outside of your Azure resource group requires firewall rules

CONNECT SSMS: 16.0-17.2+

CONNECT Visual Studio (2017 has code completion!)

Distribution, Indexes, Partitions, Statistics Storage concepts Distribution, Indexes, Partitions, Statistics

Distribution Round Robin Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Cust # Cust # Cust #

Distribution Round Robin A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up A A B C Cust # 24 Cust # Cust #

Distribution Round Robin A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up B A B C Cust # 24 Cust # 72 Cust #

Distribution Round Robin A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up C A B C Cust # 24 Cust # 72 Cust # 119

Distribution Round Robin A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up A A B C Cust # 24 240 Cust # 72 Cust # 119

Distribution Round Robin A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up B A B C Cust # 24 240 Cust # 72 278 Cust # 119

Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed A B C Cust # Cust # Cust #

Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 24 B A B C Cust # Cust # 24 Cust #

Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 72 C A B C Cust # Cust # 24 Cust # 72

Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 119 B A B C Cust # Cust # 24 119 Cust # 72

Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 240 A A B C Cust # 240 Cust # 24 119 Cust # 72

Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 278 C A B C Cust # 240 Cust # 24 119 Cust # 72 278

INDEXES Much like SQL Server, but indexed within a distribution Default: Clustered Columnstore Cust # Order # Order Date 24 1095 3/1/2016 2210 8/15/2016 2901 11/14/2016 119 1140 3/16/2016 3319 12/10/2016

INDEXES Available: B-tree column indexes MSFT: “Use them for high cardinality columns that are used as filters in queries returning a small number of rows.”

Partitions Also much like SQL Server Partitions exist within a distribution, but must be consistent across distributions. Partition switching particularly effective given “CTAS” nature of ELT Optimal: Make sure partitions will have >1M rows A B C Order Date [2017] [2016] Order Date [2017] [2016] Order Date [2017] [2016]

Statistics SQL Server: Statistics kept on tables SQL DW: Statistics are kept on individual columns Tells the control node what column value distributions look like across nodes “How can I move the least amount of data?” Index columns used to filter or join

Getting lots of data in efficiently Loading Data Getting lots of data in efficiently

Loading Methods

PolyBase Preferred Method Scales with DWUs as each compute node is PolyBase capable

PolyBase SETUP PROCESS Copy data into Blob storage (storage explorer or AZCOPY) Create: Scoped credential External data source External file format External table

PolyBase LOAD PROCESS Initial load: CTAS (CREATE TABLE AS SELECT) CREATE TABLE MyTable        WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH(MyDistColumn),             PARTITION (DateColumn RANGE RIGHT FOR VALUES (‘2010-01-01’,’2011-01-01’. . .)))         as select * from ExternalTable; Incremental load: INSERT INTO Try to keep these in smaller batches Loads are automatically parallelized 

BCP LOAD PROCESS ASCII or UTF-16 only Run from machine with source files (or direct access to them) bcp <table name> in <file> –S <server> –d <database> –U <user> –P <password> -t <‘delimiter’>

AZURE SQL DW UPLOAD TASK SSIS LEGACY METHOD AZURE SQL DW UPLOAD TASK Use SQL Server destination Change connection target < 10K rows per second Part of SSIS Azure Feature Pack UTF-8 text files only Assignment: Jimmy

Diminished Distributed Transact SQL DTSQL Diminished Distributed Transact SQL

ELT with CTAS CREATE TABLE AS SELECT All-or-nothing Minimal logging Preferred Method of ELT (Extract, Load, Transform)

NOT SUPPORTED Many Functions TRY_CAST(), TRY_CONVERT() Use ISNUMERIC() or ISDATE() before CAST/CONVERT Not perfect FORMAT TRIM XML/JSON functions Security: Row-level security Dynamic data masking

NOT SUPPORTED Miscellanea MERGE statement Global temporary tables (##) Cursors Geometric / geospatial data R Services Pausing / scaling immediately kills all running operations

Benchmarks SQL DW is so fast…

HOW FAST IS IT? Test data set

HOW FAST IS IT? Loading Data (Polybase)

HOW FAST IS IT? Star Query

QUESTIONS?