Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPP – Maximize Parallel Productivity

Similar presentations


Presentation on theme: "MPP – Maximize Parallel Productivity"— Presentation transcript:

1 MPP – Maximize Parallel Productivity
Getting the most for your efforts and money in SQL DW

2 Agenda MPP Architecture Data Warehousing Units (DWUs)
Cost: Storage and Compute Creating an Instance Distribution Keys, Indexes, Partitions, and Statistics Loading Data DTSQL, the Language of SQL DW AGENDA: Gregg – Opening Remarks (15 mins) Jim & Robert – Sales Outlook & Solution Portfolio (20 mins) Louise Goal Recap (10 mins) Louise & teams – Project Spotlights (15 mins – 3-4 mins each Zimmer, Nautilus, ATI Support) Brian A – Agile Overview & Table Activity (15 mins) Awards & Recognition (15 mins) ----End Meeting---- After Activity – Client Networking Activity

3 About Me Live in Indianapolis, Indiana, USA
Data Warehousing / Analytics Consultant at DMI Was a software developer for 8 years Been in analytics for 10 years MCSE: Data Management and Analytics Also hold Hortonworks and SAP BI Certifications AGENDA: Gregg – Opening Remarks (15 mins) Jim & Robert – Sales Outlook & Solution Portfolio (20 mins) Louise Goal Recap (10 mins) Louise & teams – Project Spotlights (15 mins – 3-4 mins each Zimmer, Nautilus, ATI Support) Brian A – Agile Overview & Table Activity (15 mins) Awards & Recognition (15 mins) ----End Meeting---- After Activity – Client Networking Activity

4 Architecture SQL DB - SMP Shared

5 Architecture SQL DW - MPP Shared? Nothing

6 Architecture SQL DW – MPP (a little more detail)

7 Architecture 100 DWU

8 Architecture 200 DWU

9 Architecture 500 DWU

10 Cost FREE €8.72/ DWU/mo. COMPUTE Assignment: Jimmy STORAGE
DATA TRANSFER FREE €8.72/ DWU/mo. Assignment: Jimmy

11 CREATE AND CONNECT Starting the Process

12 What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software

13 What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software MSDN / VS or Trial

14 What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software Use existing: ---or--- Create later

15 What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software

16 What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software SQL Server Management Studio Visual Studio SQL Server Data Tools Azure Storage Explorer Azure Feature Pack for SSIS

17 Create The Instance Provision the SQL DW

18 Create The Instance The Blade

19 Create The Instance Set Firewall Rules
Connecting from outside of your Azure resource group requires firewall rules

20 CONNECT SSMS:

21 CONNECT Visual Studio (2017 has code completion!)

22 Distribution, Indexes, Partitions, Statistics
Storage concepts Distribution, Indexes, Partitions, Statistics

23 Distribution Round Robin Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Cust # Cust # Cust #

24 Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up A A B C Cust # 24 Cust # Cust #

25 Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up B A B C Cust # 24 Cust # 72 Cust #

26 Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up C A B C Cust # 24 Cust # 72 Cust # 119

27 Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up A A B C Cust # 24 240 Cust # 72 Cust # 119

28 Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up B A B C Cust # 24 240 Cust # 72 278 Cust # 119

29 Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed A B C Cust # Cust # Cust #

30 Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 24 B A B C Cust # Cust # 24 Cust #

31 Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 72 C A B C Cust # Cust # 24 Cust # 72

32 Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 119 B A B C Cust # Cust # 24 119 Cust # 72

33 Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 240 A A B C Cust # 240 Cust # 24 119 Cust # 72

34 Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 278 C A B C Cust # 240 Cust # 24 119 Cust # 72 278

35 INDEXES Much like SQL Server, but indexed within a distribution
Default: Clustered Columnstore Cust # Order # Order Date 24 1095 3/1/2016 2210 8/15/2016 2901 11/14/2016 119 1140 3/16/2016 3319 12/10/2016

36 INDEXES Available: B-tree column indexes
MSFT: “Use them for high cardinality columns that are used as filters in queries returning a small number of rows.”

37 Partitions Also much like SQL Server
Partitions exist within a distribution, but must be consistent across distributions. Partition switching particularly effective given “CTAS” nature of ELT Optimal: Make sure partitions will have >1M rows A B C Order Date [2017] [2016] Order Date [2017] [2016] Order Date [2017] [2016]

38 Statistics SQL Server: Statistics kept on tables
SQL DW: Statistics are kept on individual columns Tells the control node what column value distributions look like across nodes “How can I move the least amount of data?” Index columns used to filter or join

39 Getting lots of data in efficiently
Loading Data Getting lots of data in efficiently

40 Loading Methods

41 PolyBase Preferred Method
Scales with DWUs as each compute node is PolyBase capable

42 PolyBase SETUP PROCESS
Copy data into Blob storage (storage explorer or AZCOPY) Create: Scoped credential External data source External file format External table

43 PolyBase LOAD PROCESS Initial load: CTAS (CREATE TABLE AS SELECT)
CREATE TABLE MyTable        WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH(MyDistColumn),             PARTITION (DateColumn RANGE RIGHT FOR VALUES (‘ ’,’ ’. . .)))         as select * from ExternalTable; Incremental load: INSERT INTO Try to keep these in smaller batches Loads are automatically parallelized 

44 BCP LOAD PROCESS ASCII or UTF-16 only
Run from machine with source files (or direct access to them) bcp <table name> in <file> –S <server> –d <database> –U <user> –P <password> -t <‘delimiter’>

45 AZURE SQL DW UPLOAD TASK
SSIS LEGACY METHOD AZURE SQL DW UPLOAD TASK Use SQL Server destination Change connection target < 10K rows per second Part of SSIS Azure Feature Pack UTF-8 text files only Assignment: Jimmy

46 Diminished Distributed Transact SQL
DTSQL Diminished Distributed Transact SQL

47 ELT with CTAS CREATE TABLE AS SELECT All-or-nothing Minimal logging
Preferred Method of ELT (Extract, Load, Transform)

48 NOT SUPPORTED Many Functions TRY_CAST(), TRY_CONVERT()
Use ISNUMERIC() or ISDATE() before CAST/CONVERT Not perfect FORMAT TRIM XML/JSON functions Security: Row-level security Dynamic data masking

49 NOT SUPPORTED Miscellanea MERGE statement Global temporary tables (##)
Cursors Geometric / geospatial data R Services Pausing / scaling immediately kills all running operations

50 Benchmarks SQL DW is so fast…

51 HOW FAST IS IT? Test data set

52 HOW FAST IS IT? Loading Data (Polybase)

53 HOW FAST IS IT? Star Query

54 QUESTIONS?


Download ppt "MPP – Maximize Parallel Productivity"

Similar presentations


Ads by Google