Download presentation
Presentation is loading. Please wait.
1
MPP – Maximize Parallel Productivity
Getting the most for your efforts and money in SQL DW
2
Agenda MPP Architecture Data Warehousing Units (DWUs)
Cost: Storage and Compute Creating an Instance Distribution Keys, Indexes, Partitions, and Statistics Loading Data DTSQL, the Language of SQL DW AGENDA: Gregg – Opening Remarks (15 mins) Jim & Robert – Sales Outlook & Solution Portfolio (20 mins) Louise Goal Recap (10 mins) Louise & teams – Project Spotlights (15 mins – 3-4 mins each Zimmer, Nautilus, ATI Support) Brian A – Agile Overview & Table Activity (15 mins) Awards & Recognition (15 mins) ----End Meeting---- After Activity – Client Networking Activity
3
About Me Live in Indianapolis, Indiana, USA
Data Warehousing / Analytics Consultant at DMI Was a software developer for 8 years Been in analytics for 10 years MCSE: Data Management and Analytics Also hold Hortonworks and SAP BI Certifications AGENDA: Gregg – Opening Remarks (15 mins) Jim & Robert – Sales Outlook & Solution Portfolio (20 mins) Louise Goal Recap (10 mins) Louise & teams – Project Spotlights (15 mins – 3-4 mins each Zimmer, Nautilus, ATI Support) Brian A – Agile Overview & Table Activity (15 mins) Awards & Recognition (15 mins) ----End Meeting---- After Activity – Client Networking Activity
4
Architecture SQL DB - SMP Shared
5
Architecture SQL DW - MPP Shared? Nothing
6
Architecture SQL DW – MPP (a little more detail)
7
Architecture 100 DWU
8
Architecture 200 DWU
9
Architecture 500 DWU
10
Cost FREE €8.72/ DWU/mo. COMPUTE Assignment: Jimmy STORAGE
DATA TRANSFER FREE €8.72/ DWU/mo. Assignment: Jimmy
11
CREATE AND CONNECT Starting the Process
12
What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software
13
What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software MSDN / VS or Trial
14
What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software Use existing: ---or--- Create later
15
What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software
16
What You Need Azure Account Spending Limit Azure SQL Database
Azure VM (optional) Software SQL Server Management Studio Visual Studio SQL Server Data Tools Azure Storage Explorer Azure Feature Pack for SSIS
17
Create The Instance Provision the SQL DW
18
Create The Instance The Blade
19
Create The Instance Set Firewall Rules
Connecting from outside of your Azure resource group requires firewall rules
20
CONNECT SSMS:
21
CONNECT Visual Studio (2017 has code completion!)
22
Distribution, Indexes, Partitions, Statistics
Storage concepts Distribution, Indexes, Partitions, Statistics
23
Distribution Round Robin Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Cust # Cust # Cust #
24
Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up A A B C Cust # 24 Cust # Cust #
25
Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up B A B C Cust # 24 Cust # 72 Cust #
26
Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up C A B C Cust # 24 Cust # 72 Cust # 119
27
Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up A A B C Cust # 24 240 Cust # 72 Cust # 119
28
Distribution Round Robin A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Round Robin Next Up B A B C Cust # 24 240 Cust # 72 278 Cust # 119
29
Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed A B C Cust # Cust # Cust #
30
Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 24 B A B C Cust # Cust # 24 Cust #
31
Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 72 C A B C Cust # Cust # 24 Cust # 72
32
Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 119 B A B C Cust # Cust # 24 119 Cust # 72
33
Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 240 A A B C Cust # 240 Cust # 24 119 Cust # 72
34
Distribution Hash Distributed A B C Cust # Name City 24 Britton Gray
McCordsville, IN 72 Pamela Stephens Duluth, GA 119 Louis Wright Gloucester, MA 240 Amy Crosby Renton, WA 278 Max Cook Belton, TX Hash Distributed Cust # Hash Result 278 C A B C Cust # 240 Cust # 24 119 Cust # 72 278
35
INDEXES Much like SQL Server, but indexed within a distribution
Default: Clustered Columnstore Cust # Order # Order Date 24 1095 3/1/2016 2210 8/15/2016 2901 11/14/2016 119 1140 3/16/2016 3319 12/10/2016
36
INDEXES Available: B-tree column indexes
MSFT: “Use them for high cardinality columns that are used as filters in queries returning a small number of rows.”
37
Partitions Also much like SQL Server
Partitions exist within a distribution, but must be consistent across distributions. Partition switching particularly effective given “CTAS” nature of ELT Optimal: Make sure partitions will have >1M rows A B C Order Date [2017] [2016] Order Date [2017] [2016] Order Date [2017] [2016]
38
Statistics SQL Server: Statistics kept on tables
SQL DW: Statistics are kept on individual columns Tells the control node what column value distributions look like across nodes “How can I move the least amount of data?” Index columns used to filter or join
39
Getting lots of data in efficiently
Loading Data Getting lots of data in efficiently
40
Loading Methods
41
PolyBase Preferred Method
Scales with DWUs as each compute node is PolyBase capable
42
PolyBase SETUP PROCESS
Copy data into Blob storage (storage explorer or AZCOPY) Create: Scoped credential External data source External file format External table
43
PolyBase LOAD PROCESS Initial load: CTAS (CREATE TABLE AS SELECT)
CREATE TABLE MyTable WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH(MyDistColumn), PARTITION (DateColumn RANGE RIGHT FOR VALUES (‘ ’,’ ’. . .))) as select * from ExternalTable; Incremental load: INSERT INTO Try to keep these in smaller batches Loads are automatically parallelized
44
BCP LOAD PROCESS ASCII or UTF-16 only
Run from machine with source files (or direct access to them) bcp <table name> in <file> –S <server> –d <database> –U <user> –P <password> -t <‘delimiter’>
45
AZURE SQL DW UPLOAD TASK
SSIS LEGACY METHOD AZURE SQL DW UPLOAD TASK Use SQL Server destination Change connection target < 10K rows per second Part of SSIS Azure Feature Pack UTF-8 text files only Assignment: Jimmy
46
Diminished Distributed Transact SQL
DTSQL Diminished Distributed Transact SQL
47
ELT with CTAS CREATE TABLE AS SELECT All-or-nothing Minimal logging
Preferred Method of ELT (Extract, Load, Transform)
48
NOT SUPPORTED Many Functions TRY_CAST(), TRY_CONVERT()
Use ISNUMERIC() or ISDATE() before CAST/CONVERT Not perfect FORMAT TRIM XML/JSON functions Security: Row-level security Dynamic data masking
49
NOT SUPPORTED Miscellanea MERGE statement Global temporary tables (##)
Cursors Geometric / geospatial data R Services Pausing / scaling immediately kills all running operations
50
Benchmarks SQL DW is so fast…
51
HOW FAST IS IT? Test data set
52
HOW FAST IS IT? Loading Data (Polybase)
53
HOW FAST IS IT? Star Query
54
QUESTIONS?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.