Microsoft Analytics Platform System 04 – APS Data Loading

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Performance and Scalability. Optimizing PerformanceScaling UpScaling Out.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Matt Masson| Senior Program Manager
BARBARIN DAVID SQL Server Senior Consultant Pragmantic SA SQL Server Denali : New administration features.
Dual Partitioning for improved performance in VLDBs Ashwin Rao Karavadi, Rakesh Parida Microsoft IT.
Windows Azure Migrating SQL Server Workloads Speaker Title Organization.
Course Topics Administering SQL Server 2012 Jump Start 01 | Install and Configure SQL Server04 | Manage Data 02 | Maintain Instances and Databases05 |
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

customer.
06 | Modifying Data in SQL Server Brian Alderman | MCT, CEO / Founder of MicroTechPoint Tobias Ternstrom | Microsoft SQL Server Program Manager.
Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
Connect with life Vinod Kumar Technology Evangelist - Microsoft
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
Data Management Conference Performance & Scalability Simon Sabin London September 29th.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
SMP MPP with PDW ** Workload requirements usually drive the architecture decision.

11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Enable Operational Analytics (HTAP) in SQL Server 2016 and Azure SQL Database Sunil Agarwal Principal Program Manager, SQL Server Product Tiger Team
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
System Center Marketing
System Center Marketing
SharePoint Solutions Architect, Protiviti
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Antonio Abalos Castillo
Microsoft Build /9/2018 8:04 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Installation and database instance essentials
Introduction to SQL Server Management for the Non-DBA
Database Performance Tuning and Query Optimization
Test Upgrade Name Title Company 9/18/2018 Microsoft SharePoint
Optimizing Microsoft SQL Server 2008 Applications Using Table Valued Parameters, XML, and MERGE
A developers guide to Azure SQL Data Warehouse
Entity Based Staging SQL Server 2012 Tyler Graham
Your Data Any Place, Any Time
Azure SQL Data Warehouse Performance Tuning
Server & Tools Business
A developers guide to Azure SQL Data Warehouse
11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Office Mac /30/2018 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
TechEd /2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Azure SQL DWH: Optimization
12/5/ :14 PM © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Microsoft SQL Server 2014 for Oracle DBAs Module 7
Pedro Miguel Teixeira Senior Software Developer Microsoft Corporation
TechEd /15/2019 8:08 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
In-Memory OLTP for Database Developers
Building continuously available systems with Hyper-V
Sunil Agarwal | Principal Program Manager
Context about the Data Warehouse
Chapter 11 Database Performance Tuning and Query Optimization
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Microsoft Analytics Platform System 03 – Distribution Theory & Design
5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.
Шитманов Дархан Қаражанұлы Тарих пәнінің
Microsoft Virtual Academy
Microsoft Virtual Academy
Presentation transcript:

Microsoft Analytics Platform System 04 – APS Data Loading Brian Walker | Microsoft ​Architect – Data Insights COE Jesse Fountain| Microsoft ​WW TSP Lead November 13, 2018

Agenda Data Movement Service Load process detail Clustered Columnstore on APS Loading from remote servers Create Table as Select (CTAS)

Data Movement Service (DMS)

Data loading design goals Load data efficiently Load Data non-obtrusively, respecting concurrent queries and loads (not true for FASTLOAD) Reduce table fragmentation as much as possible Provide system recovery capabilities in the event of data load failure with minimal impact on concurrent queries Provide multiple load/ETL options for APS customers

Data Movement Service Runs On Control And Compute Nodes As A Windows Service Used For Data Loading As Well As Query Processing Used To Quickly Move Data In Parallel Between Nodes With Infiniband Network In APS Uses ADO.Net Uses Sqlclient Namespace To Select Data From SQL Server Uses Sqlbulkcopy To Insert Data Into Compute Nodes

Data Movement Service Uses The Following Protocols/Networks: Data Transfer Network To Move Data Between Nodes Message Network To Send Command And Status Messages To Nodes From Manager Closely Interacts With The Primary APS Engine Service Used For Both Loading And Querying Data

Loading modes Data Source Staging Database Target Database Data Source APS offers three standard bulk loading modes Append Reload Upsert And one optimized mode FastAppend Data Source Staging Database Target Database Data Source Target Database

Table geometries impact load performance Table Type Configuration HEAP Non-Partitioned Partitioned Clustered Index Clustered ColumnStore Fastest Slowest

Load process detail

Loading ETL/ELT options with APS DWLoader utility Mode: APPEND/RELOAD/UPSERT Stage table -> Destination table Mode: FASTAPPEND Destination table SQL Server Integration Services (SSIS) CREATE TABLE AS SELECT (CTAS) Standard SQL DML statements: INSERT/SELECT

SSIS distributed table load – Step 1 Control node Compute nodes SAS (4) Each row is converted for bulk insert and hashed based on the distribution column DMS Converter Sender Receiver Writer APS Engine Load Manager DMS Manager SQL Server (2) Load Manager creates staging tables JBODs Infiniband DMS (5) Hashed row is sent to appropriate node receiver for loading (3) DMS reads load data and buffers records to send to compute nodes in a round-robin manner Distributor Load Client SSIS API SAS Loading server DMS Converter Sender Receiver Writer Load file/SSIS (6) Row is bulk inserted into staging table SSIS (1) SSIS packages invoked

DWLOADER table load AU1 – Step 1 Control node Compute nodes SAS (4) Each row is converted for bulk insert and hashed based on the distribution column SQL Server DMS Converter Sender Receiver Writer APS Engine Load Manager DMS Manager (2) Load Manager creates staging tables JBODs Infiniband DMS Distributor (5) Hashed row is sent to appropriate node receiver for loading (3) DMS Distributor functionality is built into the DwLoader load data into compute nodes in a round-robin manner Load Client SSIS API SAS Loading server DMS Converter Sender Receiver Writer Load file with DWLOADER (6) Row is bulk inserted into staging table DWLoader DMS Distributor (1) DWLoader invoked

APS distributed table load – Step 2 STEP 1: DWLoader creates topology equivalent staging table and moves data from Loading Server file into staging tables using DMS Staging database Destination database Step 2 process: DWloader uses SQL commands to move from staging to destination tables NOTE: Distributions of a table are written in parallel when the multi-transactions option is set to true

Controlling parallel loads through soft-NUMA Compute node Insert-select Load file Bulk insert Partitioned staging table Partitioned final Sort each batch in-memory or TempDB NUMA node 1 Insert-select Load file Bulk insert Partitioned staging table Partitioned final Sort each batch in-memory or TempDB NUMA node 2 Insert-select Load file Bulk insert Partitioned staging table Partitioned final Sort each batch in-memory or TempDB NUMA node 8

Data-loading considerations APPEND, RELOAD, UPSERT Staging tables are locked during the entire load process Destination table is locked only in the final step and only at the row level (Reload, Append, and Upsert Modes), minimizing impact on table reads during loads. FASTAPPEND Using this option will eliminate staging load, but will lock the destination table with an Exclusive-U-Lock on the Table level thereby affecting table reads during loads.

Loading example: 1bn rows 6 nodes Replicated Table 1bn rows * 6 nodes 6 target tables 1bn rows per node 6bn rows in total DMS Workers 1 reader worker per node 1 writer worker per node – regardless if target table index geometry No additional lift for in heap performance Distributed Table 1bn / 6 nodes 48 target tables ~166.7m rows per node ~20.8m rows per distribution DMS Worker 1 reader worker per distribution 1 writer worker per distribution when target is indexed >1 writer worker per distribution used when target is a heap – up to 8x uplift over replicated tables

Clustered Columnstore on APS

Introduction Columnstore overview MGXFY13 11/13/2018 Columnstore overview Introduction Columnstore indexes are designed for data warehouse type queries where only a portion of the table columns are required. Enables users to isolate the required data far more efficiently than with traditional row-based storage. Typically provides higher compression ratios due to tables generally containing more duplicate values in a column than a row: Page compression is normally ~ 2.5x to 3.5x depending on data. Columnstore is normally ~ 5x to 15x depending on data. Higher compression ratios contribute to a greater ROI for raw storage. New batch mode processing enables lower CPU utilization for the same number of rows processed. Supports important existing data warehouse functionality such as partition switching, splitting, and merging. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Terminology Columnstore overview … Row group Segments Columnstore MGXFY13 11/13/2018 Columnstore overview Terminology Row group Segments Clustered columnstore index is comprised of two parts: Columnstore Deltastore Data is compressed into segments Ideally ~1 million rows (subject to system resource availability) per distribution, per partition. A collection of segments representing a set of entire rows is called a row group The minimum unit of I/O between disk and memory is a segment (red block is a single segment) Execution batch model (as opposed to traditional row mode) moves multiple rows between iterators: ~ 1000 Rows per distribution, per partition. Dictionaries (primary and secondary) are used to store additional metadata about segments C1 C2 C3 C5 C6 C4 Columnstore C1 C2 C3 C4 C5 C6 Delta (row) store … © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More than compression – batch mode MGXFY13 11/13/2018 Columnstore overview More than compression – batch mode Row mode scan example SQL 2012 implements batch mode processing to handle rows a batch-at-a-time in addition to a row-at-a-time SQL 2008 and before only had row processing Typically batches of about 1,000 rows are moved between iterators Significantly less CPU is required due to the average number of instructions per row decreasing Batch mode processing is only available for some operators Hash Join/Aggregate are supported Merge Join, Nested Loop Join, and Stream Aggregate are not supported SELECT COUNT(*) FROM FactInternetSales_Column 352 ms SELECT COUNT(*) FROM FactInternetSales_Row 6704 ms Batch mode scan example © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Columnstore DDL syntax MGXFY13 11/13/2018 Columnstore DDL syntax CTAS with columnstore Example: CTAS Query – Total Rows: 60,398 CREATE TABLE FactInternetSales_Copy WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH(SalesOrderNumber)) AS SELECT * FROM FactInternetSales Order of operations for query: Create table FactInternetSales_Copy Create clustered columnstore index on FactInternetSales_Copy Insert Into FactInternetSales_Copy from FactInternetSales Because the clustered columnstore index is created BEFORE data is inserted, an INSERT of under 16384 rows per distribution will populate the deltastore No records in the resulting table are stored in columnstore Issue statement - ALTER INDEX <IndexName> ON FactInternetSales_Copy REBUILD Note: Q_Tables (Internal TEMP Tables) never create a clustered columnstore index during data movement Initial CTAS After REBUILD © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL DMVs extended for PDW MGXFY13 11/13/2018 Columnstore metadata SQL DMVs extended for PDW Existing SQL 2012 DMVs sys.pdw_nodes_column_store_segments sys.pdw_nodes_column_store_dictionaries sys.pdw_nodes_column_store_row_groups (shown earlier in INSERT example) sys.indexes and sys.index_columns have also been modified for columnstore information DELETE Row Count © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Workload management overview MGXFY13 11/13/2018 Workload management overview Resource classes PDW V1 had 32 concurrency slots where each user was assigned a single slot PDW V2 introduced pre-built resource classes to allocate more resources to a single connection mediumrc largerc xlargerc Each defined resource class is created as a server role on PDW Each resource class is a pre-defined set of appliance resources PDW concurrency slots to use Memory availability (max memory) Priority Syntax: ALTER SERVER ROLE XLargeRC ADD MEMBER charlesf © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Workload management overview MGXFY13 11/13/2018 Workload management overview Resource classes DMVs Resource Governor DMVs added to PDW V2 Existing DMV sys.dm_pdw_exec_requests is extended to include the resource_class that the request came from New DMVs added to PDW V2 sys.dm_pdw_resource_waits sys.dm_pdw_nodes_resource_governor_resource_pools sys.dm_pdw_nodes_resource_governor_workload_groups (Example Below) © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Workload management overview MGXFY13 11/13/2018 Workload management overview Resource classes resource allocation Default Behavior Concurrency slots in use 1 Max memory (per dist) ~ 150 MB (V1) ~ 400 MB (V2) Priority Medium LargeRC Concurrency slots in use 7 Max memory (per dist) ~ 1.05 GB (V1) ~ 2.8 GB (V2) Priority High MediumRC Concurrency slots in use 3 Max memory (per dist) ~ 450 MB (V1) ~ 1.2 GB (V2) Priority Medium XLargeRC Concurrency slots in use 21 Max memory (per dist) ~ 3.15 GB (V1) ~ 8.4 GB (V2) Priority High © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Workload management overview MGXFY13 11/13/2018 Workload management overview Workload management and columnstore Assigning the correct resource class is vital to the quality of the columnStore index created Look at the difference in segment size between the default and XLargeRC resource class … ALTER SERVER ROLE XLargeRC ADD MEMBER charlesf Default XLargeRC © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Presentation overview MGXFY13 11/13/2018 Presentation overview Conclusion Two factors contribute to exceptional query performance in PDW V2: Columnstore index segment elimination Batch mode execution Columnstore indexing in PDW V2 is significantly advanced from SQL Server 2012 Single copy of data Full DML Supported Query processor enhancements Be aware of the circumstances in which data is automatically compressed (rather than delta stored) Server roles in PDW V2 enable users to be categorized for workload management XLargeRC should generally be reserved for memory intensive maintenance tasks © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Loading from remote servers

Heterogeneous Sources Loading options Command Line dwloader.exe SSIS 2010, 2102 & 2014 Polybase Hadoop Azure File Based Heterogeneous Sources File Based

Loading from remote servers Loading Server Any Remote Server Connected To APS (Infiniband Connection Preferred) Multiple Loading Servers Allowed Requires Client Tools Only No DMS Or Other APS Services Running

Data loading with Dwloader Command-line Utility Invoked On A Load Server Integrated With DMS (Even More Integrated Since V2 AU1) Streamlines I/O And Minimizes Data-loading Times Through Powerful Parallel Loading Functionality Against A Single Text File Optimizes Data Load Speeds While Maintaining A Performance Balance To Avoid Degrading Concurrently Running Queries

Dwloader syntax Dwloader -H Requires APS Control Node IP Displays Simple Help Information About Using The Loader And Switches Requires APS Control Node IP Dwloader -M Append -I Dimaccount.Txt –T Adventureworksdw.Dbo.Dimaccount -R Dimaccount.Bad -T “|” -R 0x0d0x0a -U Sa -P Test -D "Yyyy-mm-dd Hh:mm:ss.Fff“ –M –S 10.10.10.1

CONTROL_NODE_ADDRESS = “10.10.10.1” DWoader installation Set default APS Control Node IP ClientTools.msi command-line parameter CONTROL_NODE_ADDRESS = “10.10.10.1” Manually Registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server Anlalytical Platform Server\Configuration\Loader Value (REG_SZ) ControlNodeAddress = 10.10.10.1

Characteristics of Dwloader Accommodate Initial Data Loads Of Large Files Over 300 GB Achieves Data Load Speeds Of Up To >3 TB Per Hour (1 Rack) Accommodates Multiple And Concurrent Incremental Loads Offers Settings For Canceling And Showing Status Of Loads Enables Input Files To Reside On Any External Server (Preferably Connected To Infiniband) Max. Concurrency Up To 10; Queues Up Subsequent Load Supports Direct Load Of Compressed Files (Zip Or Gzip Compatible Formats, Not New Formats Like Gz2)

Third-party loading options Informatica Powercenter (Versions Up To 9.5.1) Windows Environments Only Default Operation Row By Row APS Loader – Bulk Functionality SAP Business Objects Data Services (BODS) Attunity Replicate Trickle Loading Using Dwloader Under The Hood

Loading Limit Is Applied Across All Loading Methods Loading limitations Maximum 10 Concurrent Loads Across Appliance SSIS – 10 Destination Adaptors Sending Dwloader – 10 Load Commands Queue Up To 40 Loads Will Be Queued 51st Concurrent Load - Error Loading Limit Is Applied Across All Loading Methods

Create Table as Select (CTAS)

Creates a new table based on a SELECT statement Can be used to change: Create Table as Select Creates a new table based on a SELECT statement Can be used to change: Distributed table from a replicated table or vice-versa The distribution column The clustered index or changes to a heap Partitioning Use to periodically defrag tables

CTAS compared to SQL Insert with a Select Insert/Select Runs in parallel across nodes but “sequentially” across distributions Each distribution is run separately to provide transaction safety across the entire appliance Runs within a SQL Server transaction Full transaction rollback occurs in event of failure

CTAS compared to SQL Insert with a Select Runs fully in parallel across nodes across distribution Minimal logging Performance Relatively eight times better performance compared to INSERT/SELECT CREATE TABLE statement Destination table can be used with partition switching if exists

CTAS compared to SQL Insert with a Select – Total duration INSERT/SELECT Longest combined execution of all distributions on one node Be careful how you read distribution durations This is normal: Dist_A = 5 min, Dist_H = 40 min CTAS Longest distribution Dist_A =5 min, Dist_H = 5 min +/- (depends on several factors)

CREATE TABLE AS SELECT syntax CREATE TABLE [ database_name.[ dbo ].|dbo.]table_name [ WITH ( DISTRIBUTION= { HASH(distribution_column_name) | REPLICATE } [ , <CTAS_table_option> [,…n] ] } AS SELECT <select_criteria> [;]

CREATE TABLE AS SELECT – CTAS table option { [ LOCATION = USER_DB ] [ CLUSTERED INDEX ( { index_column_name [ ASC | DESC ] } [ ,...n ] ) ] | [ PARTITION( partition_column_name RANGE [ LEFT | RIGHT ] FOR VALUES ( [ boundary_value [,...n] ] ) ) ] }

CTAS – Simple example CREATE TABLE Orders_defrag WITH (DISTRIBUTION = HASH (order_number), CLUSTERED INDEX (order_date ASC)) AS SELECT * FROM Orders ;

Microsoft Analytics Platform System 11/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.