Presented by: Warren Sifre

Slides:



Advertisements
Similar presentations
SSIS Dataflow Performance Tuning 1 st October 2010 Jamie Thomson.
Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Supervisor : Prof . Abbdolahzadeh
Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
Deep Dive into ETL Implementation with SQL Server Integration Services
1 Chapter Overview Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with.
Copying, Managing, and Transforming Data With DTS.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
2 Overview of SSIS performance Troubleshooting methods Performance tips.
Data Management Console Synonym Editor
Oracle9i Performance Tuning Chapter 12 Tuning Tools.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Connect with life Vinod Kumar Technology Evangelist - Microsoft
02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Pulling Data into the Model. Agenda Overview BI Development Studio Integration Services Solutions Integration Services Packages DTS to SSIS.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Supervisor : Prof . Abbdolahzadeh
Dissecting the Data Flow: SSIS Transformations, Memory & the Pipeline
ETL Design - Stage Philip Noakes May 9, 2015.
Data Warehouse ETL By Garrett EDmondson Thanks to our Gold Sponsors:
Presented By: Jessica M. Moss
Designing and Implementing an ETL Framework
Power BI Performance Tips & Tricks
Intro to BI Architecture| Warren Sifre
Design Patterns for SSIS Performance
Data Warehousing/Loading the DW—Topics
Antonio Abalos Castillo
SQL Server Internals Overview
Informatica PowerCenter Performance Tuning Tips
Dynamic SQL: Writing Efficient Queries on the Fly
Introduction to Big Data
Contained DB? Did it do something wrong?
Database Performance Tuning and Query Optimization
Introduction to Azure Streaming Analytics
Chapter 15 QUERY EXECUTION.
Swagatika Sarangi (Jazz), MDM Expert
Populating a Data Warehouse
Save Time & Resources: Job Performance Tuning Strategies
Performance Tuning SSIS
SQL 2014 In-Memory OLTP What, Why, and How
About Me
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Welcome to SQL Saturday Denmark
Stop Wasting Time & Resources: Performance Tune Your Jobs
Populating a Data Warehouse
Dynamic SQL: Writing Efficient Queries on the Fly
Introduction to Big Data
Designing SSIS Packages for Performance
Introduction to Big Data
Chapter 8 Advanced SQL.
Chapter 11 Database Performance Tuning and Query Optimization
Orchestration and data movement with Azure Data Factory v2
Patterns and Best Practices in SSIS
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Introduction to Data Lakes
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Data Governance at a glance…
Data Warehousing/Loading the DW—Topics
Introduction to Big Data
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Presented by: Warren Sifre SSIS Do’s and Don’ts Presented by: Warren Sifre

Warren Sifre Presenter… Data Analytics Solution Architect DMI Email: wsifre@dminc.com Twitter: @WAS_SQL LinkedIn: www.linkedin.com/in/wsifre

Who is Warren??? OCR Racer and American Ninja Warrior (in Training!!!) Interests in SQL Server, MongoDB, Hadoop, Python/C#/PowerShell and Information Security (Hacking) Indy BI PASS User Group Founder and Chapter Leader / Indy Power BI User Group Chapter Leader PASS SQL Saturday / Various PASS User Groups presenter MCSE, MCSA, MCDBA, Hortonworks HCA, Teradata 14 CTP and many more…

Agenda Data Flow Performance Considerations Package Performance Considerations SSIS Frameworks and Design Patterns

Data Flow Data Flow Task (DFT) is the one of the most important architecture decision points you have when creating an SSIS Package. Many aspects of a DTF can affect the performance of a package execution. Transformation Object usage Source Query Design and Execution Memory / Batching configurations Data Type selection Connection Manager object type selection

Data Flow - Transformation Classifications Examples of Non-Blocking Partial Blocking Blocking Lookups Derived Column Data Conversion Merge Merge Join Pivot / Unpivot Union All Sort Fuzzy Lookup Fuzzy Groupings Aggregates

Data Flow - Source Row Level Considerations Reduce the # of columns returned by Source Reduce the # of records returned by the Source Reduce the Column width of String Data Types. Reducing data types will reduce reserved buffer space for column. Reduces the amount of Buffer Space (Memory) consumed for each record allowing for more records to process in the same buffer space. Reducing records reduces total buffer consumption and the # of times paging takes place

Data Flow – Additional Considerations Use SQL Command in place of Table/View with OLE DB Source connection. Perform transformations in Source query where possible. Replace Slowly Changing Dimension task with T-SQL Merge or Conditional Splits. Configure Rows per Batch and Maximum Insert Commit Size as a way to manage TempDB and T-log usage on Destination Instance. Avoid Implicit Data Type Conversion on Flat File connection objects, they consume more buffer space than Explicit. Use SQL Server Destination instead of OLE DB when data is being transferred from and to the same SQL Instance.

Reduce Execution Trees Package Considerations Parallel Processing Grouping Tasks Reduce Execution Trees

Package – Parallel Processing Processing tasks in parallel can leverage resources more efficiently. Use Sequence Containers as a way to manage partially serialized tasks in a parallel manner. Manage maximum parallel processing settings (MaxConcurrentExecutables) Default is N + 2 where N is the number of Logical Processors on machine hosting SSIS.

Package – Reducing Execution Trees Processing tasks with many decision points will increase the number of decisions trees, which ultimately increases the resources required to complete the task. Streamlining your package’s decision trees or segmenting one package into many can improve performance overall.

Package – grouping Tasks There are many instances where we are loading multiple tables with very little dependency to each other (i.e. Stage tables). This is an opportunity to leverage parallel processing of grouped tasks. EXAMPLE: There are 10 tables to load in when loading in serial it takes 20 minutes. There is one table that takes 10 minutes to load. By segmenting your table loads into groups, you can have one group be the one table that takes 10 minutes and the other group being the other 9 tables in one sequence container. This would potentially reduce the execution time from 20 minutes to 10 minutes.

Package frameworks And Design patterns Extract and Load Packages Table/Parameter Driven Packages Master and Sub Packages ETL vs ELT (aka “T-SQL/Store Procedure” vs “SSIS Lookups / Transformations”) Indexes – Drop or Not Drop?

Extract and Load Packages When creating SSIS Packages for Data Warehouse loads, it is good practice to segregate Extract and Load processes into separate packages. Segregating Processes allows the following: Simplified ETL Architecture by minimizing complexity Ease of Troubleshooting Parallel Development and Troubleshooting Modifications are isolated to just one aspect of the DW loads.

Table/Parameter Driven Packages SQL Tables can be created to store metadata leveraged by SSIS Packages. The following are benefits: One Package Template can be used to quickly develop many packages in short amount of time. Package Execution and Execution Order can be controlled by a column value in the SQL Table. Package Execution activities can be managed by values in records.

Master and Sub Packages A Master Package can be leveraged to manage which Sub Package(s) should be executed by using For Each container(s). Parallel Processing of Sub Packages with limited dependencies is possible by using Groups and For Each container(s).

ETL vs ELT AkA (“T-SQL/Store Procedure” vs “SSIS Lookups / Transformations”) In many cases, ELT is a more performant solution. Transformation tasks performed by SSIS can be bottlenecks and why not let the DB Engine do what is does best, manipulate and process records. With ELT, you can simplify your process by ensuring that Extracts perform as quickly as possible, thus reducing locks/blocks on source systems. More people are familiar with T-SQL than SSIS, so leveraging Stored Procedures in place of Transformation tasks, can improve supportability of the entire process.

Indexes – to Drop or Not To drop Dropping indexes will ALWAYS improve insert performance, but whether the act of dropping/recreating the index will yield enough performance gains to make a noticeable difference is key. The user workload against the DW will determine whether dropping indexes is an option.

Miscellaneous Considerations You may want to consider configuring your Data Flow to output any Buffer Overflow files to a different location than the default. The new location may be faster physical disk, which can yield some performance gains. Creating a descriptive, but concise Package Naming convention can help in identification of Package Role when leveraging Table/Parameter driven frameworks. Example: Extract_<TableName>, Load_<TableName>