Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Warren Sifre

Similar presentations


Presentation on theme: "Presented by: Warren Sifre"— Presentation transcript:

1 Presented by: Warren Sifre
SSIS Do’s and Don’ts Presented by: Warren Sifre

2 Warren Sifre Presenter… Data Analytics Solution Architect DMI
LinkedIn:

3 Who is Warren??? OCR Racer and American Ninja Warrior (in Training!!!)
Interests in SQL Server, MongoDB, Hadoop, Python/C#/PowerShell and Information Security (Hacking) Indy BI PASS User Group Founder and Chapter Leader / Indy Power BI User Group Chapter Leader PASS SQL Saturday / Various PASS User Groups presenter MCSE, MCSA, MCDBA, Hortonworks HCA, Teradata 14 CTP and many more…

4 Agenda Data Flow Performance Considerations
Package Performance Considerations SSIS Frameworks and Design Patterns

5 Data Flow Data Flow Task (DFT) is the one of the most important architecture decision points you have when creating an SSIS Package. Many aspects of a DTF can affect the performance of a package execution. Transformation Object usage Source Query Design and Execution Memory / Batching configurations Data Type selection Connection Manager object type selection

6 Data Flow - Transformation Classifications
Examples of Non-Blocking Partial Blocking Blocking Lookups Derived Column Data Conversion Merge Merge Join Pivot / Unpivot Union All Sort Fuzzy Lookup Fuzzy Groupings Aggregates

7 Data Flow - Source Row Level Considerations
Reduce the # of columns returned by Source Reduce the # of records returned by the Source Reduce the Column width of String Data Types. Reducing data types will reduce reserved buffer space for column. Reduces the amount of Buffer Space (Memory) consumed for each record allowing for more records to process in the same buffer space. Reducing records reduces total buffer consumption and the # of times paging takes place

8 Data Flow – Additional Considerations
Use SQL Command in place of Table/View with OLE DB Source connection. Perform transformations in Source query where possible. Replace Slowly Changing Dimension task with T-SQL Merge or Conditional Splits. Configure Rows per Batch and Maximum Insert Commit Size as a way to manage TempDB and T-log usage on Destination Instance. Avoid Implicit Data Type Conversion on Flat File connection objects, they consume more buffer space than Explicit. Use SQL Server Destination instead of OLE DB when data is being transferred from and to the same SQL Instance.

9 Reduce Execution Trees
Package Considerations Parallel Processing Grouping Tasks Reduce Execution Trees

10 Package – Parallel Processing
Processing tasks in parallel can leverage resources more efficiently. Use Sequence Containers as a way to manage partially serialized tasks in a parallel manner. Manage maximum parallel processing settings (MaxConcurrentExecutables) Default is N + 2 where N is the number of Logical Processors on machine hosting SSIS.

11 Package – Reducing Execution Trees
Processing tasks with many decision points will increase the number of decisions trees, which ultimately increases the resources required to complete the task. Streamlining your package’s decision trees or segmenting one package into many can improve performance overall.

12 Package – grouping Tasks
There are many instances where we are loading multiple tables with very little dependency to each other (i.e. Stage tables). This is an opportunity to leverage parallel processing of grouped tasks. EXAMPLE: There are 10 tables to load in when loading in serial it takes 20 minutes. There is one table that takes 10 minutes to load. By segmenting your table loads into groups, you can have one group be the one table that takes 10 minutes and the other group being the other 9 tables in one sequence container. This would potentially reduce the execution time from 20 minutes to 10 minutes.

13 Package frameworks And Design patterns
Extract and Load Packages Table/Parameter Driven Packages Master and Sub Packages ETL vs ELT (aka “T-SQL/Store Procedure” vs “SSIS Lookups / Transformations”) Indexes – Drop or Not Drop?

14 Extract and Load Packages
When creating SSIS Packages for Data Warehouse loads, it is good practice to segregate Extract and Load processes into separate packages. Segregating Processes allows the following: Simplified ETL Architecture by minimizing complexity Ease of Troubleshooting Parallel Development and Troubleshooting Modifications are isolated to just one aspect of the DW loads.

15 Table/Parameter Driven Packages
SQL Tables can be created to store metadata leveraged by SSIS Packages. The following are benefits: One Package Template can be used to quickly develop many packages in short amount of time. Package Execution and Execution Order can be controlled by a column value in the SQL Table. Package Execution activities can be managed by values in records.

16 Master and Sub Packages
A Master Package can be leveraged to manage which Sub Package(s) should be executed by using For Each container(s). Parallel Processing of Sub Packages with limited dependencies is possible by using Groups and For Each container(s).

17 ETL vs ELT AkA (“T-SQL/Store Procedure” vs “SSIS Lookups / Transformations”)
In many cases, ELT is a more performant solution. Transformation tasks performed by SSIS can be bottlenecks and why not let the DB Engine do what is does best, manipulate and process records. With ELT, you can simplify your process by ensuring that Extracts perform as quickly as possible, thus reducing locks/blocks on source systems. More people are familiar with T-SQL than SSIS, so leveraging Stored Procedures in place of Transformation tasks, can improve supportability of the entire process.

18 Indexes – to Drop or Not To drop
Dropping indexes will ALWAYS improve insert performance, but whether the act of dropping/recreating the index will yield enough performance gains to make a noticeable difference is key. The user workload against the DW will determine whether dropping indexes is an option.

19 Miscellaneous Considerations
You may want to consider configuring your Data Flow to output any Buffer Overflow files to a different location than the default. The new location may be faster physical disk, which can yield some performance gains. Creating a descriptive, but concise Package Naming convention can help in identification of Package Role when leveraging Table/Parameter driven frameworks. Example: Extract_<TableName>, Load_<TableName>


Download ppt "Presented by: Warren Sifre"

Similar presentations


Ads by Google