Performance Tuning SSIS Brian Knight, CEO Pragmatic Works bknight@pragmaticworks.com Brian
About the Ugly Guy Speaking SQL Server MVP Founder of Pragmatic Works Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com Written more than a dozen books on SQL Server
Integration Services in Action GeoSpatial Data: Semi structured Legacy data: binary files Application database SQL Server Integration Services GeoSpatial Components Custom source Standard sources Data-cleansing components Merges Data mining Warehouse Reports Mobile data Source = Use right tool for the job Transform = Smaller trips because I still had to come back for more dirt Destination = Bring the car closer because there’s less to run. Devin Integration is a seamless, manageable operation Source, prepare, & load data in single, auditable process Scale to handle heavy and complex data requirements Cube
Advanced Session
Today’s Problems with Integration Integration today Increasing data volumes Increasingly diverse sources Requirements reached the Tipping Point Low-impact source extraction Efficient transformation Bulk loading techniques When my brother entered the field, he had pansy data to deal with like 100 MBs. Now though, we have real data problems. How many have dbs over a TB in your envrionment? That’s right. We now have to deal with terrabytes of data and keep in in synch live. My brother only had to deal with flat files and excel spreadsheets and we now have to deal with going to the mainframe directly without impacting the mainframe. Devin
Tuning Decisions Choose the right tool for the job Don’t be afraid to use T-SQL Will parallelism work? Brian
Source Optimization Flat files – When available, use Fast Parse OLE DB sources – Change network packet size Use T-SQL whenever possible in the OLE DB Source Joining NULL handling Where clauses
Impact of Compression on ETL * Not official Microsoft results.
Connection manager tuning Flat file tuning OLE DB Source tuning Tuning the Source Connection manager tuning Flat file tuning OLE DB Source tuning Brian – 10 min Demo
Transform Components The Pipeline presents the buffer to each downstream component Devin
SSIS Data Flow Architecture 11/17/2018 7:45 PM SSIS Data Flow Architecture Synchronous vs. Non Synchronous Cards example © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Case Study: Patterns Devin 83 seconds 105 seconds
Source Data Extraction Extracting data from the source is expensive Efficient extraction is key to improving ETL performance Involves bulk loading data into staging areas or warehouse Time consuming & resource intensive Triggers (synchronous IO penalty) Timestamp columns (Schema changes) Complex queries (delayed IO penalty) Custom (ISV, mirror, snapshot, …) Incremental data load is key to efficient extraction Need to know what changed at source since a point in time Expensive lookups to determine changed columns Providing information up front about which columns changed will improve efficiency Devin
SQL Server 2008: Change Data Capture (CDC) Information about what changed at the source Changes captured from the log asynchronously Enabled per table CDC APIs provide access to change data OLTP Change Tables Data Warehouse Brian
Traditional CDC with SSIS Integrating CDC in 2008 Change Data Capture Traditional CDC with SSIS Integrating CDC in 2008 Devin – 8-10 min Demo
Lookup Component Three modes of operation Full Cache: for small lookup datasets No Cache: for volatile lookup datasets Partial Cache: for large lookup datasets Tradeoff memory vs. performance Use Cascaded Lookups Merge Join may be alternative Devin
SQL Server 2008: Lookup Transform Hydrate cache files for large data sets Can reuse cache Can load cache during day and use in nightly ETL Brian
Cascading lookup optimizations Cache file lookup Demo Cascading lookup optimizations Cache file lookup Devin – 5 min
Data Destinations Use “Fast Load” or SQL Server Destination Table Lock on insert operations Trace flags for improvement Old principles still apply Devin
Destination Tuning Devin Demo
Building a Work Queue System Create a work queue table. Create a loop to shift over the work queue constantly checking out work Spawn x times with a batch file
Demo Results Here is what our first run looked like with each task being processed in sequence by a single package instance.
Demo Results This is what our second run looked like with 2 processes working in parallel. As you can see, the tasks get completed in batches of two and the total demo run time drops in half from about 64 seconds to 36 seconds.
Demo Results And here is our third run with 4 processes working in parallel. The time for individual tasks has risen from 8 or 9 seconds to 13 or 14 seconds while the total run time has dropped from about 36 seconds to about 28 seconds.
Demo Results Finally, here is a run of the demo with 8 processes. As all tasks get worked on simultaneously, the time for each task has risen to about 27 seconds and the total run time is almost the same as the run with 4 processes. What’s happened here is that we’ve hit a disk I/O bottleneck as all 8 processes contend with each other to read their data files from the disk. To solve this problem, we would want to spread the files across separate disks and controllers or move to a faster disk technology.
Parallel Load Demo
Managing Resources Logging events to watch pipeline internals PipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning System Monitor to track I/O issues Buffers In Use tracks how many buffers are presently being used Buffers Spooled tracks how many 10 mb buffers have been spooled to disk Brian
Measuring Performance Perfmon Brian – 6 min
Location Consider the following configuration… Where should SSIS run? 11/17/2018 7:45 PM 11/17/2018 7:45 PM Location Consider the following configuration… Where should SSIS run? (Licensing issues aside) SQL Server 1 SQL Server 2 Brian SSIS Server © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 29 © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
WSRM Windows System Resource Manager (WSRM) can throttle CPU and memory Creates a soft throttle Can be scheduled so SSIS gets priority on weekends and nights Only activates policy if resources begin to become constrained (about 70%) WSRM is free with Windows Server 2003 Enterprise Edition and included in Windows Server 2008 Brian
Creating a soft schedule cap WSRM Creating a soft schedule cap Brian Demo
Summary Planning Use the right tool for the right job Don’t underestimate the power of the whiteboard! Use the right tool for the right job Leverage the power of the engine Patterns and Practices Understand best practices But don’t be afraid to experiment Devin/Brian
The End Already? Questions @BrianKnight http://www.bidn.com/people/brianknight @BrianKnight bknight@pragmaticworks.com http://www.youtube.com/pragmaticworks