Designing SSIS Packages for Performance Eleanor Stahura and Erin Dempster Designing SSIS Packages for Performance
Thank you Sponsors! Platinum Sponsor: Gold Sponsors: Visit the Sponsor Booths Lots of Raffle Prizes! Get your parking paid via Sponsor Bingo Thank you Sponsors! Platinum Sponsor: Gold Sponsors:
PASSMN – News/Info Sponsors: Thanks to all our sponsors of 2018! We need Sponsors for Nov/Dec 2018 and 2019! Special thanks to our annual sponsor: Board Member Elections in November/December: Your chance to help out the MN SQL community!
November 5th Through November 9th Join the brightest data professionals focused on the Microsoft Data Platform! November 5th Through November 9th Pre-Conference Sessions – Monday/Tuesday Conference – Wednesday through Friday
SQLSaturday #796 – After Party Location: 4th Floor of Mall of America Time: 6:30PM – 10PM There will be drinks and appetizers as well as free game cards and bowling! Hang out with some new friends you’ve made.
The Presenters
Eleanor Stahura 6 years' experience Database, Data Warehouse & Report development SQL Server 2008 – 2016 SSIS SSAS SSRS / Power BI Current grad student at University of St. Thomas (MS Software Engineering)
Erin Dempster 15 years experience Certified since 2004 (MCDBA on SQL Server 2000) Transactional and Analytical developer Application developer (VB 6 and C# .Net) Database Administrator (SQL 2008 – 2014) Current grad student at Dakota State University (MS Analytics)
Outline
Outline Performance in SSIS Different Types of Blocking Dimension ETL Optimization Fact ETL Optimization
Scenario Operations needs to be able to track inventory by day. Incremental inventory extracts are available to be consumed. New customers are coming in every day. Other customers are updating their contact information. Reports need to reflect the current customers and their attributes.
Scenario Build an SSIS package to run today …and tomorrow …and next month …and next year …and the data volume is growing every day …and other things are running on the server
Building Strong SSIS Packages More than just getting them to work
Why Does Performance Matter? Most obvious: faster = better Grows more happily Makes less mess Plays better with others
Start Thinking Performance
Blocking The degree to which a single row of data can be processed independently from other rows
Types of SSIS Blocking
Fully Blocking Requires entire data stream Increased memory usage Generally decreases performance Includes Sort Fuzzy Lookup Aggregate
Video - Fully Blocking
Execution Results – Sort and Merge Time Elapsed: 1 minute 52 seconds Number of records: 2.67 million Memory used: 650MB What happened? 2 Sort transformations All records stored in memory
Semi Blocking Doesn’t require entire data stream, but New thread(s) are created to run asynchronously Includes Merge Join Pivot/Un-pivot Union All
One of these is not like the others Non-Blocking Data stream is processed as it’s received Minimizes memory utilization Generally (but not always) fast transformations Includes Conditional Split Derived Column Lookup Slowly Changing Dimension One of these is not like the others
Video – Slowly Changing Dimension
Execution Results – SCD Transform Time Elapsed: 3 minutes 46 seconds Number of records: 166k records What happened? Each record is queried against the DB Inserts and updates occurring at the same time
Demos
Video – Non-Blocking Dim Package
Execution Results – Non-Blocking Dim Time Elapsed: 18 seconds Previously 3 minutes 46 seconds Number of records: 166k records What happened? Lookup retrieved all records Updates moved out of data flow
Video – Non-Blocking Fact Package
Execution Results – Sort and Merge Time Elapsed: 56 seconds Previously 1 minute 52 seconds Number of records: 2.67 million Memory used: 48MB What happened? Lookups loaded at the start In-memory comparison
Just because it can doesn’t mean it should What does SSIS do best? Aggregate Fuzzy Grouping Fuzzy Lookup Row Sampling Sort
Clear the bottlenecks
Final Notes Practice. Then practice more. Test with larger data sets Understand the larger system configurations
Questions
Thank You Elle Stahura Erin Dempster estahura@teamscs.com edempster@teamscs.com