Download presentation
Presentation is loading. Please wait.
1
Design Patterns for SSIS Performance
Darren Green Konesans Ltd
2
Performance Utility packages vs specialised work processes
Less than 10% of packages require detailed performance design or tuning It will take at least 90% of your time! Use simple tweaks and hints ORDER Col for Cl Idx, MICS More radical design changes? Which way is faster? It depends!
3
Lookups Two main uses Performance Points
Check if exists, no values are returned Lookup data, returns one or more values to augment existing data Performance Points Size of reference data set Number of rows Size of row (especially with augmentation usage) Size of input data set Cache Type
4
Lookup Cache Types Characteristic Full Cache Partial None
Size of reference dataset Small Lots of rows or wide rows can take a long time to cache and use valuable memory Small Range of Large Cache built based on input values so a small range of values reduces the costly first lookup. Any Reference dataset has no impact on the lookup, but assumes an efficient query. Reference data far larger than (unique) input. Size of input dataset Any / Large Once cached the lookup is available for 1 or 1 million rows. Larger datasets justify the initial cost of cache population. Only sensible when input values are duplicated in dataset. Use when reference dataset is large but value range is small. Small / Sorted Small datasets work best as lookup cost is high, but last value is cached, so larger sorted datasets are also good
5
Demo 1-4 Insert vs Update
6
Insert vs Update Patterns
Insert or Update with Lookup Use lookup to test if exists, or return values Standard and familiar pattern Changing the Lookup cache type can be very helpful Insert then Update Use constraint failures to direct rows Use multiple inserts to refine batch to single row Good for high Insert to update ratio, more inserts Use lookups without cache on update rows only, more efficient with large reference data sets Merge Join and Conditional Split for Insert or Update Must be sorted inputs
7
Insert vs Updates Performance
Test conditions Single Virtual Machine 3GB Ram Virtual Disk Packages run in pseudo random order 18 or 23 iterations per package Input source of 2 million (narrow) rows Existing rows 100K, 500K, 1M
8
Performance Numbers Performance metrics are for the purpose of illustration only. The organism must be present in every case of the disease. The organism can be isolated from the diseased host and grown in pure culture. The specific disease is reproduced when a pure culture of the organism is inoculated into a healthy host. The organism must be recoverable from the experimentally infected host
9
Performance Test Results
Insert to Update Ratio Lookup Insert then Update Merge Join 20:1 67 96 69 4:1 66 124 78 2:1 80 334 89 Variation in Times 13% 33% 3% Uses only top 50% of results for each run Variation means results quite unpredictable, even excluding outliers Scale of test too small for real conclusions External factors a big influence!
10
Demo 5-7 Updates
11
Update Methods OLE-DB Command T-SQL Update Statement
Standard and familiar pattern in SSIS Poor user experience developing SQL and mapping parameters Poor performance T-SQL Update Statement Requires additional staging table and space Good performance
12
Update Performance Results
OLE-DB Command T-SQL Update Time 123 82 Variation in Times 23% 18% 6 runs for each case in pseudo random order 2 million rows, 500K updates, 4:1
13
Demo 8-9 Distinct
14
Distinct Rows Distinct 2 Million to 201 Rows Sort Transformation
One integer column Sort Transformation Remove Duplicate Values Aggregate Transformation Group By
15
Distinct Performance Results
Sort Aggregate Time 54 51 Variation in Times 17% 2% 6 runs for each case in pseudo random order Variation very different Is the sort transform more susceptible to outside influences?
16
Thank you Questions? Darren Green Feedback Online darren@konesans.com
Feedback Online
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.