Download presentation
Presentation is loading. Please wait.
1
Patterns and Best Practices in SSIS
or, how to keep your DBA happy with your crazy-ass ETL
2
It’s all about me... I’m a SpeakingMentor !
3
Obligatory LOTR reference…
4
Mantra C# is *not* the only way to initialise a variable
C# is *not* the only way to move files C# is *not* the only way to call a web service C# scripts are opaque to the SSIS runtime C#... <sigh> Mantra
5
Not a pipeline Data does not get passed to components (cough)
Components manipulate blocks of data (true) Not a pipeline
6
What you *think* happens….
Get data Replace nulls Conditional Split Merge Sort
7
What actually happens…
Get data Conditional Split Replace nulls Sort Merge
8
SSIS not being a pipeline
9
It’s all about speed… There are 2 transformation types:
Synchronous – fastest (streaming and row-based) Asynchronous – slower And three ‘modes’: Non-blocking Semi-blocking Full blocking It’s all about speed…
10
Non-blocking synchronous streaming transforms
Audit Cache Transform Character Map Conditional Split Copy Column Data Conversion Derived Column Multicast Percent Sampling Row Count Lookup Non-blocking synchronous streaming transforms
11
Non-blocking synchronous row transforms
DQS cleansing Export Column Import Column OLEDB Command Script Task SCD Lookup Non-blocking synchronous row transforms
12
Wait, there’s two ‘Lookup’s ?
Lookups are non-blocking streaming transforms when the ‘Full Cache’ option is used for the lookup data Using ‘Partial Cache’ or ‘No Cache’ options in the lookup make the Lookup a row-based transform, which is necessarily slower Wait, there’s two ‘Lookup’s ?
13
Semi-blocking asynchronous transforms
Data Mining Merge Merge Join Pivot Unpivot Union All Term Lookup Semi-blocking asynchronous transforms
14
Full blocking asynchronous transforms
Aggregate Fuzzy Grouping Fuzzy Lookup Row Sampling Sort Term Extraction Script Task Full blocking asynchronous transforms
15
Wait, there’s two ‘Script Task’s ??
Script tasks are non-blocking when they’re using an outside resource (i.e. not the data that you’re working on) They become blocking when they collect a dataset before sending it on to a destination Set the ‘SynchronousInputID’ property on the output columns to ‘None’ Wait, there’s two ‘Script Task’s ??
16
Large Data Sets BufferTempStoragePath BlobTempStoragePath
These can either be set in your package template, or injected into the .dtsx Large Data Sets
17
Really ? You can just inject stuff ?
Yes Find and Replace default entries with your custom requirement… any decent text editor will do. BLOBTempStoragePath="F:\astDisk\WithLoadsOfSpace_temp" bufferTempStoragePath="F:\astDisk\WithLoadsOfSpace_temp” Really ? You can just inject stuff ?
18
Things that make you go hmmm
Logging Package Checkpoints Location, location, location Expressions Parallel Operations ‘IsSorted=True’ property Auditing Bad Data Things that make you go hmmm
19
Here’s one I prepared earlier…
Time to dissect Here’s one I prepared earlier…
20
Take-aways – You should know..
Your data Your environment Transform types Ingest / Egress of data What you are trying to achieve When to give up and use C# Take-aways – You should know..
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.