Know your data source well
Who am I? Nik – Shahriar Nikkhah Microsoft MVP 2010 – SQL Server MCITP SQL 2008 MCTS SQL 2008 and s: msdn.microsoft.com (SSIS forum) One chapter on SSIS in MVP Deep dive 2 (Sep 2011)
OVERVIEW Know your data source well / Data cleansing 1. Chronological file order 2. Data cleansing 3. Check a few sample packages Error handling / notification 1. Capture error in a text file 2. error file as notification 3. One package sample A package with the combination of the above.
Know your data source well Analyze you data source from 2 different angles 1- Data point of view Data relations, field mapping, data value PK, FK, Index, Metadata, Dictionary (mapping) tables Good records and bad records (Redirecting) 2- Data source behavior Behavior changes (Table / file renaming and header names changes ) Delivery process, how does the source get made, provided and loaded. (CSV been open by excel and saved) Who is providing it.
Scenario on data behavior Data Point of view
Scenario on data behavior Data Point of view
Scenario on data behavior Data source behavior
Scenario on data behavior Data source behavior
Scenario on data behavior Data source behavior
Scenario on data behavior Files renamed and moved to different folders. Data source behavior Who is providing data source
Daily file load statistics Working days No. of Packages CVS / Excel, Load & Reload Excel Sheets Records per sheet (1,000) Total no. Records, Million Million record per day K, 10K Perfect world
Daily file load statistics Working days No. of Packages CVS / Excel, Load & Reload Excel Sheets Records per sheet (1,000) Total no. Records, Million Million record per day K, 10K Real world Files loaded per monthMonthly extra reload (Population reload) 6,300 – 1o,500 files / month2 – 3 reload a month = 12.6 – 31.5 files / month Loads Forecast Packages for the next yearNew customers Extra 200 (sum of 300 per customer)2 – 3 per year Reloads
Chronological file load Over 99% of the ETLs that have a file as a source don’t use chronological file load in the SSIS package.
Chronological file load Package overview.
Chronological file load Script that provides the files properties and information
Chronological file load Inside the DFT
Chronological file load Sort object
Chronological file load Set flag
Chronological file load Second For EachLoop Display script
Data cleansing Data cleansing and transformation Data flow transformation includes a series of data cleansing tool such as Joins Fuzzy Lookups Character mapping Data type conversion Derived columns Set of Boolean functions for data comparisons and replacement
Data cleansing
Error handling / notification Keep track of your packages when an error occurs Organize your error files Backup in the right folder Display the right Error message. Send a notification message to the right person The subject of the must be clear
Capture error files in a text file
SEE ATTACHED SAMPLE
notification Use SSIS Variables to set your SMTP object SEE ATTACHED SAMPLE