Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.

Similar presentations


Presentation on theme: "Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP."— Presentation transcript:

1 Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP

2 SSIS in SQL Server 2008 The cool Stuff.

3 Who am I SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version 6.5 www.SQLDTS.com and www.SQLIS.com www.SQLDTS.comwww.SQLIS.com

4 Today’s Schedule SSIS in SQL Server 2008 Behind the scenes – Threading – Pipeline Limiter Front of shop – Lookup Component. Caching and other good stuff. – Data Profiling – Change Data Capture

5 Behind the Scenes Threading

6 Loading speed will benefit from getting as many threads spinning as possible. In SSIS 2005 this was not always optimised Blocking Transforms really wrecked your day. Quaint workarounds

7 Threading - Multicast Transform SSIS 2005 Incredibly quick to generate n copies of the input All outputs are Synchronous All outputs on same execution tree All outputs on same thread. Round robin

8 Threading - Multicast Transform SSIS 2008 Incredibly quick to generate n copies of the input All outputs are still Synchronous All outputs on same execution Path All outputs can get their own thread

9 Execution “Tree” 2005 begin execution tree 0 output "Generated" (8) input "Multicast Input 1" (20) output "Multicast Output 1" (21) input "TrashInput" (25) output "Multicast Output 2" (32) input "TrashInput" (29) output "Multicast Output 3" (44) end execution tree 0

10 Execution “Tree” 2008 Begin Path 0 output "Generated" (7); component "Konesans Data Generator Source" (1) input "Multicast Input 1" (36); component "Multicast" (35) Begin Subpath 0 output "Multicast Output 1" (37); component "Multicast" (35) input "TrashInput" (41); component "Trash Destination" (39) End Subpath 0 Begin Subpath 1 output "Multicast Output 2" (48); component "Multicast" (35) input "TrashInput" (45); component "Trash Destination 1" (43) End Subpath 1 End Path 0

11 Demo Multicast Transform in 2005 and 2008

12 Behind the Scenes Pipeline Limiter

13 Pipeline What? That’s right – Limiter Doesn’t that mean things will go slower in 2008? – No. It happens now in 2005 you just don’t know it Why would I want to restrict the pipeline? – Data = buffers = memory – Memory is reused when the buffer terminates – Push back from a component = no reuse of memory = run out of memory !!!!!

14 Pipeline Limiter (Nov CTP) Information: 0x400492E0 at Data Flow Task, SSIS.Pipeline: During last execution the pipeline suspended output "Union All Output 1" (487) of component "Union All" (485) for 10445 milliseconds to limit the number of in-memory buffers.

15 Pipeline Limiter (Feb CTP/RTM) No “Information” event is now raised externally Everything sent to the log messages User:PipelineComponentTime Filter out “Message” attribute starting with “The Component%”

16 Demo Pipeline Limiter

17 Front of Shop Lookup Component Caching Options

18 The only way it works in 2005 Caching options Full – Before the Data Flow task really gets going (Initialize) the lookup components cache the entirety of their reference datasets Partial – The lookup component will try for a match in cache first then go to disk. No pre-caching None – Misnomer really. The last query result is cached but that’s it. Every other query is to disk.

19 The only way it works in 2005 Full Cache is fastest but – Might take longest to cache – Could take large memory amount to cache The reference datasets are not passed around – In a loop on the same task you read the reference dataset n times – Not transferrable across Data Flow tasks

20 The way it can work in 2008 The Cache Transform – No transformation happens! – Can be used as a destination Cache Connection Manager – File. Can be read by the Raw File adapter. – In Memory – Allow the reference dataset to be passed around like luggage

21 The way it can work in 2008 Just because you can cache the dataset like this does not mean you have to Default option is still to use the 2005 way.

22 Row Redirection One of my main dislikes about this component in 2005 If a lookup component gets no match then by default it fails (Row yielded no match during lookup) You have to configure the error output to handle this. – Redirect down the error output – Ignore (pass the looked up value as a NULL)

23 Row redirection In 2008 we say hello to the “No Match Output”

24 Demo Caching options and redirection

25 Points to note Lookup transform is still case sensitive when done in cache When done on disk via OLE DB Connection Manager it is governed by their rules. Watch out for different behaviour – Be careful out there.

26 Data Profiling Task Control Flow Task Profiles data in your database Quick Setup or verbose Loads of different metrics you can extract Uses an ADO.Net Connection Manager Only SQL2K and above can be profiled

27 Data Profiling Task Fantastic for seeing how rubbish the data is Useful for identifying distribution of values Candidate keys in tables Lengths of columns Can be read externally or through a variable XML whichever way you look at it

28 Demo Data Profiling Task

29 Change Data Capture (CDC) Billed as an ETL feature We’ll use it if it’s there What is it? How do I use it?

30 30 Traditional ETL vs. CDC-ETL Operational Data Sources DWETL Engine loadextracttransform Moves Entire Data Set Requires ‘Window of Operation’ (hours to days) Frequency/Latency – monthly, weekly, sometimes daily Traditional BULK ETL BULK Change Stream Real-time Updates CDC-based ETL Moves Only Changes to the Data No ‘Window of Operation’ Frequency/Latency – multiple times a day, real-time Periodic or Continuous Change Flow

31 Why CDC is Cool Increased efficiency of ETL Process Incremental Extractions Identify changed rows and columns Identify operation on data, all changes or net changes Log Based (Transactional Replication LR) Lightweight

32 Why CDC is Cool Viewable as net changes or complete journal Key point is about the “Change” what you view about those changes is up to you Differs from Change Capture in that you see the data No excuses for not considering incremental extracts now.

33 Terminology around CDC Capture Instance – Base Object (Max 2 per object) Capture Process – Reading of the log and placing the rows into the change tables Retention Periods

34 Demo Using Change Data Capture

35 Some of the things I did not cover C# is now a scripting option

36 Questions?

37 Contact Details/Resources allan.mitchell@konesans.com www.SQLIS.comwww.SQLIS.com / www.SQLDTS.comwww.SQLDTS.com www.SQLServerFAQ.com


Download ppt "Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP."

Similar presentations


Ads by Google