Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETL process management with TSQL

Similar presentations


Presentation on theme: "ETL process management with TSQL"— Presentation transcript:

1 ETL process management with TSQL
Richard Swinbank

2 ETL process management
ETL performed by a collection of processes SSIS packages TSQL stored procedures Other bits of sticky tape and string Lots of them! Process execution has to be managed What runs when In what order What happens when things go wrong

3 Five desirable ETL behaviours
Parallel processing Fast to finish Convenient way to locate faults Fast to fix Easy to resume after error… Fast to restart …with as little as possible left to do Fast to finish after restart Easy to add new processes We’ll come back to this

4 A very small example Ten processes Process dependencies
We’ll be using stored procedures for now Process dependencies Let’s look at some possible approaches A B C D E F G H I J

5 Approach #1: Stepwise SQL Agent job
Call each SP in a separate job step FYI, demo.usp_ProcessE is broken

6 Agent job: Step-by-step
C D E F G H I J

7 Agent job: Step-by-step
C D E F G H I J

8 Agent job: Step #1 A B C D E F G H I J

9 Agent job: Step #2 A B C D E F G H I J

10 Agent job: Step #3 A B C D E F G H I J

11 Agent job: Step #4 A B C D E F G H I J

12 Agent job: Step #5 A B C D E F G H I J

13 Stepwise SQL Agent job: Results

14 Stepwise SQL Agent job: Evaluation
Parallel processing Convenient way to locate faults

15 Stepwise SQL Agent job: Evaluation
Parallel processing Convenient way to locate faults Easy to resume after error…

16 Stepwise SQL Agent job: Evaluation
Parallel processing Convenient way to locate faults Easy to resume after error… …with as little as possible left to do

17 Approach #2: Master SSIS package
Call each SP from an Execute SQL Task Deploy package to SSIS catalog; run in agent job

18 Master SSIS package: Results

19 Master SSIS package: Results

20 Master SSIS package: Evaluation
Parallel processing

21 Master SSIS package: Evaluation
Parallel processing Convenient(ish) way to locate faults

22 Master SSIS package: Evaluation
Parallel processing Convenient(ish) way to locate faults Easy to resume after error…

23 Master SSIS package: Evaluation
Parallel processing Convenient(ish) way to locate faults Easy to resume after error… …with as little as possible left to do

24 Recap We’ve identified some desirable behaviours
Parallel processing Convenient way to locate faults Easy to resume after error… …with as little as possible left to do (Easy to add new processes – we’ll come back to this) We’ve looked at two process management approaches Stepwise SQL Agent Job Master SSIS package Each has some of the behaviours we want… …but neither has all of them 10 MINUTES

25 Dependency-driven process management in TSQL
1. Table of processes Process Status demo.usp_ProcessA Ready demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD Not ready demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ 2. Dependency information Process RunsAfter demo.usp_ProcessD demo.usp_ProcessA demo.usp_ProcessF demo.usp_ProcessB demo.usp_ProcessG demo.usp_ProcessC demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessE demo.usp_ProcessJ 3. Process handler SP

26 Process handler Pseudo-TSQL WHILE (anything’s ready) BEGIN
SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END Pseudo-TSQL

27 Process handler A B C D E F G H I J WHILE (anything’s ready) BEGIN
SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END D E F G H I J Process Status demo.usp_ProcessA Ready demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD Not ready demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ

28 Process handler A B C D E F G H I J WHILE (anything’s ready) BEGIN
SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB Ready demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF Not ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ

29 Process handler A B C D E F G H I J WHILE (anything’s ready) BEGIN
SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB Ready demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF Not ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ

30 Process handler A B C D E F G H I J WHILE (anything’s ready) BEGIN
SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB Ready demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF Not ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ

31 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END

32 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB Ready demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF Not ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ

33 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB Ready demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF Not ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ

34 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD Ready demo.usp_ProcessE Errored demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI Not ready demo.usp_ProcessJ

35 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF Ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI Not ready demo.usp_ProcessJ

36 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF Ready demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI Not ready demo.usp_ProcessJ

37 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH Ready demo.usp_ProcessI Not ready demo.usp_ProcessJ

38 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH Ready demo.usp_ProcessI Not ready demo.usp_ProcessJ

39 Better process handler
WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END D E F G H I J Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI Not ready demo.usp_ProcessJ

40 Better process handler: Evaluation
Parallel processing Convenient way to locate faults Easy to resume after error… …with as little as possible left to do WHILE (anything’s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = ‘Done’ , process’s dependants = ‘Ready’ END TRY BEGIN CATCH SET process status = ‘Errored’ END CATCH END Process Status demo.usp_ProcessA Done demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE Errored demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI Not ready demo.usp_ProcessJ

41 Parallel processing Run multiple handlers at the same time
Must prevent different handlers from running the same process Make handler reserve a process before executing it (and set status of processes in execution to ‘Running’) Reserve by inserting details into reservations table Catch PK violation, continue Process demo.usp_ProcessA demo.usp_ProcessC demo.usp_ProcessE demo.usp_ProcessB demo.usp_ProcessD Primary key

42 Parallelisable process handler
WHILE (anything’s ready) BEGIN BEGIN TRY -- for process execution SELECT ready process BEGIN TRY -- for process reservation INSERT ready process details INTO ProcessReservations END TRY BEGIN CATCH CONTINUE END CATCH UPDATE ProcessList SET process status = ‘Running’ EXECUTE selected process SET process status = ‘Done’ [...]

43 What about SSIS packages?
Can’t EXECUTE selected package Execute package using SSIS catalog SPs SSISDB.catalog.create_execution SSISDB.catalog.start_execution Need EXECUTE-like behaviour Return only when package execution has finished Raise error if something goes wrong Wrap up in package runner SP Handler executes process or package runner IF process is SP EXECUTE process ELSE IF process is SSIS package EXECUTE

44 Demo Sprockit – my implementation of this approach
Pure TSQL & SQL Server Agent Completely free & open-source 20 MINUTES

45 Those five desirable behaviours
Parallel processing Convenient way to locate faults Easy to resume after error… …with as little as possible left to do Easy to add new processes? What’s so hard about this anyway?! 40 MINUTES

46 Adding new processes Where do I put new process demo.usp_ProcessK?
To decide, I need to know what everything else does Difficult unless I know the ETL landscape very well Takes a while for newbies to get up to speed I A C B D H G F E J

47 Process dependencies I A C B D H G F E J

48 Process dependencies A B C D E F G H I J

49 Resource dependencies
T01 T02 T03 A B C T07 T04 T06 D E F G H T11 T05 T08 T09 T10 I J T12 T13 T14

50 Resource dependencies
T01 T02 T03 A B C T07 T04 T06 D E F G H T11 T05 T08 T09 T10 I J T12 T13 T14

51 Resource dependencies
T01 T02 T03 A B C T07 T04 T06 D E F G H T11 T05 T08 T09 T10 I J T12 T13 T14

52 Dependencies in Sprockit
Developers provide resource dependencies (table sprockit.Resource) Process Resource Input/output demo.usp_ProcessB Table T01 Input Table T04 Output

53 Dependencies in Sprockit
Developers provide resource dependencies (table sprockit.Resource) Handlers infer process dependencies (uvw_ProcessDependency view) Process Resource Input/output demo.usp_ProcessB Table T01 Input Table T04 Output demo.usp_ProcessG Table T06 Table T09

54 Dependencies in Sprockit
Developers provide resource dependencies (table sprockit.Resource) Handlers infer process dependencies (uvw_ProcessDependency view) When adding a process, the dependency information you need is right there in the SP/package …so no need for the full process dependency picture Process Resource Input/output demo.usp_ProcessB Table T01 Input Table T04 Output demo.usp_ProcessG Table T06 Table T09 Process RunsAfter demo.usp_ProcessG demo.usp_ProcessB

55 What if I want the full picture?
Structured dependency information is a data source Force-directed graphs in Power BI Graphviz (e.g. digraph G { n2 [label="usp_ProcessB"]; n3 [label="usp_ProcessC"]; n7 [label="usp_ProcessG"]; n8 [label="usp_ProcessH"]; n10 [label="Process_J.dtsx"]; n2 -> n7; n3 -> n7; n3 -> n8; n7 -> n10; }

56 What if I want the full picture?
Structured dependency information is a data source Force-directed graphs in Power BI Graphviz (e.g.

57 Summary ETL process management in TSQL is simple but powerful
Parallel processing Convenient way to locate faults (and tolerate transaction deadlocks) Easy to resume after error… …with as little as possible left to do Easy to add new processes Pure TSQL means everything’s in the database Exploit structured dependency information Leverage ETL activity information Code available at Thanks for listening!

58 Questions? http://RichardSwinbank.net/sprockit
@RichardSwinbank


Download ppt "ETL process management with TSQL"

Similar presentations


Ads by Google