Presentation is loading. Please wait.

Presentation is loading. Please wait.

Patterns for designing a supportable Data Warehouse

Similar presentations


Presentation on theme: "Patterns for designing a supportable Data Warehouse"— Presentation transcript:

1 Patterns for designing a supportable Data Warehouse
1 September 2018 Tinus Visagie @dieTinus tvisagieblog.wordpress.com Patterns for designing a supportable Data Warehouse.- Tinus Visagie

2 Thank You Sponsors Patterns for designing a supportable Data Warehouse.- Tinus Visagie

3 Bad planning is bad This image perfectly shows how a badly planned project ends up. Someone got paid to paint the room but you are stuck with them supporting it for ages. Nobody wants to be the guy with the brush at that point. Its not nice supporting a badly planned mart. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

4 Signs of a badly planned Data Warehouse
It’s a hot potato that nobody wants to work on Run consistently fails with no easy way to fix Important to note that some of these can happen on a well planned mart as well You always seem to be spending time supporting it Patterns for designing a supportable Data Warehouse.- Tinus Visagie

5 More signs of a badly planned Data Warehouse
You have a procedure called “CleanupBalancesAfterProcess” New developments are complicated and take longer than they should If more than one error occurs you will probably miss your run window You don’t know the status of your data warehouse run Important to note that some of these can happen on a well planned mart as well The handover to a new team member is measured in months Patterns for designing a supportable Data Warehouse.- Tinus Visagie

6 What to do differently Important to note that some of these can happen on a well planned mart as well Patterns for designing a supportable Data Warehouse.- Tinus Visagie

7 Package/Task sizes and design must haves
Big things to plan for Documented standards Source control Package/Task sizes and design must haves Auditing Deployment model Success and failure notifications/mechanisms Parameter management Automated testing and validation Package scheduling and execution Patterns for designing a supportable Data Warehouse.- Tinus Visagie

8 I try to have the following standards documented:
Naming conventions for both database and ETL Branch and merge strategy Data sharing strategy Database maintenance standards Some specification templates DR procedures and tests Any process that you do more than once Patterns for designing a supportable Data Warehouse.- Tinus Visagie

9 Having these standards in place will have the following benefits:
Support on a mart is much easier if all the work is consistent Employee take on is easier with documents to reference The handover from developers to support staff is shortened Speeds up delivery of new developments Patterns for designing a supportable Data Warehouse.- Tinus Visagie

10 Always the 1st thing that I setup for a new project
Source Control Always the 1st thing that I setup for a new project Really valuable tool for your run team. Being able to see what changed and could be causing errors is invaluable Have a clear branch and merge strategy Every project should have. It should be the 1st thing you do. If its not in source control it doesn’t exist Make sure every new person on a project knows how to use your branch and merge strategy Patterns for designing a supportable Data Warehouse.- Tinus Visagie

11 Always have a branch that’s ready to deploy
Source Control Day to day development Every project should have. It should be the 1st thing you do. Always have a branch that’s ready to deploy Can easily roll back to an older version of code The best way to let multiple developers work on one project Patterns for designing a supportable Data Warehouse.- Tinus Visagie

12 Handling an emergency fix
Source Control Handling an emergency fix Every project should have. It should be the 1st thing you do. Allows us to create a new branch to fix a prod issue and merge it back into the main branch Patterns for designing a supportable Data Warehouse.- Tinus Visagie

13 Comments on check-ins are valuable
Source Control Other tips Comments on check-ins are valuable Even on development check in regularly Like the previous slide showed only ever deploy from production If its not in source control it doesn’t exist Lastly! Patterns for designing a supportable Data Warehouse.- Tinus Visagie

14 Package sizes and design must haves
Secret sauce that makes a mart easier to support Packages should be repeatable because The last thing you want is a support person trying to figure out what tables to clean up at 3AM in the morning It makes support much easier if you can run a full package to try and replicate Its usually not very difficult so just do it Two ways to do this. Either do cleanup as a 1st or last step Patterns for designing a supportable Data Warehouse.- Tinus Visagie

15 Package sizes and design must haves
When it comes to ETL packages size does matter and smaller is usually better How small is small enough? “Every package should be the smallest set of commands needed to perform and complete a specific task” – Tim Mitchell Patterns for designing a supportable Data Warehouse.- Tinus Visagie

16 Package sizes and design must haves
Smaller packages have the following benefits: Easier for multiple developers to work on the same solution More flexibility when we schedule our solution Run failures are not so expensive because we are only running the tasks that failed Easier to make repeatable Usually enables faster modification/enhancement of solution Patterns for designing a supportable Data Warehouse.- Tinus Visagie

17 Auditing It is very important to be able to objectively track run times for all steps and loads in your package Allows your run team to see which packages ran longer than usual if they miss a window Gives you the ability to see if one of your loads is getting progressively slower and could cause issues in future Gives you numbers to compare any improvements against to see if they were effective Patterns for designing a supportable Data Warehouse.- Tinus Visagie

18 None – Nothing is logged
Auditing The easiest ways to get this done is to use one of the logging options you have available in catalog projects in SQL. They are: None – Nothing is logged Basic – This is the default. Everything except custom and diagnostic logged RunTimeLineage – Collects the data required to track lineage information in the data flow. Performance – Only performance statistics, and OnError and OnWarning events, are logged. Verbose – All events are logged, including custom and diagnostic events. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

19 Auditing If none of those work then we can now also create a custom log level and use that when executing. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

20 Custom log level internals:
Auditing Custom log level internals: Each level is created once in the SSIS catalog and can be used for any package execution in that catalog. Behind the scenes, custom logging levels are stored in the table [internal].[customized_logging_levels]. Because the logging levels are stored as rows in the SSISDB database (as are most catalog settings), logging levels are portable from one catalog to another via T-SQL script. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

21 I always use project deployment to the catalog because:
Deployment Model I always use project deployment to the catalog because: Its easy to loose track of what is deployed when doing partial deploys It keeps a log of all deployments made that can be audited Rolling back to a previous version is a couple of clicks away should it be required as it is built into catalog Patterns for designing a supportable Data Warehouse.- Tinus Visagie

22 There are some configurations to be aware of when using this method
Deployment Model There are some configurations to be aware of when using this method Patterns for designing a supportable Data Warehouse.- Tinus Visagie

23 Success and failure notifications/mechanisms
It’s amazing how many projects don’t have a reliable notification method Patterns for designing a supportable Data Warehouse.- Tinus Visagie

24 Success and failure notifications/mechanisms
Why is a good notification method important? You don’t want your end users picking up issues with your mart executions. It is important to get notified and fix these as soon as possible You don’t want your run team to monitor a project. People get distracted and are not very reliable. Enables you to use all the days of the week. If a failure occurs its important to give your run team as much time as possible to fix so that you don’t miss your run window Patterns for designing a supportable Data Warehouse.- Tinus Visagie

25 Success and failure notifications/mechanisms
Don’t let your ETL handle notifications because: Introduces another point of failure in your packages You might miss some low level errors Lengthens the development and testing time ETL development tools often have limited or clunky notification options Its difficult to keep consistent Patterns for designing a supportable Data Warehouse.- Tinus Visagie

26 Success and failure notifications/mechanisms
Easy alternative: For SQL jobs the agent does a decent job of notifications Applications like IFTTT (If This Then That) provides lots of flexibility. Allows you to monitor your inbox for certain words and trigger custom actions like play a specific song Tinus’s top tip – Never pick a song you like to be your support song. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

27 A central space where all parameters can be tracked
Other than the usual database connection parameters ETL projects often have many other parameters required for the run A very good solution for this is to have a database table dedicated to parameter values Benefits - A central space where all parameters can be tracked Can be modified without re deployment Very easy to update and even put a front end on Solution to remove hard coding from the project Patterns for designing a supportable Data Warehouse.- Tinus Visagie

28 Two easy options to repeat
Automated testing Majority of ETL populates a database with results that’s within a certain variance of a previous value It is usually a query or queries that we all have saved already in case business comes back with questions on our data Two easy options to repeat 1 - Scheduled SSRS with all the queries as datasets. This is easy to neatly format as well and we can even include business in the 2 - Table with a list of queries to execute and expected results. These can then be looped through and if the actual is outside a variance we can use sp_send_mail to send results. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

29 Package scheduling and execution
Scheduling options in ETL Automation tools like WhereScape, TimeXtender etc Out of the box scheduling and execution options for SSIS packages using SQL agent without package modifications Patterns for designing a supportable Data Warehouse.- Tinus Visagie

30 Package scheduling and execution
Top missing features in SSIS execution Error and success s with error details Resuming from last completed task without package modification Ability to easily configure a run containing hundreds of packages Ability to asynchronously execute some tasks that can be executed together Automatic retries up to a specific count for certain tasks Simple execution Patterns for designing a supportable Data Warehouse.- Tinus Visagie

31 Package scheduling and execution
Started with a procedure from the following Tim Mitchell blog post. Details the 3 steps for package execution very well: Exec [SSISDB].[catalog].[create_execution] Exec [SSISDB].[catalog].[set_execution_parameter_value] Exec [SSISDB].[catalog].[start_execution] Patterns for designing a supportable Data Warehouse.- Tinus Visagie

32 Package scheduling and execution
Add a table to track package execution Add a table to track iterations and parameters Add basic error s Patterns for designing a supportable Data Warehouse.- Tinus Visagie

33 Package scheduling and execution
Add error reason to failure Can now include details similar to: “Data Flow Task:Error: ADO NET Source has failed to acquire the connection "" with the following error message: "ORA-12154: TNS:could not resolve the connect identifier specified“” Patterns for designing a supportable Data Warehouse.- Tinus Visagie

34 Package scheduling and execution
For the last pieces of functionality we need to add the following columns to JobSteps table: CanRetry MaxRetry CurrentRetry RetryWait TaskGroup 1st four columns caters for retry management. Vital piece of missing functionality to ensure support team has to be involved as little as possible. Patterns for designing a supportable Data Warehouse.- Tinus Visagie

35 Package scheduling and execution
Last column is to allow tasks with a similar task group populated to be executed together. To execute: Add a record to RunManager table Execute stored the new stored procedure Patterns for designing a supportable Data Warehouse.- Tinus Visagie

36 Demo Patterns for designing a supportable Data Warehouse.- Tinus Visagie

37 That’s it Patterns for designing a supportable Data Warehouse.- Tinus Visagie

38 Some resources https://www.timmitchell.net
Questions Some resources Patterns for designing a supportable Data Warehouse.- Tinus Visagie


Download ppt "Patterns for designing a supportable Data Warehouse"

Similar presentations


Ads by Google