Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETL Design - Stage Philip Noakes May 9, 2015.

Similar presentations


Presentation on theme: "ETL Design - Stage Philip Noakes May 9, 2015."— Presentation transcript:

1 ETL Design - Stage Philip Noakes May 9, 2015

2 Who am I? Philip Noakes Database Developer/Designer CapTech Consulting
MCITP in SQL Server BI

3 Agenda Background – ETL and Staging Data Data Modeling in Stage
ETL Architecture ETL vs ELT Data Modeling in Stage Concepts Table Structure Data Flow Auxiliary tables in the stage environment Control tables Logging Process Execution Errors Notification/Reporting

4 ETL Architecture

5 ETL vs ELT ELT - Loading raw data to presentation layer then performing transformations at the target ETL – Loading transformed data into the presentation layer

6 ETL vs ELT When to use ELT When to use ETL
Traceability to untransformed source data Larger volumes of data When to use ETL All other times “The ETL process can take a long time. If we are processing in stream, we’ll have a connection open to the source system. A long-running process can create problems with database locks and stress the transaction system.” 1

7 Stage Design Built by database developers for database developers!!!

8 Stage Concepts Schemas Denormalized Data Data Cleansing
Source specific Secured Denormalized Data Data Cleansing Flag bad or unusable data

9 Table Design - Schemas Organization Security Administration
Source System Identification Cleansed vs Raw Security Administration Grant access by source system

10 Table Design - Denormalizing
Flattening Data Pulling higher granularity attributes into lower granularity records. Pivoting lower granularity data into columns on higher granularity rows.

11 Table Design - Denormalizing
Example: Orders Table

12 Table Design - Denormalizing
Example: Product Categories

13 Table Design - Denormalizing
"[...] design staging tables to better suit the target rather than the source. Reasons: 1. ETL is usually a two-step process. Stage then load. if the staging does mild transformations to better suit the target, I need only create one set of load processes. If the DW gets similar data from multiple sources. all I need to do is create new source specific staging processes and let the existing load processes handle the new source. 2. Sources change. I don't want to rewrite ETL processes from end-to-end because of a change in the source. 3. Most of the heavy transformation logic occurs on the load side. With the staging tables closer in structure to the target, the load process code tends to be simpler.“ - Nicholas Galemmo, Kimball Group Forums

14 Table Design - Denormalizing
Why Denormalize in stage? Stand alone tables Reflect target architecture Utilize keys and indexes on source Why not Denormalize in stage? Strain on source system Flexibility

15 Table Design – Persisted Tables
Provides traceability and reload capabilities without hitting source Store more attributes than required in the presentation layer Defined retention period Track processed records

16 Data Cleansing Identify data scenarios that you don’t want in the target Enforce business rules Look for duplicates Check referential integrity

17 Data Cleansing Status/Audit Fields Status Code Process/Do Not Process
Error Description

18 Data Cleansing Cleansed Data Tables

19 Table Design – Data Typing
Match the Source Log rejected records (And maybe fail) Image Source:

20 Table Design – Keys, Indexes, Etc…
Foreign Keys? No! Indexing? No Primary Keys Yes Not NULLs/Check Constraints * - Persisted tables

21 Auxiliary Tables Set up a Framework! Log process execution stats
Keep track of errors Run your system Image Source:

22 Framework Components and Capabilities
Control Table Incremental Loads Package Execution Logging System health reporting SLA tracking Error Tables Record Accounting Notification

23 Control Table

24 Using the Control Table

25 Process Tables

26 Error Logging Reference 2

27 Error Logging

28 Summary Stage = Exciting Maintain security considerations in stage
Table design can reduce impact on source system Stage can decrease the complexity of target load Stage can be used for recovery and reload Use stage to limit risk of data quality issues

29 Questions

30 References 1 – KimballGroup.com Design Tip #99 - 2 – Erik Veerman, Jessica M Moss, Brian Knight, Jay Hackney SQL Server 2008 Integration Services: Problem, Design, Solution


Download ppt "ETL Design - Stage Philip Noakes May 9, 2015."

Similar presentations


Ads by Google