Download presentation
Presentation is loading. Please wait.
1
ETL Design - Stage Philip Noakes May 9, 2015
2
Who am I? Philip Noakes Database Developer/Designer CapTech Consulting
MCITP in SQL Server BI
3
Agenda Background – ETL and Staging Data Data Modeling in Stage
ETL Architecture ETL vs ELT Data Modeling in Stage Concepts Table Structure Data Flow Auxiliary tables in the stage environment Control tables Logging Process Execution Errors Notification/Reporting
4
ETL Architecture
5
ETL vs ELT ELT - Loading raw data to presentation layer then performing transformations at the target ETL – Loading transformed data into the presentation layer
6
ETL vs ELT When to use ELT When to use ETL
Traceability to untransformed source data Larger volumes of data When to use ETL All other times “The ETL process can take a long time. If we are processing in stream, we’ll have a connection open to the source system. A long-running process can create problems with database locks and stress the transaction system.” 1
7
Stage Design Built by database developers for database developers!!!
8
Stage Concepts Schemas Denormalized Data Data Cleansing
Source specific Secured Denormalized Data Data Cleansing Flag bad or unusable data
9
Table Design - Schemas Organization Security Administration
Source System Identification Cleansed vs Raw Security Administration Grant access by source system
10
Table Design - Denormalizing
Flattening Data Pulling higher granularity attributes into lower granularity records. Pivoting lower granularity data into columns on higher granularity rows.
11
Table Design - Denormalizing
Example: Orders Table
12
Table Design - Denormalizing
Example: Product Categories
13
Table Design - Denormalizing
"[...] design staging tables to better suit the target rather than the source. Reasons: 1. ETL is usually a two-step process. Stage then load. if the staging does mild transformations to better suit the target, I need only create one set of load processes. If the DW gets similar data from multiple sources. all I need to do is create new source specific staging processes and let the existing load processes handle the new source. 2. Sources change. I don't want to rewrite ETL processes from end-to-end because of a change in the source. 3. Most of the heavy transformation logic occurs on the load side. With the staging tables closer in structure to the target, the load process code tends to be simpler.“ - Nicholas Galemmo, Kimball Group Forums
14
Table Design - Denormalizing
Why Denormalize in stage? Stand alone tables Reflect target architecture Utilize keys and indexes on source Why not Denormalize in stage? Strain on source system Flexibility
15
Table Design – Persisted Tables
Provides traceability and reload capabilities without hitting source Store more attributes than required in the presentation layer Defined retention period Track processed records
16
Data Cleansing Identify data scenarios that you don’t want in the target Enforce business rules Look for duplicates Check referential integrity
17
Data Cleansing Status/Audit Fields Status Code Process/Do Not Process
Error Description
18
Data Cleansing Cleansed Data Tables
19
Table Design – Data Typing
Match the Source Log rejected records (And maybe fail) Image Source:
20
Table Design – Keys, Indexes, Etc…
Foreign Keys? No! Indexing? No Primary Keys Yes Not NULLs/Check Constraints * - Persisted tables
21
Auxiliary Tables Set up a Framework! Log process execution stats
Keep track of errors Run your system Image Source:
22
Framework Components and Capabilities
Control Table Incremental Loads Package Execution Logging System health reporting SLA tracking Error Tables Record Accounting Notification
23
Control Table
24
Using the Control Table
25
Process Tables
26
Error Logging Reference 2
27
Error Logging
28
Summary Stage = Exciting Maintain security considerations in stage
Table design can reduce impact on source system Stage can decrease the complexity of target load Stage can be used for recovery and reload Use stage to limit risk of data quality issues
29
Questions
30
References 1 – KimballGroup.com Design Tip #99 - 2 – Erik Veerman, Jessica M Moss, Brian Knight, Jay Hackney SQL Server 2008 Integration Services: Problem, Design, Solution
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.