The Road to Denormalization Starring Various “Denormalized” Celebrities
The Road to Denormalization Transx Data Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized Data Warehouse
I’ve been Denormalized! Normalization But before you can understand Denormalization, you must understand Normalization . . . And to understand Normalization, you must understand Relational Databases I’ve been Denormalized!
Relational Databases Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number) Foreign Key – column in child table that links to the Primary Key in the parent table
Relational DB Example “Parent” table . . . “Child” table . . . Cust # Cust Name 100 Moe 101 Larry 102 Curly Order # Prod# Qty Cust# 1 QR22 1 100 2 QR22 25 100 3 SB56 3 102 CUSTOMER TABLE ORDER TABLE Primary Key Foreign Key
Database Structure & Design 2 Approaches: I love conflict! Conflict 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting
Approach #1: Optimize for Data Capture To optimize for data capture, you must: Eliminate redundancy of data (or else wasted space & processing occurs) Ensure data integrity (or else data anomalies) Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) Normalization – process by which a database is optimized for data capture All data “redundancy” is removed from Database Has multiple forms (0, 1st, 2nd, 3rd, et al.)
Moving from 0NF to 1NF Rule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further) Cust # CustName 100, 101, 102 Moe Howard, Larry Fine, Curly Howard CUSTOMER DATA ONF 1NF Cust # FName LName 100 Moe Howard 101 Larry Fine 102 Curly Howard CUSTOMER TABLE I’M NOT MOVING!
Moving from 1NF to 2NF Rule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) Cust # FName Order# 100 Moe 1 100 Moe 2 101 Larry 3 TABLE X 1NF 100 Moe Dependency on Primary Key Cust # FName Moe Larry Curly Order # Cust# 1 100 2 100 3 101 CUSTOMER TABLE ORDER TABLE 2NF
Moving from 2NF to 3NF Rule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) Cust # City Order# ShipTime 100 NY 1 2 days 101 NY 2 2 days 102 LA 3 5 days TABLE X City # City ShipTime 10 NY 2 days 20 LA 5 days Cust # City# 100 10 101 10 102 20 SHIP TIME TABLE CUSTOMER TABLE 3NF 2NF NY 2 days Dependency b/t 2 non-key columns
Am I a good example of “Normalized?” Normalized DB Example MANY database tables ensure against redundant data (and help prevent data integrity issues) Am I a good example of “Normalized?”
Database Structure & Design 2 Approaches: I like conflict too! Conflict 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting
Approach #2: Optimize for Data Access (in a separate, read-only Data Warehouse) To optimize for data access, you must: Change the data layout to a different structure Allow data redundancy Reduce the number of table joins (i.e., reduce links among tables by combining tables) Denormalizing – Adding redundancy & reducing joins in a relational database
Denormalizing – Most Common Approach Star Schema (Clustering) Fact (core or transaction) Tables in middle of star Dimensional (structural or “lookup”) Tables around “points” of star CUSTOMER DIMENSION TABLE Cust # CustName 100 Moe 101 Larry 102 Curly Loc # LocName 1000 NY 2000 LA 3000 PGH LOC DIMENSION TABLE Order # Date Cust# Prod# Loc# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 SALES ORDER (FACT) TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 DATE DIMENSION TABLE Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION TABLE
This Date Field helps build the “Date Dimension” These 2 tables become the “SALES FACT” table in the Data Warehouse These 5 tables become the “Product Dimension” These 3 tables become the “Customer Dimension”
Resulting Star Schema Data Warehouse It’s a STAR, Like me! Cust # CustName 100 Moe 101 Larry 102 Curly CUSTOMER DIMENSION Order # Date Cust# Prod# Rep# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 SALES ORDER (FACT) TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 Juan Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION DATE DIMENSION
Common (Conformed) Dimensions Denormalizing (continued) Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse Cust # CustName 100 Moe 101 Larry 102 Curly Loc # LocName 1000 NY 2000 LA 3000 PGH LOC DIMENSION CUSTOMER DIMENSION Order # Date Cust# Prod# Loc# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 ORDER TABLE SALES ORDER (FACT) TABLE Common (Conformed) Dimensions Date Quarter 06/29/XX 2 06/30/XX 2 S 07/01/XX 3 Juan CUSTOMER TABLE Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION DATE DIMENSION TIME Prod# ProdName Stock Date Units QR22 Rake 03/23/XX 150 TW43 Mulch 04/15/XX 1452 SR56 Spade 05/01/XX 997 INVENTORY (FACT) TABLE
Mapping Normalized Tables to Denormalized (Data Warehouse) Tables Using ETL Tools (like MS-SSIS) These are 2 Normalized Transaction Tables EXTRACT The data are “Transformed” in these steps TRANSFORM This is the resulting, Denormalized Product Dimension LOAD
The End That’s all! Bye, bye!