Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Road to Denormalization

Similar presentations


Presentation on theme: "The Road to Denormalization"— Presentation transcript:

1 The Road to Denormalization
Starring Various “Denormalized” Celebrities

2 The Road to Denormalization
Transx Data Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized Data Warehouse

3 I’ve been Denormalized!
Normalization But before you can understand Denormalization, you must understand Normalization . . . And to understand Normalization, you must understand Relational Databases I’ve been Denormalized!

4 Relational Databases Collection of linked tables
Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number) Foreign Key – column in child table that links to the Primary Key in the parent table

5 Relational DB Example “Parent” table . . . “Child” table . . .
Cust # Cust Name 100 Moe 101 Larry 102 Curly Order # Prod# Qty Cust# 1 QR 2 QR 3 SB CUSTOMER TABLE ORDER TABLE Primary Key Foreign Key

6 Database Structure & Design
2 Approaches: I love conflict! Conflict 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting

7 Approach #1: Optimize for Data Capture
To optimize for data capture, you must: Eliminate redundancy of data (or else wasted space & processing occurs) Ensure data integrity (or else data anomalies) Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) Normalization – process by which a database is optimized for data capture All data “redundancy” is removed from Database Has multiple forms (0, 1st, 2nd, 3rd, et al.)

8 Moving from 0NF to 1NF Rule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further) Cust # CustName 100, 101, 102 Moe Howard, Larry Fine, Curly Howard CUSTOMER DATA ONF 1NF Cust # FName LName 100 Moe Howard 101 Larry Fine 102 Curly Howard CUSTOMER TABLE I’M NOT MOVING!

9 Moving from 1NF to 2NF Rule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) Cust # FName Order# 100 Moe 1 100 Moe 2 101 Larry 3 TABLE X 1NF 100 Moe Dependency on Primary Key Cust # FName Moe Larry Curly Order # Cust# 1 100 2 100 3 101 CUSTOMER TABLE ORDER TABLE 2NF

10 Moving from 2NF to 3NF Rule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) Cust # City Order# ShipTime 100 NY 1 2 days 101 NY 2 2 days 102 LA 3 5 days TABLE X City # City ShipTime 10 NY 2 days 20 LA 5 days Cust # City# 100 10 101 10 102 20 SHIP TIME TABLE CUSTOMER TABLE 3NF 2NF NY 2 days Dependency b/t 2 non-key columns

11 Am I a good example of “Normalized?”
Normalized DB Example MANY database tables ensure against redundant data (and help prevent data integrity issues) Am I a good example of “Normalized?”

12 Database Structure & Design
2 Approaches: I like conflict too! Conflict 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting

13 Approach #2: Optimize for Data Access (in a separate, read-only Data Warehouse)
To optimize for data access, you must: Change the data layout to a different structure Allow data redundancy Reduce the number of table joins (i.e., reduce links among tables by combining tables) Denormalizing – Adding redundancy & reducing joins in a relational database

14 Denormalizing – Most Common Approach
Star Schema (Clustering) Fact (core or transaction) Tables in middle of star Dimensional (structural or “lookup”) Tables around “points” of star CUSTOMER DIMENSION TABLE Cust # CustName 100 Moe 101 Larry 102 Curly Loc # LocName 1000 NY 2000 LA 3000 PGH LOC DIMENSION TABLE Order # Date Cust# Prod# Loc# 1 06/15/XX 100 QR 2 07/19/XX 100 QR 3 08/30/XX 101 SR SALES ORDER (FACT) TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 DATE DIMENSION TABLE Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION TABLE

15 This Date Field helps build the “Date Dimension”
These 2 tables become the “SALES FACT” table in the Data Warehouse These 5 tables become the “Product Dimension” These 3 tables become the “Customer Dimension”

16 Resulting Star Schema Data Warehouse
It’s a STAR, Like me! Cust # CustName 100 Moe 101 Larry 102 Curly CUSTOMER DIMENSION Order # Date Cust# Prod# Rep# 1 06/15/XX 100 QR 2 07/19/XX 100 QR 3 08/30/XX 101 SR SALES ORDER (FACT) TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 Juan Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION DATE DIMENSION

17 Common (Conformed) Dimensions
Denormalizing (continued) Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse Cust # CustName 100 Moe 101 Larry 102 Curly Loc # LocName 1000 NY 2000 LA 3000 PGH LOC DIMENSION CUSTOMER DIMENSION Order # Date Cust# Prod# Loc# 1 06/15/XX 100 QR 2 07/19/XX 100 QR 3 08/30/XX 101 SR ORDER TABLE SALES ORDER (FACT) TABLE Common (Conformed) Dimensions Date Quarter 06/29/XX 2 06/30/XX 2 S 07/01/XX 3 Juan CUSTOMER TABLE Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION DATE DIMENSION TIME Prod# ProdName Stock Date Units QR22 Rake 03/23/XX TW43 Mulch 04/15/XX 1452 SR56 Spade 05/01/XX INVENTORY (FACT) TABLE

18 Mapping Normalized Tables to Denormalized (Data Warehouse) Tables Using ETL Tools (like MS-SSIS)
These are 2 Normalized Transaction Tables EXTRACT The data are “Transformed” in these steps TRANSFORM This is the resulting, Denormalized Product Dimension LOAD

19 The End That’s all! Bye, bye!


Download ppt "The Road to Denormalization"

Similar presentations


Ads by Google