Download presentation
Presentation is loading. Please wait.
Published byScott Mitchell Modified over 9 years ago
1
Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)
2
The Road to Denormalization Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized Data Warehouse Transx Data
3
Normalization But before you can understand Denormalization, you must understand Normalization... And to understand Normalization, you must understand Relational Databases I’ve been Denormalized!
4
Relational Databases Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number) Foreign Key – column in child table that links to the Primary Key in the parent table
5
Relational DB Example Cust #Cust Name 100Moe 101Larry 102Curly Order #Prod#QtyCust# 1QR221100 2QR2225100 3SB563102 CUSTOMER TABLEORDER TABLE Primary Key Foreign Key “Parent” table... “Child” table...
6
Database Structure & Design 2 Approaches: 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting Conflict I love conflict!
7
Approach #1: Optimize for Data Capture To optimize for data capture, you must: Eliminate redundancy of data (or else wasted space & processing occurs) Ensure data integrity (or else data anomalies) Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) Normalization – process by which a database is optimized for data capture All data “redundancy” is removed from Database Has multiple forms (0, 1st, 2nd, 3rd, et al.)
8
Moving from 0NF to 1NF Rule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further) Cust # CustName 100, 101, 102Moe Howard, Larry Fine, Curly Howard CUSTOMER DATA ONF 1NF Cust #FName LName 100Moe Howard 101Larry Fine 102Curly Howard CUSTOMER TABLE I’M NOT MOVING!
9
Moving from 1NF to 2NF Rule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) Cust #FNameOrder# 100Moe1 100Moe2 101Larry3 TABLE X 1NF Cust #FName 100 Moe 101 Larry 102 Curly Order #Cust# 1100 2100 3101 CUSTOMER TABLEORDER TABLE 2NF 100Moe Dependency on Primary Key
10
Moving from 2NF to 3NF Rule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) Cust #CityOrder#ShipTime 100NY12 days 101NY22 days 102LA35 days TABLE X 2NF NY2 days Dependency b/t 2 non-key columns City #CityShipTime 10NY2 days 20LA5 days Cust #City# 10010 10110 10220 SHIP TIME TABLECUSTOMER TABLE 3NF
11
Normalized DB Example 11 MANY database tables ensure against redundant data (and help prevent data integrity issues)
12
Database Structure & Design 2 Approaches: 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting Conflict
13
Approach #2: Optimize for Data Access (in a separate, read-only Data Warehouse) To optimize for data access, you must: Change the data layout to a different structure Allow data redundancy Reduce the number of table joins (i.e., reduce links among tables by combining tables) Denormalizing – Adding redundancy & reducing joins in a relational database
14
Denormalizing – Most Common Approach Star Schema (Clustering) Fact (core or transaction) Tables in middle of star Dimensional (structural or “lookup”) Tables around “points” of star Order #DateCust#Prod#Loc# 106/15/XX100QR221000 207/19/XX100QR221000 308/30/XX101SR562000 SALES ORDER (FACT) TABLE Cust #CustName 100Moe 101Larry 102Curly CUSTOMER DIMENSION TABLE Prod #ProdName QR22Rake SR56Spade TW43Mulch PRODUCT DIMENSION TABLE Loc #LocName 1000NY 2000LA 3000PGH LOC DIMENSION TABLE DateQuarter 06/29/XX2Bob 06/30/XX2Sue 07/01/XX3 DATE DIMENSION TABLE
15
These 2 tables become the “SALES FACT” table in the Data Warehouse These 3 tables become the “Customer Dimension” These 5 tables become the “Product Dimension” This Date Field helps build the “Date Dimension”
16
Resulting Star Schema Data Warehouse Order #DateCust#Prod#Rep# 106/15/XX100QR221000 207/19/XX100QR221000 308/30/XX101SR562000 SALES ORDER (FACT) TABLE Cust #CustName 100Moe 101Larry 102Curly CUSTOMER DIMENSION Prod #ProdName QR22Rake SR56Spade TW43Mulch PRODUCT DIMENSION DateQuarter 06/29/XX2Bob 06/30/XX2Sue 07/01/XX3Juan DATE DIMENSION Hey, hot stuff!
17
Common (Conformed) Dimensions Denormalizing (continued) Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse Prod#ProdName Stock Date Units QR22Rake 03/23/XX 150 TW43Mulch 04/15/XX 1452 SR56Spade 05/01/XX 997 INVENTORY (FACT) TABLE ORDER TABLE Cust #CustName 100Moe 101Larry 102Curly CUSTOMER DIMENSION Prod #ProdName QR22Rake SR56Spade TW43Mulch PRODUCT DIMENSION Loc #LocName 1000NY 2000LA 3000PGH LOC DIMENSION CUSTOMER TABLE TIME Order #DateCust#Prod#Loc# 106/15/XX100QR221000 207/19/XX100QR221000 308/30/XX101SR562000 DateQuarter 06/29/XX2 06/30/XX2S 07/01/XX3Juan SALES ORDER (FACT) TABLE DATE DIMENSION
18
Mapping Normalized Tables to Denormalized (Data Warehouse) Tables Using ETL Tools (like MS-SSIS) These are 2 Normalized Transaction Tables EXTRACT The data are “Transformed” in these steps TRANSFORM This is the resulting, Denormalized Product Dimension LOAD
19
The End That’s all! Bye, bye!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.