The Road to Denormalization

Slides:



Advertisements
Similar presentations
Chapter 5 Normalization of Database Tables
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Accounting System Design
Data Warehouse IMS5024 – presented by Eder Tsang.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Key.
An Introduction to Dimensional Data Warehouse Design Presented by Joseph J. Sarna Jr. JJS Systems, LLC.
Chapter 17 Designing Databases
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Accounting Databases Chapter 2 The Crossroads of Accounting & IT
Chapter 11 Data Management Layer Design
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Normalization A337. A337 - Reed Smith2 Structure What is a database? ◦ Tables of information  Rows are referred to as records  Columns are referred.
LOGICAL DATABASE DESIGN
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Microsoft Access 2003 Define some key Access terminology: Field – A single characteristic or attribute of a person, place, object, event, or idea. Record.
Avoiding Database Anomalies
Normalization A technique that organizes data attributes (or fields) such that they are grouped to form stable, flexible and adaptive entities.
Module III: The Normal Forms. Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form. The database.
資料庫正規化 Database Normalization 取材自 AIS, 6 th edition By Gelinas et al.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.
CORE 2: Information systems and Databases NORMALISING DATABASES.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
Object Persistence (Data Base) Design Chapter 13.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
Designing Databases Systems Analysis and Design, 7e Kendall & Kendall 13 © 2008 Pearson Prentice Hall.
1 Database Concepts 2 Definition of a Database An organized Collection Of related records.
What's a Database A Database Primer Let’s discuss databases n Why they are hard n Why we need them.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)
Data Warehousing Multidimensional Analysis
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
MIS2502: Data Analytics Relational Data Modeling
NORMALIZATION. What is Normalization  The process of effectively organizing data in a database  Two goals  To eliminate redundant data  Ensure data.
Logical Database Design and the Relational Model.
NORMALIZATION: ‘1NF’ The general rule: ‘’Any field which can have many, must have its own table’’ By Sam Beaumont.
INFS 6220 Systems Analysis & Design Transactional DBs vs. Data Warehouses.
NORMALIZATION Handout - 4 DBMS. What is Normalization? The process of grouping data elements into tables in a way that simplifies retrieval, reduces data.
What Is Normalization  In relational database design, the process of organizing data to minimize redundancy  Usually involves dividing a database into.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Normalisation FORM RULES 1NF 2NF 3NF. What is normalisation of data? The process of Normalisation organises your database to: Reduce or minimise redundant.
INLS 623 – Database Normalization
Revised: 2 April 2004 Fred Swartz
Event-driven accounting information systems
MIS2502: Data Analytics Relational Data Modeling
A brief summary of database normalization
Information Systems Today: Managing in the Digital World
Chapter 5: Logical Database Design and the Relational Model
Data warehouse and OLAP
CIS 155 Table Relationship
Chapter 11 Database Design
Data Warehouses, Dimensional Modeling, and the Laundromat
Normalization Referential Integrity
Database Fundamentals
Normalization A337.
INFS 3220 Systems Analysis & Design
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Data Warehouses, Dimensional Modeling, and the Laundromat
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Database Design Agenda
Relational Database Design
Chapter 17 Designing Databases
Systems Analysis and Design, 7e Kendall & Kendall
Introduction to Database Design
Data Warehouses, Dimensional Modeling, and the Laundromat
Presentation transcript:

The Road to Denormalization Starring Various “Denormalized” Celebrities

The Road to Denormalization Transx Data Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized Data Warehouse

I’ve been Denormalized! Normalization But before you can understand Denormalization, you must understand Normalization . . . And to understand Normalization, you must understand Relational Databases I’ve been Denormalized!

Relational Databases Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number) Foreign Key – column in child table that links to the Primary Key in the parent table

Relational DB Example “Parent” table . . . “Child” table . . . Cust # Cust Name 100 Moe 101 Larry 102 Curly Order # Prod# Qty Cust# 1 QR22 1 100 2 QR22 25 100 3 SB56 3 102 CUSTOMER TABLE ORDER TABLE Primary Key Foreign Key

Database Structure & Design 2 Approaches: I love conflict! Conflict 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting

Approach #1: Optimize for Data Capture To optimize for data capture, you must: Eliminate redundancy of data (or else wasted space & processing occurs) Ensure data integrity (or else data anomalies) Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) Normalization – process by which a database is optimized for data capture All data “redundancy” is removed from Database Has multiple forms (0, 1st, 2nd, 3rd, et al.)

Moving from 0NF to 1NF Rule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further) Cust # CustName 100, 101, 102 Moe Howard, Larry Fine, Curly Howard CUSTOMER DATA ONF 1NF Cust # FName LName 100 Moe Howard 101 Larry Fine 102 Curly Howard CUSTOMER TABLE I’M NOT MOVING!

Moving from 1NF to 2NF Rule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) Cust # FName Order# 100 Moe 1 100 Moe 2 101 Larry 3 TABLE X 1NF 100 Moe Dependency on Primary Key Cust # FName Moe Larry Curly Order # Cust# 1 100 2 100 3 101 CUSTOMER TABLE ORDER TABLE 2NF

Moving from 2NF to 3NF Rule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) Cust # City Order# ShipTime 100 NY 1 2 days 101 NY 2 2 days 102 LA 3 5 days TABLE X City # City ShipTime 10 NY 2 days 20 LA 5 days Cust # City# 100 10 101 10 102 20 SHIP TIME TABLE CUSTOMER TABLE 3NF 2NF NY 2 days Dependency b/t 2 non-key columns

Am I a good example of “Normalized?” Normalized DB Example MANY database tables ensure against redundant data (and help prevent data integrity issues) Am I a good example of “Normalized?”

Database Structure & Design 2 Approaches: I like conflict too! Conflict 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting

Approach #2: Optimize for Data Access (in a separate, read-only Data Warehouse) To optimize for data access, you must: Change the data layout to a different structure Allow data redundancy Reduce the number of table joins (i.e., reduce links among tables by combining tables) Denormalizing – Adding redundancy & reducing joins in a relational database

Denormalizing – Most Common Approach Star Schema (Clustering) Fact (core or transaction) Tables in middle of star Dimensional (structural or “lookup”) Tables around “points” of star CUSTOMER DIMENSION TABLE Cust # CustName 100 Moe 101 Larry 102 Curly Loc # LocName 1000 NY 2000 LA 3000 PGH LOC DIMENSION TABLE Order # Date Cust# Prod# Loc# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 SALES ORDER (FACT) TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 DATE DIMENSION TABLE Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION TABLE

This Date Field helps build the “Date Dimension” These 2 tables become the “SALES FACT” table in the Data Warehouse These 5 tables become the “Product Dimension” These 3 tables become the “Customer Dimension”

Resulting Star Schema Data Warehouse It’s a STAR, Like me! Cust # CustName 100 Moe 101 Larry 102 Curly CUSTOMER DIMENSION Order # Date Cust# Prod# Rep# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 SALES ORDER (FACT) TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 Juan Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION DATE DIMENSION

Common (Conformed) Dimensions Denormalizing (continued) Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse Cust # CustName 100 Moe 101 Larry 102 Curly Loc # LocName 1000 NY 2000 LA 3000 PGH LOC DIMENSION CUSTOMER DIMENSION Order # Date Cust# Prod# Loc# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 ORDER TABLE SALES ORDER (FACT) TABLE Common (Conformed) Dimensions Date Quarter 06/29/XX 2 06/30/XX 2 S 07/01/XX 3 Juan CUSTOMER TABLE Prod # ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT DIMENSION DATE DIMENSION TIME Prod# ProdName Stock Date Units QR22 Rake 03/23/XX 150 TW43 Mulch 04/15/XX 1452 SR56 Spade 05/01/XX 997 INVENTORY (FACT) TABLE

Mapping Normalized Tables to Denormalized (Data Warehouse) Tables Using ETL Tools (like MS-SSIS) These are 2 Normalized Transaction Tables EXTRACT The data are “Transformed” in these steps TRANSFORM This is the resulting, Denormalized Product Dimension LOAD

The End That’s all! Bye, bye!