COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.

Slides:



Advertisements
Similar presentations
Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.
Advertisements

Data Warehousing and Data Mining J. G. Zheng May 20 th 2008 MIS Chapter 3.
Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Alternative Database topology: The star schema
C6 Databases.
DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
1 Lecture 10: More OLAP - Dimensional modeling
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Business Intelligence Process Grain of the Fact Table Dr. Chang Liu
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Module 1: Introduction to Data Warehousing and OLAP
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Dimensional Modeling Overview. Agenda ■ Dimensional Modeling Overview ■ Dimensional Modeling Steps ■ Dimensional Modeling Framework ■ Retail Sales - Case.
Basic Model: Retail Grocery Store
CSS Data Warehousing for BS(CS) Lecture 1-2: DW & Need for DW Khurram Shahzad Department of Computer Science.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
DW-2: Designing a Data Warehousing System 용 환승 이화여자대학교
UNIT-II Principles of dimensional modeling
Methodology – Monitoring and Tuning the Operational System.
Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler x 163 Mitretek Systems/NESDIS/OSD.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Chapter 4 Logical & Physical Database Design
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Copyright © Archer Decision Sciences, Inc. Our Model Store DimensionProduct Dimension District Region Total Brand Manufacturer Total StoresProducts.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
1 Dimensional Modelling III. 2 Dimensional Models A denormalized relational model Made up of tables with attributes Relationships defined by keys and.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Dimensional Modelling III
Operation Data Analysis Hints and Guidelines
Data warehouse and OLAP
Data Warehouse.
Applying Data Warehouse Techniques
CSS Data Warehousing for BS(CS)
Retail Sales is used to illustrate a first dimensional model
Applying Data Warehouse Techniques
Retail Sales is used to illustrate a first dimensional model
Applying Data Warehouse Techniques
Dimensional Modeling.
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Data Warehousing.
Presentation transcript:

COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling

Some consequences of normalization Data redundancy is reduced or eliminated. Relations are broken into smaller, related tables. Using all the attributes from the original relation requires joining these smaller tables.

Denormalization Deliberately reintroducing some redundancy, so that we can access data faster. DENORMALIZED DATA AHEAD

Exampl e Technique: Add duplicate fields Technique: Add computed fields

Example Technique: Join tables

Less common denormalization techniques Duplicating a commonly-used subset of table fields Splitting some table rows into different tables Frequently- vs. rarely-used data Data for different regions Common subclasses of data

When to denormalize? Typically used when some or all of the following apply: Many queries need to join the data Joining the data is expensive – uses scans, rather than indices Computing derived data is expensive – complex queries or complex functions

Dealing with redundant data Still want data consistency, but now it requires work. Is full consistency required at all times? Techniques: Stored procedures act as API for updating DB. They add the redundant data. Triggers check (and fix?) consistency. Application code carefully maintains consistency during updates. Reconcile data as background process. Reconcile data during system maintenance.

Dimensional modeling In a tiny, brief nutshell

An alternative to ER modeling Only a brief overview. So, we’ll view it through our lens of ER modeling + denormalization. Emphasizes decision making & use of historical data Fast retrieval & aggregation of data Less concern with updating data & maintaining consistency while updating

DB design often resembles multiple starflakes Course(crn, …) Student(sid, …, collegeid, …, home_stateid) Enrollment(crn, sid, …) College(collegeid, …) State(stateid, …, country) Instructor(instr_id, …) Teach(crn, instr_id, …) Starflake = tree of junction table & child tables in 1-to-many relationships Typical: Few cycles. Super/sub classes implemented only with superclass table.

Student(sid, …, college, …, home_state) Starflakes can be compressed to stars Course(crn, …) Enrollment(crn, sid, …) Instructor(instr_id, …) Teach(crn, instr_id, …) Denormalize by joining each child’s tree. Star = junction table & one level of child tables in 1-to-many relationships

Fact & dimension tables Facts: The junction tables are the most important data – the facts E.g.: store purchases, class enrollments, click data Generally the largest tables Key data often numeric & additive – e.g., quantity bought, cost per unit, advertisement views Dimensions: The child tables in 1-to-many relationship with facts E.g., stores, customers, sales people, sales period

Dimensional modeling process Centered on identifying the business model Identifying the potential queries Identifying the facts – the data used in such queries Each query should use only one fact table & its dimension tables. Possibly start with an ER model Identify which junction tables serve as fact tables Use only surrogate keys Often add time dimension Denormalize into starflake or star schema

Order order_id customer_id store_id clerk_id date Store store_id store_name address district floor_type Clerk clerk_id clerk_name clerk_grade Product sku description brand category Customer customer_ID customer_name purchase_profile credit_profile address Promotion promotion_id promotion_name price_type ad_type OrderLine order_id sku promotion_id dollars_sold units_sold dollars_cost ER model

Time time_key SQL_date day_of_week month Store store_key store_id store_name address district floor_type Clerk clerk_key clerk_id clerk_name clerk_grade Product product_key sku description brand category Customer customer_key customer_name purchase_profile credit_profile address Promotion promotion_key promotion_name price_type ad_type Order time_key store_key clerk_key product_key customer_key promotion_key dollars_sold units_sold dollars_cost Dimensional model

Each dimension table represents a dimension of the cube. Facts are the data in the cube. Facts might be pre- aggregated along each dimension or combination of dimensions.

Sometimes things are still messy Not all data fit nicely into facts + 1-to-many dimensions. Leads to exceptions from this simple presentation.