Dimensional Modelling III

Slides:



Advertisements
Similar presentations
Dimensional Modeling By Dr. Gabriel.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
GB Video Dimensional Example. Customer #Cust No F Name L Name Ads1 Ads2 City State Zip Tel No CC No Expire Rental #Rental No Date Clerk No Pay Type CC.
Data Warehousing M R BRAHMAM.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
How to build your own… Super Model Dimensional Modelling for Analysis Services Darren Gosbell Principal Consultant - James & Monroe
Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Designing a Data Warehouse
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
1 Data Warehouses BUAD/American University Data Warehouses.
Module 1: Introduction to Data Warehousing and OLAP
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Creating the Dimensional Model
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
1 Dimensional Modelling III. 2 Dimensional Models A denormalized relational model Made up of tables with attributes Relationships defined by keys and.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.
3 Copyright © 2006, Oracle. All rights reserved. Business, Logical, and Dimensional Modeling.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Defining Data Warehouse Structures Data Warehouse Data Access End User Data Access Data Sources Staging Area Data Marts Data Extract, Transform, and Load.
CHAPTER 9 - Data Warehouse Implementation and Use
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
CMPE Database Systems Workshop June 12 Class Meeting
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
MIS2502: Data Analytics Relational Data Modeling
On-Line Analytic Processing
Data warehouse and OLAP
PRINCIPLES OF DIMENSIONAL MODELING
Chapter 13 The Data Warehouse
Data Warehouse.
Star Schema.
Overview and Fundamentals
Competing on Analytics II
Retail Sales is used to illustrate a first dimensional model
MIS2502: Data Analytics Dimensional Data Modeling
CMPE 226 Database Systems April 11 Class Meeting
MIS2502: Review for Exam 1 JaeHwuen Jung
An Introduction to Data Warehousing
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Data Warehousing: Data Models and OLAP operations
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Modeling.
MIS2502: Data Analytics Dimensional Data Modeling
Introduction of Week 9 Return assignment 5-2
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Data Warehousing Concepts
Presented by: Tek Narayan Adhikari
Data Warehouse and OLAP Technology
Data Warehousing.
Presentation transcript:

Dimensional Modelling III

Dimensional Models A denormalized relational model Made up of tables with attributes Relationships defined by keys and foreign keys Organized for understandability and ease of reporting rather than update Queried and m aintained by SQL or special purpose management tools.

From Relational to Dimensional Dimensional Model Designed from the perspective of subject Sales Customers “De-normalised” data structures in blatant violation of normalisation Used for analysis of aggregated data OLAP : OnLine Analytical Processing Based on data that is Historical May be redundant Relational Model Designed from the perspective of process efficiency Marketing Sales “Normalised” data structures Entity Relationship Model Used for transactional, or operational systems OLTP : OnLine Transaction Processing Based on data that is Current Non Redundant

ER vs. Dimensional Models One table per entity Minimize data redundancy Optimize update The Transaction Processing Model One fact table for data organization Maximize understandability Optimized for retrieval The data warehousing model

Strengths of the Dimensional Model Predictable, standard framework Respond well to changes in user reporting needs Relatively easy to add data without reloading tables Standard design approaches have been developed There exist a number of products supporting the dimensional model

A Transactional Database Addresses AddressID Street States StateID Desc Countries CountryID Description Customers CustomerID FirstName LastName OrderHeader OrderHeaderID FreightAmount OrderDate Products ProductID Description Size A simplistic transactional schema showing 7 tables relating to sales orders OrderDetails OrderDetailNo Amount 6

A Dimensional Model : Solution 1 Customers CustomerID Name Street State Country Time TimeID Date Month Quarter Year FactSales SalesAmount Products ProductID Description Size Subcategory Category This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema State and Country have been denormalized under Customer Dimensions are in Blue These are the things that we analyse “by” (eg. By Time, By Customer, By Region) Fact is yellow These are ususally quantitative things that we are interested in PK is composite {TimeID, CustomerID, ProductId} TimeID references Time CustomerID references Customer ProductID references Product 7

A Dimensional Model : Solution 2 Customers CustomerID Name Street State Country Time TimeID Date Month Quarter Year FactSales SalesAmount SaleID Products ProductID Description Size Subcategory Category This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema State and Country have been denormalized under Customer Dimensions are in Blue These are the things that we analyse “by” (eg. By Time, By Customer, By Region) Fact is yellow These are ususally quantitative things that we are interested in PK is surrogate ; SaleID TimeID references Time CustomerID references Customer ProductID references Product 8

A Dimensional Model : Solution 2 Customers CustomerID Name Street State Country Time TimeID Date Month Quarter Year FactSales SalesAmount TimeID||CustomerID||ProductID Products ProductID Description Size Subcategory Category This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema State and Country have been denormalized under Customer Dimensions are in Blue These are the things that we analyse “by” (eg. By Time, By Customer, By Region) Fact is yellow These are ususally quantitative things that we are interested in PK concataned: TimeID||CustomerID||ProductID FKs 9

Extract Transform Load Relational Dimensional Model Process Oriented Subject Oriented Transactional Aggregate Current Historic We already have the data in a data model – why create another data model…? Well… What is currently called “Data Warehousing” or “Business Intelligence” was originally often called “Decision Support Systems” We already have all the data in the OLTP system, why replicate it in a dimensional model? Atomic - Summary Supports Transaction throughput – Supports Aggregate queries Current - Historic 10

Facts & Dimensions There are two main types of objects in a dimensional model Facts are quantitative measures that we wish to analyse and report on. Dimensions contain textual descriptors of the business. They provide context for the facts. Facts work best if they are additive Dimensions allow us to “slice & dice” the facts into meaningful groups. The provide context 11

Fact & Dimension Tables FACTS Contains two or more foreign keys Tend to have huge numbers of records Useful facts tend to be numeric and additive DIMENSIONS Contain text and descriptive information 1 in a 1-M relationship Generally the source of interesting constraints Typically contain the attributes for the SQL answer set.

GB Video E-R Diagram Owner of Requestor of Holder of Name for Customer #Cust No F Name L Name Ads1 Ads2 City State Zip Tel No CC No Expire Rental #Rental No Date Clerk No Pay Type CC Approval Line #Line No Due Date Return Date OD charge Pay type Owner of Video #Video No One-day fee Extra days Weekend Title #Title No Name Vendor No Cost Name for Holder of

PK of the fact table is the cmbination of all Foreign Keys: GB Video Data Mart Customer CustID Cust No F Name L Name Rental RentalID Rental No Clerk No Store Pay Type VideoRentalFact LineID OD Charge OneDayCharge ExtraDaysCharge WeekendCharge DaysReserved DaysOverdue LineNo Video VideoID Video No Title TitleID TitleNo Name Cost Vendor Name DateDim ActualDate DateID SQLDate Day Week Quarter Holiday Address AddressID Adddress1 Address2 City State Zip AreaCode Phone CustID AddressID RentalId VideoID TitleID RentalDateID DueDateID ReturnDateID PK of the fact table is the cmbination of all Foreign Keys: Due Date Return Date Rental Date May also use lineid as PK or the concatenated PK

Fact Table Measurements associated with a specific business process Grain: level of detail of the table Process events produce fact records Facts (attributes) are usually Numeric Additive Derived facts included Foreign (surrogate) keys refer to dimension tables (entities) Classification values help define subsets

Dimension Tables Entities describing the objects of the process Conformed dimensions cross processes Attributes are descriptive Text Numeric Surrogate keys Less volatile than facts (1:m with the fact table) Null entries Date dimensions Produce “by” questions

Business Model As always in life, there are some disadvantages to 3NF: Performance can be truly awful. Most of the work that is performed on denormalizing a data model is an attempt to reach performance objectives. The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data.

The 4 Step Design Process Choose the Data Mart Declare the Grain Choose the Dimensions Choose the Facts

Building a Data Warehouse from a Normalized Database The steps Develop a normalized entity-relationship business model of the data warehouse. Translate this into a dimensional model. This step reflects the information and analytical characteristics of the data warehouse. Translate this into the physical model. This reflects the changes necessary to reach the stated performance objectives. Designing the Perfect Data Warehouse (the paper formerly known as: Data Modeling for Data Warehouses), Frank McGuff , http://members.aol.com/fmcguff/dwmodel/ 19

Structural Dimensions The first step is the development of the structural dimensions. This step corresponds very closely to what we normally do in a relational database. The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions.

Steps in dimensional modeling Select an associative entity for a fact table Determine granularity Replace operational keys with surrogate keys Promote the keys from all hierarchies to the fact table Add date dimension Split all compound attributes Add necessary categorical dimensions Fact (varies with time) / Attribute (constant)

Converting an E-R Diagram Determine the purpose of the mart Identify an association table as the central fact table Determine facts to be included Replace all keys with surrogate keys Promote foreign keys in related tables to the fact table Add time dimension Refine the dimension tables

Choosing the Mart A set of related fact and dimension tables Single source or multiple source Conformed dimensions Typically have a fact table for each process

Fact Tables Represent a process or reporting environment that is of value to the organization It is important to determine the identity of the fact table and specify exactly what it represents. Typically correspond to an associative entity in the E-R model

Grain (unit of analysis) The grain determines what each fact record represents: the level of detail. For example Individual transactions Snapshots (points in time) Line items on a document Generally better to focus on the smallest grain

Facts Measurements associated with fact table records at fact table granularity Normally numeric and additive Non-key attributes in the fact table Attributes in dimension tables are constants. Facts vary with the granularity of the fact table

Dimensions A table (or hierarchy of tables) connected with the fact table with keys and foreign keys Preferably single valued for each fact record (1:m) Connected with surrogate (generated) keys, not operational keys Dimension tables contain text or numeric attributes

ERD PRODUCT SKU (PK) description brand category CUSTOMER customer_ID (PK) customer_name purchase_profile credit_profile address ERD ORDER-LINE dollars_sold units_sold dollars_cost ORDER orderdate STORE store_ID (PK) store_name address district floor_type CLERK clerk_id (PK) clerk_name clerk_grade PROMOTION promotion_NUM (PK) promotion_name price_type ad_type

DIMENSIONAL MODEL PRODUCT product_key (PK) SKU description brand category TIME time_key (PK) SQL_date day_of_week month FACT {time_key || store_key || clerk_key || product_key || customer_key || promotion_key} (PK) dollars_sold units_sold dollars_cost STORE store_key (PK) store_ID store_name address district floor_type CUSTOMER customer_key (PK) customer_name purchase_profile credit_profile address CLERK clerk_key (PK) clerk_id clerk_name clerk_grade PROMOTION promotion_key (PK) promotion_name price_type ad_type

Good Attributes Verbose Descriptive Complete Quality assured Indexed (b-tree vs bitmap) Equally available Documented

Date Dimensions

Snowflaking & Hierarchies Efficiency vs Space Understandability M:N relationships

Star Schema … … dimTime ProductID ProductName CategoryName SubCategoryName dimProduct ProductID TimeID CustomerID SalesAmount factSales … dimCustomer

Snowflake Schema dimSubCategory SubCategoryID Description dimCategory ProductID TimeID CustomerID SalesAmount factSales ProductID CategoryID Description dimProduct

Simple DW hierarchy pattern.