Dimensional Modeling Overview. Agenda ■ Dimensional Modeling Overview ■ Dimensional Modeling Steps ■ Dimensional Modeling Framework ■ Retail Sales - Case.

Slides:



Advertisements
Similar presentations
The Organisation As A System An information management framework The Performance Organiser Data Warehousing.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Cognos 8 Training Session
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Alternative Database topology: The star schema
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
1 9 Ch3, Hachim Haddouti Adv. DBS and Data Warehouse CSC5301 Ch3 Hachim Haddouti Hachim Haddouti.
Dimensional Modeling – Part 2
Multidimensional Modeling MIS 497. What is multidimensional model? Logical view of the enterprise Logical view of the enterprise Shows main entities of.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing (Kimball, Ch.2-4) Dr. Vairam Arunachalam School of Accountancy, MU.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Principles of Dimensional Modeling
Business Intelligence
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Bogdan Shishedjiev Data Analysis1 Data Analysis OLTP and OLAP Data Warehouse SQL for Data Analysis Data Mining.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
INVENTORY CASE STUDY. Introduction Optimized inventory levels in stores can have a major impact on chain profitability: minimize out-of-stocks reduce.
DEFINING the BUSINESS REQUIREMENTS. Introduction OLTP and DW planning is different in term of requirements clarity Planning DW is about solving users’
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Normalized model vs dimensional model
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
Operation Data Analysis Hints and Guidelines EGN 5621 Enterprise Systems Collaboration Summer B, 2014.
Basic Model: Retail Grocery Store
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Operation Data Analysis Hints and Guidelines EIN 6133 Enterprise Engineering Summer, 2015.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Creating the Dimensional Model
Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.
3 Copyright © 2006, Oracle. All rights reserved. Business, Logical, and Dimensional Modeling.
DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Operation Data Analysis Hints and Guidelines EIN 6133 Enterprise Engineering Fall, 2015.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Operation Data Analysis Hints and Guidelines
Data warehouse and OLAP
PRINCIPLES OF DIMENSIONAL MODELING
Star Schema.
Applying Data Warehouse Techniques
Overview and Fundamentals
Competing on Analytics II
Inventory is used to illustrate:
Retail Sales is used to illustrate a first dimensional model
CMPE 226 Database Systems April 11 Class Meeting
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Modeling.
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Applying Data Warehouse Techniques
Presentation transcript:

Dimensional Modeling Overview

Agenda ■ Dimensional Modeling Overview ■ Dimensional Modeling Steps ■ Dimensional Modeling Framework ■ Retail Sales - Case Study ■ Dimensional Modeling Tips

Dimensional Modeling Overview

From Requirement to Data Design ■ The requirements definition completely drives the Data design for the DW. ■ Data design consists of putting together the data structures. ■ A group of data elements form a data structure. ■ Logical data design includes determination of the various data elements that are needed and combination of the data elements into structures of data and also establishing relationships among the data structures. Requirements Gathering Rqts. Definition Document Dimensional Model Data Design Information Package

Design Decisions ■ Choosing the process. Selecting the subjects from the information packages for the first set of logical structures to be designed. ■ Choosing the grain. Determining the level of detail for the data structures. ■ Identifying and conforming the dimensions. Choosing the business dimensions (such as product, market, time, etc.) to be included in the first set of structures. ■ Choosing the facts. Selecting the metrics or units of measurements (such as product sale units, dollar sales, dollar revenue, etc.) included in set of structures. ■ Choosing the duration of the database. Determining how far back in time you should go for historical data.

Dimensional Modeling Overview ■ What is Dimensional Modeling ♦ It is a logical design technique to structure the business dimensions and the metrics that are analyzed along these dimensions. ♦ A logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high performance access ♦ It is inherently dimensional and adheres to a discipline that uses relational model with some important restrictions ♦ The fundamental idea of dimensional modeling is that nearly every type of business data can be represented as a kind of cube of data ♦ The model has also proved to provide high performance for queries and analysis

Dimensional Modeling Overview ■ Components of Dimensional Model ♦ Fact Table ► The fact table contains facts or measurements of the business ♦ Dimension Table ► The dimension tables contain textual attributes that describe the facts

“Dimension” Report,row and column heading “Facts” Numeric report values Dimensional Modeling Overview Sample Report Translation Sales Rep Performance Report Central Region Jul-00Aug-00 (Dollars) Chicago District Adams Brown Frederickson Minneapolis District Andersen Smith Central Region Total

Dimensional Modeling Overview ■ Dimension Tables ♦ Dimension Tables are the entry points into the data warehouse ♦ Dimension tables are designed especially for selection and grouping under a common head ♦ Determine contextual background for facts ♦ Parameters for OLAP ♦ Common Dimensions ► Date ► Product ► Location/Region ► Customers The dimensional model to represent the information contained in the information package, the data structure must be represent with Metrics, business dimensions and attributes for each business dimension.

Dimensional Modeling Overview ■ Dimension Tables ♦ Dimension Table Characteristics ► Serve as report labels and query constraints “By” words “Where” clauses ► Provide Descriptive Information Minimal codes Embedded meaning as attributes ► Represent hierarchical relationships Let see Product business dimension example, When we want to analyze the fact by products. ITEM KEY Item Description Item Size Package Type Category

Dimensional Modeling Overview Product Key Product Desc. Brand Desc. Class Desc. Size Product Key Product Desc Brand Desc Class Size CHEERIOS 10 OZ CHEERIOS 24 OZ LUCKY CHARMS 10 OZ CHEERIOS LUCKY CHARMS Family Kids 10 OZ 24 OZ 10 OZ Product Dimension Table with Sample Rows

Big Dimensions Retail_Fact time_Key Store_Key Customer_Key Product_key Sales_dollars Units_sold Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes...

Big Dimensions Customer Demographics Dimension Customer_demographics_key Age_Band Gender Income_Band Number_of_children Marital_status Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes... Customer Dimension Customer_key Customer_ID Customer_name Customer_address Date_of_birth

Dirty dimensions ■ A dirty dimension is the one in which data quality cannot be guaranteed ♦ Data about the same customer can appear multiple times Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes...

Hierarchies in Dimensions ■ Multiple Hierarchies ♦ Dimension tables can represent multiple hierarchies roll-ups ♦ For example,Store Dimension could have ♦ the following hierarchies ► Physical Geography Zip, City, County, State, Country ► Sales Organization District, Region, Zone ► Distribution Roll-up Distribution Center, Distribution Center Region Store Dimension Store_key Store_description Store_type Zip City State Sales_region Sales_zone Distribution Center Distribution Center Region

Hierarchies in Dimensions Dimension Tables can represent multiple hierarchical roll-ups Date Month Quarter Year Week City District State Region Sales Zone Product Brand Category Region

Dimensional Modeling Overview ■ Fact Table ♦ The fact table is where the numerical measurements of the business are stored ♦ Facts ► The detail information in a Fact tables For Examples: Sales Quantity, Unit Sales Price, Sales Amount etc. ► Key performance indicators of the business ► Numeric in Nature ► Analyzed across the dimensions ♦ Multi-part key ► Foreign keys to dimension tables ► Date is always a key Sales Fact DATE KEY ITEM KEY STORE KEY PROMOTION KEY POS TRXN# Sales Quantity Unit Sales Price Sales $ Amount

Dimensional Modeling Overview ■ So far we have formed fact table and dimension tables. ■ How should these tables be arranged in the dimensional model? ■ What are the relationships and how should we mark the relationships in the model? ■ The dimensional model should primarily facilitate queries and analyses. What would be the types of queries and analyses? ■ Before combining these tables in dimensional model. What are the requirements?  The model should provide the best data access.  The whole model must be query-centric.  It must be optimized for queries and analysis.  The model must show that the dimension tables interact with the fact table.  It should structured in such a way that every dimension can interact equally with the fact table.  The model should allow drilling down or rolling up along dimension hierarchies.

 With this rqts., each of the dimension tables are directly relates to fact table in the middle. ■ Such an arrangement in the dimensional model looks like a star formation, with the fact table at the core of a star and the dimension tables along the spikes of the star. ■ The dimensional model is therefore called a STAR schema. ■ See figure:

Star like Model Dimension 1 Fact Dimension 3 Dimension n Dimension 2

STAR schema for AUTO Sales AUTO SALES TIME PRODUCT DEALER CUSTOMER DEMO GRAPHICS PAYMENT METHOD

Typical Star model Date Dimension Date_key Day_of_week Month Quarter Year Holiday_flag Sales Fact Date_key Product_key Store_key Dollars_sold Units_sold Dollars_cost Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type

Star model….. ■ It consists of sales fact table in the middle of schema diagram. It have 3 dimension tables of Date, Product and Store. ■ The user will analyze the sales using dollar sold, unit sold and dollar cost. ■ From the STAR schema structure intuitively answers the questions for a given amount of dollars, what was the product sold? Who was the customer? Which store sold the product? When was the order placed? ■ Constraints and filters of queries are easily understood by looking at the star. ■ A common type of analysis is the drilling down the summary numbers to get at the details at the lower levels by filtering queries.

Snowflake Design Fact Table Dim Table Dim Table Dim Table Dim Table Low cardinality redundant attributes moved to “sub dimension” tables

Snowflake Design ■ Issues ♦ Only few tools optimized for snowflake schema ♦ When the tool is not optimized for snowflake design ► Presentation more complex ► Browsing is slower ► Problems with multiple joins

Snowflake Design Dimensional Modeling Steps

■ Identify the Business Process ♦ A major operational process that is supported by some kind of legacy system(s) from which data can be collected for the purpose of the data warehouse ♦ Example: orders, invoices, shipments, inventory, sales ■ Identify the Grain ♦ The fundamental lowest level of data represented in a fact table for the business process ♦ Example: individual transactions, individual daily snapshots ■ Identify the Dimensions ♦ Choose the dimensions that will apply to each fact table record ■ Identify the Facts ♦ Choose the measured facts that will populate each fact table record

Dimensional Modeling Steps Resist the temptation to model data by looking copy books alone 1.Identify the business Process 2.Identify the Grain 3.Identify the Dimensions 4.Identify the Facts Business Requirements Data Reality Dimensional Model Key InputsDimensional Modeling StepsOutput

Shared Dimensions Must Conform ■ Option 1: Identical dimensions with the same keys, labels, definitions and values Sales Schema Inventory Schema Conformed Dimension Item Key Item Desc. Brand Desc. Category.. DATE KEY ITEM KEY STORE KEY PROMO KEY Sales Fact Item Key Item Desc. Brand Desc. Category.. DATE KEY ITEM KEY STORE KEY Inventory Fact

■ Option 2: “Subset” of base dimension Sales Schema Forecast Schema Item key Item Desc Brand Desc Category Desc 0001 Cheerios 10oz Cheerios Cereal Brand key Brand Desc Category Desc 1001 Cheerios Cereal Conformed Dimension Item Key Item Desc. Brand Desc. Category Desc... DATE KEY ITEM KEY STORE KEY PROMO KEY Sales $ DATE KEY Day-of-week Week Desc Month Desc Brand Key Brand Desc. Category Desc... Month Key Brand Key Estimate Sales $ Month KEY Month Desc

Slowly Changing Dimensions ■ Dimension attributes evolve over time ♦ For example, customers change their names, move, have children, adjust their Incomes ■ For every dimension attribute, need to identify “Changes” strategy ♦ May use combination of strategies within same dimension table

Type 1 : Overwrite the changed attributes Original record Item KeyItem DescDept 12345Sim City 3000Educational S/W Updated record Item KeyItem DescDept 12345Sim City 3000Strategy S/W Slowly Changing Dimensions

Type 2 : Add a New Dimension Record Original record Item KeyItem DescDept 12345Sim City 3000Educational S/W Additional record Item KeyItem DescDept 12345Sim City 3000Strategy S/W Slowly Changing Dimensions

Type 3 : Add a “Prior” Attribute Original record Item KeyItem DescDept 12345Sim City 3000Educational S/W Updated record Item KeyItem DescDept Prior Dept 12345Sim City 3000Strategy S/WEducational S/W Slowly Changing Dimensions

■ Data Warehouse Keys ie., STAR schema keys ♦ All tables (facts and dimensions) should use Data Warehouse generated surrogate keys ♦ It is possible that the customer no., of discontinued customers are reassigned to new customers. We will have a problem because the same customer no,. Could relate to the data for the newer customer and also to the data of the retried customer. Therefore do not use such keys as a primary keys for dimensional tables. ♦ The surrogate keys are simply system generated sequence numbers. ♦ Each row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension. ♦ Each dimension table is in 1:M relationship with central fact table. So the primary key of each dimension table must be a foreign key in the fact table.

Additive Measures ■ The ability of measures to be added across all dimensions of the fact table. ■ Measures could be fully additive, semi additive or non additive ♦ Fully Additive - The values of the attributes summed up by simple addition, Aggregation is a fully additive measures is done by simple addition. Sales Quantity, Revenue ♦ Semi Additive - Account Balance, Inventory, number of customers (Measure of Intensity, head counts) ♦ Non-Additive - Profit margin (Ratios and Percentages) i.e., Ratio or Percentages should not be added 1:3, 30%, etc.

Factless Fact tables to track attendance of students Student Attendance Fact time_Key Student_Key Course_Key Faculty_Key Facility_key Student Dimension Student_key Student attributes….. Faculty Dimension Faculty_key Faculty attributes….. Time Dimension time_key time attributes….. Course Dimension Course_key Course attributes….. Facility Dimension Facility_key Facility attributes…..

Factless Fact tables – Coverage tables Date Dimension Date_key Day_of_week Month Quarter Year Holiday_flag Sales Fact Date_key Product_key Promotion_key Store_key Dollars_sold Units_sold Dollars_cost Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type Promotion Dimension Promotion_key Promotion_name Discount Event Sales Fact (revisited)

Factless Fact tables – Coverage tables Date Dimension Date_key Day_of_week Month Quarter Year Holiday_flag Sales Fact Date_key Product_key Promotion_key Store_key Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type Promotion Dimension Promotion_key Promotion_name Discount Event

Dimensional Modeling Framework

Identify Subject Area, Grain Detail Facts with Measures Source-Data Model Mapping Pre-calculations, Aggregates, Indexes, Data Structures, Source-Physical Model Mapping Conceptual Level Logical Level Level of detail Physical Level Identify Major Dimension & Facts, Conform Dimensions across Facts Detail Dimensions with Hierarchies &Attributes Slowly changing Dimensions Policies Dimensional Modeling Framework

■ STEP1: Choose the process ♦ Chose a process or subject area from the list of subject areas identified ♦ Examples of this could be Sales Analysis, Strategic Sourcing, Human Resources ■ STEP2 : Choose the Grain ♦ Choose the level of detail ♦ Every data mart / warehouse should be based on the most granular (atomic) data that can possibly be collected and stored.

Dimensional Modeling Framework ■ STEP3 : Identifying Dimensions & Dimension Hierarchy Date Month Quarter Year Week City District State Region Product Brand Category

Dimensional Modeling Framework ■ STEP4 : Choose the Measures ♦ Identify all the measures for the fact table ■ STEP5 : Conforming the dimensions ♦ Common dimensions across the Facts/ data marts have to be exactly same or subset of the main dimension table

Dimensional Modeling Framework ■ STEP6 : Adding Attributes to Dimension Tables ♦ Enhance the depth of analysis ♦ Examples Customer Age, Address, Profession, Product color, flavor, product size, packaging type etc. Date Month Quarter Year Week City District State Region Product Brand Category CurrentFlag Sequence Regional Manager Phone Address Size Color

Dimensional Modeling Framework ■ STEP7 : Storing Pre-calculations in the Fact table ♦ Calculated based on one or more base measures ■ STEP8 : Choosing the Duration of the Database ♦ Need for analyzing the data over a period of time

Dimensional Modeling Framework ■ STEP9 : Track Slowly Changing Dimensions ♦ Certain kinds of dimension attribute changes need to be handled differently in Data Warehouse ► Type I – Overwrite ► Type II - History ► Type III – Add new column example :- Organizational changes

Retail Sales – Case Study

Retail Sales - Case Study ■ Background ♦ Chain consists of over 100 grocery stores in five states ♦ Stores average 60,000 SKUs in departments such as frozen foods, dairy etc. ♦ Bar codes are scanned directly into the cash registers PoS system ♦ Items are promoted via coupons, temporary price reductions, ads and in-store promotions ■ Analytical Requirements ♦ Need to know what is selling in store each day in order to evaluate product movement, as well as to see how sales are impacted by promotions ♦ Need to understand the mix of items in a consumer market basket

Retail Sales - Case Study ■ Identify the Business Process ■ Identify the Grain ■ Identify the Dimensions ■ Identify the Measures ■ Sales ■ Transaction Item (Daily Sales) ■ Date, Location, Item, Promotion ■ Quantity, Price, Amount

Retail Sales - Case Study Date Description Week Month Quarter Year DATE KEY ITEM KEY STORE KEY PROMOTION KEY POS TRXN# Sales Quantity Unit Sales Price Sales $ Amount Store Name City District Zone Item Description Item Size Package Type Category Promotion Description Discount DATE KEY STORE KEY PROMOTION KEY ITEM KEY Resultant Sales Schema

Retail Sales - Case Study Time Dimension Date KeyDateDay of WeekDay Number in Month MonthQuarterYearHoliday Indicator 11/1/1999Friday1JanuaryQ11999Holiday 21/2/1999Saturday2JanuaryQ11999Non-Holiday 31/3/1999Sunday3JanuaryQ11999Non-Holiday 41/4/1999Monday4JanuaryQ11999Non-Holiday Item Dimension ItemkeyItem Description SKU NumberDeptSizePackage TypeBrandCategory 1Lasagna 6 OZ Grocery6 OZBoxCold GourmetFrozen Foods 2Beef Stew 6 OZ Grocery6 OZBoxCold GourmetFrozen Foods 3Extra Nougat 2 OZ Grocery6 OZCanChewyCandy

Retail Sales - Case Study Promotion Dimension Promo KeyPromo NamePrice ReductionAd TypeMedia TypePromo $Begin DateEnd Date 1Blue Ribbon Discounts TemporaryDaily PaperPaper20001/1/19991/15/1999 2Red Carpet Closeout MarkdownSunday PaperPaper10001/3/19991/10/1999 3Ad BlitzNonePaper and Radio 70001/15/19991/30/1999 Sales Fact Date KeyItem KeyStore KeyPromo KeyPOS Trxn #Sales QtyUnit Sales Price Sales $Amt

Dimensional Modeling Tips

■ Carefully choose the labels to identify data marts, dimension, attributes and facts ■ An attribute can live in one and only one dimension, whereas a fact can be repeated in multiple fact tables ■ If a single dimension appears to reside in more than one places, several roles are probably being played. Name the roles uniquely and treat them as separate dimensions ■ A single field in the underlying source data can have one or more logical columns associated with it ♦ E.g., A product attribute field may translate to product code, product short description, and product long description ■ Every fact should have a default aggregation rule (sum, min, max, latest, semi-additive, special algorithm, and not aggregatable) ♦ This will serve as a requirements list for query and report writers tools evaluations

Thank You ■ References ♦ Ralph Kimball ► The Data Warehouse Toolkit ► The Data Warehouse Lifecycle Toolkit