Download presentation
Presentation is loading. Please wait.
Published byDebra May Modified over 9 years ago
1
Dimensional Modeling Overview
2
Agenda ■ Dimensional Modeling Overview ■ Dimensional Modeling Steps ■ Dimensional Modeling Framework ■ Retail Sales - Case Study ■ Dimensional Modeling Tips
3
Dimensional Modeling Overview
4
From Requirement to Data Design ■ The requirements definition completely drives the Data design for the DW. ■ Data design consists of putting together the data structures. ■ A group of data elements form a data structure. ■ Logical data design includes determination of the various data elements that are needed and combination of the data elements into structures of data and also establishing relationships among the data structures. Requirements Gathering Rqts. Definition Document Dimensional Model Data Design Information Package
5
Design Decisions ■ Choosing the process. Selecting the subjects from the information packages for the first set of logical structures to be designed. ■ Choosing the grain. Determining the level of detail for the data structures. ■ Identifying and conforming the dimensions. Choosing the business dimensions (such as product, market, time, etc.) to be included in the first set of structures. ■ Choosing the facts. Selecting the metrics or units of measurements (such as product sale units, dollar sales, dollar revenue, etc.) included in set of structures. ■ Choosing the duration of the database. Determining how far back in time you should go for historical data.
6
Dimensional Modeling Overview ■ What is Dimensional Modeling ♦ It is a logical design technique to structure the business dimensions and the metrics that are analyzed along these dimensions. ♦ A logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high performance access ♦ It is inherently dimensional and adheres to a discipline that uses relational model with some important restrictions ♦ The fundamental idea of dimensional modeling is that nearly every type of business data can be represented as a kind of cube of data ♦ The model has also proved to provide high performance for queries and analysis
7
Dimensional Modeling Overview ■ Components of Dimensional Model ♦ Fact Table ► The fact table contains facts or measurements of the business ♦ Dimension Table ► The dimension tables contain textual attributes that describe the facts
8
“Dimension” Report,row and column heading “Facts” Numeric report values Dimensional Modeling Overview Sample Report Translation Sales Rep Performance Report Central Region Jul-00Aug-00 (Dollars) Chicago District879878 Adams345456 Brown564565 Frederickson657768 Minneapolis District890789 Andersen Smith909978 Central Region Total42444434
9
Dimensional Modeling Overview ■ Dimension Tables ♦ Dimension Tables are the entry points into the data warehouse ♦ Dimension tables are designed especially for selection and grouping under a common head ♦ Determine contextual background for facts ♦ Parameters for OLAP ♦ Common Dimensions ► Date ► Product ► Location/Region ► Customers The dimensional model to represent the information contained in the information package, the data structure must be represent with Metrics, business dimensions and attributes for each business dimension.
10
Dimensional Modeling Overview ■ Dimension Tables ♦ Dimension Table Characteristics ► Serve as report labels and query constraints “By” words “Where” clauses ► Provide Descriptive Information Minimal codes Embedded meaning as attributes ► Represent hierarchical relationships Let see Product business dimension example, When we want to analyze the fact by products. ITEM KEY Item Description Item Size Package Type Category
11
Dimensional Modeling Overview Product Key Product Desc. Brand Desc. Class Desc. Size Product Key Product Desc Brand Desc Class Size 0001 0002 0003 CHEERIOS 10 OZ CHEERIOS 24 OZ LUCKY CHARMS 10 OZ CHEERIOS LUCKY CHARMS Family Kids 10 OZ 24 OZ 10 OZ Product Dimension Table with Sample Rows
12
Big Dimensions Retail_Fact time_Key Store_Key Customer_Key Product_key Sales_dollars Units_sold Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes...
13
Big Dimensions Customer Demographics Dimension Customer_demographics_key Age_Band Gender Income_Band Number_of_children Marital_status Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes... Customer Dimension Customer_key Customer_ID Customer_name Customer_address Date_of_birth
14
Dirty dimensions ■ A dirty dimension is the one in which data quality cannot be guaranteed ♦ Data about the same customer can appear multiple times Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes...
15
Hierarchies in Dimensions ■ Multiple Hierarchies ♦ Dimension tables can represent multiple hierarchies roll-ups ♦ For example,Store Dimension could have ♦ the following hierarchies ► Physical Geography Zip, City, County, State, Country ► Sales Organization District, Region, Zone ► Distribution Roll-up Distribution Center, Distribution Center Region Store Dimension Store_key Store_description Store_type Zip City State Sales_region Sales_zone Distribution Center Distribution Center Region
16
Hierarchies in Dimensions Dimension Tables can represent multiple hierarchical roll-ups Date Month Quarter Year Week City District State Region Sales Zone Product Brand Category Region
17
Dimensional Modeling Overview ■ Fact Table ♦ The fact table is where the numerical measurements of the business are stored ♦ Facts ► The detail information in a Fact tables For Examples: Sales Quantity, Unit Sales Price, Sales Amount etc. ► Key performance indicators of the business ► Numeric in Nature ► Analyzed across the dimensions ♦ Multi-part key ► Foreign keys to dimension tables ► Date is always a key Sales Fact DATE KEY ITEM KEY STORE KEY PROMOTION KEY POS TRXN# Sales Quantity Unit Sales Price Sales $ Amount
18
Dimensional Modeling Overview ■ So far we have formed fact table and dimension tables. ■ How should these tables be arranged in the dimensional model? ■ What are the relationships and how should we mark the relationships in the model? ■ The dimensional model should primarily facilitate queries and analyses. What would be the types of queries and analyses? ■ Before combining these tables in dimensional model. What are the requirements? The model should provide the best data access. The whole model must be query-centric. It must be optimized for queries and analysis. The model must show that the dimension tables interact with the fact table. It should structured in such a way that every dimension can interact equally with the fact table. The model should allow drilling down or rolling up along dimension hierarchies.
19
With this rqts., each of the dimension tables are directly relates to fact table in the middle. ■ Such an arrangement in the dimensional model looks like a star formation, with the fact table at the core of a star and the dimension tables along the spikes of the star. ■ The dimensional model is therefore called a STAR schema. ■ See figure:
20
Star like Model Dimension 1 Fact Dimension 3 Dimension n Dimension 2
21
STAR schema for AUTO Sales AUTO SALES TIME PRODUCT DEALER CUSTOMER DEMO GRAPHICS PAYMENT METHOD
22
Typical Star model Date Dimension Date_key Day_of_week Month Quarter Year Holiday_flag Sales Fact Date_key Product_key Store_key Dollars_sold Units_sold Dollars_cost Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type
23
Star model….. ■ It consists of sales fact table in the middle of schema diagram. It have 3 dimension tables of Date, Product and Store. ■ The user will analyze the sales using dollar sold, unit sold and dollar cost. ■ From the STAR schema structure intuitively answers the questions for a given amount of dollars, what was the product sold? Who was the customer? Which store sold the product? When was the order placed? ■ Constraints and filters of queries are easily understood by looking at the star. ■ A common type of analysis is the drilling down the summary numbers to get at the details at the lower levels by filtering queries.
24
Snowflake Design Fact Table Dim Table Dim Table Dim Table Dim Table Low cardinality redundant attributes moved to “sub dimension” tables
25
Snowflake Design ■ Issues ♦ Only few tools optimized for snowflake schema ♦ When the tool is not optimized for snowflake design ► Presentation more complex ► Browsing is slower ► Problems with multiple joins
26
Snowflake Design Dimensional Modeling Steps
27
■ Identify the Business Process ♦ A major operational process that is supported by some kind of legacy system(s) from which data can be collected for the purpose of the data warehouse ♦ Example: orders, invoices, shipments, inventory, sales ■ Identify the Grain ♦ The fundamental lowest level of data represented in a fact table for the business process ♦ Example: individual transactions, individual daily snapshots ■ Identify the Dimensions ♦ Choose the dimensions that will apply to each fact table record ■ Identify the Facts ♦ Choose the measured facts that will populate each fact table record
28
Dimensional Modeling Steps Resist the temptation to model data by looking copy books alone 1.Identify the business Process 2.Identify the Grain 3.Identify the Dimensions 4.Identify the Facts Business Requirements Data Reality Dimensional Model Key InputsDimensional Modeling StepsOutput
29
Shared Dimensions Must Conform ■ Option 1: Identical dimensions with the same keys, labels, definitions and values Sales Schema Inventory Schema Conformed Dimension Item Key Item Desc. Brand Desc. Category.. DATE KEY ITEM KEY STORE KEY PROMO KEY Sales Fact Item Key Item Desc. Brand Desc. Category.. DATE KEY ITEM KEY STORE KEY Inventory Fact
30
■ Option 2: “Subset” of base dimension Sales Schema Forecast Schema Item key Item Desc Brand Desc Category Desc 0001 Cheerios 10oz Cheerios Cereal Brand key Brand Desc Category Desc 1001 Cheerios Cereal Conformed Dimension Item Key Item Desc. Brand Desc. Category Desc... DATE KEY ITEM KEY STORE KEY PROMO KEY Sales $ DATE KEY Day-of-week Week Desc Month Desc Brand Key Brand Desc. Category Desc... Month Key Brand Key Estimate Sales $ Month KEY Month Desc
31
Slowly Changing Dimensions ■ Dimension attributes evolve over time ♦ For example, customers change their names, move, have children, adjust their Incomes ■ For every dimension attribute, need to identify “Changes” strategy ♦ May use combination of strategies within same dimension table
32
Type 1 : Overwrite the changed attributes Original record Item KeyItem DescDept 12345Sim City 3000Educational S/W Updated record Item KeyItem DescDept 12345Sim City 3000Strategy S/W Slowly Changing Dimensions
33
Type 2 : Add a New Dimension Record Original record Item KeyItem DescDept 12345Sim City 3000Educational S/W Additional record Item KeyItem DescDept 12345Sim City 3000Strategy S/W Slowly Changing Dimensions
34
Type 3 : Add a “Prior” Attribute Original record Item KeyItem DescDept 12345Sim City 3000Educational S/W Updated record Item KeyItem DescDept Prior Dept 12345Sim City 3000Strategy S/WEducational S/W Slowly Changing Dimensions
35
■ Data Warehouse Keys ie., STAR schema keys ♦ All tables (facts and dimensions) should use Data Warehouse generated surrogate keys ♦ It is possible that the customer no., of discontinued customers are reassigned to new customers. We will have a problem because the same customer no,. Could relate to the data for the newer customer and also to the data of the retried customer. Therefore do not use such keys as a primary keys for dimensional tables. ♦ The surrogate keys are simply system generated sequence numbers. ♦ Each row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension. ♦ Each dimension table is in 1:M relationship with central fact table. So the primary key of each dimension table must be a foreign key in the fact table.
36
Additive Measures ■ The ability of measures to be added across all dimensions of the fact table. ■ Measures could be fully additive, semi additive or non additive ♦ Fully Additive - The values of the attributes summed up by simple addition, Aggregation is a fully additive measures is done by simple addition. Sales Quantity, Revenue ♦ Semi Additive - Account Balance, Inventory, number of customers (Measure of Intensity, head counts) ♦ Non-Additive - Profit margin (Ratios and Percentages) i.e., Ratio or Percentages should not be added 1:3, 30%, etc.
37
Factless Fact tables to track attendance of students Student Attendance Fact time_Key Student_Key Course_Key Faculty_Key Facility_key Student Dimension Student_key Student attributes….. Faculty Dimension Faculty_key Faculty attributes….. Time Dimension time_key time attributes….. Course Dimension Course_key Course attributes….. Facility Dimension Facility_key Facility attributes…..
38
Factless Fact tables – Coverage tables Date Dimension Date_key Day_of_week Month Quarter Year Holiday_flag Sales Fact Date_key Product_key Promotion_key Store_key Dollars_sold Units_sold Dollars_cost Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type Promotion Dimension Promotion_key Promotion_name Discount Event Sales Fact (revisited)
39
Factless Fact tables – Coverage tables Date Dimension Date_key Day_of_week Month Quarter Year Holiday_flag Sales Fact Date_key Product_key Promotion_key Store_key Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type Promotion Dimension Promotion_key Promotion_name Discount Event
40
Dimensional Modeling Framework
41
Identify Subject Area, Grain Detail Facts with Measures Source-Data Model Mapping Pre-calculations, Aggregates, Indexes, Data Structures, Source-Physical Model Mapping Conceptual Level Logical Level Level of detail Physical Level Identify Major Dimension & Facts, Conform Dimensions across Facts Detail Dimensions with Hierarchies &Attributes Slowly changing Dimensions Policies Dimensional Modeling Framework
42
■ STEP1: Choose the process ♦ Chose a process or subject area from the list of subject areas identified ♦ Examples of this could be Sales Analysis, Strategic Sourcing, Human Resources ■ STEP2 : Choose the Grain ♦ Choose the level of detail ♦ Every data mart / warehouse should be based on the most granular (atomic) data that can possibly be collected and stored.
43
Dimensional Modeling Framework ■ STEP3 : Identifying Dimensions & Dimension Hierarchy Date Month Quarter Year Week City District State Region Product Brand Category
44
Dimensional Modeling Framework ■ STEP4 : Choose the Measures ♦ Identify all the measures for the fact table ■ STEP5 : Conforming the dimensions ♦ Common dimensions across the Facts/ data marts have to be exactly same or subset of the main dimension table
45
Dimensional Modeling Framework ■ STEP6 : Adding Attributes to Dimension Tables ♦ Enhance the depth of analysis ♦ Examples Customer Age, Address, Profession, Product color, flavor, product size, packaging type etc. Date Month Quarter Year Week City District State Region Product Brand Category CurrentFlag Sequence Regional Manager Phone Address Size Color
46
Dimensional Modeling Framework ■ STEP7 : Storing Pre-calculations in the Fact table ♦ Calculated based on one or more base measures ■ STEP8 : Choosing the Duration of the Database ♦ Need for analyzing the data over a period of time
47
Dimensional Modeling Framework ■ STEP9 : Track Slowly Changing Dimensions ♦ Certain kinds of dimension attribute changes need to be handled differently in Data Warehouse ► Type I – Overwrite ► Type II - History ► Type III – Add new column example :- Organizational changes
48
Retail Sales – Case Study
49
Retail Sales - Case Study ■ Background ♦ Chain consists of over 100 grocery stores in five states ♦ Stores average 60,000 SKUs in departments such as frozen foods, dairy etc. ♦ Bar codes are scanned directly into the cash registers PoS system ♦ Items are promoted via coupons, temporary price reductions, ads and in-store promotions ■ Analytical Requirements ♦ Need to know what is selling in store each day in order to evaluate product movement, as well as to see how sales are impacted by promotions ♦ Need to understand the mix of items in a consumer market basket
50
Retail Sales - Case Study ■ Identify the Business Process ■ Identify the Grain ■ Identify the Dimensions ■ Identify the Measures ■ Sales ■ Transaction Item (Daily Sales) ■ Date, Location, Item, Promotion ■ Quantity, Price, Amount
51
Retail Sales - Case Study Date Description Week Month Quarter Year DATE KEY ITEM KEY STORE KEY PROMOTION KEY POS TRXN# Sales Quantity Unit Sales Price Sales $ Amount Store Name City District Zone Item Description Item Size Package Type Category Promotion Description Discount DATE KEY STORE KEY PROMOTION KEY ITEM KEY Resultant Sales Schema
52
Retail Sales - Case Study Time Dimension Date KeyDateDay of WeekDay Number in Month MonthQuarterYearHoliday Indicator 11/1/1999Friday1JanuaryQ11999Holiday 21/2/1999Saturday2JanuaryQ11999Non-Holiday 31/3/1999Sunday3JanuaryQ11999Non-Holiday 41/4/1999Monday4JanuaryQ11999Non-Holiday Item Dimension ItemkeyItem Description SKU NumberDeptSizePackage TypeBrandCategory 1Lasagna 6 OZ90706287103Grocery6 OZBoxCold GourmetFrozen Foods 2Beef Stew 6 OZ16005393282Grocery6 OZBoxCold GourmetFrozen Foods 3Extra Nougat 2 OZ 46817560065Grocery6 OZCanChewyCandy
53
Retail Sales - Case Study Promotion Dimension Promo KeyPromo NamePrice ReductionAd TypeMedia TypePromo $Begin DateEnd Date 1Blue Ribbon Discounts TemporaryDaily PaperPaper20001/1/19991/15/1999 2Red Carpet Closeout MarkdownSunday PaperPaper10001/3/19991/10/1999 3Ad BlitzNonePaper and Radio 70001/15/19991/30/1999 Sales Fact Date KeyItem KeyStore KeyPromo KeyPOS Trxn #Sales QtyUnit Sales Price Sales $Amt 1111576345789314.59 121176345789320.891.78 15111976345789412.56 2135876345792310.33 25111276345799811.29
54
Dimensional Modeling Tips
55
■ Carefully choose the labels to identify data marts, dimension, attributes and facts ■ An attribute can live in one and only one dimension, whereas a fact can be repeated in multiple fact tables ■ If a single dimension appears to reside in more than one places, several roles are probably being played. Name the roles uniquely and treat them as separate dimensions ■ A single field in the underlying source data can have one or more logical columns associated with it ♦ E.g., A product attribute field may translate to product code, product short description, and product long description ■ Every fact should have a default aggregation rule (sum, min, max, latest, semi-additive, special algorithm, and not aggregatable) ♦ This will serve as a requirements list for query and report writers tools evaluations
56
Thank You ■ References ♦ Ralph Kimball ► The Data Warehouse Toolkit ► The Data Warehouse Lifecycle Toolkit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.