Download presentation
Presentation is loading. Please wait.
Published byHarriet Randall Modified over 9 years ago
1
Dimensional Modeling 1
2
Agenda DW Project Lifecycle Eliciting Business Requirements Dimensional Model Components Dimensional Model Schemas Additional Modeling Concepts 2
3
DW Development Approach: Kimball Methodology DW Project Lifecycle Business requirements Business Requirements Documentation Bus Matrix Design, build and deliver in increments DW Architecture DW Design ETL system Cube, Reports, query tools, … 3
4
Data Warehouse Project Lifecycle 4 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN. PlanningAnalysisDesign Implementation
5
Data Warehouse Project Lifecycle 5 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
6
Project Planning Determine: Initial project scope Project cost Define: Team roles Team members Project schedule 6
7
Example Initial Project Scope 7 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
8
Data Warehouse Project Lifecycle 8 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
9
DW Development Approach: Kimball Methodology DW Project Lifecycle Business requirements Business Requirements Documentation Bus Matrix Design, build and deliver in increments DW Architecture DW Design ETL system Cube, Reports, query tools, … 9
10
Requirements Elicitation Identify who to interview May include more levels of management Conduct Interviews Business challenges Definition of success Info needed to track success, detect problems Ways to view/break-down info… Other discovery methods Existing systems Reports… Document & Prioritize 10
11
Documenting Requirements Interview Summaries Prose summarizing interviews Kimball format Kimball format Analytic Themes Analysis Requirements grouped into “categories” Kimball format (pg 35) Kimball format DW Bus Matrix Business processes mapped to data needed Kimball format (pg 37) Kimball format DM Information Package Prioritized processes Ponniah format (pg 104) Ponniah format 11
12
Kimball Example: Interview Summaries 12
13
Kimball Example: Analytic Themes 13
14
Kimball Example: Bus Matrix 14
15
Class Example: University Dept. Requirements 15
16
Class Example: University Dept. Bus Matrix 16
17
Class Example: University Dept. Information Package 17
18
In-Class Example: Newspaper Information Package 18
19
Data Warehouse Project Lifecycle 19 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
20
BI Architecture, cont… 20 Source: Oracle Corporation. Information Management and Big Data: A Reference Architecture, Oracle White Paper, February 2013, p. 12.
21
Data Warehouse Project Lifecycle 21 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
22
DW Development Approach: Kimball Methodology DW Project Lifecycle Business requirements Business Requirements Documentation Bus Matrix Design, build and deliver in increments DW Architecture DW Design ETL system Cube, Reports, query tools, … 22
23
ERD 23
24
Reporting Challenges with ERD/OLTP Model designed for efficient record processing, not "subject" processing External data often excluded Analyses require multiple joins Indexes not optimized for reporting History not stored 24
25
Pre-Computing Aggregates 25 MonthProductCityTOTAL Sales Quantity OctProd1Abiline9556 Prod1Austin799 Prod1Dallas1356 Prod1Waco36678 Prod2Abiline7869 Prod2Austin2967 Prod2Dallas568 Prod2Waco277980 Prod3Abiline43 Prod3Austin6588 Prod3Dallas8434 Prod3Waco3756 NovProd1Abiline77977 Prod1Austin234 Prod1Dallas4378 Prod1Waco20349 Prod2Abiline210 Prod2Austin789 Prod2Dallas888 Prod2Waco4566 Prod3Abiline2078 Prod3Austin292 Prod3Dallas1111 Prod3Waco36 DecProd1Abiline34657 Prod1Austin2999 Prod1Dallas5888 Prod1Waco9999 Prod2Abiline1580 Prod2Austin2940 Prod2Dallas975 Prod2Waco5748 Prod3Abiline6140 Prod3Austin211 Prod3Dallas1357 Prod3Waco1000 Queries: 1.Total Sales 2.Total Sales by Month 3.Total Sales by Month and Product Line 4.Total Sales by Month, Product Line, and City 5.Total Sales by City ….. ORDERED_QUANTITY
26
Pre-Computing Aggregates, cont… 26 OctNov Dec P1 P2 P3 1. Total Sales 3. Total Sales by Month and Product 2. Total Sales by Month (1 "fact“, 0 “dimensions”) (1 "fact", 1 "dimension" with 3 values) (1 "fact", 2 "dimensions" each with 3 values) OctNovDec SELECT sum(ordered_quantity) AS "total" FROM order_line_t; SELECT month(order_date) AS "month", sum(ordered_quantity) AS "total" FROM order_line_t ol, order_t o WHERE ol.order_id = o.order_id GROUP BY month(order_date); SELECT month(order_date) AS "month", p.product_line_id AS "product", sum(ordered_quantity) AS "total" FROM order_line_t ol, order_t o, product_t p WHERE ol.order_id = o.order_id AND ol.product_id = p.product_id GROUP BY month(order_date), p.product_line_id;
27
Pre-Computing Aggregates, cont… 27 OctNov Dec P1 P2 P3 4. Total Sales by Month, Product, & City (1 "fact", 3 "dimensions" each with 3 values) AB AU DA WA select month(order_date) as "month", p.product_line_id as "product", c.city, sum(ordered_quantity) as "total" from order_line_t ol, order_t o, product_t p, customer_t c where ol.order_id = o.order_id and ol.product_id = p.product_id and o.customer_id = c.customer_id group by month(order_date), p.product_line_id, c.city;
28
OLAP Review Short: Class of applications or tools that support ad-hoc analysis of multidimensional data Longer: “…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…” OLAP Council (www.olapcouncil.org)www.olapcouncil.org 28
29
OLAP Cubes Improves Reporting Performance Pre-processed aggregates Data In-memory Index Structures Bye Bye Locks! … Flexible, interactive information delivery to DW Multidimensional data representation and operations Rollup Drill-down Slice/Dice Pivot (or Rotate) * See http://www.jamesserra.com/archive/2013/08/why-use-a-ssas-cube/http://www.jamesserra.com/archive/2013/08/why-use-a-ssas-cube/ 29
30
30
31
31
32
32
33
33
34
Dimensional Modeling Data Model Logical view of a multi-dimensional cube Key structures and components Fact table(s) Key business process Facts/Measurements/metrics Foreign Keys Dimension tables Ways to view measures Attributes Often denormalized Surrogate Key vs. Business Key Hierarchies 34
35
Dimensional Model Example 35 Fact Table Dimension Tables Foreign Keys Attributes Measures Business Key Include it!Surrogate Key Hierarchy DIM FACT
36
Dimensional Model Characteristics Dim TablesFact Tables 36
37
Star Schema At least one fact table and (typically) two or more dimension tables Fact table has direct relationship with each of the dimension tables “Single-table” dimensions Arrangement resembles a "star" 37
38
Star Schema Example 38
39
Snowflake Schema 39 Fact table has direct relationship with some dimension tables, and indirect relationship with other(s) Multi-table dimensions i.e., "Normalized" dimensions
40
Snowflake Example 40
41
Comparison of Schemas Star The much-preferred approach Adv: Faster load/query/analysis performance Potentially more intuitive to users Snowflake Adv: Potentially faster setup Avoid data redundancy Reduces size of dimension table Ease of maintaining 41
42
Common Dims, Facts, Measures Dims 42 Facts Measures
43
In-Class Example: Newspaper Dim Model 43
44
Additional Modeling Concepts Surrogate Keys Attribute Hierarchies Time Dimensions Junk Dimensions Degenerate Dimensions Slowly-Changing Dimensions 44
45
Surrogate Keys Problem: Potential for PK to change in source systems e.g., PKs with built-in meaning Data spread across multiple systems PK's exist??? PK's consistent??? PK's means same thing??? Surrogate Key Newly-generated PK for dimension rows in DW System-generated sequence numbers Mapped to source/application key(s) Fact rows reference SKs 45
46
Surrogate Keys Example 46
47
Attribute Hierarchies 1:M relationships between attributes Supports user navigation drill-downs, drill-ups Improves performance Assists SSAS in aggregation selection Storage improvement 47
48
Attribute Hierarchy Examples 48 State City Year Month Year Semester
49
Date / Time Dimension Common feature of every data warehouse Minimum attributes: Date key (e.g. 20140121, 2014-01-21, 12345) Date name (e.g. Monday, January 21 2014) Common additional attributes Month, Year, Quarter, … Holiday Name, … 49
50
Time Dimension Example 50
51
Junk Dimensions Stores one or more "lookup" codes, flags, indicators that describe or categorize transactions/events Usually low cardinality May include all valid combinations of codes OR valid combinations that exist 51
52
Junk Dimension Example 52 Enrollment_Status_ID_ SK Registration_Statu s Permit _Issued Class_Fee_ Status 1Wait ListYPaid 2Wait ListYUnpaid 3Wait ListNPaid 4Wait ListNUnpaid 5ConfirmedYPaid 6ConfirmedYUnpaid 7ConfirmedNPaid 8ConfirmedNUnpaid 9Awaiting ApprovalYPaid 10Awaiting ApprovalYUnpaid 11Awaiting ApprovalNPaid 12Awaiting ApprovalNUnpaid
53
Degenerate Dimensions An attribute (dimension) stored in fact table Typically a high-cardinality attribute Attribute does NOT link to a dimension table Often used for drill-downs and/or data mining (e.g. Market Basket Analysis) 53
54
Degenerate Dimension Example 54
55
Slowly-Changing Dimensions 55 What you want to do when a value in dimension record changes 0. Do Nothing 1. Overwrite Record 2. Retain All History (add new rows) 3. Retain Some History (add new columns) Impacts ETL
56
Type 0 (Fixed Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles GenderM Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles GenderF Update Update Ignored or Failure © 2006 Microsoft Corporation.
57
Type 1 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271-0000 Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Simple UPDATE statement applied: UPDATE DimCustomer Set AddressLine1 = ‘123 Main St’, ZipCode = ‘54276’ WHERE CustomerID = 5000017302 © 2006 Microsoft Corporation.
58
Simple UPDATE statement applied: UPDATE DimCustomer Set EndDate = ‘2/18/2007’ WHERE CustomerID = 5000017302 Type 2 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 StartDate1/1/2007 EndDateNULL Customer Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10108 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street 123 Main St. ZipCode5427154276 StartDate1/1/20072/18/2007 EndDate2/18/2007NULL © 2006 Microsoft Corporation. Then INSERT statement applied: INSERT INTO DimCustomer (CustomerID, LastName, Firstname…) VALUES (5000017302, 'Harris', 'Miles', ‘123 Main St’, ‘54276’, '2/18/2007',NULL)
59
Type 3 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 StartDate1/1/2007 EndDateNULL Customer Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 Updated AddressLine1 123 Main St. Updated ZipCode54276 © 2006 Microsoft Corporation. Simple UPDATE statement applied: UPDATE DimCustomer Set UpdatedAddressLine1 = ‘123 Main St’, UpdatedZipCode = ‘54276’ WHERE CustomerID = 5000017302
60
Data Warehouse Project Lifecycle 60 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
61
DW Physical Design 61
62
Summary Dimensional Model Basic Components Facts Measures Dimensions Attributes Keys Primary Surrogate Business Foreign Schemas Hierarchies Slowly-Changing Dimensions Junk Dimensions Degenerate Dimensions 62
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.