Tips and Tricks for Dimensional Modeling By Shawn Jackson
Overview Set of techniques and concepts used in data warehouse design Intended to support end-user queries and is oriented around understandability and performance Uses the concepts of facts (measures) and dimensions (context) Facts are typically (but not always) numerical values that can be aggregated Dimensions are groups of hierarchies and descriptors that define the facts
Star Schema
Snowflake Schema
Kimball University: 10 Essential Rules of Dimensional Modeling (#1-5) Load detailed atomic data into dimensional structures Store data at the lowest grain Use summary tables/views to improve performance as necessary Structure dimensional models around business processes Fact tables should be based on a business event Complement single process fact tables with consolidated fact tables that combine metrics from multiple processes at the same level of detail Ensure that every fact table has an associated date dimension table Ensure that all facts in a single fact table are at the same grain or level of detail Resolve many-to-many relationships in fact tables
Kimball University: 10 Essential Rules of Dimensional Modeling (#6-10) Resolve many-to-one relationships in dimension tables Store report labels and filter domain values in dimension tables Don’t store codes and descriptions in the fact table Make sure the full description of the code is in the dimension table Make certain that dimension tables use a surrogate key Create conformed dimensions to integrate data across the enterprise Date dimension is a common example Single version of the truth Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making
Slowly Changing Dimensions Type 0 Type 1 Type 2 Type 3 Type 4 Type 6
SCD Type 0 Rows are added but never changed Missing true business / natural key Typically are only used in derived dimensions Type 0 attributes are more common Supplier Key Name 123 Acme Supply Co 124 Acme Supply Company
SCD Type 1 Rows can be updated or added based upon business key Historical information is not tracked Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Suply Co CA Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co CA Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co IL
SCD Type 2 Rows are only added A version number or effective dates are used to keep track of history Supplier Key Code Name State Start Date End 123 ABC Acme Supply Co CA 01-Jan-2000 21-Dec-2004 124 IL 22-Dec-2004
SCD Type 3 Rows are updated but not added Historical information is preserved through extra columns Supplier Key Code Name Original / Prior Supplier State Effective Date Current State 123 ABC Acme Supply Co CA 22-Dec-2004 IL
SCD Type 4 Combination of type 1 and type 2 dimensions Rows are updated in the type 1 table and added in the type 2 table Supplier Supplier_key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co IL Supplier History Supplier HistKey Supplier Key Code Name State Start Date End 1001 123 ABC Acme Supply Co CA 01-Jan-2000 21-Dec-2004 1002 IL 22-Dec-2004
SCD Type 6 / hybrid Combines type 1, 2 and 3 in one table Supplier Key Code Name Current State Prior Start Date End Flag 123 ABC Acme Supply Co NY CA 01-Jan-2000 21-Dec-2004 N 124 IL 22-Dec-2004 03-Feb-2008 125 04-Feb-2008 Y
Roleplaying Dimensions Recycled for multiple applications within the same database Date dimension is commonly used (sale date, delivery date) Can be used to get different views of data
Roleplaying Example
Factless Fact Tables Tracking events Many to many joins