Overview and Fundamentals Dimensional Modeling Overview and Fundamentals
What Is Dimensional Modeling?
Dimensional Modeling An Overview The Fundamentals Structure Terminology Benefits The Fundamentals Facts and Dimensions The 4 Step Design Process
What does Data Warehouse look like Kimball Data Warehouse Back Room Staging Area (The kitchen) Data Presentation Area (The Dinning Room) Star Schema Corporate Information Factory (CIF) Normalized Dimensions Dual ETL Loading, Warehouse and Data Marts Operational Data Store (ODS) Hybrid Data Warehouse
Kimball Data Warehouse
Corporate Information Factory
Terminology Dimensions The time independent, textual and descriptive attributes by which users describe objects. Combining all the attributes including hierarchies, rollups and sub-references into a single dimension is denormalization. Often the “by” word in a query or report Not time dependent Facts Business Measurements Most Facts are Numeric Additive, Semi-Additive, Non-Additive Built from the lowest level of detail (grain) Very Efficient Time dependent
Star Schema Singe data (fact) table surrounded by multiple descriptive (dimension) tables
Benefits Performance (Integer relationships, natural partitioning, Single joins benefit SQL optimizer) Source system independence and multiple integration Supports Change management Usability/Simplicity (easy to read, interpret, join, calculate) Presentation (Consistency, Taxonomy, Labeling) Reuse (Conformed dimensions reduce redundancy, Role-plays)
Dimension Change Strategy Type 1: Is used when the old value of the attribute has no significance or can be discarded. Easy and Fast Type2: Partitions history so that fact tables properly reflect original values. Requires use of Surrogate Keys Causes table growth due to additional history rows Users must be aware of the added complexity Effective Dates used secondary to cleaner fact joins
Dimension Change Strategy Type 3: Additional attribute used to capture changes. Used less frequently then Type 1 or 2. Simultaneously supports two views of the world. Does not trend changes over time. Current and Prior or Current and Original Attributes Hybrid Type: Combination 1, 2 & 3 changes New attribute for predictable series (such as yearly changes) Type 2 changes with prior or original attributes included Expanded dimension table for durable key inclusion in fact Added complexity to users
Dimension Role Playing A single table that plays multiple roles (using views) to create synonym dimension attributes. Most common role playing dimension is the Date Dimension. i.e. separate role playing dimensions for order date and ship date.
Fact Table Types Characteristic Transaction Periodic Snapshot Accumulating Snapshot Time period Point in time Regular, predictable intervals Indeterminate time span, typically short-lived. Grain One row per transaction event One row per period One row per life Fact table loads Insert Insert and Update Fact row updates Not revisited Revisited whenever activity Date dimension Transaction date End-of-period date Multiple dates for multiple milestones Facts Transaction activity Performance for a predefined time interval. Performance over finite lifetime
Modeling Design Process Identify the Business Process Source of “measurements” Identify the Grain What does 1 row in the fact table represent or mean? Identify the Dimensions Descriptive context, true to the grain Identify the Facts Numeric additive measurements, true to the grain
Step 1 - Identify the Business Process This is a business activity typically tied to a source system. Not to be confused with a business department or function. An Orders dimensional model should support the activities of both Sales and Marketing. “If we establish departmentally bound dimensional models, we’ll inevitably duplicate data with different labels and terminology.”
Step 2 - Identify the Grain The level of detail associated with the fact table measurements. A critical step necessary before steps 3 and 4. Preferably it should be at the most atomic level possible. “How do you describe a single row in the fact table?”
Step 3 - Identify the Dimensions The list of all the discrete, text-like attributes that emanate from the fact table. They are the “by” words used to describe the requirements. Each dimension could be though of as an analytical “entry point” to the facts. “How do business people describe the data that results from the business process?”
Step 4 - Identify the Facts Must be true to the grain defined in step 2. Typical facts are numeric additive figures. Facts that belong to a different grain belong in a separate fact table. Facts are determined by answering the question, “What are we measuring?” Percentages and ratios, such as gross margin, are non-additive. The numerator and denominator should be stored in the fact table.
For More Information Articles, Design Tips and Newsletters http://www.kimballgroup.com Designing A Scalable DW/BI System http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032297070&EventCategory=4&culture=en-US&CountryCode=US Microsoft BI Using the Kimball Method http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032297084&EventCategory=4&culture=en-US&CountryCode=US Using SSIS to Populate a Kimball Method Data Warehouse http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032297072&EventCategory=5&culture=en-US&CountryCode=US