Retail Sales is used to illustrate a first dimensional model Chapter 2 Retail Sales is used to illustrate a first dimensional model Design process Case study: POS example Star schema Facts Dimensions Creating the schema in SQL Server Factless facts Degenerate dimensions Extensibility Snowflaking Outriggers January 2004 91.4904 Ron McFadyen
The Dimensional Design Process 4 Step Dimensional Design Process Select the business process, examples: invoicing, orders, inventory, general ledger, … Declare the grain. Determine exactly what an individual fact table row represents. Examples: a line item on an order, a boarding pass to get on a flight, a student’s course registration, a monthly snapshot for a bank account. Choose the dimensions that apply to the facts. What describes each fact. Examples: customer dimension, student dimension, course dimension, day dimension. Identify the numeric facts that appear in the rows of the fact table. January 2004 91.4904 Ron McFadyen
The business process: POS retail sales Case Study Case Study The business process: POS retail sales Grain of the fact table: individual line items on a POS transaction The dimensions: date, product, store, promotion The facts: sales quantity, cost dollar amount, sales dollar amount, gross profit dollar amount (derivable) January 2004 91.4904 Ron McFadyen
A typical drawing seen in practice, in articles, … Case Study Schema Date Product Sales facts Store Promotion A typical drawing seen in practice, in articles, … January 2004 91.4904 Ron McFadyen
Case Study Schema in Peter Chen Notation 1 1 Product Date n n Sales facts n n n Store Promotion 1 1 1 Sales Transaction Note: Sales transaction does not appear in text. Later in chapter it is discussed as a degenerate dimension January 2004 91.4904 Ron McFadyen
Facts can be described as additive, non-additive, semi-additive. Case Study Fact Table Sales facts Sales quantity Sales dollar amount Cost dollar amount Gross profit dollar amount Additive Facts can be described as additive, non-additive, semi-additive. Additive: can be meaningfully summed across all dimensions Semi-additive: …………………….. across some dimensions Non-additive: can’t be … The text discusses some non-additive facts that might be included in such a fact table: gross margin, unit price January 2004 91.4904 Ron McFadyen
Case Study Fact Table The physical table: Sales facts Date key (FK) Product key (FK) Store key (FK) Promotion key (FK) POS Transaction Number (degenerate dimension) Sales quantity Sales dollar amount Cost dollar amount Gross profit dollar amount PK January 2004 91.4904 Ron McFadyen
Case Study Date Dimension Very descriptive Easy to set criteria for queries Easy to get headings for reports One row for each day (this is the grain of the Date dimension) PK is a surrogate key Used in every star schema Hierarchies are present Not normalized attribute hierarchy Calendar week … Fiscal week … Date key (PK) Date Full date description Day of week Day number in epoch Week number in epoch Month number in epoch Day number in calendar month … …. Last day in week indicator Holiday indicator Weekday indicator SQL date stamp … Calendar week Calendar month Calendar year Fiscal week Fiscal month Fiscal year January 2004 91.4904 Ron McFadyen
Case Study Product Dimension Very descriptive Easy to set criteria for queries Easy to get headings for reports One row for each product for sale, or ever sold, by the company PK is a surrogate key. We do not use the operational PK here. Over time it may not be unique: the business may re-use keys, companies merge … Not normalized An attribute hierarchy Brand category department Product key (PK) Product description SKU number Brand description Category description Department description Package type description Package size Fat content Diet type Weight Weight units of measure … January 2004 91.4904 Ron McFadyen
Case Study Drilling Down/Up Product Run a query to generate: Department, sales amount, sales quantity Now, add another attribute at a ‘lower’ level such as brand: Department, brand, sales amount, sales quantity What is meant by row-headers (in the text)? Product key (PK) Product description SKU number (natural key) Brand description Category description Department description Package type description Package size Fat content Diet type Weight Weight units of measure … January 2004 91.4904 Ron McFadyen
Case Study Store Dimension Very descriptive Easy to set criteria for queries Easy to get headings for reports One row for each store PK is a surrogate key. Not normalized An attribute hierarchies city county state zip district region How does the text handles the “First open date” attribute? Store key (PK) Store name Store number (natural key) Store street address Store city Store county Store state Store zip code … Total square footage First open date January 2004 91.4904 Ron McFadyen
Case Study Promotion Dimension Very descriptive Easy to set criteria for queries Easy to get headings for reports One row for each promotion PK is a surrogate key. Need a special row for “no promotion in effect” Why? Promotion key (PK) Promotion name Price reduction type Promotion media type Ad type … January 2004 91.4904 Ron McFadyen
A factless fact table has no measurement metrics. Factless Fact Tables A factless fact table has no measurement metrics. These types of fact tables record occurrences of events. In retail sales, one might ask “What products were on promotion, but did not sell?” The Sales facts table only records sales, and so it alone is not enough to answer this question. January 2004 91.4904 Ron McFadyen
How many rows are in Coverage? How many rows in Sales Facts? Factless Fact Tables Consider Date Product Coverage Store Promotion The Coverage table has one row for each promotion of a product at some store on a certain date. How many rows are in Coverage? How many rows in Sales Facts? January 2004 91.4904 Ron McFadyen
What is the SQL to determine: Factless Fact Tables Consider Date Product Coverage Store Promotion What is the SQL to determine: “What products were on promotion, but did not sell?” January 2004 91.4904 Ron McFadyen
Degenerate Dimensions A degenerate dimension is one where the only attribute of interest is the natural key. As a result, there is no physical dimension in the data warehouse. e.g. Transaction number in Retail Sales Transaction can be shown as a dimension in the logical design, but there is no Transaction table in the physical design. The fact table has a transaction number (instead of a surrogate key to a Transaction dimension) January 2004 91.4904 Ron McFadyen
Degenerate Dimensions Sales facts Date key (FK) Product key (FK) Store key (FK) Promotion key (FK) POS Transaction Number (DD) Sales quantity Sales dollar amount Cost dollar amount Gross profit dollar amount PK Degenerate Dimension January 2004 91.4904 Ron McFadyen
Degenerate Dimensions Very common in star schema designs Orders Invoices … In many systems where there are “line items”, there is some interesting operational key that can tie the facts back to the operational systems: order#, invoice#, … January 2004 91.4904 Ron McFadyen
Extensibility of Star Schema Designs In many cases we can add: New dimension tables New fact tables New aggregates New dimension attributes New measurement metrics without changing existing applications January 2004 91.4904 Ron McFadyen
Extensibility of Star Schema Designs In many cases we can add dimensions to an existing design and database. Consider Retail Sales and the new dimensions: Frequent Shopper, Clerk, Time of Day Is the Frequent Shopper concept valid? Is knowing who the clerk is reasonable? Do we know the time of day for a sale? Any way of describing a fact that is single-valued for all existing facts in the fact table, could become a dimension. What is required in the database environment to accomplish this? January 2004 91.4904 Ron McFadyen
Extensibility of Star Schema Designs What is required in the database environment to extend a star schema with a new dimension? Alter table … may be complex – at the least we are adding an attribute Create table … create a new dimension Load the new dimension … Populate the new foreign key in the altered fact table Create an ETL process for the new dimension Modify the ETL process for the fact table January 2004 91.4904 Ron McFadyen
If a dimension is normalized, we say it is a snowflaked design. Snowflaking If a dimension is normalized, we say it is a snowflaked design. Consider the Product dimension, and suppose we have the following functional dependencies: January 2004 91.4904 Ron McFadyen
The Product dimension is in _____________ normal form. Snowflaking Product key SKU number Product description Brand key Brand description Category key Category description Department key Department description The Product dimension is in _____________ normal form. January 2004 91.4904 Ron McFadyen
Now, the Product dimension is in _____________ normal form. Snowflaking Date Product Brand Category Department Sales facts Store Promotion Now, the Product dimension is in _____________ normal form. The general problem is that this complicates the user’s view of data, complicates the underlying SQL, defeats the usefulness of bit vectors, minimally decreases space requirements, and queries execute slower. January 2004 91.4904 Ron McFadyen
Outriggers Date Product Sales facts Date Store Promotion Date is called an outrigger table for Store. Note there is only one Date table Store was shown to have two dates: First open date and Last remodel date Instead of being attribute values from the Date domain, these can be foreign keys to the Date dimension. January 2004 91.4904 Ron McFadyen
Outriggers Date Product Sales facts Date Store Promotion Same table Date Product A fact will join to one row of Date and one row of Store, but these two rows of Date are usually different rows. Sales facts Date Store Promotion Outrigger Outriggers are an acceptable variation on normalized dimensions. They are justified because they add a great deal to the expressive capability of queries. January 2004 91.4904 Ron McFadyen