Examines blended and separate transaction schemas Ch 5 Procurement Examines blended and separate transaction schemas Slowly changing dimensions Type 1, 2, 3, hybrids Jan 2004 91.4904 Ron McFadyen
Procurement questions What materials are purchased most frequently? How are our vendors performing? Delivering on time? Jan 2004 91.4904 Ron McFadyen
Single schema approach Grain of one row per procurement transaction Blended – multiple transaction types Date Product Proc Trans Fact Contract terms Vendor Proc trans type Jan 2004 91.4904 Ron McFadyen
Procurement fact table Procurement transaction type Purchase orders Shipping notices Warehouse receipts Vendor payments … Complexities: discounts apply to payments but not other types; Many source systems would be involved Many possible degenerate dimensions Jan 2004 91.4904 Ron McFadyen
Multiple Schema approach Purchase requisition fact Figure 4.2 Product Contract terms Date Purchase order fact Vendor Shipping notices fact Received condition Employee Warehouse receipts fact Discount taken Grain of one row per procurement transaction Vendor payments fact Jan 2004 91.4904 Ron McFadyen
If a product is deleted, … Managing dimensions The data stored in dimensions originally comes from the legacy systems. Changes that occur in the legacy system must be reflected in the data warehouse: If a new product is being sold, it must be added to the product dimension in the warehouse. If a product changes, we need to reflect the changes in the product dimension in the warehouse If a product is deleted, … Jan 2004 91.4904 Ron McFadyen
Chapter 4 deals with slowly changing dimensions Managing dimensions We’ll classify dimensions according to the frequency with which the source data changes: Slowly changing Chapter 4 deals with slowly changing dimensions Rapidly changing Jan 2004 91.4904 Ron McFadyen
Managing slowly changing dimensions If a source field changes, and if we consider the dimension a slowly changing dimension, we have three basic approaches: Type 1 Overwrite Type 2 Create a new dimension record Type 3 Add a dimension column Jan 2004 91.4904 Ron McFadyen
Overwrite the dimension record with the new values Type 1 Overwrite the dimension record with the new values Used whenever the old value of the attribute is of no importance Definitely applicable if a value changes because of a correction Jan 2004 91.4904 Ron McFadyen
Create a new dimension record with a new surrogate key value Type 2 Create a new dimension record with a new surrogate key value Commonly used approach Said to partition history e.g. suppose a customer moves We may still want to associate the sales to the customer’s old address In this case we create a new customer record with the customer’s new field values This new customer record will have a new surrogate key Old facts are not altered Jan 2004 91.4904 Ron McFadyen
Type 3 Create an ‘old’ field in the dimension record to store the immediately previous value Jan 2004 91.4904 Ron McFadyen
Predictable set of changes Hybrids Predictable set of changes E.g. sales organization re-assigns districts every year Sales Rep dimension has 5 values for each of current and 4 previous district assignments Current district Prior1 district Prior2 district … Jan 2004 91.4904 Ron McFadyen
Unpredictable changes E.g. department assignment for a product Hybrids Unpredictable changes E.g. department assignment for a product Keep two values for department: Current and historically accurate value Type 1, type 2 and type 3 type 6 Example on page 104 Jan 2004 91.4904 Ron McFadyen
Used for all dimensions as primary keys Surrogate keys Used for all dimensions as primary keys Note that most dimensions should have a special row for “unknown”. Examine our sample Promotion dimension; there is one row for “No Promotion” Smart Keys A smart key is a key that is a single attribute with components that have meaning to end users This should be avoided and broken down into distinct fields Jan 2004 91.4904 Ron McFadyen
Production/Natural keys Do not use as keys in dimension tables – use surrogate keys instead Production, or legacy, keys could be reused on the operational side. The warehouse DBA has no control over those sorts of issues. If companies merge or are acquired, duplicates could arise. Surrogate keys get around the difficulties of using production keys in the warehouse. Jan 2004 91.4904 Ron McFadyen