Alternative Database topology: The star schema D.W. O.L.A.P Data mining
The Atomic Schema Customer Cust Purchases Product Ref Cust Averages Customer ID Status Date Cust Addr State Cust ZIP Code Customer Type Customer Status ... Customer ID Activity Date Product Code Product Name Sales Rep ID Qty Purchased Total Dollars Promotion Flag Cust Purchases Product Code ProdRef Eff. Date ProdRef End Date Product Name Unit Price Product Category Product Type Product Sub Type Product Ref Cust Averages Customer ID Cust Average Date Cust Avg. End Date Cust Avg. Rev. Cust Longevity Atomic level data structured to support a wide variety of informational requirements across the organization As a result, atomic data too normalized to be easily accessed or understood by most end users Data consistently needs to be aggregated into the same categories (dimensions) Multidimensional processing capabilities provide users with tremendous flexibility for most of their analysis requirements Store ID Store Name Store Location Distribution Channel Outlet Reference Sales Rep ID Sales Person Name Store ID Sales Rep Ref
The Star Schema Fact Table Dimension Table 1 Dimension Table 3 Dimension Key 1 Fact Table Dimension Key 3 Dimension Key 1 Dimension Key 2 Dimension Key 3 Dimension Key 4 Description 1 Aggregatn Lvl 1.1 Aggregatn Lvl 1.2 Aggregatn Lvl 1.n Description 3 Aggregatn Lvl 3.1 Aggregatn Lvl 3.2 Aggregatn Lvl 3.n Fact 1 Fact 2 Fact 3 Fact 4 . Fact n Dimension Table 2 Dimension Table 4 Dimension Key 2 Dimension Key 4 Description 2 Aggregatn Lvl 2.1 Aggregatn Lvl 2.2 Aggregatn Lvl 2.n Description 4 Aggregatn Lvl 4.1 Aggregatn Lvl 4.2 Aggregatn Lvl 4.n
Dimension Table Dimension Table 1 Dimension Key 1 Description 1 Aggregatn Lvl 1.1 Aggregatn Lvl 1.2 Aggregatn Lvl 1.n Describes the data that has been organized in the Fact Table Key should either be the most detailed aggregation level necessary (e.g. country vs. county), if possible, or... Surrogate keys may be necessary, but will decrease the natural value of the key Manageable number of aggregation levels
Fact Table Quantifies the data that has been described by the Dimension Tables Key made up of unique combination of values of dimension keys ALWAYS contains date or date dimension Fact values should be additive Aggregations of quantities or amounts from atomic level No percentages or ratios May be non-additive, time-variant data Dimension Key 1 Dimension Key 2 Dimension Key 3 Dimension Key 4 Fact 1 Fact 2 Fact 3 Fact 4 . Fact n Fact Table
For Example: Purchases 1 Customer Location Selling Responsibility Cust ZIP Code City State/Province Country Customer Location Selling Responsibility Sales Rep ID Sales Rep Name Store ID Store Name Store Location Sales Channel Purchases 1 Days of Activity Unit Price Total Quantity Total Dollars Returned Qty Returned Dollars Promotion Qty Sales Rep ID Product Code Cust ZIP Code Customer Type Week Ending Date Customer Type Cust Type Desc Product Product Code Product Name Prod. Category Product Type Prod Sub Type Week Ending Date Month Quarter Year Date Information
Star Schema Query Select E.Month, B.Customer_Type, C.Product_Type, D.Store_Location, sum(A.Total_Quantity) From Purchases_1 A, Customer_Type B, Product C, Selling_Responsibility D, Date_Information E Where B.Customer_Type = A.Customer_Type and C.Product_Code = A.Product_Code and D.Sales_Rep_ID = A.Sales_Rep_ID and E.Week_Ending_Date = A.Week_Ending_Date and E.Year = “1996” and C.Product_Category = “V” Group by E.Month, B.Customer_Type, C.Product_Type, D.Store_Location;
Answer: Distinct Time Period Fact Tables Weekly Date D1 D2 D3 D4 Monthly Date D1 D2 D3 D4 Create separate fact tables to account for different time periods Date still part of each fact table key Same dimension tables used by both fact tables Improves overall performance (loading and accessing) for each time period Will not increase amount of managed redundancy Different time periods (weekly, monthly, accounting period, billing cycle) required for different analysis purposes.