Decision Support and Data Warehouse
Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical models: Statistical model, management science model User interface –Data visualization
On-Line Analytical Processing (OLAP) The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques OLAP Operations –Cube slicing–come up with 2-D view of data –Drill-down–going from summary to more detailed views –Roll-up – the opposite direction of drill-down –Reaggregation – rearrange the order of dimensions
Slicing a data cube
Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells
Excel’s Pivot Table Data/Pivot Table –Drilldown, rollup, reaggregation
Data Warehouse A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes –Subject-oriented: e.g. customers, employees, locations, products, time periods, etc. Dimensions for data analysis –Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources –Time-variant: Can study trends and changes –Nonupdatable: Read-only, periodically refreshed
Data Warehouse Design - Star Schema - Fact table –contain detailed business data Ex. Line items of orders to compute total sales by product, by salesperson. Dimension tables –Dimension is a term used to describe any category or subjects of the business used in analyzing data, such as customers, employees, locations, products, time periods, etc. –contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. –Example: A sold item related to many business subjects such as salesperson, customer, product and time period.
Example: Order Processing System Customer Order Product Has 1 M M M CID Cname City OIDODate PID Pname Price Rating SalesPerson Qty
Star Schema FactTable LocationCode PeriodCode Rating PID Qty Amount Location Dimension LocationCode State City CustomerRating Dimension Rating Description Product Dimension PID Pname Category Period Dimension PeriodCode Year Quarter Can group by State, City
Snowflake Schema FactTable LocationCode PeriodCode Rating PID Qty Amount Location Dimension LocationCode State City CustomerRating Dimension Rating Description Product Dimension PID Pname CategoryID Product Category CategoryID Description Period Dimension PeriodCode Year Quarter Can group by State, City
The ETL Process E T L One, company- wide warehouse Periodic extraction data is not completely current in warehouse
The ETL Process Capture/Extract Transform –Scrub(data cleansing),derive –Example: City -> LocationCode, State, City OrderDate -> PeriodCode, Year, Quarter Load and Index ETL = Extract, transform, and load
From SalesDB to MyDataWarehouse Extract data from SalesDB: –Create query to get the data –Download to MyDataWareHouse File/Import/Save as Table Transform: –Transform City to Location –Transform Odate to Period Query FactPC Load data to FactTable
Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables contain descriptions about the subjects of the business
Star schema with sample data
SQL GROUPING SETS GROUPING SETS –SELECT CITY,RATING,COUNT(CID) FROM CUSTOMERS –GROUP BY GROUPING SETS(CITY,RATING,(CITY,RATING),()) – ORDER BY CITY; Note: Compute the subtotals for every member in the GROUPING SETS. () indicates that an overall total is desired.
Results CITY Rating COUNT(CID) CHICAGO A 1 CHICAGO B 2 CHICAGO 3 LOS ANGELES A 1 LOS ANGELES C 1 LOS ANGELES 2 SAN FRANCISCO A 2 SAN FRANCISCO B 1 SAN FRANCISCO 3 A 4 8 CITY R COUNT(CID) B 3 C 1
SQL CUBE Perform aggregations for all possible combinations of columns indicated. –SELECT CITY,RATING,COUNT(CID) FROM CUSTOMERS –GROUP BY CUBE(CITY,RATING) –ORDER BY CITY, RATING;
Results CITY Rating COUNT(CID) CHICAGO A 1 CHICAGO B 2 CHICAGO 3 LOS ANGELES A 1 LOS ANGELES C 1 LOS ANGELES 2 SAN FRANCISCO A 2 SAN FRANCISCO B 1 SAN FRANCISCO 3 A 4 B 3 CITY R COUNT(CID) C 1 8
SQL ROLLUP The ROLLUP extension causes cumulative subtotals to be calculated for the columns indicated. If multiple columns are indicated, subtotals are performed for each of the columns except the far-right column. –SELECT CITY,RATING,COUNT(CID) FROM CUSTOMERS – GROUP BY ROLLUP(CITY,RATING) – ORDER BY CITY, RATING;
Results CITY Rating COUNT(CID) CHICAGO A 1 CHICAGO B 2 CHICAGO 3 LOS ANGELES A 1 LOS ANGELES C 1 LOS ANGELES 2 SAN FRANCISCO A 2 SAN FRANCISCO B 1 SAN FRANCISCO 3 8
Need for Data Warehousing Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational systems and data (for improved performance)