Download presentation
Presentation is loading. Please wait.
1
Data Warehousing - 2 ISYS 650
2
Data Warehouse Design - Star Schema - Dimension tables – contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. Fact table – contain detailed business data with links to dimension tables.
3
Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables contain descriptions about the subjects of the business Note: What is the key of the fact table?
4
Star schema with sample data
5
On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques OLAP Operations – Cube slicing–come up with 2-D view of data – Drill-down–going from summary to more detailed views – Roll-up – the opposite direction of drill-down – Reaggregation – rearrange the order of dimensions
6
Slicing a data cube
7
Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells
8
Excel’s Pivot Table Insert/Pivot Table or Pivot Chart – Drill down, rollup and reaggregation – Pivot: change the dimensional orientation of a report or an ad hoc query-page display – Filter Pivot Chart – Filter – Drilldown, rollup, reaggregation
9
Data Warehouse Lifecycle Requirement gathering – Determine the reports that DW is supposed to support. Identify data sources and data modeling – based on user requirements Extract data and populate the staging area with the data extracted from transactional sources. Build and populate a dimensional database. Build Extraction Transformation and Loading (ETL) routines to populate the dimensional database regularly. Build reports and analytical views Maintain the warehouse by adding/changing supported features and reports
10
Example: Transaction Database Customer Order Product Has 1 M M M CID Cname City OIDODate PID Pname Price Rating SalesPerson Qty
11
Analyze Sales Data Detailed Business Data Total sales: – by product: Qty*Price of each detail line Sum (Qty*Price) Detailed business data: qty*price Total quantity sold: – By product: Sum(Qty) Detailed business data: Qty
12
Dimensions for Data Analysis: Factors relevant to the business data Analyze sales by Product Analyze sales related to Customer: – Location: Sales by City – Customer type: Sales by Rating Analyze sales related to Time: – Quarterly, monthly, yearly Sales Analyze sales related to Employee: – Sales by SalesPerson
13
Data Warehouse Design - Star Schema - Dimension tables – contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. Fact table – contain detailed business data with links to dimension tables.
14
Star Schema FactTable LocationCode PeriodCode Rating PID Qty Amount Location Dimension LocationCode State City CustomerRating Dimension Rating Description Product Dimension PID Pname Category Period Dimension PeriodCode Year Quarter Can group by State, City
15
Define Location Dimension Location: – In the transaction database: City – In the data warehouse we define Location to be State, City San Francisco -> California, San Francisco Los Angeles -> California, Los Angeles – Define Location Code: California, San Francisco -> L1 California, Los Angeles -> L2
16
Define Period Dimension Period: – In the transaction database: Odate – In the data warehouse we define Period to be: Year, Quarter Odate: 11/2/2003 -> 2003, 4 Odate: 2/28/2003 -> 2003, 1 – Define Period Code: 2003, 4 -> 20034 2003, 1 -> 20031
17
The ETL Process Capture/Extract Transform – Scrub(data cleansing),derive – Example: City -> LocationCode, State, City OrderDate -> PeriodCode, Year, Quarter Load and Index
18
From SalesDB to MyDataWarehouse Extract data from SalesDB: – Create query to get the fact data FactData – Download to MyDataWareHouse Transform: – Transform City to Location – Transform Odate to Period Query FactDataScrubing Load data to FactTable
19
Performing Analysis Analyze sales: – by Location – By Location and Customer Type – By Location and Period – By Period and Product Pivot Table: – Drill down, roll up, reaggregation
20
HR Database Historical data: – Job_History A record in this table keep track the starting date and ending date of an employee working on a job at a department.
21
We may study: Average days an employee stays in assigned jobs. Average days employees stay in a specific job_id. Any difference among departments in how long employees stay in job. Will the starting year affect how long employees stay in job? Basic measurement: – DaysOnJob: End_Date – Start_Date
22
Star Schema FactTable Empliyee_ID SartedYear Job_ID Department_ID City DayOnJob City Dimension City Country_Name Employee Dimension Empliyee_ID FullName Email Department Dimension Department_ID Department_Name StartYear Dimension StartedYear City Dimension City Country_Name
23
Define Dimensions Employee dimension: – Employee_ID, FullName, Email FullName = First_name || ‘ ‘ || Last_Name Job dimension: – Job_ID, Job_Title City dimension: – City, Country_Name Join Locations and Countries Department dimension: – Department_ID, Department_Name StartYear dimension – StartedYear extract(year from start_date)
24
Create DWHR Using Access Each dimension is defined as a view in HR database. Communication between Access and Oracle is using ODBC. In Access, we can import Oracle’s view to create a table.
25
Create View to Retrieve Fact Data FactData view is a join of Job_History, Departments and Locations.
26
Transform Fact Data select employee_id, extract(year from start_date) as StartedYear, Job_id,department_id,city, End_date-Start_date as DaysOnJob from factdata ;
27
Reference http://msdn.microsoft.com/en- us/library/aa902672(SQL.80).aspx#sql_dwdesi gn_tool
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.