Download presentation
1
Data Warehousing
2
On-Line Analytical Processing (OLAP) Tools
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing–come up with 2-D view of data Drill-down–going from summary to more detailed views Roll-up – the opposite direction of drill-down Reaggregation – rearrange the order of dimensions
3
Slicing a data cube
4
Example of drill-down Summary report
Starting with summary data, users can obtain details for particular cells Drill-down with color added
5
Excel’s Pivot Table Data/Pivot Table Drilldown, rollup, reaggregation
6
Access Pivot Form Drill Down
7
Data Warehouse A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented: e.g. customers, employees, locations, products, time periods, etc. Dimensions for data analysis Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources Time-variant: Can study trends and changes Nonupdatable: Read-only, periodically refreshed Data Mart: A data warehouse that is limited in scope
8
Need for Data Warehousing
Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational systems and data (for improved performance)
9
Generic two-level data warehousing architecture
One, company-wide warehouse T E Periodic extraction data is not completely current in warehouse
10
The ETL Process Capture/Extract Scrub or data cleansing Transform
Load and Index ETL = Extract, transform, and load
11
Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Incremental extract = capturing changes that have occurred since the last static extract Static extract = capturing a snapshot of the source data at a point in time
12
Data Warehouse Design - Star Schema -
Also called “dimensional model” Fact table contain detailed business data Dimension tables contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. A dimension is a term used to describe any category used in analyzing data, such as time, geography, and product line.
13
Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables contain descriptions about the subjects of the business
14
Star schema with sample data
15
Example: Order Processing System
City OID ODate CID Cname Rating SalesPerson Has M Order Customer 1 M Qty Has M Product Price PID Pname
16
Star Schema Location CustomerRating Dimension Dimension LocationCode
State City CustomerRating Dimension Rating Description FactTable LocationCode PeriodCode Rating PID Qty Amount Can group by State, City Period Dimension PeriodCode Year Quarter Product Category CategoryID Description Product Dimension PID Pname CategoryID (Snowflake model)
17
From SalesDB to MyDataWarehouse
Extract data from SalesDB: Create query to get the data Download to MyDataWareHouse File/Import/Save as Table Data scrub/cleasing,and transform: Transform City to Location Transform Odate to Period Load data to FactTable
18
Bitmap saves on space requirements Figure 6-8
Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values Figure 6-8 Bitmap index index organization
19
Figure 6-9 Join Indexes–speeds up join operations
20
Data Mining and Visualization
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis
21
SQL GROUPING SETS GROUPING SETS
SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY GROUPING SETS(CITY,RATING,(CITY,RATING),()) ORDER BY CITY; Note: () indicates that an overall total is desired.
22
SQL CUBE Perform aggregations for all possible combinations of columns indicated. SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY CUBE(CITY,RATING) ORDER BY CITY, RATING;
23
SQL ROLLUP The ROLLUP extension causes cumulative subtotals to be calculated for the columns indicated. If multiple columns are indicated, subtotals are performed for each of the columns except the far-right column. SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY ROLLUP(CITY,RATING) ORDER BY CITY, RATING
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.