Download presentation
Presentation is loading. Please wait.
1
Data Warehousing
2
Data, Data everywhere yet ...
We can’t find the data we need data is scattered over the network We can’t get the data we need need an expert to get the data We can’t understand the data we found available data is poorly documented We can’t use the data we found data needs to be transformed from one form to other
3
What is Data Warehouse? Definition by Inmon
“A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process”
4
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer, product, sales
5
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied Ensure consistency in naming conventions, attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted
6
Data Warehouse—Time Variant
The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
7
Data Warehouse—Non-Volatile
Operational update of data does not occur in the data warehouse environment Requires only two operations in data accessing: initial loading of data and access of data
8
Data Warehouse vs. Operational DBMS
OLTP (On-Line Transaction Processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (On-Line Analytical Processing) Major task of data warehouse system Data analysis and decision making
9
From Tables and Spreadsheets to Data Cubes
A data warehouse is based on multidimensional data model which views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions (such as sales) Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables
10
Conceptual Modeling of Data Warehouses
Modeling data warehouses: dimensions & measures Star schema A fact table in the middle connected to a set of dimension tables Snowflake schema A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
11
Example of Star Schema Sales Fact Table Measures Time time_key Item
day day_of_the_week month quarter year Sales Fact Table Item item_key item_name brand type supplier_type Time_key Item_key Branch_key Location Branch Location_key location_key street city province_or_street country branch_key branch_name branch_type Unit_sold Euros_sold Avg_sales Measures
12
Example of Snowflake Schema
Supplier Time supplier_key supplier_type time_key day day_of_the_week month quarter year Item Sales Fact Table item_key item_name brand type supplier_key Avg_sales Euros_sold Unit_sold Location_key Branch_key Item_key Time_key city_key city province_or_street country City Branch branch_key branch_name branch_type location_key street city_key Location Measures
13
Example of Fact Constellation
Shipping Fact Table Time unit_shipped Euros_sold to_location from_location shipper_key Item_key Time_key time_key day day_of_the_week month quarter year item_key item_name brand type supplier_key Item Sales Fact Table Avg_sales Euros_sold Unit_sold Location_key Branch_key Item_key Time_key Branch branch_key branch_name branch_type Location location_key street city Province/street country shipper_key shipper_name location_key shipper_type shipper Measures
14
A Sample Data Cube All, All, All Date Product Country
Total annual sales of TV in Ireland Date Product Country All, All, All sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr Ireland France Germany
15
Typical OLAP Operations
Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice project and select Pivot (rotate) reorient the cube, visualization, 3D to series of 2D planes.
16
Data Warehouse Architecture
Relational Databases Legacy Data Purchased Data ERP Systems Analyze Query Data Warehouse Engine Optimized Loader Extraction Cleansing Metadata Repository
17
Data Warehouse Architecture
Data Extraction - Data Extraction involves gathering the data from multiple heterogeneous sources. Data Cleaning - Data Cleaning involves finding and correcting the errors in data. Data Transformation - Data Transformation involves converting data from legacy format to warehouse format. Data Loading - Data Loading involves sorting, summarizing, consolidating, checking integrity and building indices and partitions. Refreshing - Refreshing involves updating from data sources to warehouse.
18
Data Warehouse Models Enterprise warehouse Data Mart
collects all of the information about subjects spanning the entire organization Data Mart a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart
19
Introduction to Data Mining
20
What Motivated Data Mining?
We are drowning in data, but starving for knowledge!
21
What Is Data Mining? Data mining (knowledge discovery from data)
Extraction of interesting (implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
22
Why Data Mining?—Potential Applications
Data analysis and decision support Market analysis and management Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation Risk analysis and management Forecasting, customer retention, quality control, competitive analysis Fraud detection and detection of unusual patterns (outliers)
23
Integration of Multiple Technologies
Artificial Intelligence Machine Learning Database Management Statistics Algorithms Visualization Data Mining 10
24
What Can Data Mining Do? Cluster Classify Summarize
Categorical, Regression Summarize Summary statistics, Summary rules Link Analysis / Model Dependencies Association rules Detect Deviations
25
Clustering Find groups of similar data items
“Group people with similar travel profiles” George, Patricia Jeff, Evelyn, Chris Rob
26
Classification Find ways to separate data items into pre-defined groups A bank loan officer wants to analyse the data in order to know which customer (loan applicant) are risky or which are safe.
27
Association Rules “Find groups of items commonly purchased together”
Identify dependencies in the data: X makes Y likely Indicate significance of each dependency “Find groups of items commonly purchased together” People who purchase X are likely to purchase Y
28
Deviation Detection Find unexpected values, Uses: Failure analysis
Anomaly discovery for analysis “Find unusual occurrences in stock prices”
29
Knowledge Discovery (KDD) Process
Pattern Evaluation Data mining—core of knowledge discovery process Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases
30
Knowledge Process Data cleaning – to remove noise and inconsistent data Data integration – to combine multiple source Data selection – to retrieve relevant data for analysis Data transformation – to transform data into appropriate form for data mining Data mining Evaluation Knowledge presentation
31
Knowledge Process Although data mining is only one step in the entire process, it is an essential one since it uncovers hidden patterns for evaluation
32
Knowledge Process Based on this view, the architecture of a typical data mining system may have the following major components: Database, data warehouse, world wide web, or other information repository Database or data warehouse server Data mining engine Pattern evaluation model User interface
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.