Data Warehousing.

Slides:



Advertisements
Similar presentations
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Advertisements

Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision.
ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/20101Lipyeow.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
The Role of Data Warehousing and OLAP Technologies CS 536 – Data Mining These slides are adapted from J. Han and M. Kamber’s book slides (
1 CIS *717.2 Data Warehouse Design Week 2 Dimensional Modeling Primer Data Warehouse Models OLAP Operations Instructor Carmela R. Balassiano Feb 5, Spring.
Data Warehouses and OLAP
Data Warehousing Xintao Wu. Evolution of Database Technology (See Fig. 1.1) 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational.
Data Warehousing.
Dr. M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2010 COMP207: Data Mining Data Warehousing COMP207: Data Mining.
Data Mining By Archana Ketkar.
1 Lecture 10: More OLAP - Dimensional modeling
Lab3 CPIT 440 Data Mining and Warehouse.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATA WAREHOUSE (Muscat, Oman).
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
CS346: Advanced Databases
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
1 Data Warehouses C hapter 2. 2 Chapter 2 Outline Chapter 2 Outline – Introduction –Data Warehouses –Data Warehouse in Organisation – OLTP vs. OLAP –Why.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
An overview of Data Warehousing and OLAP Technology
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
School of Management, HUST
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Chapter 1 Introduction to Data Mining
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Dr. N. MamoulisAdvanced Database Technologies1 Topic 6: Data Warehousing & OLAP Defined in many different ways, but not rigorously. A decision support.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CISB594 – Business Intelligence Data Warehousing Part I.
Data Mining Lecture 2. Course Syllabus Course topics: Introduction (Week1-Week2) –What is Data Mining? –Data Collection and Data Management Fundamentals.
Data Mining Data Warehouses.
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
CISB594 – Business Intelligence Data Warehousing Part I.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Advanced Database Concepts
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Datawarehousing and OLAP C.Eng 714 Spring
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP. Data Warehousing and OLAP Technology for Data Mining  What is a data warehouse?  A multi-dimensional data model  Data warehouse.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Mining: Data Warehousing
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
A multi-dimensional data model
Data Warehouse—Subject‐Oriented
OLAP Concepts and Techniques
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Data Warehousing and OLAP Technology for Data Mining
Data Warehouse and OLAP
Data Warehousing and Data Mining
Overview of Data Warehousing and OLAP
Data Warehousing and Decision Support Chapter 25
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Mining: Concepts and Techniques
Data Warehouse and OLAP
Presented by: Tek Narayan Adhikari
Presentation transcript:

Data Warehousing

Data, Data everywhere yet ... We can’t find the data we need data is scattered over the network We can’t get the data we need need an expert to get the data We can’t understand the data we found available data is poorly documented We can’t use the data we found data needs to be transformed from one form to other

What is Data Warehouse? Definition by Inmon “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process”

Data Warehouse—Subject-Oriented Organized around major subjects, such as customer, product, sales

Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied Ensure consistency in naming conventions, attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted

Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

Data Warehouse—Non-Volatile Operational update of data does not occur in the data warehouse environment Requires only two operations in data accessing: initial loading of data and access of data

Data Warehouse vs. Operational DBMS OLTP (On-Line Transaction Processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (On-Line Analytical Processing) Major task of data warehouse system Data analysis and decision making

From Tables and Spreadsheets to Data Cubes A data warehouse is based on multidimensional data model which views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions (such as sales) Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables

Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema A fact table in the middle connected to a set of dimension tables Snowflake schema A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

Example of Star Schema Sales Fact Table Measures Time time_key Item day day_of_the_week month quarter year Sales Fact Table Item item_key item_name brand type supplier_type Time_key Item_key Branch_key Location Branch Location_key location_key street city province_or_street country branch_key branch_name branch_type Unit_sold Euros_sold Avg_sales Measures

Example of Snowflake Schema Supplier Time supplier_key supplier_type time_key day day_of_the_week month quarter year Item Sales Fact Table item_key item_name brand type supplier_key Avg_sales Euros_sold Unit_sold Location_key Branch_key Item_key Time_key city_key city province_or_street country City Branch branch_key branch_name branch_type location_key street city_key Location Measures

Example of Fact Constellation Shipping Fact Table Time unit_shipped Euros_sold to_location from_location shipper_key Item_key Time_key time_key day day_of_the_week month quarter year item_key item_name brand type supplier_key Item Sales Fact Table Avg_sales Euros_sold Unit_sold Location_key Branch_key Item_key Time_key Branch branch_key branch_name branch_type Location location_key street city Province/street country shipper_key shipper_name location_key shipper_type shipper Measures

A Sample Data Cube All, All, All Date Product Country Total annual sales of TV in Ireland Date Product Country All, All, All sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr Ireland France Germany

Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice project and select Pivot (rotate) reorient the cube, visualization, 3D to series of 2D planes.

Data Warehouse Architecture Relational Databases Legacy Data Purchased Data ERP Systems Analyze Query Data Warehouse Engine Optimized Loader Extraction Cleansing Metadata Repository

Data Warehouse Architecture Data Extraction - Data Extraction involves gathering the data from multiple heterogeneous sources. Data Cleaning - Data Cleaning involves finding and correcting the errors in data. Data Transformation - Data Transformation involves converting data from legacy format to warehouse format. Data Loading - Data Loading involves sorting, summarizing, consolidating, checking integrity and building indices and partitions. Refreshing - Refreshing involves updating from data sources to warehouse.

Data Warehouse Models Enterprise warehouse Data Mart collects all of the information about subjects spanning the entire organization Data Mart a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart

Introduction to Data Mining

What Motivated Data Mining? We are drowning in data, but starving for knowledge!

What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Why Data Mining?—Potential Applications Data analysis and decision support Market analysis and management Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation Risk analysis and management Forecasting, customer retention, quality control, competitive analysis Fraud detection and detection of unusual patterns (outliers)

Integration of Multiple Technologies Artificial Intelligence Machine Learning Database Management Statistics Algorithms Visualization Data Mining 10

What Can Data Mining Do? Cluster Classify Summarize Categorical, Regression Summarize Summary statistics, Summary rules Link Analysis / Model Dependencies Association rules Detect Deviations

Clustering Find groups of similar data items “Group people with similar travel profiles” George, Patricia Jeff, Evelyn, Chris Rob

Classification Find ways to separate data items into pre-defined groups A bank loan officer wants to analyse the data in order to know which customer (loan applicant) are risky or which are safe.

Association Rules “Find groups of items commonly purchased together” Identify dependencies in the data: X makes Y likely Indicate significance of each dependency “Find groups of items commonly purchased together” People who purchase X are likely to purchase Y

Deviation Detection Find unexpected values, Uses: Failure analysis Anomaly discovery for analysis “Find unusual occurrences in stock prices”

Knowledge Discovery (KDD) Process Pattern Evaluation Data mining—core of knowledge discovery process Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

Knowledge Process Data cleaning – to remove noise and inconsistent data Data integration – to combine multiple source Data selection – to retrieve relevant data for analysis Data transformation – to transform data into appropriate form for data mining Data mining Evaluation Knowledge presentation

Knowledge Process Although data mining is only one step in the entire process, it is an essential one since it uncovers hidden patterns for evaluation

Knowledge Process Based on this view, the architecture of a typical data mining system may have the following major components: Database, data warehouse, world wide web, or other information repository Database or data warehouse server Data mining engine Pattern evaluation model User interface