Intro. to Data Warehouse

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Data Management for Decision Support Session - 1 Prof. Bharat Bhasker.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 5 DATA WAREHOUSING.
Introduction to Data Warehousing Enrico Franconi CS 636.
Data Warehouse Components
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
ETL The process of updating the data warehouse.. Recent Developments in Data Warehousing: A Tutorial Hugh J. Watson Terry College of Business University.
2nd semester 2010 Dr. Qusai Abuein
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Chapter 2 Data Warehousing
Database Systems – Data Warehousing
Data Warehouse Concepts Transparencies
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
CISB594 – Business Intelligence
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CISB594 – Business Intelligence Data Warehousing Part I.
CISB594 – Business Intelligence Data Warehousing Part I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
CISB594 – Business Intelligence Data Warehousing Part I.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Chapter 2 Data Warehousing. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures.
Chapter 2 Data Warehousing. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Understand data warehousing architectures.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
CISB594 – Business Intelligence Data Warehousing Part I.
Two-Tier DW Architecture. Three-Tier DW Architecture.
Advanced Database Concepts
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 5: Data Warehousing.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
DATA WAREHOUSING. Learning Objectives  Understand the basic definitions and concepts of data warehouses  Understand data warehousing architectures 
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Supervisor : Prof . Abbdolahzadeh
Chapter 2 Data Warehousing
Advanced Applied IT for Business 2
Data warehouse and OLAP
Data Warehouse—Subject‐Oriented
Data Warehousing and Data Mining By N.Gopinath AP/CSE
Data Warehouse.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
An Introduction to Data Warehousing
Introduction to Data Warehousing
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehouse.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Data Warehousing Concepts
Presentation transcript:

Intro. to Data Warehouse รศ.ดร. วรพจน์ กรีสุระเดช Worapoj Kreesuradej, Ph.D. Associate Professor Data Mining & Data Exploration Laboratory (DME Lab), Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Web: www.it.kmitl.ac.th/dme Email: worapoj@it.kmitl.ac.th

Book Paulraj Ponniah, Data Warehousing Fundamentals, John Wiley & Sons, 2001. Ralph Kimbal and Margy Ross, The Data Warehouse Toolkit, John Wiley and Sons, 2002.

Definition of DW “A collection of integrated, subject-oriented databases designed to supply the information required for decision-making.” - W. Inmon A decision support database that is maintained separately from the organization’s operational databases. A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format –E. Turban and etc.

R. Kimball’s definition of a DW A data warehouse is a copy of transactional data specifically structured for querying and analysis.

Problem: Data Management in Large Enterprises Vertical fragmentation of informational systems Result of application (user)-driven development of operational systems Sales Administration Finance Manufacturing ... Sales Planning Stock Mngmt Suppliers Debt Mngmt Num. Control Inventory

Problem: Data Management in Large Enterprises Two Approaches for accessing data: Query-Driven (Lazy) Warehouse (Eager) ? Source Source

The Need for DW . . . . . . Query-driven (lazy, on-demand) Clients Integration System Metadata . . . Wrapper Wrapper Wrapper . . . Source Source Source

Disadvantages of Query-Driven Approach Delay in query processing Inefficient and potentially expensive for frequent queries Competes with local processing at sources

The Warehousing Approach Information integrated in advance Stored in wh for direct querying and analysis Clients Data Warehouse Integration System Metadata . . . Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor . . . Source Source Source

Advantages of Warehousing Approach High query performance Doesn’t interfere with local processing at sources Information copied at warehouse Can modify, annotate, summarize, restructure, etc. Can store historical information Security, no auditing

Characteristics of DW Subject oriented Data are organized by how users refer to it Integrated Inconsistencies are removed in both nomenclature and conflicting information; (i.e. data are ‘clean’) Non-volatile Read-only data. Data do not change over time. Time variant Data are time series, not current status

Subject Oriented Data Warehouse is designed around “subjects” rather than processes A company may have Retail Sales System Outlet Sales System Catalog Sales System DW will have a Sales Subject Area

Subject Oriented OLTP Systems Data Warehouse Catalog Sales Retail Sales System Outlet Sales System Catalog Sales Sales Subject Area Subject-Oriented Sales Information Data Warehouse OLTP Systems

Integrated Heterogeneous Source Systems Need to Integrate source data For Example: Product codes could be different in different systems Arrive at common code in DW

Integrated Information integrated in advance Stored in DW for direct querying and analysis Clients Data Warehouse Source . . . Extractor/ Monitor Integration System Metadata

Non-Volatile Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data.

Non-Volatile(Read-Mostly) Write OLTP USER Read DW USER Read

Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

Time Variant Most business analysis has a time component Trend Analysis (historical data is required) Sales 2001 2002 2003 2004

Data Warehousing Process Overview

Data Warehousing Process Overview The major components of a data warehousing process Data sources Data extraction Data loading Comprehensive Database /Data Store Data Mart Metadata Middleware tools /information delivery tools

ETL Data Extraction Data Cleaning and Transformation Convert from legacy/host format to warehouse format Load Sort, summarize, consolidate, compute views, check integrity, build indexes, partition

The ETL Process Source Systems Staging Area DW Database Transform Extract Load

Data Staging Area A storage area where extracted data is cleaned, transformed and deduplicated. Initial storage for data Need not be based on Relational model Mainly sorting and Sequential processing Does not provide data access to users Analogy – kitchen of a restaurant

ETL Process Issues & Challenges Consumes 70-80% of project time Heterogeneous Source Systems Little or no control over source systems Source systems scattered Different currencies, measurement units Ensuring data quality

Comprehensive Database /Data Store Mostly a relational DB Oracle, DB2, Sybase, SQL Server New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing)

Data Warehouse Design OLTP Systems are Data Capture Systems “DATA IN” systems DW are “DATA OUT” systems OLTP DW

Dimensional Modeling Facts are stored in FACT Tables Dimensions are stored in DIMENSION tables Dimension tables contains textual descriptors of business Fact and dimension tables form a Star Schema “BIG” fact table in center surrounded by “SMALL” dimension tables

Star Schema

Star Schema

Data mart Data mart = subset of DW for community users, e.g. accounting department Sometimes exist as Multidimensional Database Info mart = summarized data + report for community users

Meta Data Data about data Needed by both information technology personnel and users IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.

Information Delivery Tools Query & reporting OLAP Data mining, visualization, segmentation, clustering New developments: text mining, web mining & personalization Mining multimedia data

Information Delivery Tools Commercial tools Crystal Report, Impromptu, WebFocus Increasingly common mode of delivery: Web-enabled

Data Warehouse Architecture Data Flow Architecture System Architecture

Data Flow Architecture

Data Flow Architecture

Data Flow Architecture Operational data stores (ODS) A type of database often used as an interim area for a data warehouse, especially for customer information files MDB=Multidimensional databases

System Architectures Three parts of the data warehouse The data warehouse that contains the data and associated software Data acquisition (back-end) software that extracts data from legacy systems and external sources, consolidates and summarizes them, and loads them into the data warehouse Client (front-end) software that allows users to access and analyze data from the warehouse

System Architectures

System Architectures

System Architecture

System Architecture

Data Warehouse Development Data warehouse development approaches Inmon Model: EDW approach, Enterprise-wide warehouse, top down Kimball Model: Data mart approach, Data mart, bottom up Which model is best? There is no one-size-fits-all strategy to data warehousing When properly executed, both result in an enterprise-wide data warehouse, but with different architectures

The Data Mart Strategy The most common approach Begins with a single mart and architected marts are added over time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data warehousing Can perpetuate the “silos of information” problem Can postpone difficult decisions and activities Requires an overall integration plan

The Enterprise-wide Strategy A comprehensive warehouse is built initially An initial dependent data mart is built using a subset of the data in the warehouse Additional data marts are built using subsets of the data in the warehouse Like all complex projects, it is expensive, time consuming, and prone to failure When successful, it results in an integrated, scalable warehouse

DW Lifecycle (Ralph Kimball )

Data Warehouse Development Some best practices for implementing a data warehouse (Weir, 2002): Project must fit with corporate strategy and business objectives There must be complete buy-in to the project by executives, managers, and users It is important to manage user expectations about the completed project The data warehouse must be built incrementally Build in adaptability

Data Warehouse Development Some best practices for implementing a data warehouse (Weir, 2002): The project must be managed by both IT and business professionals Develop a business/supplier relationship Only load data that have been cleansed and are of a quality understood by the organization Do not overlook training requirements Be politically aware