IST 210 1 Data Warehousing. IST 210 2 Data Rich, but Information Poor Data is stored, not explored : by its volume and complexity it represents a burden,

Slides:



Advertisements
Similar presentations
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Advertisements

Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Data Warehouse IMS5024 – presented by Eder Tsang.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Components of the Data Warehouse Michael A. Fudge, Jr.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Data Warehouse & Data Mining
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
Database Systems – Data Warehousing
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Datawarehouse Objectives
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
CISB594 – Business Intelligence
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data resource management
Penn State Student Chapter of the Association for Computing Machinery We welcome all interested students to our 4th general meeting of the Spring 2005.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
7 Strategies for Extracting, Transforming, and Loading.
Advanced Database Concepts
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Defining Data Warehouse Concepts and Terminology
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Defining Data Warehouse Concepts and Terminology
Data Warehouse and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Data Warehousing Concepts
Analytics, BI & Data Integration
Data Warehouse and OLAP
Presentation transcript:

IST Data Warehousing

IST Data Rich, but Information Poor Data is stored, not explored : by its volume and complexity it represents a burden, not a support Data overload results in uninformed decisions, contradictory information, higher overhead, wrong decisions, increased costs Data is not designed and is not structured for successful management decision making

IST Improving Decision Making Data Information Decisions Data Warehouse

IST Data Warehouse Concepts

IST What’s a Data Warehouse? A data warehouse is a single, integrated source of decision support information formed by collecting data from multiple sources, internal to the organization as well as external, and transforming and summarising this information to enable improved decision making. A data warehouse is designed for easy access by users to large amounts of information, and data access is typically supported by specialized analytical tools and applications.

IST Data Warehouse Characteristics  Key Characteristics of a Data Warehouse  Subject-oriented  Integrated  Time-variant  Non-volatile

IST Subject Oriented Example for an insurance company : Policy Customer Data Losses Premium Commercial and Life Insurance Systems Auto and Fire Policy Processing Systems Data Accounting System Claims Processing System Billing System Applications AreaData Warehouse

IST Integrated Data is stored once in a single integrated location (e.g. insurance company) Data Warehouse Database Subject = Customer Auto Policy Processing System Auto Policy Processing System Customer data stored in several databases Fire Policy Processing System Fire Policy Processing System FACTS, LIFE Commercial, Accounting Applications FACTS, LIFE Commercial, Accounting Applications

IST Time - Variant  Data is tagged with some element of time - creation date, as of date, etc.  Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years Data Warehouse Data TimeData { Key Data is stored as a series of snapshots or views which record how it is collected across time.

IST Non-Volatile Existing data in the warehouse is not overwritten or updated. External Sources Read-Only Data Warehouse Database Data Warehouse Environment Data Warehouse Environment Production Databases Production Applications Production Applications Update Insert Delete Load

IST Transaction System vs. Data Warehouse

IST On-line, real time update into disparate systems Day-to-day operationsSystem Experts UsersData Manipulation Unix VMS MVS Other Transaction-Based Reporting System

IST BENEFIT: Reduce data processing costs BENEFIT: Integrated, consistent data available for analysis BENEFIT: Improve Network Reporting processes and analytical capabilities Data Staging, Transformation and Cleansing Data Staging, Transformation and Cleansing Interfaces Executive Reporting and On-Line Analysis Environment Other VMS MVS Unix Summarization OLAP Data Warehouse Warehouse-Based Reporting System

IST Transaction - Warehouse Process Transform Summarize & Refine On-line, real time update. “Transaction Based Process” Day-to-day operations Detailed Information to operational systems. “Warehouse Based Process” Decision support for management use. Batch Load

IST  Supports management analysis and decision- making processes  Contains summarized, refined, and cleansed information  Non-volatile -- provides a data “snapshot”; adjustments are not permitted, or are limited  Business analysis requirements drive the data structure and system design  Integrated, consistent information on a single technology platform  Users have direct, fast access via On-line Analytical Processing tools  Minimal impact on operational processes  Data Warehouse  Supports day-to-day operational processes  Contains raw, detailed data that has not been refined or cleansed  Volatile -- data changes from day-to-day, with frequent updates  Technical issues drive the data structure and system design  Disparate data structures, physical locations, query types, etc.  Users rely on technical analysts for reporting needs  Operational processes impacted by queries run off of system  Transaction System Transaction System vs. Data Warehouse

IST Data Warehouse Architecture

IST Data Warehouse Architecture Conversion & Interface OLAP Cubes Ad-hoc Reporting Canned Reports Data Marts Staging Area ODS Operational SystemData Warehouse

IST  Systems are widely dispersed  Systems are organized for on-line transaction processing (OLTP)  Functionality and data definitions are typically duplicated across many systems Data Warehouse Architecture Operational System Characteristics

IST  Map source data to target  Data scrubbing  Derive new data  Data Extraction  Transform / convert data  Create / modify metadata Conversion & Cleansing Data Warehouse Architecture Conversion and Cleansing Activities

IST  Contains Current and near current data  Contains almost all detail data  Data is updated frequently  Used to report a status continuously and ask specific questions – not flexible  Contains historical data  Contains summarized and detailed data  Data is non-volatile  Used to populate the DWH, which makes OLAP possible - flexible ODS Data Warehouse Architecture ODS vs. DWH Staging Area DWH Staging Area

IST  Back room  Sequential Processing: clean, combine, sort, archive, remove duplicates, add keys  Off limits to the end users  Front room  ROLAP – OLAP: subject oriented, locally implemented, user group driven  Available for end user inquiry DWH Staging Area Data Warehouse Architecture Data Staging Area vs. Presentation Area DWH Detailed Data

IST Detailed Data Metadata  Ranges from detailed to summarized data  Contains metadata  Many views of the data  Subject-Oriented  Time-variant Summary Data Data Warehouse Architecture Data Warehouse Components

IST Data Warehouse Model

IST Requirements Gathering Process Business Measure Definition  Standard definition and related business rules and formulas  Source data element(s), including quality constraints  Data granularity levels (e.g., county detail for state)  Data retention (e.g., one month, one quarter, one year, multiple years)  Priority of the information (For example, is the information necessary to derive other business measures?)  Data load frequency (e.g., monthly, quarterly, etc.)

IST Data Modeling Process Fact Table and Dimension  Each subject area (e.g. Business Unit) has its own Fact Table.  Fact tables relate or link dimensions.  Each attribute of the fact table is a measure or foreign key.  “The best facts are numeric, additive and continuously valued.”  Fact tables never contain direct links to other fact tables.  Fact Table  Dimension  Best defined in focus sessions and interviews  Business Unit specific and overall perspectives  Typically, hierarchical in nature

IST Star Join Schema Region_Dimension_Table region _id NE NW SE SW region _doc Northeast Northwest Southeast Southwest account _id account _doc ABC Electronics Midway Electric Victor Components Washburn, Inc. Zerox Account_Dimension_Table Product_Dimension_Table prod_grp_id prod_id prod_grp_desc Fewer devices Circuit boards Components prod_desc Power supply Motherboard Co-processor month mo_in_fiscal_yr month_name January February March prod_id region_id SW NE SW account_id vend_id net-sales 30,000 23,000 32,000 gross_sales 50,000 42,000 49,000 Monthly_Sales_Summary_Table Time_Dimension_Table Fact Table Dimension Tables Vendor_Dimension_Table vend_id vendor_desc PowerAge, Inc. Advanced Micro Devices Farad Incorporated month

IST Storage of cubes Rolap (Relational On-line Analytical Processing) Fact table is stored in relational data bases using keys Building is faster, consulting is slower Less storage space Used for very large amounts of data Molap (Multi-dimensional On-line Analytical Processing) Cube is stored using multidimensional tables Data is stored in the cube, using sparsity Longer building time, faster consulting time of the cube More storage space needed Used for smaller amounts of data Holap (Hybrid On-line Analytical Processing) Cube is stored using a combination of both techniques Detailed data is stored using ROLAP Summarized data is stored using MOLAP Synergy between storage and efficiency

IST Multi-Dimensional Analysis

IST Application of a Data Warehouse

IST Application Solution Classes Executive information system (EIS) : Present information at the highest level of summarization using corporate business measures. They are designed for extreme ease-of- use and, in many cases, only a mouse is required. Graphics are usually generously incorporated to provide at-a-glance indications of performance Decision Support Systems (DSS) : They ideally present information in graphical and tabular form, providing the user with the ability to drill down on selected information. Note the increased detail and data manipulation options presented

IST Data Mining Data Mining provides techniques to : Detect trends or patterns, find correlations Exploratory data analysis Forecasting and business modeling Intelligent agents are coupled to the data warehouse using different techniques: Neural networks Expert systems Advanced statistics The volume and complexity of information may not become a barrier Applications : Early warning systems, Fraud detection, market research, direct mail.