COMP 578 Data Warehouse and Data Warehousing: An Introduction

Slides:



Advertisements
Similar presentations
April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Advertisements

The Database Environment
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Ch1: File Systems and Databases Hachim Haddouti
Introduction to Database Management
Introduction to Data Warehousing Enrico Franconi CS 636.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Designing a Data Warehouse
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Lecture-8/ T. Nouf Almujally
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
Defining Data Warehouse Concepts and Terminology.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Data Management for Decision Support Session-2 Prof. Bharat Bhasker.
Data Warehouse Concepts Transparencies
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
© 2007 by Prentice Hall 1 Introduction to databases.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
CISB594 – Business Intelligence
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
CISB594 – Business Intelligence Data Warehousing Part I.
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
CISB594 – Business Intelligence Data Warehousing Part I.
1 Database Systems Instructor: Nasir Minhas Assistant Professor UIIT PMAS-AAUR
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
DATA RESOURCE MANAGEMENT
Data Mining Data Warehouses.
CISB594 – Business Intelligence Data Warehousing Part I.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
 Definition of terms  Reasons for need of data warehousing  Describe three levels of data warehouse architectures  Describe two components of star.
Data Warehousing/Mining 1 Data Warehousing/Mining Introduction.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Business Intelligence Overview
Data warehouse and OLAP
Data Warehouse.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Introduction to Data Warehousing
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehouse.
Data Warehousing Concepts
Data Warehouse and OLAP Technology
Presentation transcript:

COMP 578 Data Warehouse and Data Warehousing: An Introduction Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University

What is A Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.— W. H. Inmon HRO Health Data Warehouse Student ITs

Data Warehousing and Industry One of the hottest topic in IS. Over 90% of larger companies either have a DW or are starting one. Warehousing is big business Old statistics from Megroup. $3.5 billion in early 1997 $8 billion in 1998 [Metagroup] over $200 billion over next 5 years. Latest by IDC on DW tools. $5 billion in 1999. $16 billion in 2004. Latest by IDC on CRM applications $61 billion in 2001 $148 billion in 2005

Data Warehousing and Industry (2) A 1996 study of 62 data warehousing projects showed an average return on investment of 321%, with an average payback period of 2.73 years. In 2003, some people are skeptical. WalMart has largest warehouse 900-CPU, 2,700 disk, 23 TB Teradata system ~7TB in warehouse 40-50GB per day

Why Data Warehouse? Why The Hype?

Information vs. Data Information is pivotal in today’s business environment. Success is dependent on its early and decisive use. A lack of information is a sure sign for failure. The rapidly changing environment in which business operates demands ever more immediate access to data. (Devlin, 1997) Many corporations are actively looking for new technologies that will assist them in becoming more profitable and competitive. Gaining competitive advantage requires that companies accelerate their decision making process so that they can respond quickly to change. One key to this accelerated decision making is having the right information, at the right time, easily accessible (Poe, 1996).

The Information Gap The information gap is a result of: Fragmented way in which ISs and supporting DBs have been developed. One-thing-at-a-time due to constraints on time and resources. DBs on a variety of hardware and software platforms. Difficult to locate and use accurate information. Most systems developed to support operational processing. Operational processing (a.k.a. TP) captures, stores and manipulates data to support daily operations. Little thought given to the information or analytical tools needed for decision making.

Bridging The Information Gap Data warehouses (DW) consolidate and integrate information from many different sources and arrange it in a meaningful format for making accurate business decisions (Martin, 1997a). They support complex business decisions through analysis of trends, target marketing, competitive analysis, and so on. Data warehousing has evolved to meet these needs without disturbing existing operational processing.

What Are The Issues? How DW relates to existing operational systems. Data architecture appropriate for most DW environments. Extracting data from existing operational systems and loading them into a DW. Interact with DW using OLAP, data mining and data visualization.

Data Warehouse & Data Warehousing as Solution

The Need for Data Warehouses Two major factors drive the need for data warehousing in most organizations today: Business requires an integrated company-wide view of high-quality information. The IS department must separate informational from operational systems in order to dramatically improve performance in managing company data.

Need for a Company Wide View Data in operational systems typically fragmented and of poor quality. Generally distributed on a variety of incompatible HW and SW platforms: Unix running oracle DBMS IBM MVS running the DB2 DBMS Often necessary to provide a single, corporate view of that information for decision making.

Deriving a Single Corporate View Develop a profile for each student from: STUDENT_DATA, STUDENT-EMPLOYEE, STUDENT_HEALTH Some issues to resolve: Inconsistent key structures: HKID and student name Synonyms: Student_No and Student_ID. Free-form vs. structured fields: Last name, first name. Inconsistent data values: different phone numbers. Missing data: how will the value for insurance be located?

Need to Separate Operational and Informational Systems Operational system used to run a business in real time based on current data. E.g. sales order processing, reservation systems, patient registration, Process large volumes of relatively simple read/write transactions, while providing fast response. Information systems designed to support decision making based on historical data. Designed for complex and read-only queries or data mining application. Sales trend analysis, customer segmentation, and human resource planning.

Need to Separate Operational and Informational Systems (2) It is essential to separate informational processing from operational processing by creating a data warehouse. A DW centralizes data (at least logically) that are scattered throughout disparate operational systems and makes them readily available for decision support. A properly designed DW adds value to data by improving their quality and consistency. A separate data warehouse eliminates much of the contention for resources that results when informational applications are cofounded with operational processing.

Data Warehouse vs. Operational DB Systems OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making

Data Warehouse vs. Operational DB Systems Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries

Why Separate Data Warehouse? High performance for both systems: DBMS — tuned for OLTP: access methods, indexing, concurrency control, recovery Warehouse — tuned for OLAP: complex OLAP queries, multidimensional view, consolidation. Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled.

Advantages of Warehousing Approach High query performance But not necessarily most current information Doesn’t interfere with local processing at sources Complex queries at warehouse OLTP at information sources Information copied at warehouse Can modify, annotate, summarize, restructure, etc. Can store historical information Security, no auditing Has caught on in industry

The Terms & The Definitions

The Data Warehouse Strategic response to customer requirement for providing and processing information: at various levels of abstraction using history for trend analysis with high performance What it provides - A protected business decision support environment - A repository of consolidated corporate data - A staging area for revitalizing operational systems and they are a strategic response to the business need for high quality, high value information. They provide multiple layers of detail not avaliable in OLTP bound databases( concepts like drill-down and data mining allow organizations to follow a chain of thought to its logical conclusion) The use of history allows trend analysis to occur. Questions we can ask are, how well did a product sell over the last five years, in the northwest region as compared to the rest of the country? Traditional information or query access systems against OLTP databases cannot deliver the high performance or the depth of information required in today’s complex world of decision making. The data has to be there, it has to be right and it has to be avaliable, NOW.

What is A Data Warehouse Multi-Dimensional Database Data Rotation Middleware O L A P M e t a D a t a Data Scrubbers Data Warehouse Manager D S S Data Mart E I S Dimensional Data Modeling ESS Data Mining Star Schema Even the term Data Warehousing means different things to different people. To some it is a database, to others a business decision support mechanism or a means of centralizing or rehabilitating corporate data. As usual in IT we further confuse the issue by providing numerous synonmys and homonyms for what we mean such as dimensional data modeling (also known as star schema), or data warehousing also known as information warehousing or the information factory. So how do we make sense of it all? Data Propagation Multi-relational tools

What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant

What is a Data Warehouse? An Alternative Viewpoint “A DW is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making.” -- W.H. Inmon, Building the Data Warehouse, 1992

The Data Warehouse Key characteristics Subject-oriented Integrated Time-variant Nonvolatile So what are its common characteristics? Data Warehouses are data driven rather than application or procedural driven as in OLTP. Databases are maintained around common subjects of information of interest to decision makers such as customers and products They are integrated in that the information they contain, is consistent across theorganization They are time variant as they reflect data across time by including mulitple years of data They are nonvolatile as they reflect a stable, consistent view of corporate information which is independent of the vagracies of day to day OLTP processing.

Subject Oriented Operational Applications/ Databases Data Warehouse Subjects Data is stored by business subject rather than by application Order Billing Accounts Receivable Accounts Payable Loans Savings Life Insurance Claims Processing Auto Insurance Customer Claims Sales Product

Integrated Data is stored once in a single integrated location Operational Environment Decision Support Environment Savings Database Data Warehouse Database Savings Application No Application Flavor Customer data stored in several Databases Current Accounts Database Current Accounts Application Personal Loans Database Personal Loans Application Subject = Customer

Time-variant Data is stored as a series of snapshots or views which records data content and context across time. Data Warehouse Data { Time Data Key, Version and Date timestamp - Data is tagged with some element of time - creation date, as of date/to , etc. - Data is available for long periods of time. For example, five or more years

Non-volatile Existing data in the warehouse is not overwritten or updated. External Source Systems Create Update Delete Transactions Internal Source Systems Data Warehouse READ-ONLY Data Warehouse Business Users & Applications

How the Data Warehouse evolved Operational Reporting Data Extraction/Replication Data Warehouses Data Marts OLAP Servers Data Mining

Line of Business Data Marts extend the concept Business Source Systems Data Staging/Replication Layer Line of Business Systems Extraction Transformation Cleansing External Data Other Data Warehouse Data Marts Extends the concept of Data Warehousing into the various lines-of-business in support of specific needs for business intelligence

Data Mining further extends the concepts of tactical access to data in support of specific business objectives Business Source Systems Data Staging/Replication Layer Line of Business Systems Extraction Transformation Cleansing External Data Other Data Warehouse Specialized applications which run on OLAP servers for drill-down processing Can include access by neural nets, gophers and agents

Data Warehousing Definition 1: Definition 2: The process of constructing and using data warehouses Definition 2: The process whereby organizations extract meaning from their informational assets through the use of data warehouses.