Building The Data Warehouse By W. H. Inmon. CHAPTER 1 Evolution of Decision Support Systems The data warehouse requires an architecture that begins by.

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

Information Resources Management January 23, 2001.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
Managing Data Resources
Lab/Sessional -CSE-374. SYSTEM DEVELOPMENT LIFE CYCLE.
Management Information Systems
Evolution of Decision Support Systems. Data warehouse  Data warehouse ?  Why Data warehouse ?  What for the Data warehouse ?
Organizing Data & Information
The Data Warehouse Environment. The Structure of the Data Warehouse  There are different levels of detail in the data warehouse.  Older level of detail.
Chapter 14 The Second Component: The Database.
1 IS 605/606: Information Systems Technology Focus Evolution of DSS Introduction to Data Warehousing Dr. Boris Jukić.
Introduction to Database Management
DECISION SUPPORT SYSTEM DEVELOPMENT
Chapter 13 The Data Warehouse
Lecture Nine Database Planning, Design, and Administration
Introduction to Systems Analysis and Design
Designing a Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Lecture-8/ T. Nouf Almujally
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
 ETL: Extract Transformation and Load  Term is used to describe data migration or data conversion process  ETL may be part of the business process repeated.
Chapter 1 Overview of Databases and Transaction Processing.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA ebay
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Database Systems COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Understanding Data Warehousing
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
CSC271 Database Systems Lecture # 30.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
ITEC224 Database Programming
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
Case 2: Emerson and Sanofi Data stewards seek data conformity
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
1 Data Warehouses BUAD/American University Data Warehouses.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
12/6/05 The Data Warehouse from William H. Inmon, Building the Data Warehouse (4 th ed)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Systems Analysis and Design in a Changing World, Fourth Edition
Database Systems. Role and Advantages of the DBMS Improved data sharing Improved data security Better data integration Minimized data inconsistency Improved.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Department of Industrial Engineering Sharif University of Technology Session# 9.
DATA RESOURCE MANAGEMENT
CSC 351 FUNDAMENTALS OF DATABASE SYSTEMS. LECTURE 1: INTRODUCTION TO DATABASES.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
Oracle 8i Data Warehousing (chapter 1, 2) Data Warehousing Lab. 석사 1 학기 HyunSuk Jung.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Chapter 1 Overview of Databases and Transaction Processing.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Victoria Ibarra Mat:  Generally, Computer hardware is divided into four main functional areas. These are:  Input devices Input devices  Output.
Managing Data Resources File Organization and databases for business information systems.
MANAGEMENT INFORMATION SYSTEM
THE COMPELLING NEED FOR DATA WAREHOUSING
Chapter 1 The Systems Development Environment
Chapter 13 The Data Warehouse
An Introduction to Data Warehousing
Data Warehouse.
Data Warehousing Concepts
Presentation transcript:

Building The Data Warehouse By W. H. Inmon

CHAPTER 1 Evolution of Decision Support Systems The data warehouse requires an architecture that begins by looking at the whole and then works down to the particulars. The story of the data warehouse begins with the evolution of information and decision support systems.

The Evolution

In the early 1960s, the world of computation consisted of creating individual applications that were run using master files. The programs, usually built in COBOL. The master files were housed on magnetic tape, which were good for storing a large volume of data cheaply, but the drawback was that they had to be accessed sequentially. By the mid-1960s, the growth of master files and magnetic tape exploded and came huge amounts of redundant data. They introduced the following problems: 1. The need to synchronize data upon update 2.The complexity of maintaining programs. 3.The complexity of developing new programs 4.The need for extensive amounts of hardware to support all the master files By 1970, the advent of disk storage, or direct access storage device (DASD). Disk storage was fundamentally different from magnetic tape storage in that data could be accessed directly on DASD. There was no need to go through records 1, 2, 3,... n to get to record n +1. In fact, the time to locate a record on DASD could be measured in milliseconds. With DASD came a new type of system software known as a database management system (DBMS). The purpose of the DBMS was to make it easy for the programmer to store and access data on DASD, indexing data, and so forth.

By the mid-1970s, online transaction processing (OLTP) made even faster access to data possible. The computer could now be used for tasks not previously possible, including driving reservations systems, bank teller systems, manufacturing control systems, and the like. By the 1980s, more new technologies, such as PCs and fourth- generation languages (4GLs), began to surface. With PCs and 4GL technology came the notion that more could be done with data than simply processing online transactions. MIS (management information systems), as it was called in the early days, could also be implemented. Today known as DSS, MIS was processing used to drive management decisions. No single database could serve both operational transaction processing and analytical processing at the same time.

The Extract Program The extract program is the simplest of all programs. It searches through a file or database, uses some criteria for selecting data, and, finding qualified data, transports the data to another file or database.

The Spider Web It is a type of extract processing. First, there were extracts; then there were extracts of extracts; then extracts of extracts of extracts; and so forth. A large company can perform as many as 45,000 extracts per day. the “naturally evolving architecture” it is of out- of-control extract processing across the organization The larger and more mature the organization, the worse the problems of the naturally evolving architecture become.

Problems with the Naturally Evolving Architecture It presents many challenges, such as: Data credibility Productivity Inability to transform data into information

Lack of Data Credibility It occurs if two departments are delivering a report to management the first one claims that activity is down 15 percent, the other says that activity is up 10 percent. Not only are the two departments not in sync with each other, they are off by very large margins. In addition, reconciliation is difficult and results a crisis This crisis is widespread and predictable. Why?

1.Credibility 1. There is no time basis for the data. 2. The correlation between data. 3. The level of extraction. 4. No common source of data to begin with.

2. Productivity In order to locate the data, many files and layouts of data must be analyzed. Different skill sets are required in order to access data across the enterprise. Furthermore Having to go through every piece of data (not just by name but by definition) is a very tedious and time consuming process

3. From Data to Information Applications were never constructed with integration in mind,

A second major obstacle is that there is not enough historical data stored in the applications to meet the needs of the DSS request.

Solution ???? What is needed is something much larger a change in architectures. Data Warehouse Architecture. There are fundamentally two kinds of data: 1. Primitive data and 2. Derived data.

Differences between the two types of Data. Primitive data is detailed data used to run the day-to-day operations of the company. Derived data has been summarized or otherwise calculated to meet the needs of the management of the company. Primitive data can be updated. Derived data cannot be directly updated. Primitive data is primarily current-value data. Derived data is often historical data. Primitive data is operated on by repetitive procedures. Derived data is operated on by heuristic. Operational data is primitive; DSS data is derived. Primitive data supports the clerical function. Derived data supports the managerial function.

Conclusion Primitive data and Derived data are so different that they do not reside in the same database or even the same environment.

The Architected Environment There are four levels of data in the architected environment:- 1. The operational level, 2. The atomic or the data warehouse level, 3. The departmental (or the data mart level), and 4. The individual level.

The operational level of data holds application- oriented primitive data only and primarily serves the TP community The data warehouse level of data holds integrated, historical primitive data that cannot be updated. The departmental/ data mart level of data contains derived data specifically suited to the needs of the department. The individual level of data is where much heuristic analysis is done.

Data in the Architected Environment There is no point in bringing data over from the operational environment into the data warehouse environment without integrating it. In every environment the un-integrated operational data is complex and difficult to deal with. In order to achieve the real benefits of a data warehouse, though, it is necessary to undergo this complex, and time-consuming exercise. Extract/transform/load (ETL) software can automate much of this tedious process.

Who Is the User? The data-warehouse user (also called the DSS analyst) is a businessperson first and foremost, and a technician second. The primary job of the DSS analyst is to define and discover information used in corporate decision-making. The DSS analyst operates in a mindset of discovery. He says, “Give me what I say I want, then I can tell you what I really want.”

Differences between the Operational and Data Warehouse Environments The development life cycle Hardware utilization

The Development Life Cycle The operational environment is supported by the classical systems development life cycle (the SDLC). The development of the data warehouse operates under a very different life cycle, sometimes called the CLDS (the reverse of the SDLC) or “spiral” development methodology. The classical SDLC is driven by requirements. In order to build systems, you must first understand the requirements. Then you go into stages of design and development. The CLDS is almost exactly the reverse: The CLDS starts with data. Once the data is in hand, it is integrated and then tested to see what bias there is to the data, if any. Programs are then written against the data. The results of the programs are analyzed, and finally the requirements of the system are understood. The CLDS is a classic data-driven development life cycle, while the SDLC is a classic requirements-driven development life cycle. Trying to apply inappropriate tools and techniques of development results only in waste and confusion.

Hardware Utilization In operational processing, there is a relatively static and predictable pattern of hardware utilization. In the data warehouse environment either the hardware is being utilized fully or not at all. This fundamental difference reflects that you can optimize your machine either for operational processing or for data warehouse processing, but not both.

The Transformation to The Architected, Data Warehouse-Centered Environment The most important step a company can take to make its efforts in reengineering successful is to first go to the data warehouse environment. The first effect is the removal of the bulk of data (archival). The removal of informational processing such as reports, screens, extracts, and so forth. The very nature of information processing is constant change.

Monitoring the Data Warehouse Environment Once the data warehouse is built, it must be monitored. Two components are monitored: the data residing in the data warehouse and the usage or activity of the data

The results that are achieved by monitoring Data: Identifying what growth is occurring, where the growth is occurring, and at what rate the growth is occurring Identifying what data is being used Calculating what response time the end user is getting Determining who is actually using the data warehouse Specifying how much of the data warehouse end users are using Pinpointing when the data warehouse is being used Recognizing how much of the data warehouse is being used Examining the level of usage of the data warehouse

The monitored activities in the data warehouse are: What data is being accessed? 1.When? 2. By whom? 3.How frequently? 4.At what level of detail? What is the response time for the request? At what point in the day is the request submitted? How big was the request? Was the request terminated, or did it end naturally?

Response Time Response time in the DSS environment is quite different from response time in the OLTP environment. In the OLTP environment, response time is almost always mission critical. There is no mission-critical nature to response time in DSS but this does not mean that it is not important.