DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information.

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Chapter 3 Database Management
Introduction to Data Warehousing. From DBMS to Decision Support DBMSs widely used to maintain transactional data Attempts to use of these data for analysis,
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Business Driven Technology Unit 2
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
DATA WAREHOUSE (Muscat, Oman).
Designing a Data Warehouse
Components of the Data Warehouse Michael A. Fudge, Jr.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
MBA 664 Database Management Systems Dave Salisbury ( )
Data Warehouse Concepts Transparencies
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Datawarehouse Objectives
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CISB594 – Business Intelligence
BUSINESS ANALYTICS AND DATA VISUALIZATION
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
Data Warehouses and OLAP Data Management Dennis Volemi D61/70384/2009 Judy Mwangoe D61/73260/2009 Jeremy Ndirangu D61/75216/2009.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
CISB594 – Business Intelligence Data Warehousing Part I.
Data Warehousing.
Advanced Database Concepts
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
 Definition of terms  Reasons for need of data warehousing  Describe three levels of data warehouse architectures  Describe two components of star.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 14: Data Warehousing Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 The Data Warehouse
Data Warehousing and Data Mining By N.Gopinath AP/CSE
Data Warehouse.
An Introduction to Data Warehousing
Data Warehousing: Data Models and OLAP operations
Data Warehouse.
Data Warehousing Concepts
Presentation transcript:

DATA WAREHOUSING

Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information gap:  Fragmented information systems, uncoordinated (sometimes inconsistent) databases /islands of information due to time and resources constraints  Most systems support operational /transaction processing rather than informational processing. Bridging the information gap are data warehouses!

Introduction (Contd.) A data warehouse bridges this information gap as  it consolidates and integrates information from internal and external sources and  arranges it in a meaningful format for making accurate business decisions The two noticeable pioneers in the DWH field are Ralph Kimball and Bill Inmon. The term Data Warehouse was coined by Bill Inmon in 1990

What is a Data Warehouse? Inmon defines a data warehouse as follows:  "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Description of the above terms according to Inmon  Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.  Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

 Time-variant: All data in the data warehouse is identified with a particular time dimension. The time factor can be used to study trends and changes.  Non-volatile Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

What is a Data Warehouse? This definition by Inmon remains reasonably accurate almost ten years later. However,  a single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope.  data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.

What is a Data Warehouse? Ralph Kimball provided a much simpler definition of a data warehouse.  a data warehouse is "a copy of transaction data specifically structured for query and analysis". This definition provides less insight and depth than Mr. Inmon's, but is no less accurate.

The Need for Data warehousing Two factors drive the need for data warehousing  Need of company-wide view of data  Need to separate operational* and informational* systems * See slide notes

Data warehousing? It is the process of creating, populating, and then querying a data warehouse Components of data warehousing are  Source Systems Identification  Data warehouse Design and Creation  Data Acquisition  Changed data capture  Data Cleansing  Data Aggregation  Business Intelligence Multidimensional Analysis Tools Query Tools Data Mining Tools Data Visualization Tools  Metadata Management

Components of Data warehousing Source System Identification  Appropriate data must be located to build a data warehouse.  This data comes from Current OLTP systems (providing day-to-day information) `Legacy systems (providing historical data for prior periods) Data Warehouse Design and Creation  The design is often an iterative process  Requires effort to understand database schema and a great deal of user interaction

Components of Data warehousing Data Acquisition  Involves moving company data from source systems into warehouse  Time consuming activity  Performed using ETL (Extract/Transform/ Load) Tools  Data acquisition is an ongoing, scheduled process. Warehouse is refreshed monthly Changed data capture  Periodic update of warehouse from transactional systems is complicated as it is difficult to identify which records in source have changed since last update.  Some technologies used in this area are Replication servers, Publish/Subscribe, Triggers and Stored procedures, and Database Log analysis.

Components of Data warehousing Data Cleansing  Typically performed in conjunction with data acquisition (part of “T” in “ETL”)  A complicated process that validates and if required corrects data before its loaded into warehouse  Example The entries for "Customer Name" may appear differently in various source systems for the same customer. one entered as "IBM", one as "I.B.M.", and one as "International Business Machines". A decision must be made as to which is correct, and then the data cleansing tool will change the others to match the rule. This process is also referred to as "data scrubbing" or "data quality assurance". It can be an extremely complex process, especially if some of the warehouse inputs are from older mainframe file systems (commonly referred to as "flat files" or "sequential files").

Components of Data warehousing Data Aggregation  If required, it is performed during “T” phase of “ETL”  Data warehouse can be designed to store data at detail level (each individual transaction), at some aggregate level (summary data) or a combination of both.  The advantage of summarized data is that typical queries against the warehouse run faster.  The disadvantage is that information, which may be needed to answer a query, is lost during aggregation Business Intelligence (BI)  Contains technologies such as Decision Support Systems (DSS), Executive Information Systems (EIS), On-Line Analytical Processing (OLAP), Relational OLAP (ROLAP), Multi- Dimensional OLAP (MOLAP), Hybrid OLAP (HOLAP, a combination of MOLAP and ROLAP), and more.

Components of Data warehousing  BI can be broken down into four broad fields Multidimensional Analysis Tools  Allow user to look at data from various different angles  Often use a multidimensional database called cube Query Tools  Allow user to issue SQL queries against warehouse and get results Data Mining Tools  Automatically search for patterns in data  Driven by complex statistical formulas Data Visualization Tools  Graphically represent data including 3D data pictures

Components of Data warehousing Metadata management  Throughout the entire process of identifying, acquiring, and querying the data, metadata management takes place  The datatype (e.g., string or integer) of the column is metadata. The name of the column is another. The actual value in the column for a particular row is not metadata - it is data  Metadata is stored in metadata repository  Metadata is useful in almost all components of data warehousing discussed earlier

Data Mining Definition:  Knowledge discovery using a sophisticated blend of techniques from traditional statistics, artificial intelligence and computer graphics Goals  To explain observed events or conditions, such as why sales of a product have increased in a particular area  To confirm hypothesis, such as, whether two-income families are more likely to buy family medical cover than single-income families  To analyze data for new or unexpected relationships, such as what spending patterns are likely to accompany credit card fraud.

Data Mining Techniques Case-based reasoning  Derives rules from real world case examples Rule discovery  Searches for patterns and correlations in large data sets Signal Processing  Identifies clusters of observations with similar characteristics Neural nets  Develops predictive models based on principles modeled after the human brain Fractals  Compresses large databases without losing information

Some Data Mining Applications Analysis of business trends  Identifying markets with above average or below average growth Target marketing  Identifying customers for promotional activity Usage Analysis  Identifying usage patterns of products and services Product Affinity  Identifying products that are purchased concurrently or characteristics of shoppers For further application types - refer to book page 437 table 11.5

Reading Assignment Read these concepts from Book  The ETL process  data warehouse Data mart  Independent data mart  dependent data mart  EDW Star Schema Snowflake Schema OLAP MOLAP ROLAP

Resources tawarehousing.html tawarehousing.html Modern Database Management, 6/e