Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP.

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

Chapter 13 The Data Warehouse
C6 Databases.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Database Systems: Design, Implementation, and Management Tenth Edition
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1.
CS346: Advanced Databases
Designing a Data Warehouse
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1 Chapter 1 - Introduction: Databases and Database Users - Outline Types of Databases and.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Data Warehouse & Data Mining
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
AN OVERVIEW OF DATA WAREHOUSING
OnLine Analytical Processing (OLAP)
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 10 New Application.
Copyright © 2004 Pearson Education, Inc.. Chapter 28 Overview of Data Warehousing and OLAP.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 25: Distributed Databases. Definitions Distributed Database – a collection of of multiple logically interrelated databases distributed over a.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
OLAP in DWH Ján Genči PDT. 2 Outline OLAP Definitions and Rules The term OLAP was introduced in a paper entitled “Providing On-Line Analytical.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Data Warehousing Technological Education Institution of Larisa in collaboration with Staffordshire University Larisa Dr. Theodoros Mitakos
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 5: Data Warehousing.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 8: Data Warehousing.
Presented By: Pedel Oppong-Abebrese,Pedel Oppong-Abebrese Michael Boadi, William Osei, Nana Amoa OforiMichael BoadiWilliam OseiNana Amoa Ofori DATA WAREHOUSING.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Overview of Data Warehousing and OLAP
Overview of Data Warehousing (DW) and OLAP
Lecture 15 Data Mining Concepts
Intro to MIS – MGS351 Databases and Data Warehouses
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Databases and Data Warehouses Chapter 3
Chapter 13 – Data Warehousing
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
MANAGING DATA RESOURCES
Data Warehouse and OLAP
OLAP in DWH Ján Genči PDT.
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Purpose of Data Warehousing Traditional databases are not optimized for data access only they have to balance the requirement of data access with the need to ensure integrity of data. Most of the times the data warehouse users need only read access but, need the access to be fast over a large volume of data. Most of the data required for data warehouse analysis comes from multiple databases and these analysis are recurrent and predictable to be able to design specific software to meet the requirements. There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data. The above functionality is achieved by Data Warehousing and Online analytical processing (OLAP)

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Introduction, Definitions, and Terminology W. H Inmon characterized a data warehouse as: “A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decisions.”

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Introduction, Definitions, and Terminology Data warehouses have the distinguishing characteristic that they are mainly intended for decision support applications. Traditional databases are transactional. Applications that data warehouse supports are: OLAP (Online Analytical Processing) is a term used to describe the analysis of complex data from the data warehouse. DSS (Decision Support Systems) also known as EIS (Executive Information Systems) supports organization’s leading decision makers for making complex and important decisions. Data Mining is used for knowledge discovery, the process of searching data for unanticipated new knowledge.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Conceptual Structure of Data Warehouse Data Warehouse processing involves Cleaning and reformatting of data OLAP Data Mining

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Comparison with Traditional Databases Data Warehouses are mainly optimized for appropriate data access. Traditional databases are transactional and are optimized for both access mechanisms and integrity assurance measures. Data warehouses emphasize more on historical data as their main purpose is to support time-series and trend analysis. Compared with transactional databases, data warehouses are nonvolatile. In transactional databases transaction is the mechanism change to the database. By contrast information in data warehouse is relatively coarse grained and refresh policy is carefully chosen, usually incremental.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Characteristics of Data Warehouses Multidimensional conceptual view Generic dimensionality Unlimited dimensions and aggregation levels Unrestricted cross-dimensional operations Dynamic sparse matrix handling Client-server architecture Multi-user support Accessibility Transparency Intuitive data manipulation Consistent reporting performance Flexible reporting

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Classification of Data Warehouses Generally, Data Warehouses are an order of magnitude larger than the source databases. The sheer volume of data is an issue, based on which Data Warehouses could be classified as follows. Enterprise-wide data warehouses They are huge projects requiring massive investment of time and resources. Virtual data warehouses They provide views of operational databases that are materialized for efficient access. Data marts These are generally targeted to a subset of organization, such as a department, and are more tightly focused.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Modeling for Data Warehouses Traditional Databases generally deal with two- dimensional data (similar to a spread sheet). However, querying performance in a multi- dimensional data storage model is much more efficient. Data warehouses can take advantage of this feature as generally these are Non volatile The degree of predictability of the analysis that will be performed on them is high.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Modeling for Data Warehouses Example of Two- Dimensional vs. Multi- Dimensional

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data Modeling for Data Warehouses Advantages of a multi-dimensional model Multi-dimensional models lend themselves readily to hierarchical views in what is known as roll-up display and drill-down display. The data can be directly queried in any combination of dimensions, bypassing complex database queries.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Multi-dimensional Schemas Multi-dimensional schemas are specified using: Dimension table It consists of tuples of attributes of the dimension. Fact table Each tuple is a recorded fact. This fact contains some measured or observed variable (s) and identifies it with pointers to dimension tables. The fact table contains the data, and the dimensions to identify each tuple in the data.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Multi-dimensional Schemas Two common multi-dimensional schemas are Star schema: Consists of a fact table with a single table for each dimension Snowflake Schema: It is a variation of star schema, in which the dimensional tables from a star schema are organized into a hierarchy by normalizing them.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Multi-dimensional Schemas Star schema: Consists of a fact table with a single table for each dimension.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Multi-dimensional Schemas Snowflake Schema: It is a variation of star schema, in which the dimensional tables from a star schema are organized into a hierarchy by normalizing them.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Multi-dimensional Schemas Fact Constellation Fact constellation is a set of tables that share some dimension tables. However, fact constellations limit the possible queries for the warehouse.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Multi-dimensional Schemas Indexing Data warehouse also utilizes indexing to support high performance access. A technique called bitmap indexing constructs a bit vector for each value in domain being indexed. Indexing works very well for domains of low cardinality.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Building A Data Warehouse The builders of Data warehouse should take a broad view of the anticipated use of the warehouse. The design should support ad-hoc querying An appropriate schema should be chosen that reflects the anticipated usage.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Building A Data Warehouse The Design of a Data Warehouse involves following steps. Acquisition of data for the warehouse. Ensuring that Data Storage meets the query requirements efficiently. Giving full consideration to the environment in which the data warehouse resides.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Building A Data Warehouse Acquisition of data for the warehouse The data must be extracted from multiple, heterogeneous sources. Data must be formatted for consistency within the warehouse. The data must be cleaned to ensure validity. Difficult to automate cleaning process. Back flushing, upgrading the data with cleaned data.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Building A Data Warehouse Acquisition of data for the warehouse (contd.) The data must be fitted into the data model of the warehouse. The data must be loaded into the warehouse. Proper design for refresh policy should be considered.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Building A Data Warehouse Storing the data according to the data model of the warehouse Creating and maintaining required data structures Creating and maintaining appropriate access paths Providing for time-variant data as new data are added Supporting the updating of warehouse data. Refreshing the data Purging data

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Building A Data Warehouse Usage projections The fit of the data model Characteristics of available resources Design of the metadata component Modular component design Design for manageability and change Considerations of distributed and parallel architecture Distributed vs. federated warehouses

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Functionality of a Data Warehouse Functionality that can be expected: Roll-up: Data is summarized with increasing generalization Drill-Down: Increasing levels of detail are revealed Pivot: Cross tabulation is performed Slice and dice: Performing projection operations on the dimensions. Sorting: Data is sorted by ordinal value. Selection: Data is available by value or range. Derived attributes: Attributes are computed by operations on stored derived values.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Warehouse vs. Data Views Views and data warehouses are alike in that they both have read-only extracts from the databases. However, data warehouses are different from views in the following ways: Data Warehouses exist as persistent storage instead of being materialized on demand. Data Warehouses are not usually relational, but rather multi- dimensional. Data Warehouses can be indexed for optimization. Data Warehouses provide specific support of functionality. Data Warehouses deals huge volumes of data that is contained generally in more than one database.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Difficulties of implementing Data Warehouses Lead time is huge in building a data warehouse Potentially it takes years to build and efficiently maintain a data warehouse. Both quality and consistency of data are major concerns. Revising the usage projections regularly to meet the current requirements. The data warehouse should be designed to accommodate addition and attrition of data sources without major redesign Administration of data warehouse would require far broader skills than are needed for a traditional database.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Open Issues in Data Warehousing Data cleaning, indexing, partitioning, and views could be given new attention with perspective to data warehousing. Automation of data acquisition data quality management selection and construction of access paths and structures self-maintainability functionality and performance optimization Incorporating of domain and business rules appropriately into the warehouse creation and maintenance process more intelligently.

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Recap Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics of data Warehouses Classification of Data Warehouses Multi-dimensional Schemas Building A Data Warehouse Functionality of a Data Warehouse Warehouse vs. Data Views Implementation difficulties and open issues