Designing the data warehouse / data marts Methodologies and Techniques.

Slides:



Advertisements
Similar presentations
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Advertisements

Chapter 13 The Data Warehouse
C6 Databases.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Online Analytical Processing OLAP
Data Warehousing M R BRAHMAM.
Chapter 3 Database Management
Defining Data Warehouse Concepts and Terminology
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
Modeling the Data Warehouse Chapter 7. Data Warehouse Database Design Phases zDefining the business model (conceptual model) zCreating the dimensional.
Designing the data warehouse / data mart Methodologies and Techniques.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Designing a Data Warehouse
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
1 Sharif University Data Warehouse. 2 Sharif University Objectives Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse.
Data Warehouse & Data Mining
Understanding Data Warehousing
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Warehousing. Databases support: Transaction Processing Systems –operational level decision –recording of transactions Decision Support Systems –tactical.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
UNIT-II Principles of dimensional modeling
OLAP in DWH Ján Genči PDT. 2 Outline OLAP Definitions and Rules The term OLAP was introduced in a paper entitled “Providing On-Line Analytical.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
4 Copyright © Oracle Corporation, All rights reserved. Modeling the Data Warehouse.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
3 Copyright © 2006, Oracle. All rights reserved. Business, Logical, and Dimensional Modeling.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
Advanced Applied IT for Business 2
Defining Data Warehouse Concepts and Terminology
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
Data Warehouse.
Defining Data Warehouse Concepts and Terminology
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Data Warehousing: Data Models and OLAP operations
Introduction of Week 9 Return assignment 5-2
OLAP in DWH Ján Genči PDT.
Data Warehouse.
Data Warehousing Concepts
Data Warehouse and OLAP
Data Warehousing.
Presentation transcript:

Designing the data warehouse / data marts Methodologies and Techniques

Basic principles

Life cycle of the DW Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive

Oracle Warehouse Components Relationaltools Applications/ Web Any Data Any Access Any Source Externaldata Operationaldata OLAPtools Text, image Oracle Medi` Relational / Multidimensional Spatial Audio, video Web

Oracle Intelligence Tools IS develops user’s Views Oracle Reports Current Business users Oracle Discoverer Tactical Analysts Oracle Express Strategic

Oracle Data Mart Suite Ware- housing Engines Data Modeling Oracle Data Mart Designer Data Management Oracle Enterprise Manager Data Extraction Oracle Data Mart Builder Data Access & Analysis Discoverer & Oracle Reports OLTP Engines OLTP Databases Data Mart Database Oracle8 SQL*PLUS

“Big Bang” Approach: Advantages and Disadvantages Advantages: –warehouse built as part of major project (eg: BPR) –Having a “big picture” of the data warehouse before starting the data warehousing project Disadvantages: –Involves a high risk, takes a longer time –Runs the risk of needing to change requirements –Costly and harder to get support for from users

Incremental Approach to Warehouse Development Multiple iterations Shorter implementations Validation of each phase Strategy Definition Analysis Design Build Production

Benefits of an Incremental Approach Delivers a strategic data warehouse solution through incremental development efforts Provides extensible, scalable architecture Quickly provides business benefits and ensures a much earlier return of investment Allows a data warehouse to be built based on a subject or application area at a time Allows the construction of an integrated data mart environment

Data Mart A subset of a data warehouse that supports the requirements of a particular department or business function. Characteristics include: –Do not normally contain detailed operational data unlike data warehouses. –May contain certain levels of aggregation

Marketing Sales Finance Human Resources Dependent Data Mart DataWarehouse Data Marts External Data Flat Files Operational Systems Marketing Sales Finance

Independent Data Mart Sales or Marketing External Data Flat Files Operational Systems

Reasons for Creating a Data Mart To give users more flexible access to the data they need to analyse most often. To provide data in a form that matches the collective view of a group of users To improve end-user response time. Potential users of a data mart are clearly defined and can be targeted for support

Reasons for Creating a Data Mart To provide appropriately structured data as dictated by the requirements of the end-user access tools. Building a data mart is simpler compared with establishing a corporate data warehouse. The cost of implementing data marts is far less than that required to establish a data warehouse.

Data Marts Issues Data mart functionality Data mart size Data mart load performance Users access to data in multiple data marts Data mart Internet / Intranet access Data mart administration Data mart installation

Example of DW tool OLAP Rotate and drill down to successive levels of detail. Create and examine calculated data interactively on large volumes of data. Determine comparative or relative differences. Perform exception and trend analysis. Perform advanced analytical functions for example forecasting, modeling, and regression analysis

Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture

Original OLAP Rules 6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels

Relational Database Model FMMFFMMF Anderson Green Lee Ramos Attribute 1 Name Attribute 2 Age Attribute 3 Gender Row 1 Row 2 Row 3 Row 4 The table above illustrates the employee relation. Attribute 4 Emp No.

Multidimensional Database Model The data is found at the intersection of dimensions. Store GL_Line Time FINANCE Store Product Time SALES Customer

Two dimensions

Three dimensions

Specialised Multidimensional tool Benefits: –Quick access to very large volumes of data –Extensive and comprehensive libraries of complex functions analysis Strong modeling and forecasting capabilities –Can access multidimensional and relational database structures –Caters for calculated fields Disadvantages: –Difficulty of changing model –Lack of support for very large volumes of data –May require significant processing power

MOLAP Server The application layer stores data in a multidimensional structure The presentation layer provides the multidimensional view MOLAP Engine DSS client Application layer Warehouse Efficient storage and processing Complexity hidden from the user Analysis using preaggregated summaries and precalculated measures

ROLAP Server The warehouse stores atomic data. The application layer generates SQL for the three- dimensional view. The presentation layer provides the multidimensional view. ROLAP engine DSS client Application layer Warehouse server Multiple SQL

MOLAP ExpressServerExpressuserWarehouse Query Data MDDB Periodicload

ROLAP ExpressServer Expressuser Warehouse Datacache Livefetch Cache Query Data Also Hybrid (HOLAP)

Choosing a Reporting Architecture Business needs Potential for growth interface enterprise architecture Network architecture Speed of access Openness MOLAP ROLAP Simple Complex QueryPerformance Good OK Analysis

Data Acquisition Identify, extract, transform, and transport source data Consider internal and external data Perform gap analysis between source data and target database objects Plan move of data between sources and target Define first-time load and refresh strategy Define tool requirements Build, test, and execute data acquisition modules

Modeling Warehouses differ from operational structures:Warehouses differ from operational structures: –Analytical requirements –Subject orientation Data must map to subject oriented information:Data must map to subject oriented information: –Identify business subjects –Define relationships between subjects –Name the attributes of each subject Modeling is iterativeModeling is iterative Modeling tools are availableModeling tools are available

1.Defining the business model 2.Creating the dimensional model 3.Modeling summaries 4.Creating the physical model Physical model 1 2, 3 4 Select a business process Modeling the Data Warehouse

Identifying Business Rules Product Type Monitor Status PC15 inchNew Server17 inchRebuilt 19 inchCustom None Location Geographic proximity miles miles > 5 miles Store Store > District > Region Time Month > Quarter > Year

Creating the Dimensional Model Identify fact tables –Translate business measures into fact tables –Analyze source system information for additional measures –Identify base and derived measures –Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users

Dimension Tables Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business Contain relatively static data Are joined to a fact table through a foreign key reference ProductChannel Facts (units, price) Customer Time

Fact Tables Fact tables have the following characteristics: Contain numeric measures (metrics) of the business May contain summarized (aggregated) data May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

Dimensional Model (Star Schema) ProductChannel Facts (units, price) Customer Time Dimension tables Fact table

Star Schema Model Central fact table Radiating dimensions Denormalized model Store Table Store_id District_id... Item Table Item_id Item_desc... Time Table Day_id Month_id Period_id Year_id Product Table Product_id Product_desc … Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units...

Star Schema Model Easy for users to understand Fast response to queries Simple metadata Supported by many front end tools Less robust to change Slower to build Does not support history

Snowflake Schema Model Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Product Table Product_id Product_desc Item Table Item_id Item_desc Dept_id Sales Fact Table Item_id Store_id Sales_dollars Sales_units Store Table Store_id Store_desc District_id District Table District_id District_desc

Snowflake Schema Model Direct use by some tools More flexible to change Provides for speedier data loading May become large and unmanageable Degrades query performance More complex metadata

Using Summary Data Provides fast access to precomputed data Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Usually exists in summary fact tables Phase 3: Modeling summaries

Designing Summary Tables UnitsSales(€)Store Product A Total Product B Total Product C Total Average Maximum Total Percentage

Summary Tables Example SALES FACTS SalesRegionMonth 10,000NorthJan 99 12,000SouthFeb 99 11,000North Jan 99 15,000WestMar 99 18,000South Feb 99 20,000North Jan 99 10,000EastJan 99 2,000WestMar 99 SALES BY MONTH/REGION MonthRegionTot_Sales$ Jan 99North41,000 Jan 99East10,000 Feb 99South40,000 Mar 99West17,000 SALES BY MONTH MonthTot_Sales Jan 9951,000 Feb 9940,000 Mar 9917,000

Summary Management in Oracle8i Product Region Time Sales summary City Sales State Summary usage Summary advisor Space requirements Summary recommendations

The Time Dimension How and where should it be stored? Time dimension Sales fact Time is critical to the data warehouse. A consistent representation of time is required for extensibility.