1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.

Slides:



Advertisements
Similar presentations
Chapter 4 Tutorial.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Lecture-19 ETL Detail: Data Cleansing
Data Warehousing M R BRAHMAM.
Dimensional Modeling Business Intelligence Solutions.
Prof. Navneet Goyal Computer Science Department BITS, Pilani
Advance Database System
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Dimensional Modeling – Part 2
Organizing Data Chapter 5. Data Hierachy Table = Entities X Attributes Entities = Records Attributes = Fields.
Organizing Data & Information
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Normalized bubble chart for Data in the Instructor’s View Instructor Organization Title Standard fee Text book Course + Date Participant Grade.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
CSI315CSI315 Web Development Technologies Continued.
Ahsan Abdullah 1 Data Warehousing Lecture-17 Issues of ETL Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
Datawarehouse & Datamart OLAPs vs. OLTPs Dimensional Modeling Creating Physical Design Using SQL Mgt. Studio Module II: Designing Datamarts 1.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
OnLine Analytical Processing (OLAP)
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-4 Introduction and Background Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Ahsan Abdullah 1 Data Warehousing Lecture-18 ETL Detail: Data Extraction & Transformation Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. &
Ahsan Abdullah 1 Data Warehousing Lecture-9 Issues of De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
1 Data Warehousing Lecture-14 Process of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-2 Introduction and Background Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
UNIT-II Principles of dimensional modeling
Shilpa Seth.  Multidimensional Data Model Concepts Multidimensional Data Model Concepts  Data Cube Data Cube  Data warehouse Schemas Data warehouse.
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Warehousing Multidimensional Analysis
CIS 250 Advanced Computer Applications Database Management Systems.
Ahsan Abdullah 1 Data Warehousing Lecture-6Normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Data Resource Management Lecture 8. Traditional File Processing Data are organized, stored, and processed in independent files of data records In traditional.
Lecture-3 Introduction and Background
Data warehouse and OLAP
Star Schema.
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Dimensional Modeling.
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Presentation transcript:

1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research National University of Computers & Emerging Sciences, Islamabad

2 Dimensional Modeling (DM)

3 The need for ER modeling?  Problems with early COBOLian data processing systems.  Data redundancies  From flat file to Table, each entity ultimately becomes a Table in the physical schema.  Simple O(n 2 ) Join to work with Tables

4 Why ER Modeling has been so successful?  Coupled with normalization drives out all the redundancy out of the database.  Change (or add or delete) the data at just one point.  Can be used with indexing for very fast access.  Resulted in success of OLTP systems.

5 Need for DM: Un-answered Qs  Lets have a look at a typical ER data model first. typical ER data model typical ER data model  Some Observations:  All tables look-alike, as a consequence it is difficult to identify:  Which table is more important ?  Which is the largest?  Which tables contain numerical measurements of the business?  Which table contain nearly static descriptive attributes?

6 Need for DM: Complexity of Representation  Many topologies for the same ER diagram, all appearing different.  Very hard to visualize and remember.  A large number of possible connections to any two (or more) tables

7 Need for DM: The Paradox  The Paradox: Trying to make information accessible using tables resulted in an inability to query them!  ER and Normalization result in large number of tables which are:  Hard to understand by the users (DB programmers)  Hard to navigate optimally by DBMS software  Real value of ER is in using tables individually or in pairs  Too complex for queries that span multiple tables with a large number of records

8 ER vs. DM ERDM Constituted to optimize OLTP performance. Constituted to optimize DSS query performance. Models the micro relationships among data elements. Models the macro relationships among data elements with an overall deterministic strategy. A wild variability of the structure of ER models. All dimensions serve as equal entry points to the fact table. Very vulnerable to changes in the user's querying habits, because such schemas are asymmetrical. Changes in users' querying habits can be accommodated by automatic SQL generators.

9 How to simplify a ER data model?  Two general methods:  De-Normalization  Dimensional Modeling (DM)

10 What is DM?…  A simpler logical model optimized for decision support.  Inherently dimensional in nature, with a single central fact table and a set of smaller dimensional tables.  Multi-part key for the fact table  Dimensional tables with a single-part PK.  Keys are usually system generated

11 What is DM?...  Results in a star like structure, called star schema or star join.  All relationships mandatory M-1.  Single path between any two levels.  Supports ROLAP operations.

12 Dimensions have Hierarchies Items BooksCloths FictionTextMenWomen MedicalEngg Analysts tend to look at the data through dimension at a particular “level” in the hierarchy

13 The two Schemas Star Snow-flake

14 “Simplified” 3NF (Retail) CITYDISTRICT 1 ZONECITY DISTRICTDIVISION MONTHQTR STORE #STREETZONE... WEEKMONTH DATEWEEK RECEIPT #STORE #DATE... ITEM #RECEIPT #...$ ITEM #CATEGORY ITEM # DEPTCATEGORY year month week sale_header store sale_detail item_x_cat item_x_splir cat_x_dept M 1 M 1 M 1 M 1 1 MM 1 M M M 1 1 M 1 1 M YEARQTR 1 M quarter SUPPLIER DIVISIONPROVINCE M 1 BACK division district zone

15 Vastly Simplified Star Schema RECEIPT# STORE# DATE ITEM# M Fact Table ITEM# CATEGORY DEPT SUPPLIER Product Dim M Sale Rs. M STORE# ZONE CITY PROVINCE Geography Dim DISTRICT DATE WEEK QUARTER YEAR Time Dim MONTH facts DIVISION

16 The Benefit of Simplicity Beauty lies in close correspondence with the business, evident even to business users.

17 Features of Star Schema Dimensional hierarchies are collapsed into a single table for each dimension. Loss of Information? A single fact table created with a single header from the detail records, resulting in:  A vastly simplified physical data model!  Fewer tables (thousands of tables in some ERP systems).  Fewer joins resulting in high performance.  Some requirement of additional space.

18 Quantifying space requirement Quantifying use of additional space using star schema There are about 10 million mobile phone users in Pakistan. Say the top company has half of them = 500,000 Number of days in 1 year = 365 Number of calls recorded each day = 250,000 (assumed) Maximum number of records in fact table = 91 billion rows Assuming a relatively small header size = 128 bytes Fact table storage used = 11 Tera bytes Average length of city name = 8 characters  8 bytes Total number of cities with telephone access = 170 (1 byte) Space used for city name in fact table using Star = 8 x = TB Space used for city code using snow-flake = 1x 0.091= TB Additional space used  Tera byte i.e. about 5.8%