Dimensional Modeling – Part 2

Slides:



Advertisements
Similar presentations
Chapter 4 Tutorial.
Advertisements

Dimensional Modeling.
Tips and Tricks for Dimensional Modeling
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Cognos 8 Training Session
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Dimensional Modeling II Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Telecommunication Case Study CS 543 – Data Warehousing.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
Principles of Dimensional Modeling
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Summarizing Data with CUBE and ROLLUP. SQL ROLLUP and CUBE commands Quick, efficient way to summarize the data stored in your database Offer a valuable.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
OnLine Analytical Processing (OLAP)
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
BI Terminologies.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
More Dimensional Modeling. Facts Types of Fact Design Transactional Periodic Snapshot –Predictable time period –Ex. Monthly, yearly, etc. Accumulating.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Building a Data Warehouse for Business Reporting Presented by – Arpit Desai Faculty Advisor – Dr. Meiliu Lu CSC Department – Spring 2006 California State.
MIS 451 Building Business Intelligence Systems Logical Design (1)
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
Chapter 4 Logical & Physical Database Design
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
Houston Petroleum Valve Company Data-Mining Project Data Modeling Phase Fouad Alibrahim Mohammad H. Monakes University of Houston Clear Lake University.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Operation Data Analysis Hints and Guidelines
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Data Warehouse.
Star Schema.
Competing on Analytics II
Dimensional Model January 14, 2003
Inventory is used to illustrate:
CMPE 226 Database Systems April 11 Class Meeting
An Introduction to Data Warehousing
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Retail Sales is used to illustrate a first dimensional model
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Data Warehousing.
Presentation transcript:

Dimensional Modeling – Part 2 CS 543 – Data Warehousing CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

The Snowflake Schema Snowflaking is a method of normalizing the dimension tables in a star schema Normalization increases the efficiency of certain queries, and reduces space requirements CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Star Schema CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Querying Suppose the product table has 500,000 rows (different products). These products fall under 500 product brands, and these brands fall under 10 product categories Query: give me the total quantity of a specific product category sold in Jan 2004? All 500,000 rows in the product dimension table would have to be searched to find the products belonging to the specified product category CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

A Snowflake Schema CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Normalization Partially or fully normalize only a few dimension tables, leaving the others intact Partially normalize every dimension table Fully normalize every dimension table CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Snowflaking? Advantages Small savings in storage space Normalized structures are easier to update and maintain Disadvantages Schema less intuitive and end-users are put off by the complexity Ability to browse through the contents difficult Degraded query performance because of additional joins CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Sub-dimensions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Some Query Examples (1) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Some Query Examples (2) Query: Total sales for customer number 12345678 during the first week of December 2003 for product Widget-1 Find and sum the sales quantity and sales dollars for all fact table rows where the customer key relates to customer number 12345678, the product key relates to product Widget-1, and the time key relates to the seven days in the first week of December 2003. Assuming a customer can make a single purchase on a given day, only seven rows of the fact table will be summed CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Some Query Examples (3) Query: total sales for all customers in the south-central territory for the first two quarters of 2003 for product category Bigtools All fact table rows where the customer key relates to all customers in the south-central territory, the product key relates to all products in the product category Bigtools and the time key relates to about 180 days in the first two quarters of 2003. In this query, clearly a large number of fact table rows participate the summation How can we reduce the execution time? CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Fact Table Size (1) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Fact Table Size (2) Credit card transaction tracking Time dimension: 5 years = 60 months Number of credit card accounts: 150 million Av. number of monthly transaction/account: 20 Max. number of base fact table records: 180 billion CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Aggregating Fact Table Typically, queries require detailed data on some dimensions, while only summary data is needed for the other dimensions Example: assume one sale per product per store per week. Estimate the number of fact table rows required: Query involves 1 product, 1 store, 1 week Query involves 1 product, all stores, 1 week Query involves 1 brand, 1 store, 1 week Query involves 1 brand, all stores, 1 year Suppose now you have an aggregate fact table where each row summarizes the totals for a brand, a store, and a week. Now estimate the number of fact table rows required. CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Multi-Way Aggregate Fact Tables (1) Utilize hierarchies in dimensions to create appropriate aggregate fact tables Single-way aggregate fact table aggregates along one dimension only; multi-way have more than one dimension aggregated CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Multi-Way Aggregate Fact Tables (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Multi-Way Aggregate Fact Tables (3) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Goals for Aggregation Primary goal: improve overall DW performance Do not get bogged down with too many aggregates. Remember you have to create addition derived dimensions to support the aggregates Try to cater to a wide range of user groups Go for aggregates that do not unduly increase the overall usage of storage Keep the aggregates hidden from the end-users CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Families of Stars CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Snapshot and Transaction Tables CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Conformed Dimensions Since multiple fact tables share dimension tables, it is essential that dimensions are conformed, i.e., they have the same meaning Conformed dimensions are essential for Building up an enterprise warehouse from data marts Running queries across data marts Consistent semantics of queries and their results Using conformed dimensions is a important responsibility of the DW project team CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS

Standardizing Facts Since fact tables can be shared, they need to be standardized. Ensure same definition and terminology across data marts Resolve homonyms and synonyms Guarantee that the same algorithms are used for any derived units in each fact table Make sure each fact uses the right unit of measurement CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS