DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5.

Slides:



Advertisements
Similar presentations
Chapter 4 Tutorial.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Cognos 8 Training Session
Chapter 10: Designing Databases
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Data Warehousing.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Introduction to Data Warehousing CPS Notes 6.
ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/20101Lipyeow.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
1 9 Ch3, Hachim Haddouti Adv. DBS and Data Warehouse CSC5301 Ch3 Hachim Haddouti Hachim Haddouti.
Dimensional Modeling – Part 2
Data Warehousing Design Transparencies
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
1 Lecture 10: More OLAP - Dimensional modeling
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
Principles of Dimensional Modeling
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 A Guide to MySQL 2 Database Design Fundamentals.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Normalized model vs dimensional model
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
Operation Data Analysis Hints and Guidelines EGN 5621 Enterprise Systems Collaboration Summer B, 2014.
Basic Model: Retail Grocery Store
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Shilpa Seth.  Multidimensional Data Model Concepts Multidimensional Data Model Concepts  Data Cube Data Cube  Data warehouse Schemas Data warehouse.
Data Mining Data Warehouses.
Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
مثال ‌ هایی از شماهای پایگاه داده تحلیلی سید حسن فیروزآبادی.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
Operation Data Analysis Hints and Guidelines EIN 6133 Enterprise Engineering Fall, 2015.
Operation Data Analysis Hints and Guidelines
Data warehouse and OLAP
PRINCIPLES OF DIMENSIONAL MODELING
A multi-dimensional data model
Star Schema.
Overview and Fundamentals
Retail Sales is used to illustrate a first dimensional model
Database Fundamentals
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Modeling.
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Presented by: Tek Narayan Adhikari
Presentation transcript:

DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5

CHAPTER OBJECTIVES Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast it with entity-relationship modeling Review the basics of the STAR schema Find out what is inside the fact table and inside the dimension tables Determine the advantages of the STAR schema for data warehouses

BOOKS TO CONSIDER Datawarehousing fundamentals -a guide for IT professionals by: P. Ponniah A complete guide to dimensional modelling by Kimball and rose-2end edition

DIMENSIONAL MODELING VOCACBULARY 1. Dimensional modeling (DM) is the name of a set of techniques and concepts used in data warehouse design. Dimensional Modeling does not necessarily involve a relational database.data warehouse Dimensional modeling is widely accepted as the preferred technique for presenting analytic data because it addresses two simultaneous requirements: ■ Deliver data that’s understandable to the business users. ■ Deliver fast query performance

Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies that define the facts. For example, sales amount is a fact ; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc

2. FACT TABLE A fact table is the primary table in a dimensional model where the numerical performance measurements of the business are stored, We use the term fact to represent a business measure. We can imagine standing in the marketplace watching products being sold and writing down the quantity sold and dollar sales amount each day for each product in each store

So the above table gives sales activity on a given day in a given store for a given product. All fact tables have two or more foreign keys, as designated by the FK notation in the Figure above, that connect to the dimension tables’ primary keys. For example, the product key in the fact table always will match a specific product key in the product dimension table. When all the keys in the fact table match their respective primary keys correctly in the corresponding dimension tables, we say that the tables satisfy referential integrity. We access the fact table via the dimension tables joined to it.

DIMENSION TABLES The dimension tables contain the textual descriptors of the business, as illustrated in the Figure below In a well-designed dimensional model, dimension tables have many columns or attributes. These attributes describe the rows in the dimension table It is not uncommon for a dimension table to have 50 to 100 attributes

BRINGING TOGETHER FACTS AND DIMENSIONS Now that we understand fact and dimension tables, let’s bring the two building blocks together in a dimensional model. As illustrated in the Figure below, the fact table consisting of numeric measurements is joined to a set of dimension tables filled with descriptive attributes. This characteristic starlike structure is often called a star join schema.

DW SCHEMAS The schema is a logical description of the entire database. The schema includes the name and description of records of all record types including all associated data-items and aggregates. The database uses the relational model on the other hand the data warehouse uses the Stars, snowflake and fact constellation schema. In this chapter we will discuss the schemas used in data warehouse.

STAR SCHEMA In star schema each dimension is represented with only one dimension table. This dimension table contains the set of attributes. In the following diagram we have shown the sales data of a company with respect to the four dimensions namely, time, item, branch and location.

EXAMPLES OF A STAR SCHEMA time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch foreign keys

MORE EXPLANATION…….. There is a fact table at the centre. This fact table contains the keys to each of four dimensions. The fact table also contain the attributes namely, dollars sold and units sold. Note: Each dimension has only one dimension table and each table holds a set of attributes. For example the location dimension table contains the attribute set {location_key,street,city,province_or_state,country}. This constraint may cause data redundancy. For example the "Vancouver" and "Victoria" both cities are both in Canadian province of British Columbia. The entries for such cities may cause data redundancy along the attributes province_or_state and country.

SNOWFLAKE SCHEMA In Snowflake schema some dimension tables are normalized. The normalization split up the data into additional tables. for example the item dimension table in star schema is normalized and split into two dimension tables namely, item and supplier table.

EXAMPLE OF SNOWFLAKE SCHEMA time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measure s item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city province_or_street country city normalization

Therefore now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-key. The supplier key is linked to supplier dimension table. The supplier dimension table contains the attributes supplier_key, and supplier_type. Note: Due to normalization in Snowflake schema the redundancy is reduced therefore it becomes easy to maintain and save storage space.

FACT CONSTELLATION SCHEMA In fact Constellation there are multiple fact tables. This schema is also known as galaxy schema. In the following diagram we have two fact tables namely, sales and shipping.

EXAMPLE OF FACT CONSTELLATION time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measure s item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

The sale fact table is same as that in star schema. The shipping fact table has the five dimensions namely, item_key, time_key, shipper-key, from-location. The shipping fact table also contains two measures namely, dollars sold and units sold. It is also possible for dimension table to share between fact tables. For example time, item and location dimension tables are shared between sales and shipping fact table.

DIMENSIONAL MODELING PROCESS The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:star-like schema Choose the business process Declare the grain Identify the dimensions Identify the fact

CHOOSE THE BUSINESS PROCESS The basics in the design build on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML).data warehouseBPMNUML Example business processes include raw materials purchasing, orders, shipments, invoicing, inventory, and general ledger. It is important to remember that we’re not referring to an organizational business department or function when we talk about business processes

DECLARE THE GRAIN Declaring the grain means specifying exactly what an individual fact table row represents. The grain conveys the level of detail associated with the fact table measurements. It provides the answer to the question, “How do you describe a single row in the fact table?”

EXAMPLE GRAIN DECLARATIONS INCLUDE: An individual line item on a customer’s retail sales ticket as measured by a scanner device A line item on a bill received from a doctor An individual boarding pass to get on a flight A daily inventory levels for each product in a warehouse A monthly snapshot for each bank account

IDENTIFY THE DIMENSIONS The third step in the design process is to define the dimensions of the model.. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday. Examples of common dimensions include date, product, customer, transaction type, and status.

IDENTIFY THE FACTS Identify the numeric facts that will populate each fact table row. Facts are determined by answering the question, “ What are we measuring?” Business users are keenly interested in analyzing these business process performance measures. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse.data warehouse Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.

TYPES OF FACTS Additive : Additive facts are facts that can be summed up through all of the dimensions in the fact table. Semi-Additive : Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Non-Additive : Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

ADDITIVE FACTS Let us use examples to illustrate each of the three types of facts. The first example assumes that we are a retailer, and we have a fact table with the following columns: (Date Key; Store key; Productkey; Sales_Amount) The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week.

SEMI ADDITIVE FACTS Say we are a bank with the following fact table: (Date Key; Account Key; Current_Balance; Profit_Margin) The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts.

SEMI ADDITIVE FACTS Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level.

FACT LESS FACT TABLES Fact tables that don't have any facts at all! They may consist of nothing but keys. For example fact tables that records an event.

FACTLESS FACT TABLES Imagine that you have a modern student tracking system that detects each student attendance event each day. The dimensions surrounding the student attendance event include: Date : one record in this dimension for each day on the calendar Student: one record in this dimension for each student Course: one record in this dimension for each course taught each semester Teacher : one record in this dimension for each teacher Facility : one record in this dimension for each room, laboratory, or athletic field

FACTLESS FACT TABLE A factless fact table for recording student attendance on a daily basis at a college. The five dimension tables contain rich descriptions of dates, students, courses, teachers, and facilities. There are no additive, numeric facts.

The grain of the fact table above is the individual student attendance event. When the student walks through the door into the lecture, a record is generated. The fact table record, consisting of just the five keys, is a good representation of the student attendance event

There is no obvious fact to record each time a student attends a lecture or suits up for physical education. Tangible facts such as the grade for the course don't belong in this fact table. This fact table represents the student attendance process, not the semester grading process or even the midterm exam process.

A lot of interesting questions can be asked of this dimensional schema, including: Which classes were the most heavily attended? Which classes were the most consistently attended? Which teachers taught the most students? Which teachers taught classes in facilities belonging to other departments? Which facilities were the most lightly used? What was the average total walking distance of a student in a given day?

GROUP WORK