Lecture 3 Themes in this session Basics of the multidimensional data model and star- join schemata The process of, and specific design issues in, multidimensional.

Slides:



Advertisements
Similar presentations
The Organisation As A System An information management framework The Performance Organiser Data Warehousing.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
C6 Databases.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Dimensional Modeling – Part 2
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti.
Data Warehousing Design Transparencies
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Business Driven Technology Unit 2 Exploring Business Intelligence Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Data Warehousing (Kimball, Ch.2-4) Dr. Vairam Arunachalam School of Accountancy, MU.
Principles of Dimensional Modeling
1 Basic concepts of On-Line Analytical processing DT211 /4.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
CSC271 Database Systems Lecture # 30.
Completing the Model Common Problems in Database Design.
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
BI Terminologies.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
December 5, Repository Metadata: Tips and Tricks Peggy Rodriguez, Kathy Kimball.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
ITGS Databases.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
More Dimensional Modeling. Facts Types of Fact Design Transactional Periodic Snapshot –Predictable time period –Ex. Monthly, yearly, etc. Accumulating.
UNIT-II Principles of dimensional modeling
Department of Industrial Engineering Sharif University of Technology Session# 9.
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Database Planning Database Design Normalization.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CHP - 9 File Structures.
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Star Schema.
Overview and Fundamentals
Competing on Analytics II
Relational Database Model
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Dimensional Modeling.
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Chapter 17 Designing Databases
Presentation transcript:

Lecture 3 Themes in this session Basics of the multidimensional data model and star- join schemata The process of, and specific design issues in, multidimensional data modelling

Why use Multidimensional modelling Simple, intuitive design basis –easy to create a multidimensional model –easy to communicate the meaning of the model –easy to gain an overview of the model A logical model which can be implemented in a variety of databases –relational databases –multidimensional databases –object-oriented databases Supports the reporting and analytical needs of business users Tried and tested

General structure for a multidimensional model A central fact table, referred to as a multidimensional data subject Surrounding dimension tables, referred to as single dimensional data subjects Joins connecting the fact table and its surrounding dimension tables. Only one join per dimension table A concatenated or multipart key in the fact table which is comprised of one key from each dimension table. NOTE: the multidimensional model represents a n- dimensional matrix, with n being the number of dimensions

The fact table Measures or facts –reflect focal events or snapshots of states of being –vary continuously over time All the facts have a specific granularity Facts should ideally be additive but this is not always the case A set of foreign keys constituting a concatenated key Contains the major volume of data

The dimension tables Dimensions are often referred to as causal dimensions, they contain the causal factors responsible for the collected measures The time dimension is not a causal dimension, it is however one of the most important dimensions for structuring and analysing data The dimension tables contain dimensional attributes, these are usually textual and discrete and must have a relevant business meaning Good dimensional attributes are stable across time If the attributes are connected in one or more hierarchies then these are usually captured in the dimension table

Skinny fact tables As the fact table contains the vast volume of records it is important that it is memory space efficient. Foreign keys are usually represented in integer from and do not require much memory space Facts too are often numeric properties and can usually be represented as integers (contrast to dimensional attributes which are usually long text strings) This space efficiency is critical to the memory space consumption of the data warehouse

Aggregation Lowest level of aggregation is determined by the granularity of the fact table. Aggregations can be created on-the-fly or by the process of pre-aggregation Pre-aggregation demands more storage space but provides better query performance Aggregation is easier when facts are all additive

Sparsity The matrices, represented by multidimensional models are often 99% sparse. Sparsity is dealt with by simply not creating records for the cells that are not filled in the matrix. If nothing has happened no record is created. Pre-aggregation and storage of aggregates can however lead to sparsity failure which places large demands on data storage

De-normalisation In 3rd normal from all mutually independent and fully dependent on the primary key The fact table is by nature highly normalised in a star-join schema Dimensions are however usually not normalised

Snowflakes and normalised dimension tables Any attempts to normalise dimension tables in order to save disk space are a waste of time Affects the intuitive understandability of the diagram Normalised dimension tables destroy the ability to browse Normalised tables demand extra joins and the querying of snowflakes take s longer than the querying of standard star-join schemata

The process of multidimensional data modelling

What to focus on in MDM Query optimised database Whole business entities Key business activities and influences Transaction history People, places and things Time Dimension and rollup

Basic steps in modelling Select a business subject area Identify which business process(es) is being modelled Identify the basic measures or facts Determine at what level of detail (granularity) active analysis is conducted Determine what the measures have in common (identify the dimensions) Identify the relevant attributes in the various dimensions Determine if the attributes are stable or variable over time and if their cardinality is bounded or unbounded

Identifying facts The purpose of the analyst must be supported by the facts in the fact table, there must be measures which have relevance to the business goals which the organisation seeks to fulfil Facts are by nature dynamic and variable over time They do not have a limited cardinality Facts have their origins in the working of the organisation and the activities it performs

Identifying dimensional attributes Dimensional attributes are those predictive variables business users believe are of significance to the measures in the fact table Dimensional attributes are often present in hierarchies in the causal dimensions Dimensional attributes usually have a limited cardinality and are non-variable across time.

Supporting multiple hierarchies in dimensions Dimensions should be able to support multiple independent hierarchies Alternate hierarchies are easily supported Cyclic paths in two hierarchies demand that the hierarchies be split into two entirely separate hierarchies or even two separate dimensions A dimension can also contain attributes that do not have any hierarchical relationships to the other attributes in the dimension NOTE: Any of the attributes, whether in a hierarchy or not, can be used in the drill down process

Factless fact tables Some fact tables quite simply have no measured facts! Often used to represent many-to-many relationships The only thing they contain is a concatenated key, they do still however represent a focal event which is identified by the combination of conditions referenced in the dimension tables There are two main types of factless fact tables: –event tracking tables –coverage tables

Dealing with many-to-many relationships among dimensional attributes Many to many relationships are difficult to deal with in a any database design situation. Great efforts should be taken to identify any in the data model When creating a MDM it is necessary to separate the two entities and capture their relationship in a factless fact table

Dealing with semi-additive and non- additive facts Semi-additive facts are those which are not additive across all dimensions –warn users –prohibit the addition of these facts across the relevant dimensions Non-additive facts are not additive across any dimensions –most ratios and all measures depicting snapshots of a state fall into this class –in some cases other calculatory methods can be used to aggregate these measures average over the number of time periods calculate the ratio of the sums not the sum of the ratios

Degenerate dimensions A dimension which has been cannibalised by other dimensions Represented as a dimension key without a corresponding dimension The key to this dimension still serves as a vital element for the grouping of facts This often happens when the grain in the fact tables represents actual working documents

Special mention - handling unit prices and costs Could usually be considered to be dimension attributes of a product However, every price change would then result in a change in one of the dimensions (remember these are supposed to be stable) Instead they are included as non-additive facts in the fact table. These are often used in a lot of derived measures which are presented to the business user in a specially created view This is once again a snapshot of a state, the state being the cost or price of the goods at a specific point in time

Depicting processes and value chains as families of star-join schemas Kimball speaks of their being two sides to the value chain –the demand side - the steps needed to satisfy the customers demand for the product –the supply side - the steps needed to manufacture the products from original ingredients or parts The chain consists of a sequence of inventory and flow star-join schemata joining the different star-join schemata is only possible when two sequential schemata have a common, identical dimension Sometimes the represented chain can be extended beyond the bounds of the business itself

A family of stars

Creating mini-dimensions for really large dimensions Many dimension attributes are used very frequently as browsing constraints, in big dimensions these constraints can be hard to find among the lesser used ones Logical groups of often used constraints can be separated into small dimensions which are very well indexed and easily accessible for browsing All variables in these mini-dimensions must be presented as distinct bands or classes The key to the mini-dimension can be places as a foreign key in both the fact table and dimension table from which it has been broken off Mini-dimensions, as their name suggests, should be kept small and compact

Slowly changing dimensions Most dimensions are not constant over time Most dimensions are however almost constant over time Almost constant dimensions are referred to as slowly- changing dimensions There are three main methods of handling slowly changing dimensions: –Overwriting –Creating additional dimensional records –Creating new current fields within the original dimension Note: One of the key functions of the data warehouse is to track events over extended periods of time. The validity of the data warehouse is thus dependent on how well changes in the its dimensions are tracked.

Slowly changing dimensions (1) The dimensional attribute record is overwritten with the new value No changes are needed elsewhere in the dimension record No keys are affected anywhere in the database Very easy to implement but the historical data is now inconsistent Two basic questions need to be asked before overwriting a dimension attribute: –How important is the value to the end-users analysis needs? –How important is the tracking of history?

Slowly changing dimensions (2) Introduce a new record for the same dimensional entity in order to reflect its changed state A new instance of the dimensional key is created which references the new record In order to is best dealt with by using version digits at the end of the key. These allow up to 100 snapshots of a changing dimensional entity All these keys need to be created, maintained and managed by someone and tracked in the metadata The database maintains its consistency and the versions can be said to partition history

Slowly changing dimensions (3) Use slightly different design of dimension table which has fields for: –original status of dimensional attribute –current status of dimensional attribute –an effective date of change field This allows the analyst to compare the as-is and as-was states against each other Only two states can be traced, the current and the original Some inconsistencies are created in the data as time is not properly partitioned

Special mention - Heterogeneous products Some products have many, many distinguishing attributes and many possible permutations (usually on the basis of some customised offer). This results in immense product dimensions and bad browsing performance In order to deal with this, fact tables with accompanying product dimensions can be created for each product type - these are known as custom fact tables Primary core facts on the products types are kept in a core fact table The core facts are copied in each of the customer fact tables

Pre-aggregated data in SJS How to deal with aggregates is one of the biggest issues in the design of a DW Choice of pre-aggregation/on-the fly aggregation has great relvance to data storage and query performance Two main strategies exist for the creation of aggregates: –the creation of new fact tables for aggregates –the creation of new level fields for aggregates