On Relationships Offering New Drill-across Possibilities

Slides:



Advertisements
Similar presentations
The Organisation As A System An information management framework The Performance Organiser Data Warehousing.
Advertisements

OLAP over Uncertain and Imprecise Data
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Implementing Operations to Navigate Semantic Star Schemas Alberto Abelló, José Samos and Fèlix Saltor U. Politècnica de Catalunya & U. de Granada November.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Multiscale Visualization Using Data Cubes Chris Stolte, Diane Tang, Pat Hanrahan Stanford University Information Visualization October 2002 Boston, MA.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
1 Basic concepts of On-Line Analytical processing DT211 /4.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
GeoUML a conceptual data model for geographical data conformant to ISO TC 211 Main GeoUML constructs Alberto BelussiNovembre 2004.
OnLine Analytical Processing (OLAP)
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
1 Copyright © Cengage Learning. All rights reserved. 4 Probability.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
ISQS 6339, Data Management and Business Intelligence Cubism – Bells and Whistles Zhangxi Lin Texas Tech University 1.
CS3773 Software Engineering Lecture 04 UML Class Diagram.
BI Terminologies.
DEFINING the BUSINESS REQUIREMENTS. Introduction OLTP and DW planning is different in term of requirements clarity Planning DW is about solving users’
11th SSDBM, Cleveland, Ohio, July 28-30, 1999 Supporting Imprecision in Multidimensional Databases Using Granularities T. B. Pedersen 1,2, C. S. Jensen.
Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11.
UNIT-II Principles of dimensional modeling
Presented By: Solutions Delivery Managing Reports in CRMnext.
Shilpa Seth.  Multidimensional Data Model Concepts Multidimensional Data Model Concepts  Data Cube Data Cube  Data warehouse Schemas Data warehouse.
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 6 The Data Warehouse Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
1 Copyright © 2006, Oracle. All rights reserved. Defining OLAP Concepts.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
COP Introduction to Database Structures
Operation Data Analysis Hints and Guidelines
Course Outcomes of Object Oriented Modeling Design (17630,C604)
As the last CC-list represents Maximum Compatible Classes we conclude:
On-Line Analytic Processing
Data warehouse and OLAP
Entity-Relationship Model
Databases and Database Management Systems Chapter 9
Data storage is growing Future Prediction through historical data
Business Process Measures
Star Schema.
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
MIS2502: Data Analytics Dimensional Data Modeling
Data Warehouse and OLAP
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Systems Analysis and Design With UML 2
Electrical and Computer Engineering Department
DataMart (Data Warehouse) Tool:
The Relational Model Transparencies
Retail Sales is used to illustrate a first dimensional model
Chapter 4 The Relational Model Pearson Education © 2009.
Systems Analysis – ITEC 3155 Modeling System Requirements – Part 2
Retail Sales is used to illustrate a first dimensional model
Chapter 13 The Data Warehouse
Information Networks: State of the Art
Dimensional Model January 16, 2003
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
DWH – Dimesional Modeling
Semantic Nets and Frames
Database EER.
Data Warehouse and OLAP
Presentation transcript:

On Relationships Offering New Drill-across Possibilities Alberto Abelló, José Samos and Fèlix Saltor Universitat Politècnica de Catalunya November 8th, 2002 DOLAP

Contents Contents Related Work The Data Model UML Relationships YAM2 Example Conclusions Contents November 8th, 2002 Alberto Abelló

Example of Multidimensional Schema Related work November 8th, 2002 Alberto Abelló

Multi-star Schemas Related work Kimball: Share Dimensions Giovinazzo: Galaxy sharing Dimensions Pedersen and Jensen: Multidimensional Object Family sharing subdimensions Gopalkrishnan, Li, and Karlapalem: Multi-star Schemas normalizing fact tables Moody and Kortink: Constellation (hierarchically linked fact tables) Galaxy (share Dimensions) Star Cluster (sharing subDimensions) Related work November 8th, 2002 Alberto Abelló

Semantic Relationships Tryfona, Busborg, and Christiansen: EER Trujillo, Palomar, Gómez and Song: UML (Generalization and Association) Related work November 8th, 2002 Alberto Abelló

Dimension The Data Model A Dimension is a connected, directed graph representing a point of view on analyzing data. Every vertex in the graph corresponds to a Level, and edges reflect part-whole relationships. The Data Model The first element is the analysis dimension (or Dimension for short). It is just a point of view we can use on analyzing data, and as we can see here, contains an aggregation hierarchy that shows how we can obtain data at different aggregation levels. In this case, we can aggregate monthly data by trimesters or four-month periods, and in either case, we can aggregate them to obtain years. Finally, we can also group all years to obtain one instance. Just to notice here that these aggregation hierarchies are defined by means of part-whole relationships ... November 8th, 2002 Alberto Abelló

Fact A Fact is a connected, directed graph representing a subject of analysis. Every vertex in the graph corresponds to a Cell, and edges reflect part-whole relationships. The Data Model As we defined what a Dimension is, let’s define now what a Fact is. Notice again the capital “F”. A Fact (with capital “F”) represents a set of facts (with small “f”) all of the same kind. Facts of the same kind could correspond to different aggregation levels, so that we can group them to give rise to more complex facts. How facts (with small “f”) can be grouped is shown in a Fact by means of a graph ... November 8th, 2002 Alberto Abelló

Cells in a Fact The Data Model November 8th, 2002 Alberto Abelló ... where each node represents a Cell with capital C, and arcs are part-whole relationships. The point here is that we do not directly define this graph, but it is defined by the Dimensions we use in the analysis. For example, it this case, we use “Time” and “Geographic” Dimensions with 5 and 3 Levels, respectively. Therefore, we have 15 possible classes of cells in the Fact. Each one corresponds to a combination of aggregation levels in the Dimensions. For “Month” and “City” aggregation levels, we have the corresponding class of cells in the Fact. And for the pair “Year” and “Region” we also have the corresponding class ... which can be obtained by succesively aggregating the atomic cells we have at bottom: Cities into Regions, Months into Trimesters, and these into Years… … or by means of any other path we have in the graph. Just to say that if the graphs of the Dimensions are lattices, the graph of the Fact will be a lattice, as well. We have already defined what Dimensions and Facts are, let’s see now what Cube is. November 8th, 2002 Alberto Abelló

Main Model Elements The Data Model November 8th, 2002 Alberto Abelló We have seen the meaning of every multidimensional element, let’s go through the metaclasses of YAM2. We can see that a multidimensional schema is composed by stars, which contain one Fact and several Dimensions. These in turn, can be succesively decomposed to show more detail. Firstly, we can see that Dimensions contain Levels and Facts contain Cells. By means of LevelRelations and CellRelations, we form the graphs of Dimensions and Facts. … An association end of the LevelRelation is the whole, and the other is the Part. … Since we have that a Cell is defined at a given granularity … … corresponding to the relationships between levels, we find … … Part-Whole associations between Cells. We can also observe that a Cube is just a relationship between a set of Levels (we call Base) and a Cell. Moreover, we can distinguish between Cells that can be calculated from others, and those that can not , and are just summarized. Finally, at the most detailed level, we can see the Descriptors and Measures. Those Measures that cannot be calculated from others must belong to FundamentalCells. November 8th, 2002 Alberto Abelló

An specialization of UML The Data Model Once we have seen the elements of YAM2, we should also notice that none of them is really new. They are just specializations, for multidimensional modeling, of generic UML elements. Firstly, Measures and Descriptors are just attributes ... ... of Levels and Cells that are classes. By means of CellRelations and LevelRelations, that are Associations, we form the aggregation graphs ... ... that can be contained by Classifiers. That is, Facts and Dimensions. One Fact and several Dimensions form a Star, that is a Package … … and several Stars form a MultidimensionalSchema that is what UML calls a Model. November 8th, 2002 Alberto Abelló

Relationships The Data Model Dimension Dimension Dimension Fact Fact Level Cell Descriptor Measure Dimension Fact Level Cell Dimension Fact Level Cell Therefore, if multidimensional modeling elements are just special cases of more general UML elements, we can use UML relationships to relate them. Firstly, since Dimension, Fact, Level and Cell are GeneralizableElements, we can relate them by means of Generalization. Since Dimension, Fact, Level and Cell are Classifiers, we can also relate them by means of any kind of Association. An finally, since all six elements are ModelElements, they can be related by Derivation as well as Flow. November 8th, 2002 Alberto Abelló

Operations The Data Model November 8th, 2002 Alberto Abelló Regarding the operations we can perform with cubes, we can also think of them at three levels. At the upper level we can change the dimensions of the space (by means of ChangeBase), or the subject of analysis (by means of Drill-across). At intermediate level we could change the granularity of data by rolling them up. And at the lower detailed level, we have operations simmilar to those of the relational algebra. That is Projection, that selects the measures to query, and Dice, which corresponds to a selection of points in the space. November 8th, 2002 Alberto Abelló

Multi-star schema The Data Model November 8th, 2002 Alberto Abelló What are all those relationships we saw usefull for? Those relationships (specialization, association, flow, and so on) relate different Stars. Therefore, they can be used to drill-across. We have that the Stars are semantically related. If we zoom in, we see that there are relationships between Dimensions and Facts, and between Levels and Cells. So we do not have isolated Stars, but a complex net of semantic relationships that analysts can use in their work. Let’s see some examples of multidimensional schemas and how these relationships can appear between the elements. November 8th, 2002 Alberto Abelló

Derivation Dimension-Dimension November 8th, 2002 Alberto Abelló

Generalization Dimension-Dimension November 8th, 2002 Alberto Abelló

Association (I) Dimension-Dimension November 8th, 2002 Alberto Abelló

Association (II) Dimension-Dimension November 8th, 2002 Alberto Abelló

Flow Dimension-Dimension November 8th, 2002 Alberto Abelló

Derivation Fact-Fact November 8th, 2002 Alberto Abelló

Association Fact-Fact November 8th, 2002 Alberto Abelló

Generalization Fact-Fact November 8th, 2002 Alberto Abelló

Flow Fact-Fact November 8th, 2002 Alberto Abelló

Derivation/Association Fact-Dimension November 8th, 2002 Alberto Abelló

Upper detail level YAM2 example November 8th, 2002 Alberto Abelló Firstly, remember that at upper detail level we had Dimensions and Facts. So, here we can see that both can be specialized. As an specialization of People Dimension we have Clerk and Customer Dimensions, and as an specialization of ProductSale, we have CreditSale. We can also associate them. We can associate two Dimensions, like Clerk and Store, two Facts, like ProductSale and Production, or even a Dimension and a Fact, like Promotion and Product. Stronger kinds of Association are also possible. For example, one Dimension could compose another, like People and Clubs, or a Fact could compose another, like ProductSale and Deal. From Dimensions we can also derive other Dimensions, and from Facts we can derive either other Facts or Dimensions, like deriving Promotion Dimension from the homonim Fact. However, we cannot derive a Fact from a Dimension, because Dimensions represent given data, while Facts represent measurements Finally, we can also show how schemas evolve, by relating Dimensions and Facts by means of Flow relationships, like the old and new versions of Store and ProductSale. November 8th, 2002 Alberto Abelló

Intermediate detail level YAM2 example At intermediate detail level, we can see Levels and Cells. Firstly, we can find them related by means of Generalization relationships. An AtomicSaleInSouthRegion is just an specialization of an AtomicSale. Associations can also be found between Cells, like that between AtomicProduct and AtomicSale, between two Levels, like that between Clerk and Store, or between a Cell and a Level that are those that define the Star. Moreover, we could also find reflexive Associations either around Levels or Cells. Aggregations become really important here, because they are used to define aggregation hierarchies inside Dimensions. However, they can also be used to show other relationships, like that between AtomicSale and AtomicDeal. Here we also have an example of Flow that relates the old and new version of defined, comercial Regions. Of course, derivations could be used to customize the schema to the needs of any user, changing names, hiding levels, and so on. But, for the sake of simplicity, they are not depicted here. November 8th, 2002 Alberto Abelló

Summary Conclusions November 8th, 2002 Alberto Abelló

Conclusions Conclusions Benefit multidimensional modeling from O-O concepts Relate data to facilitate the view of the whole picture Use semantics to find properties Conclusions In this sense, it is important to show semantically rich relationships, so that analysts have as much information as possible. Moreover, these relationships are not only usefull for analysts, but also for designers. We can use those semantics to find properties of the schemas November 8th, 2002 Alberto Abelló

Questions November 8th, 2002 Alberto Abelló Thank-you very much for your attention. If you have any question ... November 8th, 2002 Alberto Abelló

Generalization/Specialization of Facts Consequences of relations November 8th, 2002 Alberto Abelló

Generalization/Specialization of Facts Consequences of relations November 8th, 2002 Alberto Abelló