Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Relationships Offering New Drill-across Possibilities

Similar presentations


Presentation on theme: "On Relationships Offering New Drill-across Possibilities"— Presentation transcript:

1 On Relationships Offering New Drill-across Possibilities
Alberto Abelló, José Samos and Fèlix Saltor Universitat Politècnica de Catalunya November 8th, 2002 DOLAP

2 Contents Contents Related Work The Data Model UML Relationships
YAM2 Example Conclusions Contents November 8th, 2002 Alberto Abelló

3 Example of Multidimensional Schema
Related work November 8th, 2002 Alberto Abelló

4 Multi-star Schemas Related work Kimball: Share Dimensions
Giovinazzo: Galaxy sharing Dimensions Pedersen and Jensen: Multidimensional Object Family sharing subdimensions Gopalkrishnan, Li, and Karlapalem: Multi-star Schemas normalizing fact tables Moody and Kortink: Constellation (hierarchically linked fact tables) Galaxy (share Dimensions) Star Cluster (sharing subDimensions) Related work November 8th, 2002 Alberto Abelló

5 Semantic Relationships
Tryfona, Busborg, and Christiansen: EER Trujillo, Palomar, Gómez and Song: UML (Generalization and Association) Related work November 8th, 2002 Alberto Abelló

6 Dimension The Data Model
A Dimension is a connected, directed graph representing a point of view on analyzing data. Every vertex in the graph corresponds to a Level, and edges reflect part-whole relationships. The Data Model The first element is the analysis dimension (or Dimension for short). It is just a point of view we can use on analyzing data, and as we can see here, contains an aggregation hierarchy that shows how we can obtain data at different aggregation levels. In this case, we can aggregate monthly data by trimesters or four-month periods, and in either case, we can aggregate them to obtain years. Finally, we can also group all years to obtain one instance. Just to notice here that these aggregation hierarchies are defined by means of part-whole relationships ... November 8th, 2002 Alberto Abelló

7 Fact A Fact is a connected, directed graph representing a subject of analysis. Every vertex in the graph corresponds to a Cell, and edges reflect part-whole relationships. The Data Model As we defined what a Dimension is, let’s define now what a Fact is. Notice again the capital “F”. A Fact (with capital “F”) represents a set of facts (with small “f”) all of the same kind. Facts of the same kind could correspond to different aggregation levels, so that we can group them to give rise to more complex facts. How facts (with small “f”) can be grouped is shown in a Fact by means of a graph ... November 8th, 2002 Alberto Abelló

8 Cells in a Fact The Data Model November 8th, 2002 Alberto Abelló
... where each node represents a Cell with capital C, and arcs are part-whole relationships. The point here is that we do not directly define this graph, but it is defined by the Dimensions we use in the analysis. For example, it this case, we use “Time” and “Geographic” Dimensions with 5 and 3 Levels, respectively. Therefore, we have 15 possible classes of cells in the Fact. Each one corresponds to a combination of aggregation levels in the Dimensions. For “Month” and “City” aggregation levels, we have the corresponding class of cells in the Fact. And for the pair “Year” and “Region” we also have the corresponding class ... which can be obtained by succesively aggregating the atomic cells we have at bottom: Cities into Regions, Months into Trimesters, and these into Years… … or by means of any other path we have in the graph. Just to say that if the graphs of the Dimensions are lattices, the graph of the Fact will be a lattice, as well. We have already defined what Dimensions and Facts are, let’s see now what Cube is. November 8th, 2002 Alberto Abelló

9 Main Model Elements The Data Model November 8th, 2002 Alberto Abelló
We have seen the meaning of every multidimensional element, let’s go through the metaclasses of YAM2. We can see that a multidimensional schema is composed by stars, which contain one Fact and several Dimensions. These in turn, can be succesively decomposed to show more detail. Firstly, we can see that Dimensions contain Levels and Facts contain Cells. By means of LevelRelations and CellRelations, we form the graphs of Dimensions and Facts. … An association end of the LevelRelation is the whole, and the other is the Part. … Since we have that a Cell is defined at a given granularity … … corresponding to the relationships between levels, we find … … Part-Whole associations between Cells. We can also observe that a Cube is just a relationship between a set of Levels (we call Base) and a Cell. Moreover, we can distinguish between Cells that can be calculated from others, and those that can not , and are just summarized. Finally, at the most detailed level, we can see the Descriptors and Measures. Those Measures that cannot be calculated from others must belong to FundamentalCells. November 8th, 2002 Alberto Abelló

10 An specialization of UML
The Data Model Once we have seen the elements of YAM2, we should also notice that none of them is really new. They are just specializations, for multidimensional modeling, of generic UML elements. Firstly, Measures and Descriptors are just attributes ... ... of Levels and Cells that are classes. By means of CellRelations and LevelRelations, that are Associations, we form the aggregation graphs ... ... that can be contained by Classifiers. That is, Facts and Dimensions. One Fact and several Dimensions form a Star, that is a Package … … and several Stars form a MultidimensionalSchema that is what UML calls a Model. November 8th, 2002 Alberto Abelló

11 Relationships The Data Model Dimension Dimension Dimension Fact Fact
Level Cell Descriptor Measure Dimension Fact Level Cell Dimension Fact Level Cell Therefore, if multidimensional modeling elements are just special cases of more general UML elements, we can use UML relationships to relate them. Firstly, since Dimension, Fact, Level and Cell are GeneralizableElements, we can relate them by means of Generalization. Since Dimension, Fact, Level and Cell are Classifiers, we can also relate them by means of any kind of Association. An finally, since all six elements are ModelElements, they can be related by Derivation as well as Flow. November 8th, 2002 Alberto Abelló

12 Operations The Data Model November 8th, 2002 Alberto Abelló
Regarding the operations we can perform with cubes, we can also think of them at three levels. At the upper level we can change the dimensions of the space (by means of ChangeBase), or the subject of analysis (by means of Drill-across). At intermediate level we could change the granularity of data by rolling them up. And at the lower detailed level, we have operations simmilar to those of the relational algebra. That is Projection, that selects the measures to query, and Dice, which corresponds to a selection of points in the space. November 8th, 2002 Alberto Abelló

13 Multi-star schema The Data Model November 8th, 2002 Alberto Abelló
What are all those relationships we saw usefull for? Those relationships (specialization, association, flow, and so on) relate different Stars. Therefore, they can be used to drill-across. We have that the Stars are semantically related. If we zoom in, we see that there are relationships between Dimensions and Facts, and between Levels and Cells. So we do not have isolated Stars, but a complex net of semantic relationships that analysts can use in their work. Let’s see some examples of multidimensional schemas and how these relationships can appear between the elements. November 8th, 2002 Alberto Abelló

14 Derivation Dimension-Dimension November 8th, 2002 Alberto Abelló

15 Generalization Dimension-Dimension November 8th, 2002 Alberto Abelló

16 Association (I) Dimension-Dimension November 8th, 2002 Alberto Abelló

17 Association (II) Dimension-Dimension November 8th, 2002 Alberto Abelló

18 Flow Dimension-Dimension November 8th, 2002 Alberto Abelló

19 Derivation Fact-Fact November 8th, 2002 Alberto Abelló

20 Association Fact-Fact November 8th, 2002 Alberto Abelló

21 Generalization Fact-Fact November 8th, 2002 Alberto Abelló

22 Flow Fact-Fact November 8th, 2002 Alberto Abelló

23 Derivation/Association
Fact-Dimension November 8th, 2002 Alberto Abelló

24 Upper detail level YAM2 example November 8th, 2002 Alberto Abelló
Firstly, remember that at upper detail level we had Dimensions and Facts. So, here we can see that both can be specialized. As an specialization of People Dimension we have Clerk and Customer Dimensions, and as an specialization of ProductSale, we have CreditSale. We can also associate them. We can associate two Dimensions, like Clerk and Store, two Facts, like ProductSale and Production, or even a Dimension and a Fact, like Promotion and Product. Stronger kinds of Association are also possible. For example, one Dimension could compose another, like People and Clubs, or a Fact could compose another, like ProductSale and Deal. From Dimensions we can also derive other Dimensions, and from Facts we can derive either other Facts or Dimensions, like deriving Promotion Dimension from the homonim Fact. However, we cannot derive a Fact from a Dimension, because Dimensions represent given data, while Facts represent measurements Finally, we can also show how schemas evolve, by relating Dimensions and Facts by means of Flow relationships, like the old and new versions of Store and ProductSale. November 8th, 2002 Alberto Abelló

25 Intermediate detail level
YAM2 example At intermediate detail level, we can see Levels and Cells. Firstly, we can find them related by means of Generalization relationships. An AtomicSaleInSouthRegion is just an specialization of an AtomicSale. Associations can also be found between Cells, like that between AtomicProduct and AtomicSale, between two Levels, like that between Clerk and Store, or between a Cell and a Level that are those that define the Star. Moreover, we could also find reflexive Associations either around Levels or Cells. Aggregations become really important here, because they are used to define aggregation hierarchies inside Dimensions. However, they can also be used to show other relationships, like that between AtomicSale and AtomicDeal. Here we also have an example of Flow that relates the old and new version of defined, comercial Regions. Of course, derivations could be used to customize the schema to the needs of any user, changing names, hiding levels, and so on. But, for the sake of simplicity, they are not depicted here. November 8th, 2002 Alberto Abelló

26 Summary Conclusions November 8th, 2002 Alberto Abelló

27 Conclusions Conclusions
Benefit multidimensional modeling from O-O concepts Relate data to facilitate the view of the whole picture Use semantics to find properties Conclusions In this sense, it is important to show semantically rich relationships, so that analysts have as much information as possible. Moreover, these relationships are not only usefull for analysts, but also for designers. We can use those semantics to find properties of the schemas November 8th, 2002 Alberto Abelló

28 Questions November 8th, 2002 Alberto Abelló
Thank-you very much for your attention. If you have any question ... November 8th, 2002 Alberto Abelló

29 Generalization/Specialization of Facts
Consequences of relations November 8th, 2002 Alberto Abelló

30 Generalization/Specialization of Facts
Consequences of relations November 8th, 2002 Alberto Abelló


Download ppt "On Relationships Offering New Drill-across Possibilities"

Similar presentations


Ads by Google