Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11.

Similar presentations


Presentation on theme: "Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11."— Presentation transcript:

1 Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11

2 This is a tough paper This is the toughest paper that we’ve dealt with so far It introduces –a number of concepts that are very important –in ways that are often difficult to follow –with a combination of standard and homemade terms So, for today –rather than concentrate on critique items –we need to concentrate on the concepts Sept-Dec 2009 – w7d12

3 Multidimensional modeling Ties together the concepts of: –a data warehouse –multidimensional database (MDB) –online analytical processing (OLAP) What are dimensions? What are –data warehouses –multidimensional database (MDB) –online analytical processing (OLAP) Sept-Dec 2009 – w7d13

4 Multidimensional modeling Structures information into –facts –dimensions a set of attributes called measures or fact attributes –can be atomic or derived –are contained in cells or points within the data cube We base this set of measures on a set of dimensions that derive from the granularity chosen for representing the facts. These dimensions thus present the context for analyzing the facts. dimension attributes –provide the specifics that characterize dimensions. Sept-Dec 2009 – w7d14

5 Multidimensional modeling facts –many-to-many relationships between all dimensions –many-to-one relationships between the fact and every particular dimension e.g. product sale is related to only one product that is sold in one store to one customer at one time –can represent many-to-many relationships between particular dimensions e.g. one sales slip can contain many products, and one product can be on many sales slips Sept-Dec 2009 – w7d15

6 Multidimensional modeling The additivity / summarizability concept –A measure (fact attribute) is additive along a dimension if we can use the SUM operator to aggregate attribute values along all hierarchies defined on that dimension The aggregation of some fact attributes –called roll-up in OLAP terminology –might not be semantically meaningful for all measures along all dimensions e.g. number of clients –estimated by counting the number of purchase receipts for a given product, customer, day, and store –is not additive along the product dimension. Because the same ticket can include other products, adding up the number of clients for two or more products would lead to inconsistent results. Sept-Dec 2009 – w7d16

7 Multidimensional modeling The strictness concept –an object at a hierarchy’s lower level –belongs to only one higher level object –e.g. a province can only relate to one country The completeness concept –all members belong to one higher-class object and –that object consists of those members only –e.g. only the recorded provinces can form a country. In a “complete” classification hierarchy between the country and province levels, all the recorded provinces form the country, and all the provinces that form the country have been recorded Sept-Dec 2009 – w7d17

8 Multidimensional modeling Categorization of dimensions –some attributes are normally valid for all elements within a dimension –while others are only valid for a subset of elements –e.g. the attributes alcohol percentage and volume would only be valid for drink products and would be null for food products. A proper multidimensional data model –should consider attributes only when necessary, –depending on the categorization of dimensions. Sept-Dec 2009 – w7d18

9 Multidimensional modeling Recommended modeling approach –Clearly separate the structure of a multidimensional model into facts dimensions –Fact classes are composite classes “in a shared-aggregation relationship of n dimension classes” e.g. they relate instances from all dimensions –A fact object instance is always related to object instances from all dimensions Sept-Dec 2009 – w7d19

10 Multidimensional modeling Given the basic of their modeling approach –they then go on to explain how they can annotate derived measures (with a “/”) table specific components of the table’s primary key / object ID (“OID”) attributes that function as descriptors (‘D”) constraints on additivity (between braces near the fact table) additivity and derivation rules (separate from the diagram) that a dimension is a directed acyclic graph (“DAG”) –they also use various other UML notations Is this perhaps a little much semantic loading? Sept-Dec 2009 – w7d110

11 Multidimensional modeling Regardless of how we model these various concepts –it is important that they be considered –in the design of data warehouses Sept-Dec 2009 – w7d111

12 Dimensional Modeling (based on Jones) Sept-Dec 2009 – w7d112

13 Characteristics for using Patterns The problem that the pattern addresses is identified, recognized, and defined from real world situations. A pattern provides an approach for formulating a solution to a real world problem. The approach must be defined with respect to the real world context from which the problem emanates. The approach is reusable because it has been successfully used to solve recurring real world problems. A pattern endures over time. Sept-Dec 2009 – w7d113

14 Dimensional Data Patterns involve a commonly known & recognized mental model –with the intent of increasing the practitioner's ability to understand, remember, and apply the DDPs facilitate the identification of commonly used entities –thereby providing a greater potential for improving design correctness with the initial model are common across many dimensional models –thus reusability is improved and design time may be decreased Sept-Dec 2009 – w7d114

15 Mental Models for DDPs Using a story as the basis for Domain DDPs –Who: the characters involved in the story –What: the important entities and the ideas for those entities –When: a particular time frame involved –Where: the location / setting of the story –Why: the motivation or the reasons behind the story Sept-Dec 2009 – w7d115

16 Domain DDPs A high-level set of domains can then be constructed: –temporal (when) –location (where) –stakeholder (who) –action (what is done or accomplished) –object (what) –qualifier (why) Sept-Dec 2009 – w7d116

17 Commonality of DDPs The basic domains can apply to any story Experience across stories will recognize commonalities Individual stories may contain unique components –however, many of these components will take on similar patterns –despite the components having different names Sept-Dec 2009 – w7d117


Download ppt "Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11."

Similar presentations


Ads by Google