Presentation is loading. Please wait.

Presentation is loading. Please wait.

From UML to ROLAP multidimensional databases using a pivot model

Similar presentations


Presentation on theme: "From UML to ROLAP multidimensional databases using a pivot model"— Presentation transcript:

1 From UML to ROLAP multidimensional databases using a pivot model
Nicolas PRAT, ESSEC Business School Jacky AKOKA, CNAM Paris BDA 2002, INT, Evry, October 2002

2 Overview 1. Introduction 2. Unified multidimensional metamodel
3. Design method 4. Conclusion

3 Introduction Data warehousing and OLAP market growing rapidly
=> need for systematic, tool-supported method for data warehouse/multidimensional database design. Difficulty of data warehouse design often underestimated by OLAP tool vendors. However, crucial phase. => Data warehouse design should follow the conceptual/logical/physical design phases (as in transactional database design).

4 State of the art 1. Introduction Many papers proposing multidimensional data models (sometimes with associated algebra/query language). Only a few data warehouse design methods (Akoka 97, Akoka 01, Golfarelli 98, Cabibbo 98, Moody 00). Distinction between conceptual/logical/physical: Often unclear and/or missing phases. Our contribution: Data warehouse design method based on UML, spanning the three design phases. Metamodels for each design step (including unified multidimensional metamodel=>pivot model). Transformations operating on the concepts of the metamodels. Specification of the transformations in OCL (Object Constraint Language).

5 Multidimensional metamodel
2. Unified multidimensional metamodel Problem with the multidimensional metamodel: No agreement on the concepts of this model (e.g. facts). No agreement on the level of this model: physical, logical, or conceptual. We consider the multidimensional metamodel to be at the logical level: It exists independently of implementation. Its concepts (e.g. dimension) are not as close to reality as concepts like the object or the entity. Strong parallel with the relational model. We have defined a unified multidimensional model.

6 Multidimensional modeling
2. Unified multidimensional metamodel 4 6 9 3 1 12 8 11 5 P1 P2 P3 P4 P5 P6 P7 3 March 99 4 March 99 5 March 99 6 March 99 7 March 99 8 March 99 9 March 99 Bordeaux Brest Lyon Nantes Paris LEGEND Measure Quantity sold MONTH QUARTER YEAR Hierarchy PRODUCT DAY DIMENSION CITY REGION product name unit price Attribute : CATEGORY

7 Unified multidimensional metamodel
ModelElement name : Name MultidimensionalModel MultidimensionalModelElement 1 1..* + ownedElement + owner 1 attribute 0..* + +source 1 dimensionLink 0..* DimensionLink target 1..* 2..* dimensionHierarchy { ordered } level : Integer Dimension DimensionAttribute DimensionHierarchy 1..* 0..* +dimension + measure Dimensioning strong : Boolean Measure dummyMeasure : Boolean 1..* 0..* AggregateFunction name : FunctionName restrictionLevel Integer

8 Overview 3. Design method Universe of discourse CONCEPTUAL DESIGN UML
schema conceptual modeling model enrichment /transformation Enriched / transformed UML schema model LOGICAL DESIGN Logical mapping Unified multidimensional schema Unified multidimensional model PHYSICAL DESIGN Physical mapping ROLAP snowflake schema ROLAP star MOLAP DATA CONFRON - TATION Source confrontation Data Warehouse Metadata

9 Conceptual design Multidimensional representation of data (OLAP).
3. Design method Multidimensional representation of data (OLAP). Conceptual phase necessary (vs. direct representation of data in ROLAP stars/snowflakes or MOLAP cubes). Choice of UML for the conceptual phase: Standard and well-known formalism Simple and powerful constructs to represent data at a high level of abstraction “Easy” mapping to relational and multidimensional systems. 2-step conceptual design: Definition of a UML model (class diagram without operations) Enrichment/transformation of this model to facilitate further automatic mapping to a unified multidimensional model. =>need to enrich UML metamodel.

10 Enriched UML metamodel
3. Design method UMLModel UMLModel ModelElement ModelElement name name : : Name Name 1 1 1 1 1..* 1..* 1..* 1..* + + ownedElement ownedElement + + constrainedElement constrainedElement + + constraint constraint UMLModelElement UMLModelElement 0..* 0..* 0..* 0..* { { ordered ordered } } 0..* 0..* 0..* 0..* Constraint Constraint Attribute Attribute AssociationEnd AssociationEnd Relationship Relationship + + owner owner + + attribute attribute Class Class measure measure : : Boolean Boolean aggregation aggregation : : AggregationKind AggregationKind multiplicity multiplicity : : Multiplicity Multiplicity GeneralizationConstraint GeneralizationConstraint 1 1 1 1 0..* 0..* 0..* 0..* { { ordered ordered } } 1 1 1 1 + + connection connection 0..* 0..* 0..* 0..* 2..* 2..* { { ordered ordered } } +participant +participant +association +association +association +association {disjoint, {disjoint, complete complete } } 1 1 1 1 AttributeOfOrdinaryClass AttributeOfOrdinaryClass AttributeOfAssociationClass AttributeOfAssociationClass Association Association Generalization Generalization identifyingAttribute identifyingAttribute : : Boolean Boolean 0..* 0..* 0..* 0..* 1 1 1 1 0..* 0..* 0..* 0..* {disjoint, {disjoint, complete complete } } + + child child {disjoint, {disjoint, complete complete } } + + generalization generalization OrdinaryClass OrdinaryClass AssociationClass AssociationClass OrdinaryAssociation OrdinaryAssociation + + specialization specialization 1 1 1 1 +parent +parent

11 Conceptual design (step 1)
3. Design method Product Product _type _type Media_type Media_type may may _ _ be be _ _ advertised advertised _in _in product product _type _type media_type media_type product product _unit _unit insertion insertion 1..* 1..* 1..* 1..* Region Region * * * * region region Conceptual design (step 1) 1 1 1 1 1 1 1 1 number number _of_ _of_ inhabitants inhabitants 1..* 1..* 1..* 1..* gets gets * * * * * * * * * * * * Media Media 1 1 1 1 Product Product exposure exposure media_ media_ name name product product _code _code percentage percentage _of_ _of_ region region media_ media_ exposure exposure advertising advertising _ _ price price product product _ _ name name * * * * * * * * * * * * 1..* 1..* 1..* 1..* 1..* 1..* 1..* 1..* 1 1 1 1 main_ main_ Target Target shareholder shareholder * * * * target target _code _code consumption consumption status status for for * * * * * * product product _ _ consumption consumption minimum_age minimum_age 1 1 1 1 maximum_age maximum_age * * * * * * sex sex Shareholder Shareholder is is _ _ strongly strongly _ _ influenced influenced _by _by * * shareholder shareholder _ _ name name * * * * * * * * Advertising Advertising _ _ campaign campaign Year Year campaign campaign _code _code {overlapping, complete} year year * * * * * * * * 1 1 1 1 Private Private _ _ shareholder shareholder Public_ Public_ shareholder shareholder during during 1..* 1..* 1..* 1..* public_ public_ shareholder shareholder _ _ level level Quarter Quarter 1 1 1 1 quarter quarter {disjoint, complete} in in 1 1 1 1 1..* 1..* 1..* 1..* Person Person Company Company Date Date manager_ manager_ name name dd_mm_ dd_mm_ yy yy

12 Conceptual design (step 2)
3. Design method Enrichment/transformation of the UML model with 4 types of successive transformations: Determination of identifying attributes Determination of attributes representing measures Migration of association attributes Transformation of generalizations. Determination of identifying attributes: Identifier=not a standard concept in UML Necessary in order to define dimensions in the logical phase Necessary for ordinary classes only Use of the tagged value {id}.

13 Conceptual design (step 2)
3. Design method Determination of attributes representing measures: Measures vs. qualitative values Distinction cannot based performed automatically based on types Not necessary for identifiers (defined previously) Use of the tagged value {meas}. Migration of 1-1 and 1-N association attributes: Check validity of representation first. Transformation Tcc3 : Each attribute belonging to a 1-1 association is transferred to one of the classes involved in the association. Transformation Tcc4 : Each attribute belonging to a 1-N association is transferred to the N-class, i.e. the class involved several times in the association. Transformation of generalizations: No direct mapping of UML generalizations to multidimensional hierarchies. UML generalizations transformed into aggregations and classes.

14 Conceptual design (step 2)
3. Design method Transformation of generalizations (cont’d): Transformation Tcc5 : For each level i of specialization of a class C, a class named Type-C-i is created. The occurrences of these classes define all the specializations of C. In case of overlapping between specializations, a special value is created for each overlapping between two or more sub-classes of C. In case of incomplete specialization, the special value “others” is created. A N-1 aggregation is created between the classes C and Type-C-i. Shareholder _type shareholder _type {id} _ name {id} public_ level manager_ 1 * Private private Occurrences of shareholder_type: {private,public,both} Occurrences of private_shareholder_type: {person,company,others} Private _ shareholder Public_ public_ level Person Company manager_ name {overlapping, complete} {disjoint, Shareholder _type _type {id} _name {id} Transformation Tcc5

15 Logical design 3. Design method From enriched/transformed UML model to unified multidimensional model. Mapping of: Ordinary classes and their attributes (transformations Tcl1 to Tcl3) Associations and their attributes (transformations Tcl4 to Tcl6). Mapping ordinary classes and their attributes : Transformation Tcl1: The identifying attribute of each ordinary class is mapped into a dimension in the multidimensional model. Transformation Tcl2: The non-identifying attributes of each ordinary class are mapped into dimension attributes in the multidimensional model if these non-identifying attributes are not measures of interest. Transformation Tcl3: The non-identifying attributes of each ordinary class are mapped into measures in the multidimensional model if these non-identifying attributes are measures of interest.

16 Logical design Specifying transformation Tcl3 with OCL :
3. Design method Specifying transformation Tcl3 with OCL : Context UMLModel::Tcl3(nonIdentifier:Attribute, multidimensionalModel:MultidimensionalModel) :Measure pre: nonIdentifier.owner.oclIsTypeOf(OrdinaryClass) =true and nonIdentifier.identifyingAttribute=false nonIdentifier.measure=true post:result.name=nonIdentifier.name post:nonIdentifier.owner.attribute-> forall(a1:Attribute| if a1.identifyingAttribute=true then result.dimension=Tcl1(a1) endif) post:multimensionalModel->includes(result)

17 Logical design 3. Design method Mapping ordinary classes and their attributes (example): exposure media_ { meas } Media_type media_type {id} insertion Region region {id} number _of_ inhabitants Media name advertising _ price 1 * 1..* gets Target target _code {id} status minimum_age maximum_age sex percentage Year year Quarter quarter {id} Enriched/transformed UML model Transformation Tcl1 dimension target_code dimension quarter dimension year dimension region dimension media_name dimension media_type attribute status [target_code] attribute minimum_age [target_code] attribute maximum_age [target_code] attribute sex [target_code] attribute insertion [media_type] attribute advertising_price [media_name] measure percentage_of_region [target_code] measure number_of_inhabitants [region] Transformation Tcl2 Transformation Tcl3 Unified multidimensional model

18 Logical design Mapping associations and their attributes :
3. Design method Mapping associations and their attributes : Transformation Tcl4: The attributes of each association class are mapped into measures, associated with dimensions obtained by mapping the identifying attributes of the ordinary classes directly or indirectly participating in the association class (transformation Tcl1). Transformation Tcl5: A path formed by N-1 associations is mapped into a hierarchy in the multidimensional model. Transformation Tcl6: Every N-M or N-ary association without at least one attribute that is always defined is mapped into a dummy measure, associated with dimensions obtained by mapping the identifying attributes of the ordinary classes directly or indirectly participating in the association (transformation Tcl1).

19 Logical design Mapping associations and their attributes (example):
3. Design method Mapping associations and their attributes (example): dimension target_code dimension quarter dimension year dimension region dimension media_name dimension media_type attribute status [target_code] attribute minimum_age [target_code] attribute maximum_age [target_code] attribute sex [target_code] attribute insertion [media_type] attribute advertising_price [media_name] measure percentage_of_region [target_code] measure number_of_inhabitants [region] measure media_exposure [media_name,target_code,quarter] hierarchy time quarter->year hierarchy media_type media_name->media_type dummy measure gets [region,media_name] Media_type Media_type media_type {id} media_type {id} insertion insertion Region Region region region {id} {id} 1 1 1 1 number number _of_ _of_ inhabitants inhabitants { { meas meas } } 1..* 1..* 1..* gets * * * * * * Media Media exposure exposure media_ media_ name name {id} {id} media_ media_ exposure exposure { { meas meas } } advertising advertising _ _ price price * * * * * * * * Target Target Year Year Transformation Tcl4 target target _code {id} _code {id} * * year year {id} {id} status status minimum_age minimum_age Transformation Tcl5 1 maximum_age maximum_age 1..* sex sex Transformation Tcl6 percentage percentage _of_ _of_ region region { { meas meas } } Quarter Quarter quarter {id} quarter {id} Unified multidimensional model Enriched/transformed UML model

20 Physical design 3. Design method For each type of target system: metamodel + associated transformations (elaborates on/completes OMG’s Common Warehouse Metamodel). Example transformation (ROLAP star) : Transformation Tls4: Every hierarchy D1->D2->…->Dn of the logical model is mapped by considering all the sub-hierarchies Dj->Dj+1…->Dn where 1<=j<n and Dj dimensions at least one measure. A sub-hierarchy Dj->Dj+1…->Dn is mapped in the physical model by defining in the dimension table identified by Dj a column corresponding to each of the Di (where j<i<=n).

21 Conclusion Data warehouse design method based on UML:
Spans the conceptual/logical/physical levels Each step: metamodels + associated transformations Unified multidimensional metamodel at the logical level (pivot metamodel). Tool support (prototype developed). Future works: Complete/specialise set of transformations Further experimentation Reverse engineering.


Download ppt "From UML to ROLAP multidimensional databases using a pivot model"

Similar presentations


Ads by Google