Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Bayer, DWH, Ch.4, SS 20021 Chapter 4: Dimensions, Hierarchies, Operations, Modeling.

Similar presentations


Presentation on theme: "Prof. Bayer, DWH, Ch.4, SS 20021 Chapter 4: Dimensions, Hierarchies, Operations, Modeling."— Presentation transcript:

1 Prof. Bayer, DWH, Ch.4, SS 20021 Chapter 4: Dimensions, Hierarchies, Operations, Modeling

2 Prof. Bayer, DWH, Ch.4, SS 20022 Chapter 4.1 Hierarchical Dimensions Def: Hierarchical Dimensions are composite keys with an order on the key attributes. Prefixes are allowed as keys. Ex: dimension Time = ( Year, Month, Day) legal keys are: (Year)or (Year, Month)or (Year, Month, Day) Def: Basic facts are values in cells with full foreign keys

3 Prof. Bayer, DWH, Ch.4, SS 20023 Aggregations, Summaries Def: Aggregations are facts in cells with partial keys. These facts are derived by aggregation functions. In a cube with derived facts the aggregation function must be specified. Ex: Sales on a monthly basis Sales (Year, Month) =  Sales (Year, Month, Days) Aggregation Functions: count, sum, avg, min, max,...

4 Prof. Bayer, DWH, Ch.4, SS 20024 Note on Aggregations Aggregations may be stored explicitely in the cube, but then they should be secured by integrity constraints Aggregations may be virtual and must be computed on demand when needed i.e., classical tradeoff between storage space, performance, flexibility

5 Prof. Bayer, DWH, Ch.4, SS 20025 Relational Modeling Expand and complete partial key by ALL (Year, Month, ALL) (ALL, Month, ALL) (ALL, ALL, ALL) to obtain simple and complete relational keys via special symbol ALL Question: SQL to compute complete cube with all aggregations from base-cube?

6 Prof. Bayer, DWH, Ch.4, SS 20026 Hierarchy Example

7 Prof. Bayer, DWH, Ch.4, SS 20027 Chapter 4.2: OLAP Operations Def: Roll-up computes higher aggregations from lower aggregations or base facts according to hierarchies Ex: for base facts (Year, Month, Day) there are 3 hierarchical roll-up functions: Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Roll-up (ALL, ALL, ALL) which are supported in general (canonical roll-ups)

8 Prof. Bayer, DWH, Ch.4, SS 20028 Additional Roll-ups: (ALL, Month, ALL) etc. therefore 2 3 -1 aggregations or in general 2 m -1 aggregations for m hierarchy levels Note: see later chapters for the support of arbitrary aggregations Note: for m dimensions with h 1, h 2,...h m hierarchy levels there are different aggregations for a given aggregation function.

9 Prof. Bayer, DWH, Ch.4, SS 20029 Size of base cube 2-dim example Dim1: (4, 5)= cardinality of the dimension levels Dim2: (6, 7, 2) (4 5) ( 6 7 2)1680 = Size of base cube 42 84 20

10 Prof. Bayer, DWH, Ch.4, SS 200210 4-672336 4567-840 --67284 4-67-168 456--120 --67-42 4-6--24 45---20 --6--6 4----4 -----1 Number of cells per aggregation function 1645 Size of hierarchically aggregated Cube

11 Prof. Bayer, DWH, Ch.4, SS 200211 Size of completely aggregated cube 45672 00000 0| |0 || 00|00 0| |0 || 0|000 ||| |0000 ||||| 1 2 7 14 24 24 x 6 =144 168 5 x 168 = 840 840 + 168 6 x 168 1008 4 x 1008 = 4032 5 x 1008 = 4032 + 1008 = 5040 : :

12 Prof. Bayer, DWH, Ch.4, SS 200212 Computation with binary Tree 2 2 2 2 2 2 22 1 1 1111 1 1 1 111 77 7 7 6 611 51 4 840120 2016824284 140 12020 24 4 204 1680 840 48285624481683362040140 280 120 240

13 Prof. Bayer, DWH, Ch.4, SS 200213 Lemma: Given a data cube with m dimensions with h 1,..., h m hierarchy levels resp. Let the hierarchy levels of dimension i have Then the base cube has and the cube with all aggregations has Size of the Cube

14 Prof. Bayer, DWH, Ch.4, SS 200214 Size of the Cube (2) The aggregated cube is larger than the base cube by the factor

15 Prof. Bayer, DWH, Ch.4, SS 200215 Size of the hierarchically aggregated Cube For a hierarchy i with h i levels and there are hierarchical aggregation possibilities, i.e. Lemma: A hierarchically completely aggregated data cube has

16 Prof. Bayer, DWH, Ch.4, SS 200216 Ex: (4 5) (6 7 2) size of the hierarchically aggregated cube plus base cube = (1 + 4 + 20) * (1 + 6 + 42 + 84) = 25 * 133 = 3325 Ex: (4 5) (6 7 2)( 8 3) size of base cube:40,320 hierarchically aggregated cube plus base: = (1 + 4 + 20) * (1 + 6 + 42 + 84) * (1 + 8 + 24) = 3325 * 33 = 109,725

17 Prof. Bayer, DWH, Ch.4, SS 200217 Ex: (4 5) (6 7 2)( 8 3)(5 9) size of base cube:1 814,400 hierarchically aggregated cube plus base: = 109,725 * (1 + 5 + 45) = 5 595,975

18 Prof. Bayer, DWH, Ch.4, SS 200218 Additional comments on aggregations 1. In addition to the size of the complete cube there is a factor of 5 for the various aggregation functions, e.g. sum, avg, min, max, count,... 2. So far we did not consider general restrictions, e.g. „all Saturdays in March“ or „vacation months July and August“, which cross bounds of hierarchy levels Interactive query formulation results in an unlimited number of aggregations Optimization: restrictions corresponding to hierarchy levels shoud be pushed down, since they lead to query boxes

19 Prof. Bayer, DWH, Ch.4, SS 200219 Note: See later chapters for multidimensional indexes and MHC techniques and optimization of ROLAP-algebra to support hierarchical canonical aggregations like Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Roll-up (ALL, ALL, ALL) but not Roll-up ( ALL, Month, ALL)

20 Prof. Bayer, DWH, Ch.4, SS 200220 Optimization Problem Non-hierarchical aggregation, e.g. March for all years decompose into union of several restrictions, e.g.  Sales (Year, Month, Day) where Month = March and (Year = 1996 or Year = 1997 or Year = 1998) see later for translation into ROLAP expression and transformations for optimization

21 Prof. Bayer, DWH, Ch.4, SS 200221 Multiple Hierarchies e.g. the time hierarchy Aggregation for month e.g. by covering QB of weeks and postfiltering

22 Prof. Bayer, DWH, Ch.4, SS 200222 Navigation Operations Drill Down: first show single result for aggregated value, e.g. sales per day, then show: hourly values for days with very high or very low sales in order to plan working hours for sales people better Other Examples: daily sales during Christmas season vacation bookings for skiing on fasching

23 Prof. Bayer, DWH, Ch.4, SS 200223 Roll-up: Compute Aggregations

24 Prof. Bayer, DWH, Ch.4, SS 200224 Slicing Selection of a smaller data cube or even reduction of a multidimensional datacube to fewer dimensions by a point restriction in some dimension (becomes pivot element)

25 Prof. Bayer, DWH, Ch.4, SS 200225 Dicing (würfeln) rotate result, to show another view, e.g. exchanging rows and columns Slice management precomputing and caching of several slices for later or special use, e.g. for a special sales person

26 Prof. Bayer, DWH, Ch.4, SS 200226 Chapter 4.3 Modeling Methodology Purpose: analysis of business processes, characteristic facts (Kennzahlen) for managers to support decisions (DSS) Steps of Decision Process: 1. Which business processes to model and analyze? 2. What are the measures, where do they come from? 3. Which degree of details, e.g. minutes like in SAP? Which precision is required for OLAP? 4. Common properties of measures to determine dimensions? Brand, Time, geogr. Region, Productgroup? Dependencies between levels of hierarchies?

27 Prof. Bayer, DWH, Ch.4, SS 200227 5. Attributes of dimensions, e.g. of products screen size of TV & computers cc and PS for cars focal length for camera Problem: how common are properties to dimensions? Non common properties cannot be modeled by levels of dimensions, are called features at GfK (up to 50), they are numbered, their meaning dependent on a specific dimension element, e.g. TV: screen size color audio system Car:transmission ccPS#cyl...

28 Prof. Bayer, DWH, Ch.4, SS 200228 6. Constant or changing attributes of dimensions? E.g. New models of car makers new powersource: electrical, hydrogen, solar attributes are rather stable, but still should be planned ahead! (mergers like Daimler-Crysler) 7. Sparsity: one hypercube or several, i.e. multicube model? Influences storage requirements, query formulation and performance, cannot be hidden easily from user, maybe by views?

29 Prof. Bayer, DWH, Ch.4, SS 200229 8. Caching and management of aggregates? Time Optimal Number of aggregates

30 Prof. Bayer, DWH, Ch.4, SS 200230 Chapter 4.4 Comparison of OLAP Architectures 1.MOLAP: Multidimensional OLAP 2.ROLAP: Relational OLAP 3. HOLAP: Hybrid OLAP

31 Prof. Bayer, DWH, Ch.4, SS 200231 MOLAP Architecture

32 Prof. Bayer, DWH, Ch.4, SS 200232 MDDBMS in ANSI-X3-Sparc

33 Prof. Bayer, DWH, Ch.4, SS 200233 Logical components of a MDDBMS

34 Prof. Bayer, DWH, Ch.4, SS 200234 ROLAP Architecture

35 Prof. Bayer, DWH, Ch.4, SS 200235 HOLAP Architecture

36 Prof. Bayer, DWH, Ch.4, SS 200236 Reasons for MOLAP performance write access Data Marts functional power Reasons for ROLAP scalability flexible precomputations, partial aggregates parallelism DB-mamagement and ACID


Download ppt "Prof. Bayer, DWH, Ch.4, SS 20021 Chapter 4: Dimensions, Hierarchies, Operations, Modeling."

Similar presentations


Ads by Google