Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)

Similar presentations


Presentation on theme: "Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)"— Presentation transcript:

1 Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12) TIME Region (8) Nation (7) TradeType (2) BusinessType (7) CUSTOMER Type (5) Brand (8) Category (19) Container (10) PRODUCT Sales Organization (5) Distribution Channel (3) DISTRIBUTION All ProductsAll DistributionsAll CustomerAll Time

2 Prof. Bayer, DWH, Ch.7, SS20022 (b)

3 Prof. Bayer, DWH, Ch.7, SS20023 Size of completely aggregated Cube (6*9*20*11)*(9*8*3*8)*(6*4)*(4*13) ------------------------------------------------ = (5*8*19*10)*(8*7*2*7)*(5*3)*(3*12) 4*6*6*9*11*13 185.328 -------------------- = ----------- = 7.96 larger than base cube 5*5*7*7*19 23.275 Base Cube has 2.245.024.000 cells * 4 B ~ 9 GB Number of available facts:26 million

4 Prof. Bayer, DWH, Ch.7, SS20024 Sparsity: 26*10 6 --------------=0,0116 2,245* 10 9 100 - 1.16 =98.84 % sparsity

5 Prof. Bayer, DWH, Ch.7, SS20025 Hierarchically aggregated Cube (1+5+40+760+7600)=8406 (1+8+56+112+784)= 961 (1+5+15)= 21 (1+3+24)= 28  =4.749.961.608 Size of base cube2.145.024.000 Number of aggregate cells2.504.937.608 ==> Juice and More database has 96 times more hierarchically aggregated cells than occupied base cells!

6 Prof. Bayer, DWH, Ch.7, SS20026 Star-Joins Restrictions on several dimension tables, which are then joined with fact table In addition: grouping, computation of aggregates, sorting of results. Example: Select FromFact F, Customer C, DISTRIBUTION D, Product P, Time T WhereF. ProdKey = P. ProdKey AND F. CustKey = C. CustKey AND F.TIMEKEY = T.TIMEKEY AND F.DISTKEY = D.DISTKEY AND AND

7 Prof. Bayer, DWH, Ch.7, SS20027 Select FromFact F WhereF. ProdKey BETWEEN Pkey1 AND Pkey2 AND F. DistKey BETWEEN Dkey1 AND Dkey2 AND F. CustKey BETWEEN Ckey1 AND Ckey2 AND F. TimeKey BETWEEN Tkey1 AND Tkey2

8 Prof. Bayer, DWH, Ch.7, SS20028 Key Question: How to compute star-joins efficiently? Secondary indexes on foreign keys of fact table (standard B-trees), see chapter 5 for details - intersect result lists -retrieve tuples from fact table randomly Bitmaps

9 Prof. Bayer, DWH, Ch.7, SS20029 Bitmap Index Intersection bitmap for organization = „TM“ bitmap for region = „ Asia “ 1.....1.11 1.1...1.1. 1.1...1.1....1.1......1.1...1. 11.1...... 1.11.....1.1.1..1... 1.1.1......1..1.1... 1......... 1.1.............1.................1..... Page 1 Page 2 Page 3 Page 4 Page 5 result of bitmap intersection accessed disk pages (shaded) 34 % of tuples 32 % of tuples 10 % of tuples 80 % of pages

10 Prof. Bayer, DWH, Ch.7, SS200210 Problem: for small result sets of a few %, almost all pages of the facts table must be fetched from disk, if the hits in the result set are not clustered on disk. Ex: with 8 KB pages 20 to 400 tuples per page, i.e. at 0.25% to 5% hits in the result almost all pages must be fetched. At least tuple clustering, preferably page clustering, are desirable, but how?? Goal: Code hierarchies in such a way, that for star- joins with the Fact table we have to join only with a query box on the Fact table

11 Prof. Bayer, DWH, Ch.7, SS200211 Basic Idea for Multidimensional Clustering 1L} 0.5L; Juice Apple 1L; OJ 0.7L ; OJ 0.33L; {OJ 0 1  m 1L} OJ 0.7L; OJ 0.33L; {OJ 1 1  m 0.5L} {A-Juice 2 4  m 1L} Juice Apple 0.5L Juice Apple { 1 2  m 0.33L} {OJ 2 1  m 0.7L} {OJ 2 2  m1L} OJ { 2 3  m1L} {A-Juice 2 5  m Orange JuiceApple Juice 0,33L 0,7L1L 0,5L Product Category All Products All 01 1 0 0 2 Level Label Member Ordinal (e.g.,1) Member Label (e.g., 0.7L) Legend: Example Hierarchy in Member Set Representation AppleJuice 1 1L

12 Prof. Bayer, DWH, Ch.7, SS200212 Dimension D consists of Value Set V= [[ v 1, v 2,... v n ]] Hierarchy H of height h consisting of h+1 hierarchy levelsH = [[L 0, L 1,..., L h ]] Level L i is a set of sets = [[m 1 i,..., m j i ]] with m k i  V m k i get names, e.g. „Orange Juice“ as label(m 1 1 ), in general label(m k i ) Constraint: every m l i+1 must be a subset of some m k i

13 Prof. Bayer, DWH, Ch.7, SS200213 Hierarchic Relationships The children of m k i are all those sets m l i+1 of the lower level i+1 with the property: m l i+1  m k i, formally: children(m k i ):= [[m l i+1  L i+1 : m l i+1  m k i ]] parent(m k i ):= [[m l i-1  L i-1 : m l i-1  m k i ]] Principle: the children of m are numbered by the bijective function ord m starting at 1 or 0

14 Prof. Bayer, DWH, Ch.7, SS200214 Hierarchic Relationships The children of m k i are all those sets m l i+1 of the lower level i+1 with the property: m l i+1  m k i, formally: children(m k i ):= [[m l i+1  L i+1 : m l i+1  m k i ]] parent(m k i ):= [[m l i-1  L i-1 : m l i-1  m k i ]] Principle: the children of m are numbered by the bijective function ord m starting at 1 or 0

15 Prof. Bayer, DWH, Ch.7, SS200215 Enumeration and Surrogate Functions Let A be an enumeration type A = [[ a 0, a 1,... a k ]] f : A --> (0, 1,..., k )defined as f (a i ) = i then i is called the surrogate of a i

16 Prof. Bayer, DWH, Ch.7, SS200216 Hierarchies and composite Surrogates Basic Idea: concatenate the surogates of successive hierarchy levels (compound surrogates cs) Note: the root ALL of the hierarchy is not encoded Def: compound surrogate cs for hierarchy H ord m : children (m) --> [[0, 1,..., |children(m)| -1]] cs (H, m i ) := ord father (mi) (m i ) if i=1 :=cs (H, father ( m i ))  ord father (mi) (m i ) otherwise

17 Prof. Bayer, DWH, Ch.7, SS200217 Example: REGION f(REGION) South Europe0 Middle Europe1 Northern Europe2 Western Europe 3 North America4 Latin America5 Asia6 Australia7 (a)

18 Prof. Bayer, DWH, Ch.7, SS200218 0 CUSTOMER South EuropeNorth AmericaAsia RetailWholesale Kana ´s´sSushiBar Joe‘s Sports Bar... Bar 46 2 1 10 Retail USA Canada 10... Australia 7 Wholesale 0 Surrogates for Region and the entire Costumer Hierarchy

19 Prof. Bayer, DWH, Ch.7, SS200219 Example: the path North America --> USA --> Retail --> Bar has the compound surrogate 4  1  1  2 Next Idea: for every hierarchy level determine the higest branching degree (plus a safety margin for future extensions) and code by fixed number of bits. surrogates (H,i) := max [[ cardinality (children (H,m)) : m  level (H, i-1) ]]

20 Prof. Bayer, DWH, Ch.7, SS200220 let l i :=  log 2 surrogates (H,i)  then l i bits are needed for the surrogates of level i let  be a path  = m 0  m 1  m 2 ...  m h to a leaf m h of hierarchy H:

21 Prof. Bayer, DWH, Ch.7, SS200221 cs (H,  ) = cs (H,m h ) :=:=...+ + +

22 Prof. Bayer, DWH, Ch.7, SS200222 Example: cs (H, Bar) = 100 001 1 010 = 538 l 1 =3 l 2 =3 l 3 =1 l 4 =3 number of bits needed at certain level

23 Prof. Bayer, DWH, Ch.7, SS200223 Properties of MHC Encoding very compact coding of fixed length lexicographic order of composite keys remains, i.e. isomorphic to integer ordering point restrictions on arbitrary hierarchy levels lead to interval restrictions on the compound surrogates

24 Prof. Bayer, DWH, Ch.7, SS200224 Example: path to USA is: North America --> USA 4 = 100 2 1 = 001 2 leads to range on cs: 100 001 0 000 2 to 100 001 1 111 2 and to the decimal range: 528 to543 or [528 : 543] ==> star join with restriction North America.USA leads to an interval restriction on the fact table ==> point restrictions on arbitrary hierarchy levels of several dimensions lead to Query Boxes on the fact table.

25 Prof. Bayer, DWH, Ch.7, SS200225 Complex Hierarchies time with months and weeks, both restrictions lead to intervals on the level of days Example of Fig. 4-4 proposal for multiple hierarchies: choose the most useful (depending on the query profile) or consider multiple hierarchies as several independent hierarchies. Caution, this increases the number of dimensions !!! Time variant hierarchies: extend by time interval of validity, see Example Fig. 4-5,

26 Prof. Bayer, DWH, Ch.7, SS200226 (a) (b) YEAR MONTHWEEK DAY REGION NATION TRADE TYPE CUSTOMER TYPE CUSTOMER SIZE CUSTOMER Fig. 4-4 Complex Hierarchy Graphs

27 Prof. Bayer, DWH, Ch.7, SS200227 CUSTOMER South EuropeNorth America... USACanada RetailWholesale BarRestaurant Joe‘s Sports Bar Year<= 1997Year> 1997 Fig. 4-5 Change of a hierarchy over the time

28 Prof. Bayer, DWH, Ch.7, SS200228 Orange Juice Asia

29 Prof. Bayer, DWH, Ch.7, SS200229 Apple Juice Asia Processing a query box in sort order with the Tetris algorithm


Download ppt "Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)"

Similar presentations


Ads by Google