Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapt. 7 Multidimensional Hierarchical Clustering

Similar presentations


Presentation on theme: "Chapt. 7 Multidimensional Hierarchical Clustering"— Presentation transcript:

1 Chapt. 7 Multidimensional Hierarchical Clustering
Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12) TIME Region (8) Nation (7) Trade Type (2) Business Type (7) CUSTOMER Type (5) Brand (8) Category (19) Container (10) PRODUCT Sales Organization (5) Distribution Channel (3) DISTRIBUTION All Products All Distributions All Customer All Time Prof. Bayer, DWH, Ch.7, SS2000

2 ... 36 PRODKEY CUSTKEY DISTKEY TIMEKEY SALES DISTCOST PRODUCT
2180 rows TYPE BRAND CATEGORY CONTAINER ... CUSTOMER 7064 rows REGION NATION TRADE-TYPE BUSINESS-TYPE DISTRIBUTION 12 rows SALESORG CHANNEL TIME 36 YEAR MONTH FACT 26M rows (b) Prof. Bayer, DWH, Ch.7, SS2000

3 Size of completely aggregated Cube
(6*9*20*11)*(9*8*3*8)*(6*4)*(4*13) = (5*8*19*10)*(8*7*2*7)*(5*3)*(3*12) 4*6*6*9*11* = = larger than base cube 5*5*7*7* Base Cube has cells * 4 B ~ 9 GB Number of available facts: 26 million Prof. Bayer, DWH, Ch.7, SS2000

4 Sparsity: 26*106 -------------- = 0,0116 2,245* 109
= 0,0116 2,245* 109 = % sparsity Prof. Bayer, DWH, Ch.7, SS2000

5 Hierarchically aggregated Cube (1+5+40+760+7600) = 8406
( ) = 8406 ( ) = 961 (1+5+15) = (1+3+24) = P = Size of base cube Number of aggregate cells ==> Juice and More database has 96 times more hierarchically aggregated cells than occupied base cells! Prof. Bayer, DWH, Ch.7, SS2000

6 In addition: grouping, computation of aggregates, sorting of results.
Star-Joins Restrictions on several dimension tables, which are then joined with fact table In addition: grouping, computation of aggregates, sorting of results. Example: Select <MEASURE AGGREGATION> From Fact F, Customer C, DISTRIBUTION D, Product P, Time T Where F. ProdKey = P. AND CustKey = C. F.TIMEKEY = T.TIMEKEY AND F.DISTKEY = D.DISTKEY AND <CUSTOMER RESTRICTION> AND <DISTRIBUTION RESTRICTION> AND <PRODUCT RESTRICTION> AND <TIME RESTRICTION> Prof. Bayer, DWH, Ch.7, SS2000

7 <MEASURE AGGREGATION>
Select <MEASURE AGGREGATION> From Fact F Where F. ProdKey BETWEEN Pkey1 AND Pkey2 AND DistKey BETWEEN Dkey1 AND Dkey2 AND CustKey BETWEEN Ckey1 AND Ckey2 AND TimeKey BETWEEN Tkey1 AND Tkey2 Prof. Bayer, DWH, Ch.7, SS2000

8 How to compute star-joins efficiently?
Key Question: How to compute star-joins efficiently? Secondary indexes on foreign keys of fact table (standard B-trees), see chapter 5 for details - intersect result lists retrieve tuples from fact table randomly Bitmaps Prof. Bayer, DWH, Ch.7, SS2000

9 Bitmap Index Intersection
bitmap for organization 34 % of = „TM“ tuples bitmap for region 32 % of = „ Asia tuples result of bitmap intersection 10 % of tuples 80 % of accessed disk pages Page 1 Page 2 Page 3 Page 4 Page 5 pages (shaded) Bitmap Index Intersection Prof. Bayer, DWH, Ch.7, SS2000

10 Problem: for small result sets of a few %, almost all pages of the facts table must be fetched from disk, if the hits in the result set are not clustered on disk. Ex: with 8 KB pages 20 to 400 tuples per page, i.e. at 0.25% to 5% hits in the result almost all pages must be fetched. At least tuple clustering, preferably page clustering, are desirable, but how?? Goal: Code hierarchies in such a way, that for star-joins with the Fact table we have to join only with a query box on the Fact table Prof. Bayer, DWH, Ch.7, SS2000

11 Example Hierarchy in Member Set Representation
Basic Idea for Multidimensional Clustering All 1L} 0.5L; Juice Apple 1L; OJ 0.7L; 0.33L; {OJ 1 = m All Products AppleJuice Orange Juice Apple Juice 1L} OJ 0.7L; 0.33L; {OJ 1 = m 1L} Juice Apple 0.5L { 1 2 = m Product Category 1 0,33L 1 0,7L 2 1L 0,5L 1 1L 0.33L} {OJ 2 1 = m 0.7L} {OJ 2 = m 1L} OJ { 2 3 = m 0.5L} {A-Juice 2 4 = m 1L} {A-Juice 2 5 = m Level Label Member Ordinal (e.g.,1) Member Label (e.g., 0.7L) Legend: Example Hierarchy in Member Set Representation Prof. Bayer, DWH, Ch.7, SS2000

12 Dimension D consists of Value Set V = [[ v1, v2, ... vn ]]
Hierarchy H of height h consisting of h+1 hierarchy levels H = [[L0 , L1 ,..., Lh ]] Level Li is a set of sets = [[m1i, ..., mji ]] with mki elof V mki get names, e.g. „Orange Juice“ as label(m11), in general label(mki) Constraint: every mli+1 must be a subset of some mki Prof. Bayer, DWH, Ch.7, SS2000

13 Hierarchic Relationships
The children of mki are all those sets mli+1 of the lower level i+1 with the property: mli+1 elof??? mki , formally: children(mki ) := [[mli+1 subsetof??? Li+1 : mli+1 subof??? mki ]] parent(mki ) := [[mli-1 subsetof??? Li-1 : mli-1 superof??? mki ]] Principle: the children of m are numbered by the bijective function ordm starting at 1 or 0 Prof. Bayer, DWH, Ch.7, SS2000

14 Enumeration and Surrogate Functions Let A be an enumeration type
A = [[ a0, a1, ... ak ]] f : A --> (0, 1 ,..., k ) defined as f (ai ) = i then i is called the surrogate of ai Prof. Bayer, DWH, Ch.7, SS2000

15 Hierarchies and composite Surrogates
Basic Idea: concatenate the surogates of successive hierarchy levels (compound surrogates cs) Note: the root ALL of the hierarchy is not encoded Def: compound surrogate cs for hierarchy H ordm : children (m) --> [[0, 1, ..., |children(m)| -1]] cs (H, mi) := ord father (mi) (mi) if i=1 :=cs (H, father ( mi)) comp??? ord father (mi) (mi) otherwise Prof. Bayer, DWH, Ch.7, SS2000

16 Example: REGION f(REGION) South Europe Middle Europe 1 Northern Europe
Middle Europe 1 Northern Europe 2 Western Europe 3 North America 4 Latin America 5 Asia 6 Australia 7 (a) Prof. Bayer, DWH, Ch.7, SS2000

17 Surrogates for Region and the entire Costumer Hierarchy
CUSTOMER South Europe North America Asia Retail Wholesale Kana ´s Sushi Bar Joe ‘s Sports Bar ... 4 6 2 1 USA Canada Australia 7 Surrogates for Region and the entire Costumer Hierarchy Prof. Bayer, DWH, Ch.7, SS2000

18 North America --> USA --> Retail --> Bar
Example: the path North America --> USA --> Retail --> Bar has the compound surrogate 4?1?1?2 Next Idea: for every hierarchy level determine the higest branching degree (plus a safety margin for future extensions) and code by fixed number of bits. surrogates (H,i) := max [[ cardinality (children (H,m)) : m in??? level (H, i-1) ]] Prof. Bayer, DWH, Ch.7, SS2000

19 handgeschriebene Seite 6.6 ??? Problem mit doppelten Indices?
Prof. Bayer, DWH, Ch.7, SS2000

20 Properties of MHC Encoding very compact coding of fixed length
lexicographic order of composite keys remains, i.e. isomorphic to integer ordering point restrictions on arbitrary hierarchy levels lead to interval restrictions on the compound surrogates Prof. Bayer, DWH, Ch.7, SS2000

21 Example: path to USA is: North America --> USA 4 = 1002 1 = 0012
4 = = 0012 leads to range on cs: to and to the decimal range: to 543 or [528 : 543] ==> star join with restriction North America.USA leads to an interval restriction on the fact table ==> point restrictions on arbitrary hierarchy levels of several dimensions lead to Query Boxes on the fact table. Prof. Bayer, DWH, Ch.7, SS2000

22 Complex Hierarchies time with months and weeks, both restrictions lead to intervals on the level of days Example of Fig. 4-4 proposal for multiple hierarchies: choose the most useful (depending on the query profile) or consider multiple hierarchies as several independent hierarchies. Caution, this increases the number of dimensions !!! Time variant hierarchies: extend by time interval of validity , see Example Fig. 4-5, Prof. Bayer, DWH, Ch.7, SS2000

23 Complex Hierarchy Graphs
REGION YEAR NATION CUSTOMER TYPE MONTH WEEK TRADE TYPE CUSTOMER SIZE DAY CUSTOMER (b) (a) Complex Hierarchy Graphs Prof. Bayer, DWH, Ch.7, SS2000

24 Change of a hierarchy over the time
CUSTOMER South Europe North America ... Canada USA Retail Wholesale Bar Restaurant Year <= 1997 Year > 1997 Joe ‘s Sports Bar Change of a hierarchy over the time Prof. Bayer, DWH, Ch.7, SS2000

25 Orange Juice Asia Prof. Bayer, DWH, Ch.7, SS2000

26 Processing a query box in sort order with the Tetris algorithm
Apple Juice Asia Processing a query box in sort order with the Tetris algorithm Prof. Bayer, DWH, Ch.7, SS2000


Download ppt "Chapt. 7 Multidimensional Hierarchical Clustering"

Similar presentations


Ads by Google