Presentation is loading. Please wait.

Presentation is loading. Please wait.

Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002.

Similar presentations


Presentation on theme: "Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002."— Presentation transcript:

1 Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002

2 2 Topics Overview of OLAP Exponentiality in View Selection Our Polynomial Greedy Algorithm (PGA) Test Results Conclusions Current Work

3 3 Example Star Schema Sell CustID DateID BindID Cost Fact Table DateID Month Quarter Year Calendar CustID Name City State/Prov Customer Bind Style BindID Desc

4 4 Star Schema Viewed with Data Fact Table Bind Style BindID PB HC Desc Paper Back Hard Cover DateIDMonthQuarterYear 1/1/98Jan1 1998 1/2/98Jan1 1998 12/31/00Dec42000  Customer CustIDNameCityState/Prov 00001U of MAnn ArborMI 00002Smith & Co.TorontoOnt  SellCustIDDateIDBindIDCost $6000000212/31/00PB$500 $1300002221/1/99HC$1100  Many Rows Calendar

5 5 Eight Dimensions of Book Database AttributeHierarchy Levels Trim Width4 Trim Length4 Pages4 Quantity4 Stock Width4 Stock Length4 Bind Style4 Press4

6 6 Combinatorial Explosion Possible views =  ℓ i, where d = |dimensions| ℓ i = |levels| in dimension i Book database example –2 dimensions, 4 2 = 16 views –4 dimensions, 4 4 = 256 views –6 dimensions, 4 6 = 4,096 views –8 dimensions, 4 8 = 65,536 views i = 1 d

7 7 Recap Materialized views quicken query responses Disk space limits view materialization Update window is a constraint Solution: Select strategic views

8 8 Our OLAP Optimization Approach Fact Table Update Users Sample Data Estimated View Size Strategic Views Current Views Incremental Data Queries Quick Responses Completed Work Current Work Initial Data Estimate Request View Size Estimation View Selection View Maintenance Query Optimization

9 9 View Selection: Example of Hypercube Lattice [HRU96] p = Part s = Supplier c = Customer {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

10 10 Example of HRU Algorithm [HRU96] 5.2M x 4 = 20.8M 0 x 4 = 0 5.99M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 {p, s} {c, s} {c, p} {s} {p} {c} {} Iteration 1 Benefits of Possible Materialization Choices p = Part s = Supplier c = Customer {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

11 11 0 x 4 = 0 0.79M x 2 = 1.58M 0.6M x 2 = 1.2M 5.9M x 2 = 11.8M 0.8M - 1 Iteration 2 Benefits of Possible Materialization Choices p = Part s = Supplier c = Customer Example of HRU 5.2M x 4 = 20.8M 0 x 4 = 0 5.99M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 {p, s} {c, s} {c, p} {s} {p} {c} {} Iteration 1 {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

12 12 Exponentiality in HRU O(kn 2 ) time, where k = |views to select|, n = |possible views| n = 2 d in non-hierarchical database, where d = |dimensions| HRU algorithm is O(k2 2d ) time Two sources of exponentiality –Each possible view is evaluated –Each view evaluation considers the effect of materialization on every descendent

13 13 Polynomial Greedy Algorithm (PGA) Nominate smallest child view NominationSelection For each candidate Select fact table [more candidates] [else] [termination condition met] [else] Evaluate benefit Select view greedily Start new path [path ended] [continuing path]

14 14 p = Part s = Supplier c = Customer Example of PGA [NT02] {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

15 15 Example of PGA {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1 p = Part s = Supplier c = Customer Nomination Candidates {p, s} {s} {}

16 16 Example of PGA p = Part s = Supplier c = Customer Candidates {p, s} {s} {} Iteration 1 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 NominationSelection {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

17 17 Example of PGA p = Part s = Supplier c = Customer Candidates {p, s} {s} {} Iteration 1 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Candidates {c, s} {s} {c} {} NominationSelectionNomination {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

18 18 Example of PGA p = Part s = Supplier c = Customer Candidates {p, s} {s} {} Iteration 1 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Candidates 0 x 2 = 0 0.79M x 2 = 1.58M 5.9M x 2 = 11.8M 6M - 1 {c, s} {s} {c} {} Iteration 2 NominationSelectionNominationSelection {c, p, s} 6M {p, s} 0.8M{c, s} 6M{c, p} 6M {s} 0.01M{p} 0.2M{c} 0.1M {} 1

19 19 Nomination Complexity Maximum swatch width is d. Maximum path length is d. Finding one path is O(d 2 ) time Our strategy nominates a path each time a view is selected, complexity is O(d 2 k) time

20 20 Evaluating Views in PGA Polynomial time evaluation requires approximating materialization benefits Account for smallest ancestor Account for materialized view with largest overlap in descendants Complexity of our algorithm is O(d 2 k 2 )

21 21 Complexities d = | dimensions | g = geometric mean of the number of hierarchical levels per dimension k = | views selected for materialization | ℓ = | layers in lattice | Database TypeHRUPGA Non-HierarchicalO(k2 2d ) timeO(d 2 k 2 ) time O(d 2 k) space HierarchicalO(kg 2d ) timeO(dk 2 ℓ) time O(dkℓ) space

22 22 Near Optimal Selection d=2, ℓ = 4 Materialization Costs (rows) Query Costs (rows)

23 23 Query Costs at Four Dimensions Query Costs (thousands of rows) Materialization Costs (thousands of rows) HRU PGA

24 24 Query Costs at Six Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows) HRU PGA

25 25 Query Costs at Eight Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows) HRU PGA

26 26 Performance at Four Dimensions Materialization Costs (thousands of rows) Processing Time (seconds) HRU PGA

27 27 Performance at Six Dimensions HRU PGA Materialization Costs (thousands of rows) Processing Time (minutes)

28 28 Performance at Eight Dimensions Materialization Costs (thousands of rows) Processing Time (minutes) HRU PGA

29 29 Conclusions PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity PGA extends the usefulness of OLAP systems into higher dimensionality

30 30 Current Work Fact Table Update Users Sample Data Estimated View Size Strategic Views Current Views Incremental Data Queries Quick Responses Completed Work Current Work Initial Data Estimate Request View Size Estimation View Selection View Maintenance Query Optimization

31 31 Current Work Design alternative data structures for materialized views in OLAP Test impact of new data structures on update and query costs. Integrate our work into an OLAP system

32 32 References [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada. [NT01]T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada. [NT02]T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html.


Download ppt "Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002."

Similar presentations


Ads by Google