Presentation is loading. Please wait.

Presentation is loading. Please wait.

Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006.

Similar presentations


Presentation on theme: "Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006."— Presentation transcript:

1 Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006

2 outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.The MDred-lattice.

3 Multidimensional Database A multidimensional database (MDDB) is a data repository that provides an integrated environment for decision support queries that require complex aggregations on huge amounts of historical data. An MDDB is a relational data warehouse where the information is organized following the so-called star- model.

4 A Practical Example Consider the MDDB for a large store chain, characterized by a large number of stores, each of which is a supermarket selling a wide variety of different products. We can identify the following dimensions: Product, which can be characterized by Product_id, Department, Manufactured_date and Price. Store, which can be characterized by Store_id, store address (which can be decomposed into City, State, and Zip). Time, which can be characterized by Timestamp, Date, Week, Month, Quarter, Year.

5 The schema of the example Product Product_id Department Manufactured_ date Price Time Timestamp Date Week Month Quarter Year Store Store_id City State Zip Sales Transaction_id Timestamp Product_id Store_id

6 Example queries Query 1: the total sales for year 2003. SELECT SUM (Price) FROM Sales, Time, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Timestamp = Time.Timestamp AND Time.Year = ‘2003’ Query 2: the total sales for store at Ohio. SELECT SUM (Price) FROM Sales, Store, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Store_id = Store.Store_id AND Store.State = ‘Ohio’

7 How many views an MDDB can have? It depends on the number of attributes of the dimensions of the MDDB without hierarchies on the dimensional tables, the number is In our example database with only 3 dimension tables of 6, 4, 4 attributes, this number is 18785, but for a real-world database with 50 attributes, this number is 2 50 ~10 15,

8 outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.Data-cube lattice, MD-lattice and MDred-lattice.

9 Materialized View A materialized view is the result of some queries, which we choose to store in the database, rather than reconstructing it as needed in response to queries. INSERT INTO SalesV1 SELECT SUM (Price) FROM Sales, Time, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Timestamp = Time.Timestamp GROUP BY (Time.Year) The materialized view SalesV1 can answer the query 1 directly.

10 outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.MDred-lattice.

11 The MDmat-Problem—the cost Query cost : the cost of computing query q i, given a set of materializations M. We want to minimize this cost. Update cost, here m i is the i th view in M and f mi is the frequency m j is updated and c u (m i ) is the update cost for m i. We want to minimize this cost too. So, given the query set and the materialized view set, the cost of this solution is the sum of the above two costs:

12 choose the right views to materialize Compare to the possible views we can have, the number of queries is extremely small. Consider the data-cube lattice we have below, among the total 16 nodes, only 4 nodes may be used to answer queries. So we can only select a small number of views to materialize. psdr psdpsrpd r sdr pspdprsdsrdr psdr none q4q4 q3q3 q2q2 q1q1

13 Functional Dependence Functional dependency is a constraint on the content of the dimension table: for each tuple pair t 1,t 2 and fd: A l →A r, t 1 [A l ]=t 2 [A l ]→t 1 [A r ]=t 2 [A r ] Examples: 1. In the dimension table Store, we have fd s1 : Store_id →Zip, fd s2 : Zip → City, fd s3 : City →State. 2. In the dimension talbe Time, we have fd t1 : timestamp →week, fd t2 : timestamp →date, fd t3 : date →month, fd t4 : month →quarter, fd t5 : quarter →year. Use the attributes hierarchy, we can get the multidimensional lattice.

14 The MD-lattice Store_id Zip City State The MD-lattice of the Store dimension The MD-lattice of the Time dimension all Timestamp Week Date Month Quarter Year all

15 Candidate Views It’s impossible (and no need) to materialize all the possible views in the data cube. We only need the views which can help us to answer the queries. We only consider the views that can provide some contribution to reduce the total cost, the candidate views. A view v i belonging to an MD-lattice is a candidate view if one of the following two conditions holds: 1. View v i is associated to some query q i ; 2. There exist two candidate views v j and v k, and v i is the least upper bound of v j and v k.

16 The materialization of a non- candidate view will not help Suppose there is a non-candidate view vi and it’s materialized. We consider two cases: 1.There is no candidate view depending on v i. Since v i will not change the query cost, and the update cost for view v i is always positive, so materialize v i will not help. 2.At least one candidate view exists depending on v i. Say there’s a candidate view v j depending on v i. Since the size of v j is smaller than v i, we can see the update cost of v j is always smaller than v i. That means the materialization of v i always costs more. Conclusion: we should always choose the candidate view to materialize. : materialized view : unmaterialized view case 1 case 2 Both views are materialized only the non-candidate view is materialized

17 Candidate views examples For query 1 on slide #5, we can choose the view SalesV1 to materialize. For query 2, we can do: CREAT MATERIALIZED VIEW SalesV2 SELECT SUM (Price) FROM Sales, Store, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Store_id = Store.Store_id GROUP BY (Store.State) In both examples, we choose the view which is associated to the query to materialize.

18 outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.The MDred-lattice.

19 The MDred-lattice Given an MD-lattice and a set of queries Q, the set of its candidate views forms the MDred-lattice. The MDred-lattice Construction Algorithm

20 An MD-lattice construction Suppose we have two queries: query 1: the total sale of the week 50. query 2: the total sale of the 3 rd quarter of year 2005. From the MDred-lattice construction algorithm, first we need to materialize the views group by attribute Week and attribute Quarter to answer the queries, then we need to extend the view set by adding the least upper bound, attribute Timestamp to the view set.

21 The cost evaluation Suppose we have two queries q j and q k, consider both the query cost and the update cost, we have two options: Option 1: materialize v j and v k. The total cost is Option 2: only materialize v i, which is the least upper bound of v j and v k. The total cost is

22 The cost evaluation (cont.) For option 1, let f u =0.8, c u (v j )=100, f qj =0.5, c qj (v j )=100, c u (v k )=100, c qk (v k )=100, we can get C1=0.8×100+0.5×100+0.8×100+0.5×100=260. For option 2, let f u =0.8, the update cost will be larger (since the cardinality of v i is larger), say c u (v i )=120, the query cost will also be larger (since additional aggregation will be used to answer the queries), say cqj(vi)=110, cqk(vi)=110, we can get C2=0.8×120+0.5×110+0.5×110=206. So option 2 is the better choice!

23 References Materialized view selection in a multidimensional database. Elena Baralis, Stefano Paraboschi and Ernest Teniente. Proceedings of the 23 rd VLDB Conference.1997 Designing Data Warehouses. Dimitri Theodoratos, Timos Sellis. 1999


Download ppt "Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006."

Similar presentations


Ads by Google