Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006.

Slides:



Advertisements
Similar presentations
Chapter 4 Tutorial.
Advertisements

Dimensional Modeling.
Data Warehousing and Decision Support, part 2
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Alternative Database topology: The star schema
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 16 Data Warehouse Technology and Management.
Materialized View Selection in a Multidimensional Database Elena Baralis Stefano Paraboschi Ernest Teniente.
1 Lecture 10: More OLAP - Dimensional modeling
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
1 Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung Implementing Data Cubes Efficiently.
OnLine Analytical Processing (OLAP)
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222.
Chapter 16 Data Warehouse Technology and Management.
BI Terminologies.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
MIS2502: Data Analytics Dimensional Data Modeling
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
1 Views, Indexes Virtual and Materialized Views Speeding Accesses to Data.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Chapter 16 Data Warehouse Technology and Management.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
INFORMATION INTEGRATION Sandeep Singh Balouria CS-257 ID- 101.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Copyright © Archer Decision Sciences, Inc. Our Model Store DimensionProduct Dimension District Region Total Brand Manufacturer Total StoresProducts.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
On-Line Application Processing
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouse.
On-Line Analytic Processing
Data warehouse and OLAP
On-Line Analytic Processing
Chapter 13 The Data Warehouse
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
On-Line Analytical Processing (OLAP)
MIS2502: Data Analytics Dimensional Data Modeling
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Retail Sales is used to illustrate a first dimensional model
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
CUBE MATERIALIZATION E0 261 Jayant Haritsa
Data Warehousing.
Presentation transcript:

Materialized View Selection in a Multidimensional Database Presenter: Dong Wang 3/14/2006

outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.The MDred-lattice.

Multidimensional Database A multidimensional database (MDDB) is a data repository that provides an integrated environment for decision support queries that require complex aggregations on huge amounts of historical data. An MDDB is a relational data warehouse where the information is organized following the so-called star- model.

A Practical Example Consider the MDDB for a large store chain, characterized by a large number of stores, each of which is a supermarket selling a wide variety of different products. We can identify the following dimensions: Product, which can be characterized by Product_id, Department, Manufactured_date and Price. Store, which can be characterized by Store_id, store address (which can be decomposed into City, State, and Zip). Time, which can be characterized by Timestamp, Date, Week, Month, Quarter, Year.

The schema of the example Product Product_id Department Manufactured_ date Price Time Timestamp Date Week Month Quarter Year Store Store_id City State Zip Sales Transaction_id Timestamp Product_id Store_id

Example queries Query 1: the total sales for year SELECT SUM (Price) FROM Sales, Time, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Timestamp = Time.Timestamp AND Time.Year = ‘2003’ Query 2: the total sales for store at Ohio. SELECT SUM (Price) FROM Sales, Store, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Store_id = Store.Store_id AND Store.State = ‘Ohio’

How many views an MDDB can have? It depends on the number of attributes of the dimensions of the MDDB without hierarchies on the dimensional tables, the number is In our example database with only 3 dimension tables of 6, 4, 4 attributes, this number is 18785, but for a real-world database with 50 attributes, this number is 2 50 ~10 15,

outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.Data-cube lattice, MD-lattice and MDred-lattice.

Materialized View A materialized view is the result of some queries, which we choose to store in the database, rather than reconstructing it as needed in response to queries. INSERT INTO SalesV1 SELECT SUM (Price) FROM Sales, Time, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Timestamp = Time.Timestamp GROUP BY (Time.Year) The materialized view SalesV1 can answer the query 1 directly.

outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.MDred-lattice.

The MDmat-Problem—the cost Query cost : the cost of computing query q i, given a set of materializations M. We want to minimize this cost. Update cost, here m i is the i th view in M and f mi is the frequency m j is updated and c u (m i ) is the update cost for m i. We want to minimize this cost too. So, given the query set and the materialized view set, the cost of this solution is the sum of the above two costs:

choose the right views to materialize Compare to the possible views we can have, the number of queries is extremely small. Consider the data-cube lattice we have below, among the total 16 nodes, only 4 nodes may be used to answer queries. So we can only select a small number of views to materialize. psdr psdpsrpd r sdr pspdprsdsrdr psdr none q4q4 q3q3 q2q2 q1q1

Functional Dependence Functional dependency is a constraint on the content of the dimension table: for each tuple pair t 1,t 2 and fd: A l →A r, t 1 [A l ]=t 2 [A l ]→t 1 [A r ]=t 2 [A r ] Examples: 1. In the dimension table Store, we have fd s1 : Store_id →Zip, fd s2 : Zip → City, fd s3 : City →State. 2. In the dimension talbe Time, we have fd t1 : timestamp →week, fd t2 : timestamp →date, fd t3 : date →month, fd t4 : month →quarter, fd t5 : quarter →year. Use the attributes hierarchy, we can get the multidimensional lattice.

The MD-lattice Store_id Zip City State The MD-lattice of the Store dimension The MD-lattice of the Time dimension all Timestamp Week Date Month Quarter Year all

Candidate Views It’s impossible (and no need) to materialize all the possible views in the data cube. We only need the views which can help us to answer the queries. We only consider the views that can provide some contribution to reduce the total cost, the candidate views. A view v i belonging to an MD-lattice is a candidate view if one of the following two conditions holds: 1. View v i is associated to some query q i ; 2. There exist two candidate views v j and v k, and v i is the least upper bound of v j and v k.

The materialization of a non- candidate view will not help Suppose there is a non-candidate view vi and it’s materialized. We consider two cases: 1.There is no candidate view depending on v i. Since v i will not change the query cost, and the update cost for view v i is always positive, so materialize v i will not help. 2.At least one candidate view exists depending on v i. Say there’s a candidate view v j depending on v i. Since the size of v j is smaller than v i, we can see the update cost of v j is always smaller than v i. That means the materialization of v i always costs more. Conclusion: we should always choose the candidate view to materialize. : materialized view : unmaterialized view case 1 case 2 Both views are materialized only the non-candidate view is materialized

Candidate views examples For query 1 on slide #5, we can choose the view SalesV1 to materialize. For query 2, we can do: CREAT MATERIALIZED VIEW SalesV2 SELECT SUM (Price) FROM Sales, Store, Product WHERE Sales.Product_id = Product.Product_id AND Sales.Store_id = Store.Store_id GROUP BY (Store.State) In both examples, we choose the view which is associated to the query to materialize.

outlines 1.What is multidimensional database. 2.Why materialize views. 3.The cost evaluation. 4.The MDred-lattice.

The MDred-lattice Given an MD-lattice and a set of queries Q, the set of its candidate views forms the MDred-lattice. The MDred-lattice Construction Algorithm

An MD-lattice construction Suppose we have two queries: query 1: the total sale of the week 50. query 2: the total sale of the 3 rd quarter of year From the MDred-lattice construction algorithm, first we need to materialize the views group by attribute Week and attribute Quarter to answer the queries, then we need to extend the view set by adding the least upper bound, attribute Timestamp to the view set.

The cost evaluation Suppose we have two queries q j and q k, consider both the query cost and the update cost, we have two options: Option 1: materialize v j and v k. The total cost is Option 2: only materialize v i, which is the least upper bound of v j and v k. The total cost is

The cost evaluation (cont.) For option 1, let f u =0.8, c u (v j )=100, f qj =0.5, c qj (v j )=100, c u (v k )=100, c qk (v k )=100, we can get C1=0.8× × × ×100=260. For option 2, let f u =0.8, the update cost will be larger (since the cardinality of v i is larger), say c u (v i )=120, the query cost will also be larger (since additional aggregation will be used to answer the queries), say cqj(vi)=110, cqk(vi)=110, we can get C2=0.8× × ×110=206. So option 2 is the better choice!

References Materialized view selection in a multidimensional database. Elena Baralis, Stefano Paraboschi and Ernest Teniente. Proceedings of the 23 rd VLDB Conference.1997 Designing Data Warehouses. Dimitri Theodoratos, Timos Sellis. 1999