1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Fast Algorithms For Hierarchical Range Histogram Constructions
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
March DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Overview of Databases and Transaction Processing Chapter 1.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
1 Introduction Introduction to database systems Database Management Systems (DBMS) Type of Databases Database Design Database Design Considerations.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
Proxy-Server Architectures for OLAP Panos Kalnis, Dimitris Papadias THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY.
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
1 Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung Implementing Data Cubes Efficiently.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
OnLine Analytical Processing (OLAP)
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Data Warehousing.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Methodology – Physical Database Design for Relational Databases.
UNIT-II Principles of dimensional modeling
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
An Overview of Data Warehousing and OLAP Technology
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Chapter 1 Overview of Databases and Transaction Processing.
Dense-Region Based Compact Data Cube
Mehdi Kargar Department of Computer Science and Engineering
Spatio-Temporal Databases
Indexes By Adrienne Watt.
Record Storage, File Organization, and Indexes
Pertemuan <<13>> Data Warehousing dan Decision Support
Physical Database Design
A Black-Box Approach to Query Cardinality Estimation
A paper on Join Synopses for Approximate Query Answering
Chapter 13 The Data Warehouse
Methodology – Physical Database Design for Relational Databases
Database Management Systems (CS 564)
Chapter 12: Query Processing
MANAGING DATA RESOURCES
Overview of Databases and Transaction Processing
Spatio-Temporal Databases
C.U.SHAH COLLEGE OF ENG. & TECH.
Physical Storage Indexes Partitions Materialized views March 2004
Physical Storage Indexes Partitions Materialized views March 2006
View and Index Selection Problem in Data Warehousing Environments
Physical Storage Indexes Partitions Materialized views March 2005
Chapter 17 Designing Databases
Query Optimization.
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung

2 Outline Introduction Background DynaMat Experiments Conclusions References

3 Introduction On-Line Analytical Processing (OLAP)  Why OLAP? A dominant factor for Support Decision Application  Ad-hoc data-intensive queries  Costly multi-joins and aggregations Materialized View  Why materialize view? Data amount in data warehouses is very big OLAP query is very complex and costly OLAP query result maybe summary data  Represent a set of redundant entities in a data warehouse that are used to accelerate OLAP.

4 Introduction(cont.) Basic rule to materialize view  Given some space restriction, select some suitable views to materialize. Data warehouse Materialized View Query Not all data redundant ? How many? Which?

5 Background Research topics on materialized view  Store summary data as materialized view  Efficiently compute and update views Static selection of views  Pre-determine which view should be materialized and materialize them before the queries come Static!

6 Background(cont.) Limitations of Static Selection of Views  Many queries can’t be answered by the materialized data since query patterns change  Update is costly as data is changing overtime  Administrator: Monitor query patterns Re-calibrate such views by rerunning the query Automated view selection  Dynamic View Management: DynaMat workload heavy!!!

7 DynaMat Charactmaeristics:  Dynamically materializes information at different granularity  View Selection + View maintenance in a single framework System overview View pool organization Directory index Query execution Pool maintenance

8 System Overview Components Two phrases  On-line Query  Off-line Update Store materialized data Support sub-linear search in V Whether the materialized data can be used to answer query? Off-line update Maintain View Pool S

9 View Pool Organization Multi-Range query(MRQ)  Hyper-plane: n-vector  n: number of group by attributes  Ri: full range of the domain; single value; empty range Select product, year, sum(sales) From F Where product=‘p1’ Group by product, year F (product, country, year, sales) Product(p1, p35) Country (c1, c30) Year (1995,2000)

10 View Pool Organization(cont.) MRF(Multidimensional Range Fragments)  Each fragment can also be represented by a hyper-plane  Basic logical unit in the pool Many fragments in the View Pool ProductyearCountrySales P11997C130 P11997C250 P11999C140 P11999C360 P21997C140 P21998C250 P21998C330 F ProductyearCountrySales P11997All80 P11999All100 MRF

11 Directory Index Facilitate the search in view pool Directory index is a R-tree based on fragment’s hyper-planes. Each fragment corresponds to one entity in directory index Year P Product P15 P Directory Index

12 Query Execution Query Step:  From MR query, get its hyper-plane  Query the view pool based on the directory index Year P Product P15 P Directory Index f2 f3

13 Query Execution(cont.) Query cases:  One fragment f matches the query exactly Retrieve f and return it back to the user  No exact match, but many fragments can be used to answer the query Choose the best fragment to answer the query  The query can not be answered by the view pool Perform the query directly on the DW Query results  ACE in the later two cases

14 Pool Maintenance Admission Control Entity(ACE) Two cases to maintenance  New query results come  Data in base relation changes Space Bound &Time Bound  Space bound: View pool hits the pre-defined space window W space  replace  Time bound: the system restrict the time window W time to refresh the fragments. Goodness measure to determine whether a fragment is good enough.

15 Pool Maintenance(cont.) Pool maintenance during queries  New query results can be stored in the view pool if it has enough space  Call replace algorithm if it hits the space constraint. If goodness(new result) >goodness( f victim ), E vict f victim, This process doesn’t stop until there is enough space for the new query result. Maintenance of the father pointers evicted f victim f new : new query result Goodness(f victim )< goodness(f new ) f1 f2

16 Pool Maintenance(cont.) Pool maintenance during updates  Condition:data in base relation changes  Step: For each fragment compute minimum update cost UC(f)  Get all necessary deltas, which make change to the DW  Get from the directory index  Calculate dV and update each f by querying dV Total update cost: Evict fragments from the view pool according to the non-ascending order of their cost, if the UC(V) is greater than the time bound ProductyearCountrySales P301999C130 P12000C250 P41999C140 P11999C660 ProductyearCountrySales P301999C130 P41999C140 P12000C250 P11999C660 dV ={(p1,p35)},(1995,2000),(c1,C10)} Delta

17 Pool Maintenance(cont.) Year P Product P15 P ProductyearCountrySales P301999C130 P12000C250 P41999C140 P11999C660 ProductyearCountrySales P41999C140 P12000C250 P11999C660 dV ={(p1,p20)},(1995,2000),(c1,C10)} Delta

18 Experiments Measure: Detailed Cost Savings Ratio  Ci: Cost of answering queries in DW  Si: Saving cost when answering queries in view pool   The greater the DCSR, the better the performance

19 Experiments(cont.) Comparison with the optimal static view selection  1 Fact table: 6 dims, 20 million records  updates: 40 sets * 100 thousand records  Time constraint: 2% of the full Data Cube  Queries: 40 sets*500 MR Queries.

20 Conclusion DynaMat: A view management system  Dynamically materializes results from incoming queries  Exploits them to future use  Considering time and space constraint  Better performance than static methods

21 Reference Y. Kotidis, N. Roussopoulos. DynaMat: A Dynamic View Management System for Data Warehouses. In Proceedings of ACM SIGMOD International Conference on Management of Data, , Philadelphia, Pennsylvania, June Y. Kotidis, N. Roussopoulos. A Case for Dynamic View Management. ACM Transactions on Database Systems, Volume 26(4), , Original presentation by the author,

22 Thanks! Q&A?