View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Lecture 8: Data structures for databases II Jose M. Peña
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Managing Data Resources
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
Distributed Databases
Chapter 6: Integrity Objective Key Constraints (Chapter 2) Cardinality Constraints (Chapter 2) Domain Constraints Referential Integrity Assertions Triggers.
Query Processing Presented by Aung S. Win.
©Silberschatz, Korth and Sudarshan6.1Database System Concepts Chapter 6: Integrity and Security Domain Constraints Referential Integrity Assertions Triggers.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Views: Limiting Access to Data A view is a named select statement that is stored in a database as an object. It allows you to view a subset of rows or.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Part 2: EJB Persistency Jianguo Lu. 2 Object Persistency A persistent object is one that can automatically store and retrieve itself in permanent storage.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
What have we learned?. What is a database? An organized collection of related data.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
11th International Conference on Web-Age Information Management July 15-17, 2010 Jiuzhaigou, China V Locking Protocol for Materialized Aggregate Join Views.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
SQL- Updates, Assertions and Views. Data Definition, Constraints, and Schema Changes Used to CREATE, DROP, and ALTER the descriptions of the tables (relations)
SQL Basics Review Reviewing what we’ve learned so far…….
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Managing Data Resources File Organization and databases for business information systems.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
10/3/2017 Chapter 6 Index Structures.
Indexing Structures for Files
Chapter 6: Integrity (and Security)
Module 11: File Structure
Updating SF-Tree Speaker: Ho Wai Shing.
A paper on Join Synopses for Approximate Query Answering
Net 323 D: Networks Protocols
Relational Algebra Chapter 4, Part A
Accounting System Design
MANAGING DATA RESOURCES
Practical Database Design and Tuning
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Accounting System Design
Chapter 11 Indexing And Hashing (1)
Evaluation of Relational Operations: Other Techniques
Views 1.
Combinatorial Optimization of Multicast Key Management
Chapter 8 Views and Indexes
Presentation transcript:

View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost

Motivation Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources

View Materialization & Maintenance View materialization is the process of pre- computing views (summarized information) in order to gain performance View materialization is the process of pre- computing views (summarized information) in order to gain performance Drawback is to keep the view consistent when the underlying data sources change Drawback is to keep the view consistent when the underlying data sources change View Maintenance is the process of keeping the view consistent with the underlying source tables View Maintenance is the process of keeping the view consistent with the underlying source tables

Incremental View Maintenance Relevant Updates only affect the view Relevant Updates only affect the view The aim of incremental view maintenance is to re-compute the view considering only the net changes that have taken place instead of re-calculating the view from scratch. The aim of incremental view maintenance is to re-compute the view considering only the net changes that have taken place instead of re-calculating the view from scratch.

Selection V = Ө C(y) (r) any tuple that satisfies C(y) will be in the view V = Ө C(y) (r) any tuple that satisfies C(y) will be in the view After inserts and deletes we get After inserts and deletes we get V’= V + Ө C(y) (i) - Ө C(y) (d) V’= V + Ө C(y) (i) - Ө C(y) (d) The view can be incrementally maintained by: The view can be incrementally maintained by: Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i))) Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i))) Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d)) Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d))

Projection Problem with Projections: Problem with Projections: Imagine if you delete Imagine if you delete (1,10) from the base (1,10) from the base table table Solution is to keep the key in the view or use a counter Solution is to keep the key in the view or use a counter

Joins Inserts: Inserts: Let V = r s and r’ = r i then: V’= r’ s V’= r’ s = (r i) s = (r i) s = (r s) (i s) = (r s) (i s) = V (i s) = V (i s) Deletes are similar Deletes are similar

View Maintenance in Dynamic Environments Dynamic environment is specified here as one that covers both data updates and schema changes Dynamic environment is specified here as one that covers both data updates and schema changes Interleaving data updates and schema changes can cause problems Interleaving data updates and schema changes can cause problems The following steps need to be taken: The following steps need to be taken: - Optimize updates based on their source relations and update types. - Optimize updates based on their source relations and update types. - For schema changes that effect the view definition perform a view evolution process. - For schema changes that effect the view definition perform a view evolution process. - Perform view adaptation to make the view consistent. - Perform view adaptation to make the view consistent.

Optimize Updates DU’ = п (attr (R)) ∩ (attr(R’)) DU’ = п (attr (R)) ∩ (attr(R’)) Its obvious to see п(attr (R)) ∩ (attr(R’)) contains all the attributes related to the view redefinition. This is essentially because neither dropped nor added attributes will appear in the view definition. Its obvious to see п(attr (R)) ∩ (attr(R’)) contains all the attributes related to the view redefinition. This is essentially because neither dropped nor added attributes will appear in the view definition. The relationship between SC and DU are: The relationship between SC and DU are: 1. If SCi’ contains drop relation Ri, then DUi ={} and SCi’ = drop relation Ri. 2. If SCi’ contains drop attribute operation both SCi’ and DUi’ might not be empty 3. If SCi’ contains no drop operation, then DUi’=DUi.

Example Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this information you can see that attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}. Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this information you can see that attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}.

Evolving View Definition Applying view synchronization: Applying view synchronization:

Making the view consistent Now that the schema is consistent we need the view to become synchronized with the underlying base table updates. Many mechanisms have been defined I will explain more on this issue later. Now that the schema is consistent we need the view to become synchronized with the underlying base table updates. Many mechanisms have been defined I will explain more on this issue later.

Efficient VM over distributed data sources Materialized view integrate and store data from distributed data sources to ensure better access, higher performance and better availability. Materialized view integrate and store data from distributed data sources to ensure better access, higher performance and better availability. Since the data sources are distributed the network cost involved in transferring the net changes can also be dramatic. Since the data sources are distributed the network cost involved in transferring the net changes can also be dramatic. State of the art view maintenance requires 0(n^2) maintenance queries to remote data sources with n being the number of data sources in the view definition. State of the art view maintenance requires 0(n^2) maintenance queries to remote data sources with n being the number of data sources in the view definition.

Goal The aim is to restructure the view maintenance queries in order to reduce costs. The aim is to restructure the view maintenance queries in order to reduce costs. HOW?? HOW?? Assume the Materialized view Assume the Materialized view R1 ► R 2 ► R 3 ► R 4. ( ► = join) R1 ► R 2 ► R 3 ► R 4. ( ► = join)

Restructuring Batch View Maintenance State of ART: State of ART: Ri’=Ri +  Ri Ri’=Ri +  Ri Hence O(n^2) Hence O(n^2)

Adjacent Grouping Adjacent Grouping (share common access to the maintenance Queries): Adjacent Grouping (share common access to the maintenance Queries): For the previous example divide it up into two groups. For the previous example divide it up into two groups. It becomes It becomes (  R1 ► R2 +R1’ ►  R2) ► R3 ► R4 + (  R1 ► R2 +R1’ ►  R2) ► R3 ► R4 + (  R3 ► R4+ R 3’ ►  R4) ► R1 ’ + R2 ’ hence 12 queries have been reduced to 8 hence O(n^1.5) (  R3 ► R4+ R 3’ ►  R4) ► R1 ’ + R2 ’ hence 12 queries have been reduced to 8 hence O(n^1.5)

Conditional Grouping A more aggressive method is called conditional grouping whose execution is 2*(n-1) maintenance queries. A more aggressive method is called conditional grouping whose execution is 2*(n-1) maintenance queries. Scroll up phase Scroll up phase

Conditional Grouping Cont Scroll Down phase Scroll Down phase

Self Maintenance of Multiple SPJ Views The view V at level 0 can be described in terms of nodes at level as tmp1 tmp3. The view V at level 0 can be described in terms of nodes at level as tmp1 tmp3. Some tuples of tmp1 Some tuples of tmp1 and tmp3 do not join into the view V; hence, we store these tuples in their respective AV’s for tmp1 and tmp3 at level 1.

Update takes place in Relation R There are two possible paths that U (update) can take to find its way to the root node: There are two possible paths that U (update) can take to find its way to the root node: 1. ∆V = (((U AV(S)) AV(T)) AV(tmp1))) 2. ∆V = V U

Sub-trees With this approach, a With this approach, a change in any sub-tree can be propagated to the root node without re-computing any of the other sub-trees. Since we only store Since we only store tuples at level i if they do not join into the node at level i+1, the tuples are not duplicated in the tree.

Benefits of this approach The benefits of this procedure can be summarized as follows: The benefits of this procedure can be summarized as follows: 1. Changes to the view of a sub-tree only effectively change the root of that sub-tree 2. The view updates can effectively be computed by joining only subsets of base relations rather than the entire base relation. As an example ∆V = (((U AV(S)) AV(T)) AV(tmp1))) rather than the traditional method ∆V = (((U S) T) AV(tmp1)))

Multiple View Maintenance Essentially the same as single view maintenance however the AV of the shared node in the tree will be different. Essentially the same as single view maintenance however the AV of the shared node in the tree will be different.

Auxiliary View Structure 1. AV(temp3) stores tuples that do not join in V and do not join into V’ into two separate AV’s. The problem with this scheme is that it stores the set AV(temp3) (V) ∩ AV(temp3) (V’).The sub-tree represented by intermediate node temp3 will be recomputed twice and the views V and V’ will be updated separately. 2. AV(temp3) stores tuples that do not join in view V and tuples that do not join in view V’ in three AV’s: AV(temp3)(V), AV(temp3)(V’) and AV(temp3)(V ∩ V’). This eliminates duplicates, this will cut down the computational cost but incurs additional overhead of placing tuples in the correct AV.