Download presentation
Presentation is loading. Please wait.
Published byClemence Pitts Modified over 8 years ago
1
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost
2
Motivation Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources
3
View Materialization & Maintenance View materialization is the process of pre- computing views (summarized information) in order to gain performance View materialization is the process of pre- computing views (summarized information) in order to gain performance Drawback is to keep the view consistent when the underlying data sources change Drawback is to keep the view consistent when the underlying data sources change View Maintenance is the process of keeping the view consistent with the underlying source tables View Maintenance is the process of keeping the view consistent with the underlying source tables
4
Incremental View Maintenance Relevant Updates only affect the view Relevant Updates only affect the view The aim of incremental view maintenance is to re-compute the view considering only the net changes that have taken place instead of re-calculating the view from scratch. The aim of incremental view maintenance is to re-compute the view considering only the net changes that have taken place instead of re-calculating the view from scratch.
5
Selection V = Ө C(y) (r) any tuple that satisfies C(y) will be in the view V = Ө C(y) (r) any tuple that satisfies C(y) will be in the view After inserts and deletes we get After inserts and deletes we get V’= V + Ө C(y) (i) - Ө C(y) (d) V’= V + Ө C(y) (i) - Ө C(y) (d) The view can be incrementally maintained by: The view can be incrementally maintained by: Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i))) Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i))) Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d)) Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d))
6
Projection Problem with Projections: Problem with Projections: Imagine if you delete Imagine if you delete (1,10) from the base (1,10) from the base table table Solution is to keep the key in the view or use a counter Solution is to keep the key in the view or use a counter
7
Joins Inserts: Inserts: Let V = r s and r’ = r i then: V’= r’ s V’= r’ s = (r i) s = (r i) s = (r s) (i s) = (r s) (i s) = V (i s) = V (i s) Deletes are similar Deletes are similar
8
View Maintenance in Dynamic Environments Dynamic environment is specified here as one that covers both data updates and schema changes Dynamic environment is specified here as one that covers both data updates and schema changes Interleaving data updates and schema changes can cause problems Interleaving data updates and schema changes can cause problems The following steps need to be taken: The following steps need to be taken: - Optimize updates based on their source relations and update types. - Optimize updates based on their source relations and update types. - For schema changes that effect the view definition perform a view evolution process. - For schema changes that effect the view definition perform a view evolution process. - Perform view adaptation to make the view consistent. - Perform view adaptation to make the view consistent.
9
Optimize Updates DU’ = п (attr (R)) ∩ (attr(R’)) DU’ = п (attr (R)) ∩ (attr(R’)) Its obvious to see п(attr (R)) ∩ (attr(R’)) contains all the attributes related to the view redefinition. This is essentially because neither dropped nor added attributes will appear in the view definition. Its obvious to see п(attr (R)) ∩ (attr(R’)) contains all the attributes related to the view redefinition. This is essentially because neither dropped nor added attributes will appear in the view definition. The relationship between SC and DU are: The relationship between SC and DU are: 1. If SCi’ contains drop relation Ri, then DUi ={} and SCi’ = drop relation Ri. 2. If SCi’ contains drop attribute operation both SCi’ and DUi’ might not be empty 3. If SCi’ contains no drop operation, then DUi’=DUi.
10
Example Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this information you can see that attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}. Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this information you can see that attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}.
11
Evolving View Definition Applying view synchronization: Applying view synchronization:
12
Making the view consistent Now that the schema is consistent we need the view to become synchronized with the underlying base table updates. Many mechanisms have been defined I will explain more on this issue later. Now that the schema is consistent we need the view to become synchronized with the underlying base table updates. Many mechanisms have been defined I will explain more on this issue later.
13
Efficient VM over distributed data sources Materialized view integrate and store data from distributed data sources to ensure better access, higher performance and better availability. Materialized view integrate and store data from distributed data sources to ensure better access, higher performance and better availability. Since the data sources are distributed the network cost involved in transferring the net changes can also be dramatic. Since the data sources are distributed the network cost involved in transferring the net changes can also be dramatic. State of the art view maintenance requires 0(n^2) maintenance queries to remote data sources with n being the number of data sources in the view definition. State of the art view maintenance requires 0(n^2) maintenance queries to remote data sources with n being the number of data sources in the view definition.
14
Goal The aim is to restructure the view maintenance queries in order to reduce costs. The aim is to restructure the view maintenance queries in order to reduce costs. HOW?? HOW?? Assume the Materialized view Assume the Materialized view R1 ► R 2 ► R 3 ► R 4. ( ► = join) R1 ► R 2 ► R 3 ► R 4. ( ► = join)
15
Restructuring Batch View Maintenance State of ART: State of ART: Ri’=Ri + Ri Ri’=Ri + Ri Hence O(n^2) Hence O(n^2)
16
Adjacent Grouping Adjacent Grouping (share common access to the maintenance Queries): Adjacent Grouping (share common access to the maintenance Queries): For the previous example divide it up into two groups. For the previous example divide it up into two groups. It becomes It becomes ( R1 ► R2 +R1’ ► R2) ► R3 ► R4 + ( R1 ► R2 +R1’ ► R2) ► R3 ► R4 + ( R3 ► R4+ R 3’ ► R4) ► R1 ’ + R2 ’ hence 12 queries have been reduced to 8 hence O(n^1.5) ( R3 ► R4+ R 3’ ► R4) ► R1 ’ + R2 ’ hence 12 queries have been reduced to 8 hence O(n^1.5)
17
Conditional Grouping A more aggressive method is called conditional grouping whose execution is 2*(n-1) maintenance queries. A more aggressive method is called conditional grouping whose execution is 2*(n-1) maintenance queries. Scroll up phase Scroll up phase
18
Conditional Grouping Cont Scroll Down phase Scroll Down phase
19
Self Maintenance of Multiple SPJ Views The view V at level 0 can be described in terms of nodes at level as tmp1 tmp3. The view V at level 0 can be described in terms of nodes at level as tmp1 tmp3. Some tuples of tmp1 Some tuples of tmp1 and tmp3 do not join into the view V; hence, we store these tuples in their respective AV’s for tmp1 and tmp3 at level 1.
20
Update takes place in Relation R There are two possible paths that U (update) can take to find its way to the root node: There are two possible paths that U (update) can take to find its way to the root node: 1. ∆V = (((U AV(S)) AV(T)) AV(tmp1))) 2. ∆V = V U
21
Sub-trees With this approach, a With this approach, a change in any sub-tree can be propagated to the root node without re-computing any of the other sub-trees. Since we only store Since we only store tuples at level i if they do not join into the node at level i+1, the tuples are not duplicated in the tree.
22
Benefits of this approach The benefits of this procedure can be summarized as follows: The benefits of this procedure can be summarized as follows: 1. Changes to the view of a sub-tree only effectively change the root of that sub-tree 2. The view updates can effectively be computed by joining only subsets of base relations rather than the entire base relation. As an example ∆V = (((U AV(S)) AV(T)) AV(tmp1))) rather than the traditional method ∆V = (((U S) T) AV(tmp1)))
23
Multiple View Maintenance Essentially the same as single view maintenance however the AV of the shared node in the tree will be different. Essentially the same as single view maintenance however the AV of the shared node in the tree will be different.
24
Auxiliary View Structure 1. AV(temp3) stores tuples that do not join in V and do not join into V’ into two separate AV’s. The problem with this scheme is that it stores the set AV(temp3) (V) ∩ AV(temp3) (V’).The sub-tree represented by intermediate node temp3 will be recomputed twice and the views V and V’ will be updated separately. 2. AV(temp3) stores tuples that do not join in view V and tuples that do not join in view V’ in three AV’s: AV(temp3)(V), AV(temp3)(V’) and AV(temp3)(V ∩ V’). This eliminates duplicates, this will cut down the computational cost but incurs additional overhead of placing tuples in the correct AV.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.