Download presentation
Presentation is loading. Please wait.
1
VLDB’2007 review Denis Mindolin
2
VLDB’07 program
4
Outline Probabilistic Skylines on Uncertain Data, Jian Pei et al Lazy Maintenance of Materialized Views, Jingren Zhou et al
5
Probabilistic Skylines on Uncertain Data Based on the VLDB’07 paper of Jian Pei et al
6
Skyline. General picture For a dataset D = {p 1,..,p n }, the skyline S is the set of all p i s.t. there is no other p j that dominates p i p i dominates p j if p i is better than p j in at least one dimension, and not worse than p j in all other dimensions Single game results: S = {Eddie, Carl}
7
Uncertain data Multiple game results: S=? Use some aggregate function? Can’t capture distribution! Can be biased by outliers!
8
Probabilistic dominance relation Uncertain data Uncertain object U={u 1,..,u l } Uncertain objects are independent Pr(u i ) = Pr(u j ) Probabilistic dominance relation Given two uncertain objects U={u 1, …, u l1 }, V={v 1, …, v l2 } The prob. that V dominates U is given by
9
Probabilistic dominance relation. Example Smaller values of X and Y are better
10
p-Skyline Let U={u 1,…,u l }. For all u U, probability of u in skyline := Probability u not dominated by any other object Skyline probability of U p-Skyline
11
The bottom up skyline algorithm Bounding Compute upper and lower bounds of skyline prob. for objects Pruning If the lower bound of Pr(U) is larger than p, then U is in the skyline. If the upper bound of Pr(U) is smaller than p, U is not in the skyline Refining If p is between the lower and the upper bounds, then we need to get tighter bounds of the skyline probabilities by the next iteration of the algorithm
12
Bounding u min =(min i=1 {u i.D 1 },…,min{u i.D l }) u max =(max i=1 {u i.D 1 },…,max{u i.D l }) Lemma If u i1 < u i2 then Pr(u i1 ) ≥ Pr(u i2 ) Pr(u min ) ≥ Pr(U) ≥ Pr(u max )
13
Pruning Rule1. For an uncertain object U and probability threshold p, if Pr(U min ) < p, then U is not in the p-skyline. If Pr(U max ) ≥ p, then U is in the p-skyline. Rule2. For each instance u U, let Pr + (u) and Pr - (u) be the upper and lower bounds of Pr(u) If, then U is not in the p-skyline If, then U is in the p-skyline Rule3. Let U and V be two different uncertain objects. If u U and V max < u, then Pr(u) = 0
14
Pruning Rule4. Let U and V be two uncertain objects and U’ U be a subset of instances of U such that U’ max V min. If, then Pr(V) < p and thus V is not in the p-skyline
15
Refinement Partition instances into layers
16
Algorithm summary Complexity: O(W total *R) W total – number of instances whose skyline probabilities are computed by the algorithm R – average cost of querying local R-tree of possible dominating objects W total is much smaller than the total number of instances Top-down algorithm: see the paper
17
Lazy Maintenance of Materialized Views Based on the VLDB’07 paper of Jingren Zhou et al
18
Eager and Deferred Materialized View Maintenance T1 V T2 Eager: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2), recomp(V)} Deferred: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2)} … User tran: {recomp(V)} … User tran: {Q(V)} Executed: {Q(V)}
19
Lazy Materialized View Maintenance T1 V T2 Lazy: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2)} … Executed: { recomp(V) } … User tran: {Q(V)} Executed: {Q(V)}
20
System architecture Based on MS SQL Server 2005
21
How it works
22
Delta tables Table 1 : {(transID i, stmtID i, rowID i, action i )} … Table n : {(transID i, stmtID i, rowID i, action i )} tranID – transaction id stmtID – statement id rowID – updated row id action = (ins|del) All “update” actions are converted into pairs of del/ins actions
23
Maintenance and its optimization Maintenance task is created for each view affected by a transaction Views updated incrementally using Delta tables “Smart” maintenance task scheduler Maintenance tasks are scheduled as low-priority jobs Maintenance tasks are combined using the Condense operator Proper times slot is allocated for each task
24
Delta stream Condense operator Intuition: Tran: {A:=1,…,A:=2,…,A:=3}=>{…,A:=3} Operator definition INS/INS condense: {ins 1 (row a ), …, ins k (row a )}=>{…, ins k (row a )} INS/DEL condense: {ins 1 (row a ), …, del k (row a )}=>{…} DEL/DEL condense: {del 1 (row a ), …, del k (row a )}=>{…, del k (row a )}
25
Performance results Response time is low Query response time is low Maintenance cost eager view update cost Overhead is low
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.