Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tuning the top-k view update process

Similar presentations


Presentation on theme: "Tuning the top-k view update process"— Presentation transcript:

1 Tuning the top-k view update process
Eftychia Baikousi Panos Vassiliadis University of Ioannina Dept. of Computer Science

2 Forecast Problem of maintaining materialized top-k views, when updates occur in the base relation Extra difficulty: address the problem in the presence of high deletion rates The crux of the approach is to materialize an appropriate number of extra tuples kcomp to sustain the deletion rates that are drastically higher than average The correct estimation & fine tuning of kcomp is not obvious We use appropriate probabilistic methods M-Pref 2007, Vienna 23/9/2007

3 Contents Motivation & Problem Definition Overview of our Method
Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions M-Pref 2007, Vienna 23/9/2007

4 Contents Motivation & Problem Definition Overview of our Method
Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions M-Pref 2007, Vienna 23/9/2007

5 Top-k query Find k tuples with highest grades according to Q Given
a relation R (id, x1, x2, x3) and a query Q, sum(x1, x2, x3) Find k tuples with highest grades according to Q R id x1 x2 x3 a 0.3 0.6 0.7 b 0.2 0.4 c 0.5 0.9 d 0.1 sum 1.6 0.9 1.8 1.4 Top-2 tuples M-Pref 2007, Vienna 23/9/2007

6 Motivating Example Shopping Center Given Maintain the view V
Customers sign in with a palmtop (PDA) Need for advertisements – Special offers to Customers Given relation Customers (id, name, age, salary, …) materialized view V of the top-2 (Younger and Highly paid Customers) according to the query Q: - age + 2*salary Maintain the view V Customers sign in and out (e.g., train departures, working hours) Customers id name age salary 1 John 18 20 2 Mary 42 25 3 Bill 26 35 4 Peter 57 37 Q 22 8 44 17 V name Q Bill 44 John 22 M-Pref 2007, Vienna 23/9/2007

7 Problem definition Given Compute Such that
a base relation R (ID, X, Y) that originally contains N tuples, a materialized view V that contains top-k tuples of the form (id, val) where val is the score according to a function Q(x,y)=ax + by and a, b are constant parameters, the update ratios ins, del and upd for insertions, deletions and updates respectively over the base relation R, Compute kcomp that is of the form kcomp = k + Δk Such that the view will contain at least k tuples, k ≤ kcomp, with probability p, after a period T V id Q k Δk kcomp M-Pref 2007, Vienna 23/9/2007

8 Related Work Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, Yuguo Chen: “Efficient Maintenance of Materialized Top-k Views”, ICDE ’03 Maintain a materialized top-k view when updates occur in the base table Compute a kmax (instead of the necessary k) adjusted at runtime so a refill query is rarely needed formulates the problem through a random walk model The method is theoretically guaranteed to work well only when the probabilities of insertions and deletions are equal, pins=pdel of insertions are more frequent than deletions pins>pdel There is no quality-of-service guarantee when deletions are more probable than insertions, pins<pdel M-Pref 2007, Vienna 23/9/2007

9 Motivating Example The view will not contain at least k tuples
Customers sign in and out Due to train departures, working hours At certain time periods, deletions are more probable than insertions pins<pdel The view will not contain at least k tuples Customers id name age salary 1 John 18 20 2 Mary 42 25 3 Bill 26 35 4 Peter 57 37 Q 22 8 44 17 V name Q Bill 44 John 22 M-Pref 2007, Vienna 23/9/2007

10 Contents Motivation & Problem Definition Overview of our Method
Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions M-Pref 2007, Vienna 23/9/2007

11 Overview of the method Compute the ratios of the incoming source updates that affect the view Compute kcomp Fine tune kcomp M-Pref 2007, Vienna 23/9/2007

12 Empirical Cumulative Distribution Function ECDF
ECDF is a non parametric cumulative distribution function that adapts itself to the data Definition Fn(x) represents the proportion of observations in a sample less than or equal to x assigns the probability 1/n to each of n observations in the sample estimates the true population proportion F(x) M-Pref 2007, Vienna 23/9/2007

13 Computation of update rates that affect V
Given a relation Customers (id, name, age, salary, …) having N=4 tuples a materialized view V containing top-2 tuples (k=2) of the form (id, Q) where Q= -age +2*salary is the score Update ratios ins=1, del=2, upd=0 Find ins_aff and del_aff (insertions & deletions affecting the view) Customers V id name age salary 1 John 18 20 2 Mary 42 25 3 Bill 26 35 4 Peter 57 37 Q 22 8 44 17 name Q Bill 44 John 22 M-Pref 2007, Vienna 23/9/2007

14 Computation of update rates that affect V
Given N=4, ins=1, del=2, upd=0 We compute the following: updates are treated as a combination of deletions and insertions from ECDF the probability of a new tuple affecting the view Ratios affecting the view M-Pref 2007, Vienna 23/9/2007

15 Overview of the method Compute the ratios of the incoming source updates that affect the view Compute kcomp Fine tune kcomp M-Pref 2007, Vienna 23/9/2007

16 Computation of kcomp Compute kcomp that is of the form kcomp = k + Δk
id Q Δk k kcomp Compute kcomp such that it will guarantee that the view will contain at least k tuples, k ≤ kcomp, with probability p, after a period of operation T that is of the form kcomp = k + Δk Customers V id name age salary 1 John 18 20 2 Mary 42 25 3 Bill 26 35 4 Peter 57 37 Q 22 8 44 17 name Q Bill 44 John 22 Peter 17 M-Pref 2007, Vienna 23/9/2007

17 Computation of kcomp Customers V id name age salary 1 John 18 20 2 Mary 42 25 3 Bill 26 35 4 Peter 57 37 5 Kate 30 Q 22 8 44 17 25 name Q Bill 44 Kate 25 John 22 Peter 17 There is 1 insertion and 2 deletions affecting the view Tuple (5, Kate, 25, 30) is inserted and Tuples (3, Bill, 26, 35) and (4, Peter, 57, 37) are deleted from the view The view will contain 2 tuples, as initially needed M-Pref 2007, Vienna 23/9/2007

18 Overview of the method Compute the ratios of the incoming source updates that affect the view Compute kcomp Fine tune kcomp M-Pref 2007, Vienna 23/9/2007

19 Fine tune kcomp kcomp is expressed as a formula depending on
ins_aff and del_aff the ratios of insertions and deletions affecting the view The probability of a tuple affecting the view may vary according to probabilistic properties Fine tune kcomp by adding the appropriate variance M-Pref 2007, Vienna 23/9/2007

20 Fine tune kcomp The probability of a new tuple z affecting the view is p(z>valk) Bernoulli experiment with 2 possible events New tuple z affecting the view with probability p(z) New tuple z not-affecting the view with probability 1-p(z) The number of successes of ins Bernoulli experiments follow a Binomial distribution with VARIANCE : ins insertions in the base relation ins Bernoulli experiments M-Pref 2007, Vienna 23/9/2007

21 Fine tune kcomp In worst case, in order to guarantee that the view will contain at least k tuples with confidence 95% kcomp is computed as: VARins denotes the variance of the insertions VARdel denotes the variance of the deletions M-Pref 2007, Vienna 23/9/2007

22 Contents Motivation & Problem Definition Overview of our Method
Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions M-Pref 2007, Vienna 23/9/2007

23 Experimental methodology
Test the following methods kcomp without fine tuning kcomp with fine tuning Yi et ICDE03 For the following measures Number of tuples (# tuples) deleted from the view that fall below the threshold value of k Memory overhead for kcomp with & without fine tuning as number of extra tuples needed to keep in the view Number of extra tuples for kcomp with & without fine tuning compared to the number of extra tuples of the related work M-Pref 2007, Vienna 23/9/2007

24 Experimental methodology
Experimental parameters: Size of source table R (tuples) |R| 1x105, 5x105, 1x106, 2x106 Size of mat. View (tuples) k 5, 10, 100, 1000 Size of update stream (pct over |R|) 1/1000, 1/100 Deletion rate over insertion rate (ratio) D/I 1.0, 1.5, 2.0 Synthetic data sets: Gaussian distribution with mean μ=50 and variance σ=10 Negative exponential distribution with parameters a=1.0 for X and a=2.0 for Y Zipf distribution with parameter a=2.1 M-Pref 2007, Vienna 23/9/2007

25 Max & average misses kcomp without fine tuning Gaussian distribution
As a function of R and  As a function of k and D/I M-Pref 2007, Vienna 23/9/2007

26 Memory overhead Number of extra tuples as a function of R and D/I
M-Pref 2007, Vienna 23/9/2007

27 Comparison with related work
Number of extra tuples of kcomp with fine tuning compared with kmax of the related work as a function of R M-Pref 2007, Vienna 23/9/2007

28 Comparison with related work
Number of extra tuples of kcomp with fine tuning compared with kmax of the related work as a function of k M-Pref 2007, Vienna 23/9/2007

29 Contents Motivation & Problem Definition Overview of our Method
Computation of rates affecting the view Computation of kcomp Fine tuning kcomp Experiments Conclusions M-Pref 2007, Vienna 23/9/2007

30 Conclusions We handled the problem of maintaining materialized top-k views in the presence of high deletion rates The method comprises the following steps: a computation of the rate that actually affects the materialized view, a computation of the necessary extension to k in order to handle the augmented number of deletions that occur and a fine tuning part that adjusts this value to take the fluctuation of the statistical properties of this value into consideration M-Pref 2007, Vienna 23/9/2007

31 Thank you for your attention!
… many thanks to our hosts! This research was co-funded by the European Union in the framework of the program “Pythagoras IΙ” of the “Operational Program for Education and Initial Vocational Training” of the 3rd Community Support Framework of the Hellenic Ministry of Education, funded by 25% from national sources and by 75% from the European Social Fund (ESF). M-Pref 2007, Vienna 23/9/2007

32 Auxiliary slides Formulas for kcomp
M-Pref 2007, Vienna 23/9/2007

33 Time to build top-k view in microseconds
Gauss Negative exponential Zipf 100K 5 328000 348500 242000 10 333000 345667 239667 100 335500 343000 1000 395333 406000 299500 500K 1M 2M M-Pref 2007, Vienna 23/9/2007


Download ppt "Tuning the top-k view update process"

Similar presentations


Ads by Google