Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.

Similar presentations


Presentation on theme: "Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University."— Presentation transcript:

1

2 Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University of California, Riverside ** Philipps Universität Marburg, Germany

3 Outline Introduction & Motivation Problem Decomposition The MVSB-tree Performance Results Conclusions

4 Introduction & Motivation Consider a collection of temporal records. Each record: key k, value v, time interval [t 1, t 2 ]. E.g.: employees and their salaries over time. Temporal Aggregation: aggregate values over time. Focus on SUM/COUNT/AVG. Introduction & Motivation

5 Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

6 Previous Work ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

7 Previous Work E.g. the sum over [t 1, t 2 ] is 28. ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

8 Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation

9 Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation Find AVG salary over past ten years of all employees whose last names start with ‘B’.

10 Alternative: Introduction & Motivation Previous approaches would need a separate index for each possible key range. (inefficient) Our solution: O(log b n). -index the records; -selection query: ‘find all records intersecting [k 1, k 2 ]x [t 1, t 2 ]’. -Query time is O(n).

11 Problem Decomposition LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. Problem Decomposition Decompose RTA into LKST and LKLT queries. E.g. LKST(k 2, t 2 )=11.

12 LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. Problem Decomposition E.g. LKLT(k 2, t 2 )=20.

13 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ])

14 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )

15 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) - LKST(k 1, t 2 )

16 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 )

17 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )

18 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) - LKLT(k 1, t 2 )

19 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )

20 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )

21 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )- LKLT(k 1, t 1 )

22 = + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )- LKLT(k 1, t 1 )

23 RTA([k 1, k 2 ]x[t 1, t 2 ]) =LKST(k 2, t 2 )- LKST(k 1, t 2 ) + LKLT(k 2, t 2 )- LKLT(k 1, t 2 ) - LKLT(k 2, t 1 )+ LKLT(k 1, t 1 ) The RTA query is decomposed to LKST and LKLT. Problem Decomposition

24 Both LKST and LKLT are point queries: ‘given k, t, return value’. An index for LKST and LKLT should:  store points in key-time space;  maintain a value for each point;  support point queries. Index Design

25 Model Assume updates come in increasing time order (transaction-time model). at t 1, inserted as: at t 2, updated as: Index Design a record:

26 The LKST index at t 1 The effect of inserting record (k, [t 1, t 2 ], v): at t 2 Index Design

27 The LKLT index no update at t 1 Index Design The effect of inserting record (k, [t 1, t 2 ], v): at t 2

28 Update Operation Common update operation for both: insert (k, t):v. Index Design That is: add v to all points in [k, t] x [k max, t max ]. Conclusion: an index supporting point query and the above update can be used for LKLT and LKST.

29 The MVSB-tree A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree

30 Insertion The MVSB-tree

31 Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page.

32 Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page. copy Strong overflow: limit the number of records in a new page. root 2 : [4, t max ) root 1 : [1, 4)

33 Point Query (k, t ) Follows a single path: the nodes containing (k, t ). Aggregates the values found in this path. The MVSB-tree

34 Point Query (k, t ) Follows a single path: the nodes containing (k, t ). The MVSB-tree E.g.: PointQuery(23, 7) = 5+2 = 7. Aggregates the values found in this path.

35 Efficiency Theorem: with 2 MVSBT indices, we achieve:  RTA query: O(log b n);  Update: O(log b K);  Space: O( * log b K). n = number of updates; K= number of different keys; b = page capacity (in records). The MVSB-tree

36 Performance Results Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; Datasets: created using the TimeIT [KS98] software and transformed to add record keys. Each dataset has a million records (10k unique keys; on average 100 intervals per key). Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results

37 Index Sizes Performance Results

38 Query Speedup Query time is averaged over 100 queries of the same query rectangle size.

39 Conclusions We addressed the range-temporal aggregation (RTA) problem; New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; Query time reduced from O(n) to O(log b n) with small space overhead; Open problems:  Min/Max range-temporal aggregation;  Valid-time environment;  Multi-dimensional aggregation over objects with extents.


Download ppt "Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University."

Similar presentations


Ads by Google