Download presentation
Presentation is loading. Please wait.
Published byEmil Owen Modified over 9 years ago
2
Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University of California, Riverside ** Philipps Universität Marburg, Germany
3
Outline Introduction & Motivation Problem Decomposition The MVSB-tree Performance Results Conclusions
4
Introduction & Motivation Consider a collection of temporal records. Each record: key k, value v, time interval [t 1, t 2 ]. E.g.: employees and their salaries over time. Temporal Aggregation: aggregate values over time. Focus on SUM/COUNT/AVG. Introduction & Motivation
5
Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation
6
Previous Work ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation
7
Previous Work E.g. the sum over [t 1, t 2 ] is 28. ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation
8
Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation
9
Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation Find AVG salary over past ten years of all employees whose last names start with ‘B’.
10
Alternative: Introduction & Motivation Previous approaches would need a separate index for each possible key range. (inefficient) Our solution: O(log b n). -index the records; -selection query: ‘find all records intersecting [k 1, k 2 ]x [t 1, t 2 ]’. -Query time is O(n).
11
Problem Decomposition LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. Problem Decomposition Decompose RTA into LKST and LKLT queries. E.g. LKST(k 2, t 2 )=11.
12
LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. Problem Decomposition E.g. LKLT(k 2, t 2 )=20.
13
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ])
14
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )
15
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) - LKST(k 1, t 2 )
16
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 )
17
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )
18
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) - LKLT(k 1, t 2 )
19
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )
20
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )
21
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )- LKLT(k 1, t 1 )
22
= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )- LKLT(k 1, t 1 )
23
RTA([k 1, k 2 ]x[t 1, t 2 ]) =LKST(k 2, t 2 )- LKST(k 1, t 2 ) + LKLT(k 2, t 2 )- LKLT(k 1, t 2 ) - LKLT(k 2, t 1 )+ LKLT(k 1, t 1 ) The RTA query is decomposed to LKST and LKLT. Problem Decomposition
24
Both LKST and LKLT are point queries: ‘given k, t, return value’. An index for LKST and LKLT should: store points in key-time space; maintain a value for each point; support point queries. Index Design
25
Model Assume updates come in increasing time order (transaction-time model). at t 1, inserted as: at t 2, updated as: Index Design a record:
26
The LKST index at t 1 The effect of inserting record (k, [t 1, t 2 ], v): at t 2 Index Design
27
The LKLT index no update at t 1 Index Design The effect of inserting record (k, [t 1, t 2 ], v): at t 2
28
Update Operation Common update operation for both: insert (k, t):v. Index Design That is: add v to all points in [k, t] x [k max, t max ]. Conclusion: an index supporting point query and the above update can be used for LKLT and LKST.
29
The MVSB-tree A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree
30
Insertion The MVSB-tree
31
Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page.
32
Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page. copy Strong overflow: limit the number of records in a new page. root 2 : [4, t max ) root 1 : [1, 4)
33
Point Query (k, t ) Follows a single path: the nodes containing (k, t ). Aggregates the values found in this path. The MVSB-tree
34
Point Query (k, t ) Follows a single path: the nodes containing (k, t ). The MVSB-tree E.g.: PointQuery(23, 7) = 5+2 = 7. Aggregates the values found in this path.
35
Efficiency Theorem: with 2 MVSBT indices, we achieve: RTA query: O(log b n); Update: O(log b K); Space: O( * log b K). n = number of updates; K= number of different keys; b = page capacity (in records). The MVSB-tree
36
Performance Results Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; Datasets: created using the TimeIT [KS98] software and transformed to add record keys. Each dataset has a million records (10k unique keys; on average 100 intervals per key). Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results
37
Index Sizes Performance Results
38
Query Speedup Query time is averaged over 100 queries of the same query rectangle size.
39
Conclusions We addressed the range-temporal aggregation (RTA) problem; New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; Query time reduced from O(n) to O(log b n) with small space overhead; Open problems: Min/Max range-temporal aggregation; Valid-time environment; Multi-dimensional aggregation over objects with extents.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.