Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University.

Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University of California, Riverside ** Philipps Universität Marburg, Germany

Outline Introduction & Motivation Problem Decomposition The MVSB-tree Performance Results Conclusions

Introduction & Motivation Consider a collection of temporal records. Each record: key k, value v, time interval [t 1, t 2 ]. E.g.: employees and their salaries over time. Temporal Aggregation: aggregate values over time. Focus on SUM/COUNT/AVG. Introduction & Motivation

Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Previous Work ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Previous Work E.g. the sum over [t 1, t 2 ] is 28. ‘Given interval [t 1, t 2 ], aggregate over all records that intersect [t 1, t 2 ]’. (SB-tree [YW01]) E.g. the sum at t 2 is 13. ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation

Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t 1, t 2 ] with keys in range [k 1, k 2 ]’. E.g. the RTA-sum over [k 1, k 2 ]x[t 1, t 2 ] is 19. Introduction & Motivation Find AVG salary over past ten years of all employees whose last names start with ‘B’.

Alternative: Introduction & Motivation Previous approaches would need a separate index for each possible key range. (inefficient) Our solution: O(log b n). -index the records; -selection query: ‘find all records intersecting [k 1, k 2 ]x [t 1, t 2 ]’. -Query time is O(n).

Problem Decomposition LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. Problem Decomposition Decompose RTA into LKST and LKLT queries. E.g. LKST(k 2, t 2 )=11.

LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. Problem Decomposition E.g. LKLT(k 2, t 2 )=20.

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ])

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) - LKST(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) - LKLT(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )- LKLT(k 1, t 1 )

= + - Problem Decomposition RTA([k 1, k 2 ]x[t 1, t 2 ]) LKST(k 2, t 2 )- LKST(k 1, t 2 ) LKLT(k 2, t 2 )- LKLT(k 1, t 2 )LKLT(k 2, t 1 )- LKLT(k 1, t 1 )

RTA([k 1, k 2 ]x[t 1, t 2 ]) =LKST(k 2, t 2 )- LKST(k 1, t 2 ) + LKLT(k 2, t 2 )- LKLT(k 1, t 2 ) - LKLT(k 2, t 1 )+ LKLT(k 1, t 1 ) The RTA query is decomposed to LKST and LKLT. Problem Decomposition

Both LKST and LKLT are point queries: ‘given k, t, return value’. An index for LKST and LKLT should:  store points in key-time space;  maintain a value for each point;  support point queries. Index Design

Model Assume updates come in increasing time order (transaction-time model). at t 1, inserted as: at t 2, updated as: Index Design a record:

The LKST index at t 1 The effect of inserting record (k, [t 1, t 2 ], v): at t 2 Index Design

The LKLT index no update at t 1 Index Design The effect of inserting record (k, [t 1, t 2 ], v): at t 2

Update Operation Common update operation for both: insert (k, t):v. Index Design That is: add v to all points in [k, t] x [k max, t max ]. Conclusion: an index supporting point query and the above update can be used for LKLT and LKST.

The MVSB-tree A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree

Insertion The MVSB-tree

Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page.

Insertion (cont.) The MVSB-tree To handle overflow, copy records with end=t max to a new page. copy Strong overflow: limit the number of records in a new page. root 2 : [4, t max ) root 1 : [1, 4)

Point Query (k, t ) Follows a single path: the nodes containing (k, t ). Aggregates the values found in this path. The MVSB-tree

Point Query (k, t ) Follows a single path: the nodes containing (k, t ). The MVSB-tree E.g.: PointQuery(23, 7) = 5+2 = 7. Aggregates the values found in this path.

Efficiency Theorem: with 2 MVSBT indices, we achieve:  RTA query: O(log b n);  Update: O(log b K);  Space: O( * log b K). n = number of updates; K= number of different keys; b = page capacity (in records). The MVSB-tree

Performance Results Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; Datasets: created using the TimeIT [KS98] software and transformed to add record keys. Each dataset has a million records (10k unique keys; on average 100 intervals per key). Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results

Index Sizes Performance Results

Query Speedup Query time is averaged over 100 queries of the same query rectangle size.

Conclusions We addressed the range-temporal aggregation (RTA) problem; New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; Query time reduced from O(n) to O(log b n) with small space overhead; Open problems:  Min/Max range-temporal aggregation;  Valid-time environment;  Multi-dimensional aggregation over objects with extents.

Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University.

Similar presentations

Presentation on theme: "Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University.

Similar presentations

Presentation on theme: "Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras *, D. Gunopulos * and B. Seeger ** * University."— Presentation transcript:

Similar presentations

About project

Feedback

Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University.

Presentation on theme: "Efficient Computation of Temporal Aggregates with Range Predicates D. Zhang *, A. Markowetz **, V. J. Tsotras , D. Gunopulos and B. Seeger ** * University."— Presentation transcript: