Download presentation
Presentation is loading. Please wait.
Published byJasmine Hawkins Modified over 9 years ago
1
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University of California, Riverside Bernhard Seeger University of Marburg, Germany
2
ICDE 2002, San Jose, CA Contents 4 Problem definition: GTE-Join 4 Straightforward approaches 4 Temporal indexing 4 Proposed join algorithms 4 Performance study 4 Conclusions
3
ICDE 2002, San Jose, CA Problem Definition 4 Temporal record: (key, start, end, attributes) 4 TE-Join: two records qualify for join if their time intervals intersect; and their keys are equal.
4
ICDE 2002, San Jose, CA TE-Join: “find the locations and Managers of all departments over time”.
5
ICDE 2002, San Jose, CA Problem Definition 4 GTE-Join: general TE-Join – record keys should be in a certain range r and time intervals should intersect a given interval i. temporal relations are large; TE-Join is a special case, when r and i are (- , + ). Interesting because:
6
ICDE 2002, San Jose, CA GTE-Join: “find the locations and managers of departments in range [D1, D2] during time [5, 10]”.
7
ICDE 2002, San Jose, CA Straightforward Solutions 4 Non-indexed join; 4 Unsynchronized join; 4 Synchronized join using B+-trees; 4 Synchronized join using R-trees.
8
ICDE 2002, San Jose, CA Straightforward Solutions 1. Non-indexed join: existing TE-Join research [Zur97] focuses on non-indexed join; not efficient for GTE-Join due to full scan. 2. Unsynchronized join: separate the selection and join phases; not efficient for: 4 storage of intermediate result; 4 selection in one relation ignores data distribution of the other relation.
9
ICDE 2002, San Jose, CA 3. Synchronized using B+-trees; Not efficient: Straightforward Solutions If cluster on start: Cluster on end is similar.
10
ICDE 2002, San Jose, CA records with keys in r are stored together and are sorted; focus on these records in each relation and sort-merge join, while skipping those whose intervals not in i. However, not efficient since records in the query rectangle are scattered. 3. Synchronized using B+-trees; Straightforward Solutions If cluster on key:
11
ICDE 2002, San Jose, CA Store each record as a two-dimensional interval in the R-tree; Use existing R-tree join algorithms [BKS93, HJR97]; Modification: integrate the selection regarding query rectangle. However, not efficient since R-trees do not handle long intervals well. 4. Synchronized using R-trees; Straightforward Solutions
12
ICDE 2002, San Jose, CA Our Solutions 4 Synchronized join using temporal indices. 4 Multi-version B+-tree (MVBT) [BGO+96]: asymptotically optimal space, update, query. 4 We propose: two categories of synchronized, MVBT-based join algorithms. (apply to other temporal indices as well)
13
ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.
14
ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.
15
ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.
16
ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.
17
ICDE 2002, San Jose, CA Review of MVBT 4 Suppose a page holds up to 3 records.
18
ICDE 2002, San Jose, CA
21
Review of MVBT 4 A “forest”: different trees may overlap; 4 Root nodes correspond to contiguous, non- intersecting time intervals; 4 A record may be stored in multiple pages; end time of all but the last copy is + . 4 Range-Interval selection algorithms [BS96]: avoid duplicate by reporting the first copy.
22
ICDE 2002, San Jose, CA The Incorrect End Time Problem Solution: report the rightmost copy! [BS96] reports first copy of x (whose end is + ); would lead GTE-Join algorithms to join x with y.
23
ICDE 2002, San Jose, CA Top-down Approaches 4 Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT). 4 STT for two trees: initially, join root nodes; to join two nodes, join their children; eventually, join elements in leaf pages. ? join condition?
24
ICDE 2002, San Jose, CA Balancing Condition Optimization (BCO) 4 To find, Page 3 and page 0 has to join; 4 BCO: balancing two conditions. (1) only intersecting pages join; (2) examine records even if not last copy. E.g. join when joining page 2 with page 0. 4 In general, join two pages even though they do not intersect. Inefficient!
25
ICDE 2002, San Jose, CA Virtual Height Optimization (VHO) 4 At the middle level, STT joins:,,,,, A1’ 4 With VHO:,
26
ICDE 2002, San Jose, CA Sideways Approach 1: Link-based 4 In each leaf page, store a pointer to its predecessor; D find pairs of data pages that intersect with the right border of the query rectangle and with each other; keep such pairs in priority queue; sweep left synchronously. 4 For GTE-Join:
27
ICDE 2002, San Jose, CA Sideways Approach 1: Link-based 4 In each leaf page, store a pointer to its predecessor; D 4 special techniques to avoid duplicates. find pairs of data pages that intersect with the right border of the query rectangle and with each other; keep such pairs in priority queue; sweep left synchronously. 4 For GTE-Join:
28
ICDE 2002, San Jose, CA Sideways Approach 2: Plane Sweep 4 Similar to link-based; 4 Maintain two priority queues, one for each MVBT; 4 At each step, access the leaf page with the largest end time and add records to buffer; 4 To add records to buffer, join with existing records from the other MVBT; 4 Throw away useless records.
29
ICDE 2002, San Jose, CA Performance Study Notation:Meaning: mvbt_dfSynchronized MVBT, depth-first mvbt_bfSynchronized MVBT, breadth-first mvbt_linkSynchronized MVBT, link-based mvbt_psSynchronized MVBT, plane-sweep mvbt_smUnsynchronized, sort-merge after selection b+Synchronized B+-tree, index on key r*_dfSynchronized R*-tree, depth-first r*_bfSynchronized R*-tree, breadth-first
30
ICDE 2002, San Jose, CA Experimental Setup Implemented in GNU C++; Sun Enterprise 250 Server machine with two UltraSPARC-II processors using Solaris 2.8; Page size = 8KB; Buffer size = 10MB; LRU buffer; Each data set: 10 million records; QRS: size ratio between the query rectangle and the whole space. Long intervals: 1/100 of time space; Short intervals: 1/10,000 of time space.
31
ICDE 2002, San Jose, CA GTE-Join Performance Joining mainly long intervals.
32
ICDE 2002, San Jose, CA GTE-Join Performance Joining mainly short intervals.
33
ICDE 2002, San Jose, CA GTE-Join Performance Varying QRS. (Log Scale)
34
ICDE 2002, San Jose, CA Conclusions 4 We addressed the GTE-Join; 4 Unsynchronized approach not efficient; 4 Synchronized approaches based on traditional indices (B+-tree, R-tree) also not efficient; 4 We proposed synchronized approaches based on temporal indices (MVBT); 4 We also proposed BCO and VHO optimizations; 4 Experiments: link-based is the best.
35
ICDE 2002, San Jose, CA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.