Download presentation
Presentation is loading. Please wait.
1
Joining Interval Data in Relational Databases
By Jost Enderle Matthias Hampel Thomas Seidl 12/1/2018 Akshay S. Vaidya
2
Outline Problem at hand Current solutions Important concepts
Solution to problem at hand Some numbers Final words 12/1/2018 Akshay S. Vaidya
3
Problem at hand Joining interval data in relational databases
Interval is duration of period of time It is a scalar E.g. 24 hours Chronological period is a pair of date times It is a compound object E.g. 2nd January 2004 – 10th November 2004 Has definition for Overlaps operator Above period Overlaps with 2nd November 2004 – 3rd December 2004 12/1/2018 Akshay S. Vaidya
4
Problem at hand Overlaps operator formulation
Where R.period overlaps S.period Where R.period.lower <= S.period.upper and R.period.upper >= S.period.lower P1: 2nd January 2004 – 10th November 2004 P2: 2nd November 2004 – 3rd December 2004 2nd January 2004 <= 3rd December 2004 And 10th November 2004 >= 2nd November 2004 R and S can have 1000s of rows and the overlaps operator needs to be executed 1,000,000 of times 12/1/2018 Akshay S. Vaidya
5
Problem at hand Relational databases support equi-joins efficiently not Multiple Interdependent Join predicates Sort order needed on interval values for sort merge joins Here there are 2 such values Index can help but it is still costly It can provide advantage if Index is designed for interval data 12/1/2018 Akshay S. Vaidya
6
Current solutions Nested loop algorithms Quadratic cost E.g. TJ-2
Assumes inner join relation to be sorted Other approaches Assume inner join relation can be stored in memory 12/1/2018 Akshay S. Vaidya
7
Current solutions Sort-Merge algorithms
Need to find sort order on period data E.g. TJ-1, multi-predicate merge join Relations need to be sorted on lower and upper points of interval E.g. TJ-3 For append only databases E.g. Algorithm by Pfoser et al E.g. Algorithm by Zhang et al, Multi predicate merge join Does not support an overlaps join predicate 12/1/2018 Akshay S. Vaidya
8
Current solutions Partition Based join
Partition input relations on join attribute values and compare tuples in relevant partitions Something like using MBRs and R+ trees E.g. Algorithms by Snodgrass and Jensen Perform poorly for lots of tuples having long intervals E.g. Algorithms by Sitzmann and Stuckey Optimize performance of above algorithm using histograms Difficult to integrate with existing databases E.g. Algorithm by lu, Ooi and Tan Convert temporal data to spatial data and perform spatial join Not efficient and difficult to integrate to existing databases 12/1/2018 Akshay S. Vaidya
9
Current solutions Index Based joins
Use indices defined on join attributes for efficient retrieval Need to check if the index can be integrated in the existing databases E.g. Algorithm by Zhang, Tsotras and Seeger Efficient implementation by MVBT (Multi version B+ tree) index Difficult to integrate with existing databases Complex algorithms!! 12/1/2018 Akshay S. Vaidya
10
Important concepts Relational Interval Tree
Binary tree of height h covers range [1, 2h -1 ] of interval bounds. Tree is virtual, traversal of tree is purely arithmetic Each node represents an instant of time Each non-leaf node represents an interval for the entire sub-tree rooted at it 12/1/2018 Akshay S. Vaidya
11
Important concepts Relational Interval (RI) tree 12/1/2018
Akshay S. Vaidya
12
Important concepts Height h = 5 , range = [ 1, 31 ] 12/1/2018
Akshay S. Vaidya
13
Important concepts Node 4 represents the instant 4, 21 represents instant 21 12/1/2018 Akshay S. Vaidya
14
Important concepts Node 2 represents interval [1, 3] 12/1/2018
Akshay S. Vaidya
15
Important concepts Node 12 represents interval [9,15],[10,14],[11,14] etc 12/1/2018 Akshay S. Vaidya
16
Important concepts Node 16 represents interval [1, 31], etc 12/1/2018
Akshay S. Vaidya
17
Important concepts Mapping of an interval to a node. E.g. [ 2, 13 ]
It is the root of the smallest sub-tree including the nodes 2 and 13 12/1/2018 Akshay S. Vaidya
18
Important concepts It is the root of the smallest sub-tree including the nodes 2 and 13 i.e. Node 8 12/1/2018 Akshay S. Vaidya
19
Important concepts Database indices of RI tree Two Indices one on the upper and the other on the lower interval values Values in the Lower index stores has the following structure Structure is indexed by the lower interval value Node is the interval Node computed as previously shown for [ 2, 13 ] Lower is the lower bound of the interval i.e. 2 ID is the primary key of the record in the table 12/1/2018 Akshay S. Vaidya
20
Important concepts Database indices of RI tree
Similarly for Upper index Node is the interval Node computed as previously shown for [ 2, 13 ] Upper is the upper bound of the interval i.e. 13 ID is the primary key of the record in the table 12/1/2018 Akshay S. Vaidya
21
Important concepts Using Relational Interval (RI) tree
Record with ID Mary has interval [ 2, 13 ] Record with ID John has interval [ 4, 23 ] Record with ID Bob has interval [ 10, 21 ] Record with ID Ann has interval [ 21, 30 ] 12/1/2018 Akshay S. Vaidya
22
Important Concepts Left queries, Right queries and Inner Queries
For any Range [ lower, upper ], for all the nodes on the path from root to lower and upper Nodes registered to the left of lower may intersect the range - left queries Nodes registered to the right of upper may intersect the range – right queries Nodes registered in [ lower, upper ] DEFINITELY intersect the range – inner queries 12/1/2018 Akshay S. Vaidya
23
Important Concepts For any Range [ 11, 13 ], for all the nodes on the path from root to lower and upper i.e. 16 to 11 and 16 to 13 12/1/2018 Akshay S. Vaidya
24
Important Concepts Nodes registered to the left of lower may intersect the range - left queries here { 8, 10 } Note here that range [ 1, 9 ] and [ 1, 13 ] are both registered at 8, however only [1, 13] intersects with [11, 13 ] 12/1/2018 Akshay S. Vaidya
25
Important Concepts Nodes registered to the right of upper may intersect the range – right queries here { 14, 16 } 12/1/2018 Akshay S. Vaidya
26
Important Concepts Nodes registered in [ lower, upper ] DEFINITELY intersect the range – inner queries here [ 11, 13 ] 12/1/2018 Akshay S. Vaidya
27
Solution to problem at hand
Determine the IDs of all the tuples that intersect with a given range [ lower, upper ] Determine left queries Get the IDs of the tuples from left queries that intersect with range [ lower, upper ] Determine right queries Get the IDs of the tuples from right queries that intersect with range [ lower, upper ] Determine inner queries Get IDs of all the tuples from inner queries Return union of all the above 12/1/2018 Akshay S. Vaidya
28
Solution to problem at hand
Determine the IDs of all the tuples that intersect with a given range [ lower, upper ] Determine left queries Get the IDs of the tuples from left queries that intersect with range [ lower, upper ] Determine right queries Get the IDs of the tuples from right queries that intersect with range [ lower, upper ] Determine inner queries Get IDs of all the tuples from inner queries Return union of all the above 12/1/2018 Akshay S. Vaidya
29
Solution to problem at hand
The final intersection query 12/1/2018 Akshay S. Vaidya
30
Solution to problem at hand
Note here that range [ 1, 9 ] and [ 1, 13 ] are both registered at 8, however only [1, 13] intersects with [11, 13 ] 12/1/2018 Akshay S. Vaidya
31
Solution to problem at hand
Both [1, 9] and [1, 13] are picked on condition 1 But only [ 1, 13] passes condition 2 Since upper of [1, 13] >= lower of query [11, 13 ] i.e. 13 >= 11 12/1/2018 Akshay S. Vaidya
32
Solution to problem at hand
Condition 1 does filtering and condition 2 does range comparison Less tuples have to be compared for range intersection here due to filtering lowering time for query evaluation 12/1/2018 Akshay S. Vaidya
33
Solution to problem at hand
Index based loop join Use the OVERLAPS algorithm defined previously to basic loop join algorithm For each tuple in the outer relation, project the all tuples that OVERLAP with the inner relation Approach is naïve, does not take partitioning into consideration 12/1/2018 Akshay S. Vaidya
34
Solution to problem at hand
Partition based join In an RI tree each interval in a tuple is assigned to exactly one node of the tree Some observations can be made regarding which nodes of the second relation join with a given node of the first relation 12/1/2018 Akshay S. Vaidya
35
Solution to problem at hand
For all tuples of relation R registered at node nR (node value n at relation R) , the tuples registered at the following nodes in S may join with them Nodes on the path from the root of the tree to the node nS (called UP nodes) Node nS ( definite overlap) All nodes in the sub tree of nS (called down nodes) 12/1/2018 Akshay S. Vaidya
36
Solution to problem at hand
Here nR (= nS) = 12 The shaded region in S represents node that may join with nR 12/1/2018 Akshay S. Vaidya
37
Solution to problem at hand
UP nodes = { 8, 16 } DOWN nodes = { 9, 10, 11, 13, 14, 15 } 12/1/2018 Akshay S. Vaidya
38
Solution to problem at hand
Depending on the manner in which the two relations are partitioned, we have three approaches RI-Tree Up-Down join RI-Tree Down-Down join RI-Tree Up-Up join We will discuss the Up-Down approach in greater detail 12/1/2018 Akshay S. Vaidya
39
Solution to problem at hand
Steps for RI-Tree Up-Down join approach Determine all nodes in relation R Determine the max-range for all the nodes in relation R Alternatively you can also find the exact-range for the nodes Determine all join partners in UP of nS Determine all join partners in DOWN of nS Generate the query performing the join 12/1/2018 Akshay S. Vaidya
40
Solution to problem at hand
Determine all nodes in relation R Note that the Index structure is actually materialized in the database so you get all the nodes that are actually present in the database 12/1/2018 Akshay S. Vaidya
41
Solution to problem at hand
For nR = 12 step = abs( 12 – 8 ) = 4 fn_lower = 12 – = 9 fn_upper = – 1 = 15 max-range = [ 9, 15 ] Determine the max-range for all the nodes in relation R max-range is [ fn_lower, fn_upper] fn_lower = nR – step + 1 fn_upper = nR + step – 1 step = abs( nR – parent( nR )) 12/1/2018 Akshay S. Vaidya
42
Solution to problem at hand
Alternatively you can also find the exact-range for the nodes This will be a closer bound for range covered by node R since it is dynamically determined based on values in the database Even if node 12 in theory may cover the range[ 9, 15] the table R may only have tuples in range [11, 15] registered at node 12. Here you will get [11, 15], in the previous case you will get [9, 15] 12/1/2018 Akshay S. Vaidya
43
Solution to problem at hand
Determine all join partners in UP of nS For all nodes in UP( nS ) If the node is less than ns add it to left queries If the node is greater than ns add it to right queries 12/1/2018 Akshay S. Vaidya
44
Solution to problem at hand
Determine all join partners in DOWN of nS Nodes on the path of nS to fn_lower are stored as left queries Nodes on the path of nS to fn_upper are stored as right queries Range between [ fn_lower, ns -1 ] is stored as left query Range between [ ns + 1, fn_upper ] is stored as right query 12/1/2018 Akshay S. Vaidya
45
Solution to problem at hand
Left query schema (forknode, from, to) : from and to define the range Similarly, Right query schema 12/1/2018 Akshay S. Vaidya
46
Solution to problem at hand
The following left queries are generated for nS = 12 (12, 8, 8) (12, 10, 10) (12, 9, 11) 12/1/2018 Akshay S. Vaidya
47
Solution to problem at hand
Generate the query performing the join Project ID pairs (R.ID, S.ID) for all the tuples where nR = nS Project ID pairs for all left queries of nS where upper portion of interval S.period intersects with lower portion of interval R.period Project ID pairs for all right queries of nS where lower portion of interval S.period intersects with upper portion of interval R.period 12/1/2018 Akshay S. Vaidya
48
Solution to problem at hand
Intersection guaranteed because range of tuples registered at node n intersects with instant n I.E. instant n in period [ fn_lower, fn_upper ] Tuples registered at the same node in R and S definitely overlap and so project them unconditionally 12/1/2018 Akshay S. Vaidya
49
Solution to problem at hand
Similarly for right queries for nS Intervals registered at Nodes corresponding to Left queries of S does not include instant nR. However, the interval may match the interval registered at nR. Project ID pairs for all left queries of nS where upper portion of interval S.period intersects with lower portion of interval R.period 12/1/2018 Akshay S. Vaidya
50
Solution to problem at hand
The finally generated query 12/1/2018 Akshay S. Vaidya
51
Solution to problem at hand
The Down-Down strategy, briefly 12/1/2018 Akshay S. Vaidya
52
Solution to problem at hand
The Up-Up strategy, briefly 12/1/2018 Akshay S. Vaidya
53
Some numbers Experimental setup
Integration with Oracle Server Release 9.2.0 Pentium 4 with 2.4 GHZ 512 MB of RAM EIDE hard drive 200 database blocks cache Block size 2kB 12/1/2018 Akshay S. Vaidya
54
Some numbers 12/1/2018 Akshay S. Vaidya
55
Some numbers 12/1/2018 Akshay S. Vaidya
56
Some numbers 12/1/2018 Akshay S. Vaidya
57
Some numbers 12/1/2018 Akshay S. Vaidya
58
Some numbers 12/1/2018 Akshay S. Vaidya
59
Some numbers 12/1/2018 Akshay S. Vaidya
60
Some numbers 12/1/2018 Akshay S. Vaidya
61
Final words Number indicate
Approach can be used on an existing database RI-tree UP-DOWN method outperforms standard B-tree methods RI-tree UP-DOWN method scales well and works equally well on real data 12/1/2018 Akshay S. Vaidya
62
Final words Future work
Cost models for prediction of performance based on data characteristics 12/1/2018 Akshay S. Vaidya
63
References 12/1/2018 Akshay S. Vaidya
64
References Bercken J. v. d., Seeger B.: Query Processing Techniques for Multiversion Access Methods. VLDB 1996, Sitzmann I., Stuckey P.J.: Improving Temporal Joins Using Histograms. DEXA 2000, 12/1/2018 Akshay S. Vaidya
65
Questions ??? Is a RI tree dynamically extensible
YES. Since it grows at the roots How much memory will a RI tree with nodes require 0 Bytes. Since, it is a virtual tree. Why are all you guys still here!!! 12/1/2018 Akshay S. Vaidya
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.