Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joining Interval Data in Relational Databases

Similar presentations


Presentation on theme: "Joining Interval Data in Relational Databases"— Presentation transcript:

1 Joining Interval Data in Relational Databases
By Jost Enderle Matthias Hampel Thomas Seidl 12/1/2018 Akshay S. Vaidya

2 Outline Problem at hand Current solutions Important concepts
Solution to problem at hand Some numbers Final words 12/1/2018 Akshay S. Vaidya

3 Problem at hand Joining interval data in relational databases
Interval is duration of period of time It is a scalar E.g. 24 hours Chronological period is a pair of date times It is a compound object E.g. 2nd January 2004 – 10th November 2004 Has definition for Overlaps operator Above period Overlaps with 2nd November 2004 – 3rd December 2004 12/1/2018 Akshay S. Vaidya

4 Problem at hand Overlaps operator formulation
Where R.period overlaps S.period Where R.period.lower <= S.period.upper and R.period.upper >= S.period.lower P1: 2nd January 2004 – 10th November 2004 P2: 2nd November 2004 – 3rd December 2004 2nd January 2004 <= 3rd December 2004 And 10th November 2004 >= 2nd November 2004 R and S can have 1000s of rows and the overlaps operator needs to be executed 1,000,000 of times 12/1/2018 Akshay S. Vaidya

5 Problem at hand Relational databases support equi-joins efficiently not Multiple Interdependent Join predicates Sort order needed on interval values for sort merge joins Here there are 2 such values Index can help but it is still costly It can provide advantage if Index is designed for interval data 12/1/2018 Akshay S. Vaidya

6 Current solutions Nested loop algorithms Quadratic cost E.g. TJ-2
Assumes inner join relation to be sorted Other approaches Assume inner join relation can be stored in memory 12/1/2018 Akshay S. Vaidya

7 Current solutions Sort-Merge algorithms
Need to find sort order on period data E.g. TJ-1, multi-predicate merge join Relations need to be sorted on lower and upper points of interval E.g. TJ-3 For append only databases E.g. Algorithm by Pfoser et al E.g. Algorithm by Zhang et al, Multi predicate merge join Does not support an overlaps join predicate 12/1/2018 Akshay S. Vaidya

8 Current solutions Partition Based join
Partition input relations on join attribute values and compare tuples in relevant partitions Something like using MBRs and R+ trees E.g. Algorithms by Snodgrass and Jensen Perform poorly for lots of tuples having long intervals E.g. Algorithms by Sitzmann and Stuckey Optimize performance of above algorithm using histograms Difficult to integrate with existing databases E.g. Algorithm by lu, Ooi and Tan Convert temporal data to spatial data and perform spatial join Not efficient and difficult to integrate to existing databases 12/1/2018 Akshay S. Vaidya

9 Current solutions Index Based joins
Use indices defined on join attributes for efficient retrieval Need to check if the index can be integrated in the existing databases E.g. Algorithm by Zhang, Tsotras and Seeger Efficient implementation by MVBT (Multi version B+ tree) index Difficult to integrate with existing databases Complex algorithms!! 12/1/2018 Akshay S. Vaidya

10 Important concepts Relational Interval Tree
Binary tree of height h covers range [1, 2h -1 ] of interval bounds. Tree is virtual, traversal of tree is purely arithmetic Each node represents an instant of time Each non-leaf node represents an interval for the entire sub-tree rooted at it 12/1/2018 Akshay S. Vaidya

11 Important concepts Relational Interval (RI) tree 12/1/2018
Akshay S. Vaidya

12 Important concepts Height h = 5 , range = [ 1, 31 ] 12/1/2018
Akshay S. Vaidya

13 Important concepts Node 4 represents the instant 4, 21 represents instant 21 12/1/2018 Akshay S. Vaidya

14 Important concepts Node 2 represents interval [1, 3] 12/1/2018
Akshay S. Vaidya

15 Important concepts Node 12 represents interval [9,15],[10,14],[11,14] etc 12/1/2018 Akshay S. Vaidya

16 Important concepts Node 16 represents interval [1, 31], etc 12/1/2018
Akshay S. Vaidya

17 Important concepts Mapping of an interval to a node. E.g. [ 2, 13 ]
It is the root of the smallest sub-tree including the nodes 2 and 13 12/1/2018 Akshay S. Vaidya

18 Important concepts It is the root of the smallest sub-tree including the nodes 2 and 13 i.e. Node 8 12/1/2018 Akshay S. Vaidya

19 Important concepts Database indices of RI tree Two Indices one on the upper and the other on the lower interval values Values in the Lower index stores has the following structure Structure is indexed by the lower interval value Node is the interval Node computed as previously shown for [ 2, 13 ] Lower is the lower bound of the interval i.e. 2 ID is the primary key of the record in the table 12/1/2018 Akshay S. Vaidya

20 Important concepts Database indices of RI tree
Similarly for Upper index Node is the interval Node computed as previously shown for [ 2, 13 ] Upper is the upper bound of the interval i.e. 13 ID is the primary key of the record in the table 12/1/2018 Akshay S. Vaidya

21 Important concepts Using Relational Interval (RI) tree
Record with ID Mary has interval [ 2, 13 ] Record with ID John has interval [ 4, 23 ] Record with ID Bob has interval [ 10, 21 ] Record with ID Ann has interval [ 21, 30 ] 12/1/2018 Akshay S. Vaidya

22 Important Concepts Left queries, Right queries and Inner Queries
For any Range [ lower, upper ], for all the nodes on the path from root to lower and upper Nodes registered to the left of lower may intersect the range - left queries Nodes registered to the right of upper may intersect the range – right queries Nodes registered in [ lower, upper ] DEFINITELY intersect the range – inner queries 12/1/2018 Akshay S. Vaidya

23 Important Concepts For any Range [ 11, 13 ], for all the nodes on the path from root to lower and upper i.e. 16 to 11 and 16 to 13 12/1/2018 Akshay S. Vaidya

24 Important Concepts Nodes registered to the left of lower may intersect the range - left queries here { 8, 10 } Note here that range [ 1, 9 ] and [ 1, 13 ] are both registered at 8, however only [1, 13] intersects with [11, 13 ] 12/1/2018 Akshay S. Vaidya

25 Important Concepts Nodes registered to the right of upper may intersect the range – right queries here { 14, 16 } 12/1/2018 Akshay S. Vaidya

26 Important Concepts Nodes registered in [ lower, upper ] DEFINITELY intersect the range – inner queries here [ 11, 13 ] 12/1/2018 Akshay S. Vaidya

27 Solution to problem at hand
Determine the IDs of all the tuples that intersect with a given range [ lower, upper ] Determine left queries Get the IDs of the tuples from left queries that intersect with range [ lower, upper ] Determine right queries Get the IDs of the tuples from right queries that intersect with range [ lower, upper ] Determine inner queries Get IDs of all the tuples from inner queries Return union of all the above 12/1/2018 Akshay S. Vaidya

28 Solution to problem at hand
Determine the IDs of all the tuples that intersect with a given range [ lower, upper ] Determine left queries Get the IDs of the tuples from left queries that intersect with range [ lower, upper ] Determine right queries Get the IDs of the tuples from right queries that intersect with range [ lower, upper ] Determine inner queries Get IDs of all the tuples from inner queries Return union of all the above 12/1/2018 Akshay S. Vaidya

29 Solution to problem at hand
The final intersection query 12/1/2018 Akshay S. Vaidya

30 Solution to problem at hand
Note here that range [ 1, 9 ] and [ 1, 13 ] are both registered at 8, however only [1, 13] intersects with [11, 13 ] 12/1/2018 Akshay S. Vaidya

31 Solution to problem at hand
Both [1, 9] and [1, 13] are picked on condition 1 But only [ 1, 13] passes condition 2 Since upper of [1, 13] >= lower of query [11, 13 ] i.e. 13 >= 11 12/1/2018 Akshay S. Vaidya

32 Solution to problem at hand
Condition 1 does filtering and condition 2 does range comparison Less tuples have to be compared for range intersection here due to filtering lowering time for query evaluation 12/1/2018 Akshay S. Vaidya

33 Solution to problem at hand
Index based loop join Use the OVERLAPS algorithm defined previously to basic loop join algorithm For each tuple in the outer relation, project the all tuples that OVERLAP with the inner relation Approach is naïve, does not take partitioning into consideration 12/1/2018 Akshay S. Vaidya

34 Solution to problem at hand
Partition based join In an RI tree each interval in a tuple is assigned to exactly one node of the tree Some observations can be made regarding which nodes of the second relation join with a given node of the first relation 12/1/2018 Akshay S. Vaidya

35 Solution to problem at hand
For all tuples of relation R registered at node nR (node value n at relation R) , the tuples registered at the following nodes in S may join with them Nodes on the path from the root of the tree to the node nS (called UP nodes) Node nS ( definite overlap) All nodes in the sub tree of nS (called down nodes) 12/1/2018 Akshay S. Vaidya

36 Solution to problem at hand
Here nR (= nS) = 12 The shaded region in S represents node that may join with nR 12/1/2018 Akshay S. Vaidya

37 Solution to problem at hand
UP nodes = { 8, 16 } DOWN nodes = { 9, 10, 11, 13, 14, 15 } 12/1/2018 Akshay S. Vaidya

38 Solution to problem at hand
Depending on the manner in which the two relations are partitioned, we have three approaches RI-Tree Up-Down join RI-Tree Down-Down join RI-Tree Up-Up join We will discuss the Up-Down approach in greater detail 12/1/2018 Akshay S. Vaidya

39 Solution to problem at hand
Steps for RI-Tree Up-Down join approach Determine all nodes in relation R Determine the max-range for all the nodes in relation R Alternatively you can also find the exact-range for the nodes Determine all join partners in UP of nS Determine all join partners in DOWN of nS Generate the query performing the join 12/1/2018 Akshay S. Vaidya

40 Solution to problem at hand
Determine all nodes in relation R Note that the Index structure is actually materialized in the database so you get all the nodes that are actually present in the database 12/1/2018 Akshay S. Vaidya

41 Solution to problem at hand
For nR = 12 step = abs( 12 – 8 ) = 4 fn_lower = 12 – = 9 fn_upper = – 1 = 15 max-range = [ 9, 15 ] Determine the max-range for all the nodes in relation R max-range is [ fn_lower, fn_upper] fn_lower = nR – step + 1 fn_upper = nR + step – 1 step = abs( nR – parent( nR )) 12/1/2018 Akshay S. Vaidya

42 Solution to problem at hand
Alternatively you can also find the exact-range for the nodes This will be a closer bound for range covered by node R since it is dynamically determined based on values in the database Even if node 12 in theory may cover the range[ 9, 15] the table R may only have tuples in range [11, 15] registered at node 12. Here you will get [11, 15], in the previous case you will get [9, 15] 12/1/2018 Akshay S. Vaidya

43 Solution to problem at hand
Determine all join partners in UP of nS For all nodes in UP( nS ) If the node is less than ns add it to left queries If the node is greater than ns add it to right queries 12/1/2018 Akshay S. Vaidya

44 Solution to problem at hand
Determine all join partners in DOWN of nS Nodes on the path of nS to fn_lower are stored as left queries Nodes on the path of nS to fn_upper are stored as right queries Range between [ fn_lower, ns -1 ] is stored as left query Range between [ ns + 1, fn_upper ] is stored as right query 12/1/2018 Akshay S. Vaidya

45 Solution to problem at hand
Left query schema (forknode, from, to) : from and to define the range Similarly, Right query schema 12/1/2018 Akshay S. Vaidya

46 Solution to problem at hand
The following left queries are generated for nS = 12 (12, 8, 8) (12, 10, 10) (12, 9, 11) 12/1/2018 Akshay S. Vaidya

47 Solution to problem at hand
Generate the query performing the join Project ID pairs (R.ID, S.ID) for all the tuples where nR = nS Project ID pairs for all left queries of nS where upper portion of interval S.period intersects with lower portion of interval R.period Project ID pairs for all right queries of nS where lower portion of interval S.period intersects with upper portion of interval R.period 12/1/2018 Akshay S. Vaidya

48 Solution to problem at hand
Intersection guaranteed because range of tuples registered at node n intersects with instant n I.E. instant n in period [ fn_lower, fn_upper ] Tuples registered at the same node in R and S definitely overlap and so project them unconditionally 12/1/2018 Akshay S. Vaidya

49 Solution to problem at hand
Similarly for right queries for nS Intervals registered at Nodes corresponding to Left queries of S does not include instant nR. However, the interval may match the interval registered at nR. Project ID pairs for all left queries of nS where upper portion of interval S.period intersects with lower portion of interval R.period 12/1/2018 Akshay S. Vaidya

50 Solution to problem at hand
The finally generated query 12/1/2018 Akshay S. Vaidya

51 Solution to problem at hand
The Down-Down strategy, briefly 12/1/2018 Akshay S. Vaidya

52 Solution to problem at hand
The Up-Up strategy, briefly 12/1/2018 Akshay S. Vaidya

53 Some numbers Experimental setup
Integration with Oracle Server Release 9.2.0 Pentium 4 with 2.4 GHZ 512 MB of RAM EIDE hard drive 200 database blocks cache Block size 2kB 12/1/2018 Akshay S. Vaidya

54 Some numbers 12/1/2018 Akshay S. Vaidya

55 Some numbers 12/1/2018 Akshay S. Vaidya

56 Some numbers 12/1/2018 Akshay S. Vaidya

57 Some numbers 12/1/2018 Akshay S. Vaidya

58 Some numbers 12/1/2018 Akshay S. Vaidya

59 Some numbers 12/1/2018 Akshay S. Vaidya

60 Some numbers 12/1/2018 Akshay S. Vaidya

61 Final words Number indicate
Approach can be used on an existing database RI-tree UP-DOWN method outperforms standard B-tree methods RI-tree UP-DOWN method scales well and works equally well on real data 12/1/2018 Akshay S. Vaidya

62 Final words Future work
Cost models for prediction of performance based on data characteristics 12/1/2018 Akshay S. Vaidya

63 References 12/1/2018 Akshay S. Vaidya

64 References Bercken J. v. d., Seeger B.: Query Processing Techniques for Multiversion Access Methods. VLDB 1996, Sitzmann I., Stuckey P.J.: Improving Temporal Joins Using Histograms. DEXA 2000, 12/1/2018 Akshay S. Vaidya

65 Questions ??? Is a RI tree dynamically extensible
YES. Since it grows at the roots How much memory will a RI tree with nodes require 0 Bytes. Since, it is a virtual tree. Why are all you guys still here!!! 12/1/2018 Akshay S. Vaidya


Download ppt "Joining Interval Data in Relational Databases"

Similar presentations


Ads by Google