CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 3) The MV3-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing Arthur
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 2 Introduction Spatial-Temporal Database Management Systems (STDBMS) Mobile Phone Systmes Track users efficiently Provide better communication services Traffic Supervision Systems Monitor vehicle locations Motion patterns Urban Planning Record the development of landscapes over the years Retrieve urban situations at any given time in the past
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 3 Spatio-Temporal Queries Traditional STDBMS Focus on static objects Attempting to update the database whenever the objects change their positions which will cause the STDBMS to spend most of the time just handling the updates It would result in huge space requirement To deal with objects that have dynamic behavior, new querying languages, modeling methods, novel attribute representation and specialized access methods are needed STR-trees and TB-trees Focus on efficient trajectory retrieval TPR-trees Focus on predicting objects’ future locations by storing their current positions and velocities
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 4 Spatio-Temporal Queries (Continue) To deal with historical information retrieval, windows queries about objects that move in discrete time is commonly used. Timeslice or Timestamp Queries Retrieve all objects that intersect a windows at a specific timestamp Interval Queries Include several consecutive timestamps
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 5 Historical Information Retrieval Types of Indexing MR-trees and HR-trees Maintain a separate R-tree for each timestamp, but allow consecutive trees to share branches Advantages Efficient for timestamp queries, as search degenerates into a static spatial window query for which R-trees are very efficient Disadvantages Extensive duplication of objects (even if they do not move) which lead to huge space requirements for most typical application Poor performance on interval queries
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 6 Historical Information Retrieval (Continue) 3-Dimensional R-trees (the 3 rd dimension corresponding to time) An object which does not change its position during a certain period of time is modeled as a 3D box, bounding both its spatial and temporal attributes A moving object is modeled by multiple boxes, each corresponding to a different version. Advantages The temporal attribute is integrated tightly with the spatial attributes thus interval queries can be answered efficiently Redundant duplication is avoided thus space usage can be reduced.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 7 Historical Information Retrieval (Continue) Disadvantages Poor performance on timestamp queries as the query time depends on the total number of entries in history rather than the live entries at the query timestamp. time x y x y Object states at the same location from t 1 to t 2 At location A At location B Object moves from location A to B at time t 2 t1t1 t2t2 t1t1 t2t2 t3t3
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 8 Induction of MV3R-tree MV3R-tree utilizes the concepts of multi- version R-tree (MVR-tree) and a small auxiliary 3D R-tree The auxiliary 3D R-tree builts on the leave of the MVR-tree Aims at timestamp and interval window queries For retrieving the past locations of discretely moving objects Enhancing the performance of multi-version framework for multi-dimensional access methods
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 9 Induction of MV3R-tree (Continue) MVR-tree involves several heuristics that take into account the features of R-trees to improve performance significantly The auxiliary 3D R-trees outperform traditional 3D R-trees for most queries MV3R-tree space consumption up to an order of magnitude smaller than that of an HR-tree, while maintaining comparable timestamp query performance
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 10 Overview of MVB-trees, HR-trees & 3D R-trees Multi-version B-trees Extensions of B-trees Index the evolution of one-dimensional data in transaction time temporal databases Insertions and deletions can only happen at current time Entry in the form of For root and intermediate entries, pointers points to a next level node For leaf entries, pointers points to the actual record with the corresponding key value. An object is said to be alive at time t if : t start ≤ t < t end
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 11 Overview of MVB-trees (Continue) Temporal attributes t start and t end denotes the time that the record was inserted and deleted in the database respectively Deletions are logical, i.e. actual records are not physically removed from database For currently live entries, t end would be denotes as * where * means “NOWTIME” Can have multiple roots and each root has a jurisdiction interval For each timestamp t, each node, except the roots, is required that either none or at least b. P version entries are alive at t P version = tree parameter b = node capacity
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 12 Overview of MVB-trees (Continue) Examples: P version = 1/3, b = 6, minimum entries = 1/3 * 6 = 2 Root A B C Point to leaf node A. Created at time 1 and is alive Point to leaf node B. Created at time 1 and is alive Point to leaf node C. Created at time 1 and is alive Key 43 in leaf node B is created at time 1 is still alive Key 95 in leaf node C is created at time 1and is deleted at time 3
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 13 Overview of MVB-trees (Continue) Insertions Root A B C D A BC Root Insertion of at timestamp 4 cause node A overflow A new node D is created to store live entries of A A “dies” meaning that it will not be modified in the future
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 14 Overview of MVB-trees (Continue) Deletion Root A B C A B C Root D E Deletion of at timestamp 4 causes node B underflow. Key split is performed. A sibling is chosen, say node C. Live entries are copied to new nodes (D & E). Nodes B and C died. Node D and E created.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 15 Overview of MVB-trees (Continue) Summary Insertions and deletions may cause overflow and underflow respectively, thus create version splits. Version splits create data redundancy for those entries duplicated. Such redundancy harms interval query performance as both the original and duplicated versions may need to be retrieved. Underflow – Strong and Weak Versions Strong version underflow happens after a version split. Weak version underflow occurs when the weak version condition is violated.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 16 Overview of MVB-trees (Continue) MVB-trees require O(N/b) space, where N is the number of updates ever made to the database and b is the block capacity. Answering a timestamp range query requires O(log b M + r/b) I/O’s, where M is the number of live object at the queried timestamp, and r is the number of output objects.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 17 Overview of MVB-trees, HR-trees & 3D R-trees Historical R-trees Based on the overlapping technique, another framework for transforming a single version data structure into a transaction time access method The structure maintains an R-tree for each timestamp, but common branches of consecutive trees are stored only once in order to save space A timestamp query is directed to the corresponding R-tree and search is performed inside the tree body. Thus, the query degenerates into an ordinary window query and is handled very efficiently
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 18 Overview of Historical R-trees (Continue) An interval query should search the corresponding trees of all the timestamps involved. Example: object e changes position at timestamp 1 a0a0 b0b0 a0a0 c0c0 d0d0 e0e0 R0R0 A0A0 B0B0 C0C0 D0D0 E0E0 Timstamp 0 a0a0 b0b0 e1e1 c0c0 d0d0 e0e0 R1R1 B1B1 C1C1 D1D1 E1E1 Timstamp 1 Object e deleted from node E at timestamp 1 Object e added to node D at timestamp 1 e1e1 e0e0 As no object in node A changes from timestamp 0 to 1, it can be shared by other trees.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 19 Overview of MVB-trees, HR-trees & 3D R-trees 3D R-trees They view time as just another dimension and integrate it in the tree construction The movements of 2D objects can be modeled as distinct boxes in three dimensional space Temporal projection denotes the period when the corresponding object remains static Spatial projections of the box correspond to the object’s position and extents during the period
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 20 Overview of 3D R-trees (Continue) No mechanism to ensure that each node has a minimum number of live entries at a given short- interval queries Poor in timestamp and short-interval query performance A single tree for the whole history A node may have a lot of dead space at a timestamp t, meaning that there is a high chance that the query window intersects the bounding box but no object inside it Where there are many objects with long lifespans, the problem becomes more serious because these objects will force the node that contain them to have long lifespans as well It depends on the total number of records, rather than on the number of records alive at the queried timestamps
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 21 Overview of 3D R-trees (Continue) Good performance in long interval queries No redundancy R-trees optimize queries with similar extents along all dimensions time x y Different objects at different locations at different time interval Only one live entry may be retrieved at a timestamp t. Ttimestamp t A high chance that the query window intersects the bounding box but no object inside
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 22 MV3R-Trees Why needs MV3R-Trees Currently, there are no such structure that can effectively handle both timestamp and interval queries. Why is MV3R-Trees good Reduce the structure size but improve query performance Are applicable to other multi-dimensional access methods when they are converted to corresponding multi-version structures
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 23 MV3R-Trees (Continue) How MV3R-Trees work Contain a multi-version R-Tree (MVR-tree) and a small auxiliary 3D R-tree built on the leaf nodes of the MVR-tree MVR-tree can contain multiple R-trees, which refer to as “logical trees”. Each entry has the form as with MVB-trees:. S denotes the spatial minimum bounding rectangle (MBR) as defined in R-trees. MVR-tree inherit the concept of weak version condition.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 24 MV3R-Trees (Insertion and overflow) Intermediate nodes insertion Allow redundancy in order to maintain good performance for timestamp and short-interval queries Process of insertion for intermediate nodes: Insertion Block Overflow Version Split Strong Version Overflow Key Split Version Split at timestamp 10 The lifespan of A1 does not include timestamp 10, its MBR does not cover C6. The MBR of A1 may be tightedned MBR of B1 is small than A1 because only C5 and C6 is bounded by B1 at timestamp A... A... B C Insertion of object C6 at timestamp 10
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 25 MV3R-Trees (Insertion and overflow) Do not consider at all strong version underflows because Underflow in MVR-trees happens much less frequently than overflows Handle underflows by entry re-insertion, which may trigger block overflows in several other nodes Version splits need to take into account the spatial extents of the nodes.
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 26 MV3R-Trees (Insertion and overflow) Leaf nodes insertion Try to avert version splits thus reduce redundancy to reduce storage space but maintain timestamp query performance Small number of leaf nodes will facilitate interval query processing using the auxiliary 3D R-tree Process of insertion for leaf nodes: To avoid version splits, try the following alternatives in order: 1.General Key Split 2.Insert in node after reinserting one of its entries 3.Insert in another node 4.Version split
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 27 MV3R-Trees (Insertion and overflow) 1.General Key Split If a new entry is to be inserted, the entries can be distributed to two nodes so that for each timestamp in a time range, there exist at least b. P version entries alive. Thus version split, which generate version redundancy for entries, can be avoided. However, it may be difficult or impossible. Two new nodes should have small overlap. General key split is different from ordinary key split because ordinary key split is applied when all the entries in the node to be split are alive and their t start equal current time. 2.Reinsert an Existing Entry of the Node Any leaf node can store a re-inserted entry provided that: Its lifespan must cover that of the entry It should be dead if the entry is dead Its area should not be enlarged much, in order to ensure good performance for timestamp queries Insertion Block Overflow 1.General Key Split 2.Insert in node after reinserting one of its entries 3.Insert in another node 4.Version Split Strong Version Overflow Key Split
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 28 MV3R-Trees (Insertion and overflow) Dead entries always come before live ones as live entries will be reinserted only into live nodes, which may also overflow and induce the same problem in the future Among the dead and live entries, sorting is based on the area decrease of the nod MBR caused by the entry deletion Reinsert a single entry even if it is possible to reinsert more because Reinsertion saves space but does not achieve structure improvement Reinsertion of a single entry already achieves the objective of averting the version split 3.Insert in Another Node Tries to insert the new entry into another node that is not full Backtrack to the upper level and try to insert the entry into another branch Only consider branches that will incur small area enlargement The area enlargement of candidate branches can only exceed that of the best branch by a certain percentage
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 29 MV3R-Trees (Insertion and overflow) Conclusion General Key Split Does not require reading any more pages. It is the most efficient method in terms of update cost It reduces the number of entries in the new nodes so that they will not overflow again in the near future Reinsert an Existing Entry of the Node It can search multiple branches Update costs are compensated by the space savings Insert in Another Node Requires backtracking up to level 2 Update costs are compensated by the space savings
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 30 MV3R-Trees (Deletion and Underflow) It is handled in a way similar to R*-trees if a deletion does not incur structural changes An entry is physically deleted, only if its t start is equal to the current time (multiple updates may happen at the same timestamp) Intermediate and leaf nodes deletions are handled in the similar way as of insertion. Intermediate node deletion Suppose an underflow occurs at the current timestamp t, the live entries of t end to be set to t. Then these entries are re-inserted into the most recent logical R-tree after setting t start = t. Apply the R*-tree algorithms
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 31 MV3R-Trees (Deletion and Underflow) Leaf node deletion To avoid redundancy caused by reinsertion, a live entry from a sibling node will be borrowed The borrowed (moved) entry should have the properties: It must be alive and its lifespan must be covered by the original sibling node, say node A After the removal of the entry, the version condition of the borrowed sibling node, say node B, must still be satisfied Inserting this entry to the original sibling node, node A, will not cause its MBR to increase above a threshold
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 32 AB MV3R-Trees (Deletion and Underflow) Example: S (root) AB Timestamp 1 AB Timestamp 2 Object B4 has been borrowed by A Deletion of object A1 at timestamp 2 which cause underflow of node A Insertion of objects B4, B5 and B6 at timestamp 2 Objects B2 and B3 cannot be moved because their deletion from B will cause weak version underflow for timestamp 1
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 33 MV3R-Trees (Auxiliary 3D R-tree) Built on the leaves of the MVR-tree in order to process interval queries For a given moderate node capacity, the number of leaf nodes in an MVR-tree is much lower than the actual number of objects. Smaller in size as compare to a complete 3D R-tree Adding auxiliary 3D R-tree not only improves interval query performance, but may also provides flexibility in other scenarios Construction of auxiliary 3D R-tree Whenever a leaf node of the MVR-tree is updated, the change is propagated to its entry in the 3D R-tree
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 34 Query Processing with MV3R-trees MVR-tree Timestamp query involves retrieval of the root whose jurisdiction interval covers the queried timestamp, and then search is performed similarly to R-trees 3D R-tree For interval queries, multiple trees may need to be searched Should avoid duplicate visits to the same node via different parents, otherwise, result in severe IO cost Duplicate pointers to a node are created in version splits or entry reinsertions In both cases, the two entries pointing to the same node have disjoint lifespans For short interval, it will be used whenever the temporal query length exceeds a certain threshold Its performance deteriorates gradually as the tree grows
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 35 Query Processing with MV3R-trees Example A... B C time x y Cube of A1Cube of B1 Cube of C3 Cube of C1Cube of C2 Cube of C4 Query Cube 1. A1 and B1 are temporally adjacent 2. A1 spatially covers C1 and C2 3. B1 spatially covers C2, C3 and C4 4. C2 and C3 intersect the query box so their subtrees (node E and F) should be search 5. Since node C may be reached twice (by following A1 and B1), we may attemp redundant visits to E and F
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 36 Conclusion of MV3R-trees A structure that combines the concepts of MVB- trees and 3D R-trees MV3R-trees can handle timestamp and interval queries efficiently with relatively small space requirements MV3R-trees could be further improved by: Analytical cost models for determining the optimal tree to answer short interval queries Overflow and underflow handling heuristics that are more efficient in terms of update cost, and can avert more version splits
CSIS Advanced Database Technologies Spatio-Temporal Data (Part 3) 37 References Tao, Y., Papadias, D. The MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries, 2000