21st International Symposium on Temporal Representation and Reasoning

21st International Symposium on Temporal Representation and Reasoning
Lean Index Structures for Snapshot Access in Transaction-time Databases Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna Bologna, Italy TIME Verona, Italy, 8-10 September 2014

Introduction (1) Temporal Databases are the answer to advanced application requirements involving the representation and management of time-varying data Adopting a non-deletion policy, the full history of past database states is kept and past snapshots of database tables can be accessed for archiving, accountability or audit purposes In many cases, this can be done in a way transparent to the non-temporal users, via support of transaction time (e.g., with “system-versioned tables” in SQL:2011) TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Introduction (2) In a dynamic environment, the full maintenance of past data versions leads to a fast data growth which soon ends up in a performance issue In order to efficiently provide selective access to temporal snapshots, suitable index structures must be employed Several temporal index structures have been proposed in the 90’s for accessing transaction-time data: Snapshot Index [Tsotras & Kangelaris 1995] Time-Split B-Tree [Lomet & Salzberg 1990] Multiversion B-Tree [Becker et al. 1996] Append-only Tree [Gunadhi & Segev 1993] Time Index [Elmasri, Wuu & Kim, 1990] TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Introduction (3) Since temporal tuples may belong to multiple snapshots, a perfect clustering involving separation of snapshots is impossible without data duplication Hence, the theoretical asymptotic optimum for snapshot access time O(logb n + |s(T)|/b) can only be achieved at the expense of a (sometimes very high) data duplication For instance, the Time-Split B-Tree grants optimal shapshot access but with a very high data duplication rate (the index may become several times the size of the indexed relation!) The Snapshot index trades between data duplication and query performance via the usefulness parameter a TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Overview of the contribution
A new index structure supporting fast selective access to past snapshots of a transaction-time database table is presented The new structure, called RABTree, is lean: the index has low memory requirements with high occupancy and requires no data duplication An even leaner though less efficient variant, called RAB-Tree, is also presented Performance evaluation and comparison with competitors is presented … TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Sample Temporal Table
TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 P13 P14 90 P15 J P16 K TID Key Value Start End P1 A 10 T0 UC P2 B 20 P3 C 30 P4 D 50 P5 E 45 P6 F 25 TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 90 P15 J P16 K P17 P18 P19 L P20 M 60 Time T0: insert A,B,C,D,E,F; Time T1: update A,C; delete F; insert G,H,I Time T2: update C,G,H; delete A,D; insert J,K; Time T3: update C,J; delete G,K; insert L,M TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Sample Temporal Table
TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 90 P15 J P16 K P17 P18 P19 L P20 M 60 T0 >= Start and T0 < End Snapshot T0: P1,P2,P3,P4,P5,P6 Snapshot T1: P2,P4,P5,P7, P8,P9,P10,P11 Snapshot T2: P2,P5,P11,P12, P13,P14,P15,P16 Snapshot T3: P2,P5,P11,P14 P17,P18,P19,P20 T1 >= Start and T1 < End T2 >= Start and T2 < End T3 >= Start and T3 < End TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Compression Technique for TID Lists
Snapshot T0: P1,P2,P3,P4,P5,P6 Snapshot T1: P2,P4,P5,P7, P8,P9,P10,P11 Snapshot T2: P2,P5,P11,P12, P13,P14,P15,P16 Snapshot T3: P2,P5,P11,P14 P17,P18,P19,P20 Snapshot T0: P1%5 Snapshot T1: P2,P4%1,P7%4 Snapshot T2: P2,P5,P11%5 Snapshot T3: P2,P5,P11,P14 P17%3 TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Further Optimization
TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 90 P15 J P16 K P17 P18 P19 L P20 M 60 TID Key Value Start End P1 A 10 T0 T1 P2 C 30 P3 F 25 P4 D 50 T2 P5 B 20 UC P6 E 45 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 J P15 K P17 90 Tuples are naturally clustered wrt Start values according to their creation (append) order A further optimization can be made by superimposing a secondary order on End values This minimizes the number of ranges and maximizes the range length in TID lists Sorting required to achieve such optimization is quite inexpensive as it can be made in main memory during update operations TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

An “Optimized” Temporal Table
TID Key Value Start End P1 A 10 T0 T1 P2 C 30 P3 F 25 P4 D 50 T2 P5 B 20 UC P6 E 45 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 J P15 K P16 90 P17 P18 P19 L P20 M 60 Snapshot T0: P1%5 Snapshot T1: P4%7 Snapshot T2: P5%1,P11%5 Snapshot T3: P5%1,P11,P16%4 TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

The RABTree (1) The structure is very similar to a traditional B+-Tree
The compression technique is applied to TID-lists in the leaves A tree structure is built on time (Start attribute) above the leaves for fast access to a given snapshot (road-map to the desired TID-list) The resulting temporal index is also very similar to the Time Index [Elmasri, Wuu & Kim, 1990], from which it basically differs by the compression technique used for TID-lists In conclusion, we presented TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

The RABTree (2) Owing to the semantics of transaction time, the resulting index is append-only. New entries are inserted only in the rightmost nodes of each level. This leads to the name: Right-Append B+-Tree. Nice properties of the RABTree are a high memory occupancy (near 100% versus 69% of the B+-Tree) and a low height. The quantum leap wrt to the Time Index is the new compression technique for TID lists: The Time Index stores a full (uncompressed) TID list for the first entry of each leaf and deltas (TIDs of added and deleted tuples) for the other entries This leads to a worst case of O(n2/b) memory space for leaves, where n is the number of tuples in the indexed relation This makes the Time Index a non lean structure (and even impracticable in several cases, due to an excess of memory required) The space required by the RABTree is O(α n/b), with usually α<1, equal to a few percents of the indexed relation; however a worst case could theoretically exist In conclusion, we presented TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Tending to The RAB-Tree
TID Key Value Start End P1 A 10 T0 T1 P2 C 30 P3 F 25 P4 D 50 T2 P5 B 20 UC P6 E 45 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 J P15 K P16 90 P17 P18 P19 L P20 M 60 Snapshot T0: P1-P6 Snapshot T1: P4-P11 Snapshot T2: P5-P16 Snapshot T3: P5-P20 TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Snapshot Access with RAB/RAB-Tree
To Access the snaspshot S(T)… …with the RABTree: Retrieve in the index leaves entries Ti:Li,Ti+1:Li+1 such that Ti≤T<Ti+1 Access all tuples pointed by TIDs in Li …with the RAB-Tree: Retrieve in the index leaves entries Ti:Pi,Ti+1:Pi+1 such that Ti≤T<Ti+1 Sequentially access all tuples from the one pointed by Pi to the last one with Start<Ti+1 (discard if End≤T) In conclusion, we presented TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Experimental Settings
Stationary growth Only updates (and/or insertions balanced by deletions) are applied, so that the snapshot size stays constant; the file grows by the addition of snapshots 50% of tuples are updated by each transaction Starting from an initial snapshot with 1,000 tuples, we end up with 2,470,000 tuples after 5,000 transactions Linear growth Each transaction executes a constant number of ins/del/upd (on average), with a positive balance between insertions and deletions, so that the snapshot size grows linearly in time Number of ins/del/upd uniformly distributed in [0..100]/[0..50]/[0..200] Starting from an initial snapshot with 1,000 tuples, we end up with 1,510,000 tuples after 10,000 transactions Exponential growth Each transaction applies ins/del/upd at a constant rate (on average), with a positive balance between insertions and deletions, so that the snapshot size grows exponentially in time Rates of randomly ins/del/upd tuples are 5/2/30 % Starting from an initial snapshot with 1,000 tuples, we end up with 32,152,000 tuples after 500 transactions In conclusion, we presented TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Conclusions We presented the RABTree index (and its variant RAB-Tree) which is an I/O-suboptimal secondary structure for snapshot access in transaction-time databases. Without data duplication, with the proposed “optimized” storage, the RABTree index is I/O-optimal We proved our solutions to be efficient in different experimental configurarions and to show a good scale-up behaviour The compression technique adopted for TID-lists in the RABTree leaves has shown to provide excellent results, making it a lean indexing solution, suitable to settings where memory occupation is an issue In conclusion, we presented TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Future work We will study in deeper detail the performance of the RAB/RAB-Tree indexes and, in particular, of the proposed RABTree TID-list compression technique (e.g., to characterize the worst case) We will study the definition of “quasi transaction-time” databases, where retro- and pro-active transactions can be allowed, at a limited extent, although the adopted time dimension has the semantics of transaction time (the RAB/RAB-Tree indexes can still be used in this kind of database) In conclusion, we presented TIME F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

21st International Symposium on Temporal Representation and Reasoning

Similar presentations

Presentation on theme: "21st International Symposium on Temporal Representation and Reasoning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

21st International Symposium on Temporal Representation and Reasoning

Similar presentations

Presentation on theme: "21st International Symposium on Temporal Representation and Reasoning"— Presentation transcript:

Similar presentations

About project

Feedback