21st International Symposium on Temporal Representation and Reasoning

Slides:



Advertisements
Similar presentations
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Advertisements

Hashing and Indexing John Ortiz.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Presenters : Virag Kothari,Vandana Ayyalasomayajula Date: 04/21/2010.
CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles 3. Overview of data storage and indexing.
10/3/2017 Chapter 6 Index Structures.
Data Indexing Herbert A. Evans.
Spatio-Temporal Databases
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Indexing and hashing.
Multiway Search Trees Data may not fit into main memory
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Parallel Databases.
Database Management System
Tree-Structured Indexes
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Dynamic Multi-version Ontology-based Personalization
Physical Database Design and Performance
COMP 430 Intro. to Database Systems
Hash-Based Indexes Chapter 11
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Chapter 11: Indexing and Hashing
Spatio-Temporal Databases
File Organizations and Indexing
Multi-temporal RDF Ontology Versioning
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Tree-Structured Indexes
Hash-Based Indexes Chapter 10
Indexing and Hashing Basic Concepts Ordered Indices
Tree-Structured Indexes
RUM Conjecture of Database Access Method
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Indexing 1.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
CPS216: Advanced Database Systems
Indexing 4/11/2019.
Tree-Structured Indexes
Tree-Structured Indexes
Chapter 11 Instructor: Xin Zhang
CS4433 Database Systems Indexing.
Tree-Structured Indexes
Chapter 11: Indexing and Hashing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
B+-trees In practice, B-trees are not used much as defined earlier.
Presentation transcript:

21st International Symposium on Temporal Representation and Reasoning Lean Index Structures for Snapshot Access in Transaction-time Databases Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna Bologna, Italy TIME 2014 Verona, Italy, 8-10 September 2014

Introduction (1) Temporal Databases are the answer to advanced application requirements involving the representation and management of time-varying data Adopting a non-deletion policy, the full history of past database states is kept and past snapshots of database tables can be accessed for archiving, accountability or audit purposes In many cases, this can be done in a way transparent to the non-temporal users, via support of transaction time (e.g., with “system-versioned tables” in SQL:2011) TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Introduction (2) In a dynamic environment, the full maintenance of past data versions leads to a fast data growth which soon ends up in a performance issue In order to efficiently provide selective access to temporal snapshots, suitable index structures must be employed Several temporal index structures have been proposed in the 90’s for accessing transaction-time data: Snapshot Index [Tsotras & Kangelaris 1995] Time-Split B-Tree [Lomet & Salzberg 1990] Multiversion B-Tree [Becker et al. 1996] Append-only Tree [Gunadhi & Segev 1993] Time Index [Elmasri, Wuu & Kim, 1990] TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Introduction (3) Since temporal tuples may belong to multiple snapshots, a perfect clustering involving separation of snapshots is impossible without data duplication Hence, the theoretical asymptotic optimum for snapshot access time O(logb n + |s(T)|/b) can only be achieved at the expense of a (sometimes very high) data duplication For instance, the Time-Split B-Tree grants optimal shapshot access but with a very high data duplication rate (the index may become several times the size of the indexed relation!) The Snapshot index trades between data duplication and query performance via the usefulness parameter a TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Overview of the contribution A new index structure supporting fast selective access to past snapshots of a transaction-time database table is presented The new structure, called RABTree, is lean: the index has low memory requirements with high occupancy and requires no data duplication An even leaner though less efficient variant, called RAB-Tree, is also presented Performance evaluation and comparison with competitors is presented … TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Sample Temporal Table TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 P13 P14 90 P15 J P16 K TID Key Value Start End P1 A 10 T0 UC P2 B 20 P3 C 30 P4 D 50 P5 E 45 P6 F 25 TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 90 P15 J P16 K P17 P18 P19 L P20 M 60 Time T0: insert A,B,C,D,E,F; Time T1: update A,C; delete F; insert G,H,I Time T2: update C,G,H; delete A,D; insert J,K; Time T3: update C,J; delete G,K; insert L,M TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Sample Temporal Table TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 90 P15 J P16 K P17 P18 P19 L P20 M 60 T0 >= Start and T0 < End Snapshot T0: P1,P2,P3,P4,P5,P6 Snapshot T1: P2,P4,P5,P7, P8,P9,P10,P11 Snapshot T2: P2,P5,P11,P12, P13,P14,P15,P16 Snapshot T3: P2,P5,P11,P14 P17,P18,P19,P20 T1 >= Start and T1 < End T2 >= Start and T2 < End T3 >= Start and T3 < End TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Compression Technique for TID Lists Snapshot T0: P1,P2,P3,P4,P5,P6 Snapshot T1: P2,P4,P5,P7, P8,P9,P10,P11 Snapshot T2: P2,P5,P11,P12, P13,P14,P15,P16 Snapshot T3: P2,P5,P11,P14 P17,P18,P19,P20 Snapshot T0: P1%5 Snapshot T1: P2,P4%1,P7%4 Snapshot T2: P2,P5,P11%5 Snapshot T3: P2,P5,P11,P14 P17%3 TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

A Further Optimization TID Key Value Start End P1 A 10 T0 T1 P2 B 20 UC P3 C 30 P4 D 50 T2 P5 E 45 P6 F 25 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 90 P15 J P16 K P17 P18 P19 L P20 M 60 TID Key Value Start End P1 A 10 T0 T1 P2 C 30 P3 F 25 P4 D 50 T2 P5 B 20 UC P6 E 45 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 J P15 K P17 90 Tuples are naturally clustered wrt Start values according to their creation (append) order A further optimization can be made by superimposing a secondary order on End values This minimizes the number of ranges and maximizes the range length in TID lists Sorting required to achieve such optimization is quite inexpensive as it can be made in main memory during update operations TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

An “Optimized” Temporal Table TID Key Value Start End P1 A 10 T0 T1 P2 C 30 P3 F 25 P4 D 50 T2 P5 B 20 UC P6 E 45 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 J P15 K P16 90 P17 P18 P19 L P20 M 60 Snapshot T0: P1%5 Snapshot T1: P4%7 Snapshot T2: P5%1,P11%5 Snapshot T3: P5%1,P11,P16%4 TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

The RABTree (1) The structure is very similar to a traditional B+-Tree The compression technique is applied to TID-lists in the leaves A tree structure is built on time (Start attribute) above the leaves for fast access to a given snapshot (road-map to the desired TID-list) The resulting temporal index is also very similar to the Time Index [Elmasri, Wuu & Kim, 1990], from which it basically differs by the compression technique used for TID-lists In conclusion, we presented TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

The RABTree (2) Owing to the semantics of transaction time, the resulting index is append-only. New entries are inserted only in the rightmost nodes of each level. This leads to the name: Right-Append B+-Tree. Nice properties of the RABTree are a high memory occupancy (near 100% versus 69% of the B+-Tree) and a low height. The quantum leap wrt to the Time Index is the new compression technique for TID lists: The Time Index stores a full (uncompressed) TID list for the first entry of each leaf and deltas (TIDs of added and deleted tuples) for the other entries This leads to a worst case of O(n2/b) memory space for leaves, where n is the number of tuples in the indexed relation This makes the Time Index a non lean structure (and even impracticable in several cases, due to an excess of memory required) The space required by the RABTree is O(α n/b), with usually α<1, equal to a few percents of the indexed relation; however a worst case could theoretically exist In conclusion, we presented TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Tending to The RAB-Tree TID Key Value Start End P1 A 10 T0 T1 P2 C 30 P3 F 25 P4 D 50 T2 P5 B 20 UC P6 E 45 P7 15 P8 P9 G P10 H P11 I P12 40 T3 P13 P14 J P15 K P16 90 P17 P18 P19 L P20 M 60 Snapshot T0: P1-P6 Snapshot T1: P4-P11 Snapshot T2: P5-P16 Snapshot T3: P5-P20 TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Snapshot Access with RAB/RAB-Tree To Access the snaspshot S(T)… …with the RABTree: Retrieve in the index leaves entries Ti:Li,Ti+1:Li+1 such that Ti≤T<Ti+1 Access all tuples pointed by TIDs in Li …with the RAB-Tree: Retrieve in the index leaves entries Ti:Pi,Ti+1:Pi+1 such that Ti≤T<Ti+1 Sequentially access all tuples from the one pointed by Pi to the last one with Start<Ti+1 (discard if End≤T) In conclusion, we presented TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Experimental Settings Stationary growth Only updates (and/or insertions balanced by deletions) are applied, so that the snapshot size stays constant; the file grows by the addition of snapshots 50% of tuples are updated by each transaction Starting from an initial snapshot with 1,000 tuples, we end up with 2,470,000 tuples after 5,000 transactions Linear growth Each transaction executes a constant number of ins/del/upd (on average), with a positive balance between insertions and deletions, so that the snapshot size grows linearly in time Number of ins/del/upd uniformly distributed in [0..100]/[0..50]/[0..200] Starting from an initial snapshot with 1,000 tuples, we end up with 1,510,000 tuples after 10,000 transactions Exponential growth Each transaction applies ins/del/upd at a constant rate (on average), with a positive balance between insertions and deletions, so that the snapshot size grows exponentially in time Rates of randomly ins/del/upd tuples are 5/2/30 % Starting from an initial snapshot with 1,000 tuples, we end up with 32,152,000 tuples after 500 transactions In conclusion, we presented TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Conclusions We presented the RABTree index (and its variant RAB-Tree) which is an I/O-suboptimal secondary structure for snapshot access in transaction-time databases. Without data duplication, with the proposed “optimized” storage, the RABTree index is I/O-optimal We proved our solutions to be efficient in different experimental configurarions and to show a good scale-up behaviour The compression technique adopted for TID-lists in the RABTree leaves has shown to provide excellent results, making it a lean indexing solution, suitable to settings where memory occupation is an issue In conclusion, we presented TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases

Future work We will study in deeper detail the performance of the RAB/RAB-Tree indexes and, in particular, of the proposed RABTree TID-list compression technique (e.g., to characterize the worst case) We will study the definition of “quasi transaction-time” databases, where retro- and pro-active transactions can be allowed, at a limited extent, although the adopted time dimension has the semantics of transaction time (the RAB/RAB-Tree indexes can still be used in this kind of database) In conclusion, we presented TIME 2014 - F. Grandi – Lean Index Structures for Snapshot Access in Transaction-time Databases