Temporal Indexing MVBT.

Slides:



Advertisements
Similar presentations
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Advertisements

Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
Temporal Indexing Snapshot Index. Transaction Time Environment Assume that when an event occurs in the real world it is inserted in the DB A timestamp.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CS4432: Database Systems II
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
1 The MV3R-Tree: A Spatio- Temporal Access Method for Timestamp and Interval Queries Yufei Tao and Dimitris Papadias Hong Kong University of Science and.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
An Asymptotically Optimal Multiversion B-Tree P. Widmayer B. Becker S. Gschwind T. Ohler B. Seeger Presented by Stan Rost.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Presenters : Virag Kothari,Vandana Ayyalasomayajula Date: 04/21/2010.
CS422 Principles of Database Systems Indexes
Data Indexing Herbert A. Evans.
Spatio-Temporal Databases
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes: Introduction
COP Introduction to Database Structures
Tree Indices Chapter 11.
B-Trees 7/5/2018 4:26 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Temporal Indexing MVBT.
Hash-Based Indexes Chapter 11
Database Management Systems (CS 564)
Chapter 11: Indexing and Hashing
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
External Memory Hashing
Introduction to Database Systems
Multi-Way Search Trees
Spatio-Temporal Databases
B+-Trees and Static Hashing
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
Indexing and Hashing Basic Concepts Ordered Indices
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
Temporal Databases.
Hash-Based Indexes Chapter 11
Database Systems (資料庫系統)
CPS216: Advanced Database Systems
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Presentation transcript:

Temporal Indexing MVBT

Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked in the company on 09/12/98” “Find the employees with salary between 35K and 50K on 04/21/99” Multiversion B-tree: answers efficiently the queries above

Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot be queried Persistent: updates can be applied at any version and any version can be queried Partially Persistent: updates applied to the last version and any version can be queried Transaction time fits with partially persistence

MVBT – The idea Store all versions of the state of a B+-tree which evolved over time, i.e. multiple “snapshots” of the tree Inserts, updates and deletes are applied to the present version of the tree and increase the version number of the whole tree Queries know which version of the tree they require the result(s) from

General method Transform a single version external access structure (with high utilization of disk blocks) at the cost of a constant factor in time and space requirements compared to the original, single version structure Such increase is asymptotically optimal = worst-case bounds cannot decrease by adding multiversion capability to existing structure

Proposition Specifics Extend B+-tree to have multiversion capability Support operations: Insert(key, info) Insert into current version, increase tree version Delete(key) Delete from current version, increase tree version Exact match query(key, version) Range query(lowkey, highkey, version)

Overview The multiversion B-tree is a directed acyclic graph of B-tree nodes that results from incremental changes to original B-tree It has a number of B+-tree root nodes which partition the versions from the first to the present one so that each B-tree root stands for an interval of versions A query can be answered by entering a multiversion B-tree at the corresponding root Time Logical view

Blocks (pages) Contain b data items Live if it has not been copied, dead otherwise Weak version condition: for every version i and each block A except the roots of versions, we require that the number of “alive” entries in A is either 0 or at least d, where b=k·d

Data Items Leaf node of the tree Inner node of the tree <key, in_version, del_version, info> Inner node of the tree <router, in_version, del_version, info> Said to be of version i if its lifespan contains i In live block, deletion version * denotes that this entry has not been deleted at present, in a dead block it means that the entry has not yet been deleted before the block died

Updates Each update creates new version If no structural changes: Insert: lifespan is [i, *) Delete: changes del_version from * to i A structural change is required if: Block overflow: can only fit b entries in a block Weak version underflow: if deletion in a block with exactly d current version entries

Structural Modification Copy the block and remove all but the present version entries from the copy If block consists of primarily present version entries, the copy will produce an almost full block, resulting in a split again after a few subsequent insertions To avoid this, request that at least εd+1 insert or delete operations are necessary for the next block overflow or version underflow in that block (ε will be defined later)

Strong Version Constraints Strong version condition: the number of present version entries after a version split must be in range from (1+ε)d to (k-ε)d. Strong version underflow: result of version split leading to less than (1+ε)d entries Attempt to merge with a sibling block containing only its present version entries, if necessary followed by a version independent split by key values Strong version overflow: if a version split leads to more than (k-ε)d entries in the block Also perform a split by key values

Original Tree (Version 1) Simple Example b=6 d=2 e=0.5 Original Tree (Version 1) Insert(40)

Simple Example (Version 2) Delete(65)

Simple Example (Version 3)

Insert(5) creates a block overflow Version Split Example Insert(5) creates a block overflow All currently live entries copied to the new live block A*, old block A marked dead Also, the root block is updated to show that entity 10 was alive in the dead block A until version 8 (Version 7)

Version Split Example Resulting “tree” (Version 8)

Weak V. Underflow Example Suppose b=6, d=2, ε=0.5 (d: the minimum # of current version entries in the block) Delete(40) results in block A only having 1 current entry 1<d: weak underflow, split A (Version 7)

Strong V. Underflow Example The version split of A has led to less than (1+ ε)d=1.5*2=3 entries in the new node strong version underflow Seek a sibling of A* (in our case, B) Version split it (to create B*) Merge B* and A* to produce block A*B* (Processing: Version 78)

Strong V. Underflow Condition Violation But now, the node A*B* violates the strong version overflow condition and must be split by key into nodes C and D (Processing: Version 78)

Resulting Tree Example Query: (25, 5) (Version 8)

Roots Can Split, Too Overflow Split

Strong version overflow with key split and allocation of the new block Roots Can Split, Too Strong version overflow with key split and allocation of the new block

Weak Version Underflow of the Root Node R3 has shrunkweak underflow Block copies of R3 and R4 are created and merged into R5 This causes weak version underflow of R2, so R5 becomes new root block

Algorithms Insertion: Find the leaf node for the new key e, then call blockInsert (say A) blockInsert: enter e in A If block overflow of A then Version split, block insert If strong version underflow then Merge Else if strong version overflow  key split

Algorithms Delete: blockDelete A Check weak version underflow on A If true, then merge with sibling Note that Deletion is easier than the insertion in the MVBT. What about the B+-tree?

Constraints on MVBT Parameters What are the restrictions on choices of k and ε? (k-ε)d+1 ≥ (1/α)(1+ε)d α is the fraction of the entries in a block guaranteed to be in a new node after a key split, 0.5 for B-trees Before key split, A contains at least (k-ε)d+1 entries After a key split, both blocks must contain at least (1+ε)d entries. 2d-1 ≥ (1+ε)d Before merge is performed, together there are at least 2d-1 present version entries in the blocks to be merged

Efficiency Analysis The big result is that the MVBT is asymptotically optimal to the B-Tree in the worst case in time and space for all considered operations Search time is in Space is O(n/b) and update

Additional Issues Can store the roots of version interval trees in a B-tree of their own; authors demonstrated that for most practical cases the depth of the root B-tree will never exceed two anyway.

Other Approach Overlapping B+-trees Idea: for each update store only the path from the root node to the leaf node, use the rest of the tree for the new version (since it is the same) Better approach, store only the new path Easier to implement but space is higher: O(n/b logb n/b)

Temporal Hashing Hashing is used to answer exact match queries Exact-match with time: “find if employee with id=23 was working in the company on 09/08/98” One approach: Use MVBT (4-5 I/Os) Temporal Hashing: can achieve 2 I/Os

Temporal Hashing Treat objects that fall into a given bucket Bi during the whole evolution as an evolving set Bi(t) To find if key k is in S(t), reconstruct bucket Bi and search for key k (Bi is the bucket that the key k should have been placed at t) Simply observe how an ephemeral hashing scheme would map the keys of S(t) in buckets and keep the history of each bucket

Temporal Hashing For each bucket use a Snapshot Index Using the SI we can reconstruct the bucket Note that when we split a bucket we treat the keys that moved to the new bucket as deleted for the old bucket, newly inserted for the new Another approach is to keep an evolving list

Bi-temporal Data Indexing Bi-temporal Index: R-tree for valid time Partial persistence for transaction time Partially persistent R-tree [Kumar at al. 97] What about the case that valid time has an open end_time (now)? A special R-tree to handle that [Saltenis et al. ‘99]