ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.

Slides:



Advertisements
Similar presentations
Relational Database and Data Modeling
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 3D_XML A three-Dimensional XML-based Model Khadija Ali, Jaroslav Pokorný Czech Technical University Prague - Czech Republic.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
Advanced Databases Temporal Databases Dr Theodoros Manavis
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Temporal Indexing Snapshot Index. Transaction Time Environment Assume that when an event occurs in the real world it is inserted in the DB A timestamp.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Introduction to Structured Query Language (SQL)
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Physical Database Monitoring and Tuning the Operational System.
Historical XML Databases Fusheng Wang and Carlo Zaniolo University of California, Los Angeles.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Introduction to Structured Query Language (SQL)
IST Databases and DBMSs Todd S. Bacastow January 2005.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
RDB/1 An introduction to RDBMS Objectives –To learn about the history and future direction of the SQL standard –To get an overall appreciation of a modern.
Database Technical Session By: Prof. Adarsh Patel.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 4th Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Chapter 9 Database Systems Introduction to CS 1 st Semester, 2014 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Fushen Wang, XinZhou, Carlo Zaniolo Using XML to Build Efficient Transaction- Time Temporal Database Systems on Relational Databases In Time Center, 2005.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
XML and Database.
Topics Related to Attribute Values Objectives of the Lecture : To consider sorting relations by attribute values. To consider Triggers and their use for.
Space-Efficient Support for Temporal Text Indexing in a Document Archive Context Kjetil Nørvåg Department of Computer and Information Science Norwegian.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Session 1 Module 1: Introduction to Data Integrity
CS240A: Databases and Knowledge Bases Temporal Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
XML: Extensible Markup Language
Practical Database Design and Tuning
Module 11: File Structure
CS422 Principles of Database Systems Course Overview
Temporal Databases.
Temporal Queries in XML Document Archives and Web Warehouses
Temporal Databases.
Data Model.
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs
Presentation transcript:

ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles

Motivation: Temporal Applications  Financial applications  Record-keeping applications  Scheduling applications  Scientific applications Most database applications are temporal in nature:

Temporal Databases: the Reality  Over 40 temporal data models and query languages have been proposed in the past  A long struggle to get around the limitations of RDBMS  No DBMS vendors have moved aggressively to extend SQL with temporal support

What’s Needed?  Expressive temporal representations and data models with minimal or no extension  Powerful languages for temporal queries with minimal or no extension  Indexing, clustering and query optimization techniques for efficient query support  Architectures that bring these together A temporal database system that provides:

Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion

Background: Publishing Relational Database as XML  Publishing relational DBs as XML  as actual XML documents: SQL/XML  as XML views: SilkRoute, XPeranto

Viewing Relation History in XML  Our proposal: view the history of relational DBs as XML documents:  Such history can be naturally represented in XML, without any extension to the data model  Temporal queries can be expressed in XQuery as is—without any extension to the language  Amenable for efficiently implementations

Temporal Grouping in XML  Temporal data models can be classified as:  Temporally ungrouped  Temporally grouped  Temporally grouped data models have more expressive power and are more natural for users  It is difficult to fit temporally grouped models into RDBMS  Temporally grouped data model can be represented well in XML

Example: Transaction-Time History of Tables  Timestamped tuple snapshots (temporally ungrouped) nameempnosalarytitle deptno DOBstartend Bob Engineerd Bob Engineerd Bob Sr Engineerd Bob Tech Leaderd nameempnosalarytitledeptnoDOB Bob : : : Engineer : d : : : Sr Engineer : d : Tech Leader :  Temporally grouped history of employees

XML Representation of DB History Bob Engineer Sr Engineer Tech Leader d01 d Bob Engineer Sr Engineer Tech Leader d01 d

Advantages of XML Representations  The attribute value history is grouped, and can be queried directly  The H-document has a well-defined schema generated from the current table  The interval constraints are maintained in the updates

Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion

Temporal Queries with XQuery  XQuery: the coming standard query language for XML  With XQuery, we can specify temporal queries without any extension:  Temporal projection, snapshot queries, temporal joins, interval queries  Complex queries: A SINCE B, continuous periods, period containment

Temporal Queries with XQuery  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s } element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }  Snapshot queries: retrieve the departments on : for $d in doc("depts.xml")/depts/dept [tstart(.) = " "] let $n := $d/name[tstart(.) =" "] let $m := $d/manager[tstart(.) = " "] return( element dept{$n,$m } ) for $d in doc("depts.xml")/depts/dept [tstart(.) = " "] let $n := $d/name[tstart(.) =" "] let $m := $d/manager[tstart(.) = " "] return( element dept{$n,$m } )

Temporal Functions  Shield the user from the low-level details used in representing time, e.g., “now”  Eliminate the need for the user to write complex functions, e.g., coalescing  Predefined functions:  Restructuring: coalese($l)  Period comparison : toverlaps, tprecedes, tcontains, tequals, tmeets  Duration and date/time: tstart($e), tend($e), timespan($e)  telement(Ts, Te): constructs an empty element element timestamped as tstart=Ts, tend=Te

Support for ‘now’  ‘now’: no change until now  Internally, “end of time” values are used to denote ‘now’, e.g.,  Intervals are only accessed through built-in functions: tstart() returns the start of an interval, tend() returns the end or CURRENT_DATE if it’s different from  In the output, tend value can be:  “ ”  CURRENT_DATE by using rtend($e) that recursively replaces all the occurrence of with the current date,  “now”, using externalnow($e) that recursively replaces all the occurrence of \ " with the string \now".

Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion

The ArchIS System  Two approaches are possible for storing and querying H- documents (H-views)  Native XML database approach: store H-documents directly into XML DB  XML-enabled RDBMS. Design issues include:  mapping (shredding) the XML views representing the H- documents into tables (H-tables)  translation of queries from the XML views to the H-tables  indexing, clustering and query mapping techniques  ArchIS: Archival Information System

The ArchIS System: Architecture H-tables Relational Data Current Database Active Rules/ update logs Temporal XML Data SQL Queries Temporal XML Queries H-views (H-documents) ARCHIS

H-tables  Assumptions  Each entity or relation has a unique key ( or composite keys) to identify it which will not change along the history. e.g., employee: empno  H-tables:  attribute history table: store history of each attribute  key table: built for the key  global relation table: record the history of relations  e.g.: current database:  employee(empno, name, sex, DOB, deptno, salary, title)

H-tables (cont’d) current table H-tables employeeglobal relation table relations(relationname, tstart, tend) empnokey tableemployee_id(id, tstart, tend) nameattribute history table employee_name(id, name, tstart, tend) …… salaryemployee_salary(id, salary, tstart, tend) titleemployee_title(id, title, tstart, tend)

H-tables (cont’d)  Sample contents of employee_salary: ID SALARY TSTART TEND ======= ======= ========== ========== /04/ /04/ /05/ /04/ /05/ /04/ /05/ /03/ /04/ /03/ /13/ /13/

Updating Table Histories  Changes in the current database can be tracked with either update logs or triggers  DB2: triggers  ArchIS: update logs

Query Mapping  General purpose query mapping: XPeranto  In ArchIS, we have well-defined mapping between H-documents (or H-views) and H- tables  We map temporal XQuery queries into SQL, utilizing SQL/XML  SQL/XML is a new standard to map between RDBMS and XML  Both tag-binding and structure construction is pushed inside the relational engine, thus be very efficient

SQL/XML Publishing Functions  XMLElement and XMLAttribute  XMLAgg select XMLElement (Name "dept", XMLAttributes (tstart as "tstart", tend as "tend"), deptname) from dept where deptname = ‘Sales’ Sales Sales select XMLElement (Name as "new_employees", XMLAttributes ("02/04/2003" as "Since") XMLAgg (XMLElement (Name as "employee", e.name)) from employee_name as e where e.tstart >= ‘02/04/2003’ Bob Jack Bob Jack

XQuery Mapping to SQL with SQL/XML select XMLElement (Name "salaryhistory", XMLAgg (XMLElement (Name as "salary", XMLAttributes (S.tstart as tstart, S.tend as "tend"), S.salary))) from employee_salary as S, employee_name as N where N.id = S.id and N.name = 'Bob' group by N.id select XMLElement (Name "salaryhistory", XMLAgg (XMLElement (Name as "salary", XMLAttributes (S.tstart as tstart, S.tend as "tend"), S.salary))) from employee_salary as S, employee_name as N where N.id = S.id and N.name = 'Bob' group by N.id  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s } element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }

XQuery Mapping to SQL with SQL/XML: Steps  Identification of variable range  Map variables in FOR/LET clause into underlying H- tables  Generation of join conditions  There is a join condition any pair of distinct tuple variables: join them by ids  Translation of built in functions  Map built-in temporal functions in XQuery into functions in ArchIS  Output generation  use XMLElement and XMLAgg constructs

Temporal Clustering and Indexing  Tuples in H-tables are stored in the order of updates, thus neither temporally clustered nor clustered by objects  Traditional indexes such as B+ Tree will not help on snapshot queries, and better temporal clustering is needed  For every segment, usefulness: U = N live /N all  At the beginning, U =100%, and it decreases with updates  The minimum tolerable usefulness: U min

Live Segment-based Clustering Scheme Segment 1 All Segment 2 All Live Segment 3 All segstart1segend1segstart2segend2segstart3segend3 tstart tuple <= segend SEG tend tuple >= segstart SEG tstart tuple <= segend SEG tend tuple >= segstart SEG

Segment-based Clustering Scheme  Initially all tuples for an attribute history table are archived in a live segment SEG live with usefulness U =100%. With updates, when U drops below U min : 1. A new segment is allocated; 2. The interval of this segment is recorded in the table segment(segno, segstart, segend); 3. All tuples in SEGlive are copied into a new segment Si sorted by id; 4. All live tuples in SEG live are copied into a new live segment SEG live', and the old live segment is dropped; After that, the new segment SEG live’ becomes the new starting segment for updates

Segment-based Clustering Scheme (cont’d)  Sample segments: Segment1 (01/01/ /17/1991): ID SALARY TSTART TEND /20/198802/19/ /20/ /19/ /20/ /19/ /20/ /31/ Segment2 (10/18/ /08/1995): ID SALARY TSTART TEND /20/ /19/ /20/ /18/ /19/ /18/ /19/ /18/ /19/ /31/

Advantages of Segment-based Clustering Scheme  The current live segment always has a high usefulness, assuring efficient updates;  Records are globally temporally clustered on segments;  For snapshot queries, only one segment is used; for interval queries, only segments involved are used;  Flexibility to control the number of redundant tuples in segments with U min

Storage Usage of Segment-based Clustering Relative storage size with different U min N seg <= N 0/ (1-U min ) NSNS

Query Performance on Temporal Data with Segment-based Clustering Queries: Point: Q1 Snapshot: Q2 Interval: Q5 History: Q3, Q4, Q6

Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion

Performance Study: Experimental Setup  Systems: Tamino, DB2, and ArchIS  ArchIS uses BerkeleyDB as its storage manager, and it builds on top of it a SQL query engine  Temporal data set: the history of 300,024 employees over 17 years  The simulation models real world salary increases, changes of titles, and changes of departments  The size of the XML data is 334MB  The single large XML document is cut into a collection of 15,000 small XML documents with around 25KB each  Machine: Pentium IV 2.4GHz PC with RedHat 8.0

Performance Study: Query Performance snapshot query Q2 on ArchIS is 137 times faster than that on Tamino; interval query Q5 is 91 times faster; history Q6 is 25 times faster; Q4 4 times faster, and Q3 near 3 times faster. Tamino with clustering: snapshot Q2 is 3.3 times faster than without clustering ( still 41 times slower than archIS); interval query Q5 is 2.9 times faster than without clustering ( still 31 times slower than on ArchIS); history queries are much slower DB2 and ArchIS: with clustering Tamino: without clustering

Storage Utilization

Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion

Database Compression  The disparity between CPU/memory and disk speeds is becoming larger and larger  Cost to read one IDE disk page: 14ms  Cost to uncompress one page: 1.1ms(500MHz CPU)  0.26ms(2.4GHz CPU)  Cost to retrieve one compressed page: 14ms ms = 14.3ms  Cost to retrieve uncompressed pages (3.6 pages):  14ms x 3.6 = 50.4ms

Page-based Compression: PageZIP  Traditional data compression tools: compress a file as a whole  PageZIP: page-based compression and uncompression at the granularity of a page  Based on gzip library: zlib  Benefit: save space; point, snapshot or interval queries only retrieve a small fraction of the history, and can be efficient

PageZIP Segment 1 Segment n page 1 ID: page 2 ID: page 3 ID: … …

Storage Utilization with Compression  For each attribute history table, we compress it as a sequence of pages and store each page as a BLOB in a RDBMS employee_salary (sid, salary, tstart, tend) => employee_salary_blob(pageno, startsid, endsid, pageblob)

Query Performance with Compression

Update Performance  For RDBMS, only the current segment is used for updates. For Tamino, current data and historical data are clustered together  Update an employee’s salary:  DB2: 0.29 seconds; Tamino: 1.2 seconds  Assume that every employee gets updated once a year: about 1/260 of the total employee get updated every day on average  DB2: 1.52 seconds; Tamino: 15 seconds  In the worse case for segment-based archiving: 39 seconds for copying segments and 36 segments for compression: but only once

Summary  We built a transaction time temporal database on RDBMS and XML, with:  XML to support temporally grouped (virtual) representations of the database history  XQuery to express powerful temporal queries on such views  temporal clustering for managing the actual historical data in a RDBMS  SQL/XML for executing the queries on the XML views as equivalent queries on the relational DB  compression as option for efficient storage  ArchIS provides a unified solution for a wide spectrum of temporal application problems

Future Work  Friendly temporal query interfaces based on temporally grouped models  Other clustering and indexing techniques to be investigated  Other efficient data compression techniques proposed for XML data to be investigated  Apply the approach to valid-time DB and bi- temporal DB  Apply the approach to OODBMS and semi- structured data model