Historical XML Databases Fusheng Wang and Carlo Zaniolo University of California, Los Angeles.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Tamino – a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
CS240A: Databases and Knowledge Bases Introduction Carlo Zaniolo Department of Computer Science University of California, Los Angeles WINTER 2002.
Introduction to Structured Query Language (SQL)
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML.
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
1 COS 425: Database and Information Management Systems XML and information exchange.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases University of California, Berkeley School of Information IS 257: Database.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Introduction to Structured Query Language (SQL)
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
SQL's Data Definition Language (DDL) – View, Sequence, Index.
Storing and Querying Multi-version XML Documents using Durable Node Numbers Shu-Yao Chien Dept. of CS UCLA Vassilis J. Tsotras Dept. of.
Version Management for XML Documents Copy-Based vs Edit-Based Schemes Shu-Yao Chien Computer Science Department University of California, Los Angeles
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
CHAPTER 7 Database: SQL, MySQL. Topics  Introduction  Relational Database Model  Relational Database Overview: Books.mdb Database  SQL (Structured.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML in SQL Server Overview XML is a key part of any modern data environment It can be used to transmit data in a platform, application neutral form.
Maziar Sanaii Ashtiani – SCT – EMU, Fall 2011/12.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Introduction to Accounting Information Systems
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Recent research : Temporal databases N. L. Sarda
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Efficient XSLT Processing in Relational Database System Zhen Hua Liu Anguel Novoselsky Oracle Corporation VLDB 2006.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Introduction to Relational Databases &
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
Database Systems Part VII: XML Querying Software School of Hunan University
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
XML Databases by Sebastian Graf Hier beginnt mein toller Vortrag.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Fushen Wang, XinZhou, Carlo Zaniolo Using XML to Build Efficient Transaction- Time Temporal Database Systems on Relational Databases In Time Center, 2005.
Visual Programing SQL Overview Section 1.
XML and Database.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.
資工所 在職碩一 P 莊浚銘 Temporal Database Paper Reading Report.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
BI Practice March-2006 COGNOS 8BI TOOLS COGNOS 8 Framework Manager TATA CONSULTANCY SERVICES SEEPZ, Mumbai.
Session 1 Module 1: Introduction to Data Integrity
CS240A: Databases and Knowledge Bases Temporal Databases Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.
Temporal Data Modeling
44271: Database Design & Implementation Physical Data Modelling Ian Perry Room: C49 Tel Ext.: 7287
Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
SQL Basics Review Reviewing what we’ve learned so far…….
In this session, you will learn to: Create and manage views Implement a full-text search Implement batches Objectives.
Introduction to Database Programming with Python Gary Stewart
XML: Extensible Markup Language
Temporal Queries in XML Document Archives and Web Warehouses
Database SQL.
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs
Presentation transcript:

Historical XML Databases Fusheng Wang and Carlo Zaniolo University of California, Los Angeles

Overview  State of the art  Two scenarios of archiving history  Publishing relational database history in XML  Temporal queries with XQuery  Historical database architecture  Efficient query support for temporal queries  Conclusion

State of the Art  Publishing Relational DBs as XML Documents:  as actual documents; to be processed using the very rich XML tool set (XSLT, DOM)  as views; to be queried by languages such as XPath or XQuery. Queries against these views are then mapped into SQL queries on the DB  DB vendors are very active in this area, e.g.:  SQLX and SQL functions for XML publishing  XTables ( XPeranto ) as a middleware

Our Proposal: Publish the History of Relational DBs as XML Documents  Publish the history of relational DBs as XML documents:  Natural ways to represent such history in XML  Historical queries can be expressed in XQuery as is—no extensions to the data model or query language required for temporal queries  Approach amenable to efficient implementation: query and storage efficiency of alternative approaches  Gain: Temporal applications are very important and are not supported well by current databases

Two Basic Scenarios  XML Data Warehouses archive the history:  change can be detected by current database update logs  or compute the delta between the published XML document snapshots of the new version and old version  Traditional version management (RCS, SCCS). More recent techniques (UBCC, RBVM) used for XML and complex queries  RDBMSs archive the history:  XML history is a view---and historical queries are mapped back into relational ones (e.g., using XTables)

A Short History of Time in Databases  Between 33 and 48 proposals counted:  A perennial struggle to get around the limitations of relational (flat) tables and a rigid query language (SQL)  Clifford, Croker, Grandi, and Tuzhilin in their “On Temporal Grouping” paper show that the temporal- grouped models are more natural and powerful [Temp DB workshop, 1995]  But it is hard to fit temporally grouped models and query languages into SQL—an infinite morass

Temporal Grouping in XML  XML makes it possible to express and support temporal grouping  The history of a relational DB can be viewed as an XML document, using such representation  Then, powerful temporal queries can be specified without requiring the introduction of new constructs in the language  There are many ways to publish DBs using XML— and not all will do

History of Tables Transaction-Time Relational Tables  Timestamped tuple snapshots nameempnosalarytitle deptno DOBstartend Bob Engineerd Bob Engineerd Bob Sr Engineerd Bob Tech Leader d nameempnosalarytitledeptnoDOB Bob : : : Engineer : d : : : Sr Engineer : d : Tech Leader :  Temporally grouped history of employees

Publishing DB History in XML Many Alternatives 1. Each table as XML document: columns as attributes  Flat structure that corresponds to the tuple snapshots (employees2.xml)employees2.xml 2. Each table as an XML document: columns as elements  A natural structure [Clifford et al.] which simplifies many queries (employees.xml, depts.xml)employees.xmldepts.xml 3. Multiple tables as a single XML document: flat structure  Good for some join queries but not for others (company.xml)company.xml 4. Multiple tables as a single XML document: hierarchy  similar but more intuitive than previous (depts3.xml)depts3.xml 5. Multiple tables as an XML document: flat structure with IDs  Can simplify join queries with IDs and IDREFs (company2.xml)company2.xml

XML Representation of DB History Table Columns as XML Elements Bob Engineer Sr Engineer Tech Leader QA RD Bob Engineer Sr Engineer Tech Leader QA RD

XML Representation of DB History (cont’d)  Historical data is represented in an XML document  Two attributes tstart and tend are used to represent the time interval  The value now is used to denote the ever-increasing current time  Node updates:  delete: tend is updated to the current timestamp  insert: a new node is appended with tend set as now  update: delete followed by an insert

Schema of the XML Representation  The document has a well-defined schema derived from the snapshot document:

Temporal Queries with XQuery  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in document("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s } element salary_history { for $s in document("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }  Snapshot queries: retrieve the departments on : for $d in document("depts.xml")/depts/dept [tstart(.) = " "] let $n := $d/name[tstart(.) =" "] let $m := $d/manager[tstart(.) = " "] return( element dept{$n,$m } ) for $d in document("depts.xml")/depts/dept [tstart(.) = " "] let $n := $d/name[tstart(.) =" "] let $m := $d/manager[tstart(.) = " "] return( element dept{$n,$m } )

Temporal Queries with XQuery (cont’d)  Interval Queries. Find employee(s) who worked in the “QA” department throughout the history of that department: for $d in document("depts.xml")/depts/dept[deptname='QA']/deptno for $e in document("employees.xml")/employees/employee[deptno=$d] where tstart($e/deptno)=tstart($d) and tend($e/deptno)=tend($d) return $e/name for $d in document("depts.xml")/depts/dept[deptname='QA']/deptno for $e in document("employees.xml")/employees/employee[deptno=$d] where tstart($e/deptno)=tstart($d) and tend($e/deptno)=tend($d) return $e/name

Complex Temporal Queries with XQuery  A Since B. Find the employee who has been the manager of the dept since he/she joined the dept “d007”: for $e in document("employees.xml")/employees/employee let $m:= $e/title[title="Manager" and tend(.)=current-date()] let $d := $e/deptno[deptno ="d007" and tcontains($m,.) ] where not empty($d) and not empty($m) return { $e/empno, $e/firstname, $e/lastname} for $e in document("employees.xml")/employees/employee let $m:= $e/title[title="Manager" and tend(.)=current-date()] let $d := $e/deptno[deptno ="d007" and tcontains($m,.) ] where not empty($d) and not empty($m) return { $e/empno, $e/firstname, $e/lastname}

Complex Temporal Queries with XQuery (cont’d)  Period Containment. Find employees with same history as employee “10112”, i.e., they worked in the same dept(s) as employee “10112” and exactly for the same periods: for $e1 in document("employees.xml")/employees/employee [empno = '10112'] for $e2 in document("employees.xml")/employees/employee [empno != '10112'] where every $d1 in $e1/deptno satisfies some $d2 in $e2/deptno satisfies(string($d1) = string( $d2 ) and tequals($d2, $d1)) and every $d2 in $e2/deptno satisfies some $d1 in $e1/deptno satisfies (string($d2) = string( $d1 ) and tequals($d1, $d2)) return {$e2/empno} for $e1 in document("employees.xml")/employees/employee [empno = '10112'] for $e2 in document("employees.xml")/employees/employee [empno != '10112'] where every $d1 in $e1/deptno satisfies some $d2 in $e2/deptno satisfies(string($d1) = string( $d2 ) and tequals($d2, $d1)) and every $d2 in $e2/deptno satisfies some $d1 in $e1/deptno satisfies (string($d2) = string( $d1 ) and tequals($d1, $d2)) return {$e2/empno}

User-Defined Temporal Functions  Shield the user from the low-level details used in representing time, e.g., “now”  Eliminate the need for the user to write complex functions, e.g., coalescing and diff  Predefined functions:  History functions: history($e,Ts, Te), snapshot($e, T), invariance($e, Ts, Te)  Restructuring functions: coalese($l)  Interval functions: toverlaps, tprecedes, tcontains, tequals, tmeets  Duration and date/time functions: timespan($e), tstart($e), tend($e), tinterval($e), telement(Ts, Te), getdbnow(), rtend($e), external($e)

Support for ‘now’  ‘now’: no change until now. Values of tuples are still current at the time the query is asked  Internally, “end of time” values are used to denote ‘now’, e.g.,  Intervals are only accessed through built-in functions: tstart() returns the start of an interval, tend() returns the end or CURRENT_DATE if it’s different from  In the output, tend value can be:  “ ”,  CURRENT_DATE (through rtend() ), or  “now” (through externalnow() )

Historical Database Architecture Historical Database XML Data Current Database Active Rules/ update logs Historical XML Data XML Queries Temporal XML Queries XML Publishing XML Views XML Publishing XML Views

Historical XML Database Architecture Two Approaches  XML-enabled RDBMS  Historical view decomposed into relational databases as binary tables  Historical data can then be published as XML document through SQL/XML publishing functions; or queried through a middleware as XML views  Native XML databases  Historical data are stored in native XML database  XML queries can be specified directly upon the database  Native XML databases: SoftwareAG’s Tamino, eXcelon’s XIS

Relational Storage of Temporal Relational Data  Assumptions  Each entity or relation has a unique key ( or composite keys) to identify it which will not change along the history. e.g., employee: empno  Relational schema:  employee(empno, firstname, lastname, sex, DOB, deptno, salary, title)  The historical XML documents are decomposed into tables

Relational Storage of Temporal Relational Data (cont’d)  Key table for keys:  employee_id(id, tstart, tend), where id =empno  For composite keys, the table will be like:  lineitem_id(id, supplierno, itemno, tstart, tend)  Attribute history tables: employee_lastname(id, lastname, tstart, tend) … employee_salary(id, salary, tstart, tend) employee_title(id, title, tstart, tend) …  Global relation table: keep all the relations history  relations(name, tstart, tend)

Relational Storage of Temporal Relational Data (cont’d)  Sample contents of employee_salary: ID SALARY TSTART TEND ================================ /04/ /04/ /05/ /04/ /05/ /04/ /05/ /03/ /04/ /03/ /13/ /13/

XML publishing and XML Queries  A middleware (XPERANTO/XTABLES) can be used to publish and query historical tables as XML documents  Create XML views over relational data  Each database has a default XML view  The temporal XML document representation can be reconstructed with user-defined XML views with XQuery, and be queried with XQuery  Query upon XML views with XQuery  Only the desired relational data items are materialized  Most computation pushed down to relational engine

Automatic Archiving  Statement CREATE HISTORICAL VIEW viewname AS SELECT col 1, col 2, … FROM tablename [ USING KEY col i, Col j, … ]  Results:  Historical tables are created for each attribute of the current table  Temporal XML views are created with XPERANTO  The historical tables are initialized with the snapshot of the current table  Active rules are started to trace any changes and archive into the historical tables  Temporal XQuery can be specified on the XML views

Implementation Comparisons  A temporal data simulation program automatically generates the historical data in XML  Total number of employees: 300,024  Database systems and major supported query languages for comparison:  Relational: DB2. SQL  Native:  SoftwareAG’s Tamino (text-based storage). XPath  eXcelon’s XIS (XML Information Server) (OODBMS-based storage). XQuery

Performance Comparisons Storage Size:

Performance Comparisons (cont’d) Query Performance of DB2 and Tamino: Q2: history query Q4,Q6: snapshot queries Q3,Q5: interval queries Q1: scan of databases Q7: join

Performance Comparisons (cont’d) Query Performance of Tamino and XIS (1/3 data size)

Efficient Query Support for Temporal Queries  H-document is first clustered by document structure, and then by the change history  Tamino will preserve the clustering structure thus retrieving the history of a node can be efficient  In RDBMS approach, tuples are stored in the order of updates, neither temporarily clustered nor clustered by objects  Traditional B+ Tree index will not help on interval- related temporal queries  A segment-based archiving scheme was used in this project

Segment-based Archiving Scheme (cont’d)  Sample segments: Segment1 (01/01/ /17/1991): ID SALARY TSTART TEND /20/198802/19/ /20/ /19/ /20/ /19/ /20/ /31/ Segment2 (10/18/ /08/1995): ID SALARY TSTART TEND /20/ /19/ /20/ /18/ /19/ /18/ /19/ /18/ /19/ /31/ segment3 (07/09/ /08/1999):...

Query Performance Query Performance with different usefulness Q1,Q3: snapshot queries Q5: interval queries Q2, Q4: history queries

Conclusion  XML can be used to support a temporally grouped data model, and represent temporal relational data  The framework supports complex temporal queries with XQuery, without extension to XQuery  The XML-viewed history of database tables can be stored using a native XML database or using a RDBMS  RDBMS has significant query performance compared to native XML database, while the latter can be more effective in terms of storage due to compression techniques  A segment-based archiving scheme based on usefulness can significantly boost the performance on most temporal queries

History of XML Documents  The temporal representation in XML not only applies to historical relational data, but also historical XML documents 1 Introduction Introduction and Overview Background Previous Work... 1 Introduction Introduction and Overview Background Previous Work...