ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.

ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles

Temporal Databases: the Reality  Over 40 temporal data models and query languages have been proposed in the past  A long struggle to get around the limitations of RDBMS  No DBMS vendors have moved aggressively to extend SQL with temporal support  The problem is not due to a lack of applications.  SQL is at least in part to blame for the problem.

Outline  Transaction Time History of Database relations in XML  and Temporal Queries with XQuery  The Performance Problem  The ArchIS System

Background: Publishing Relational Database as XML  Publishing relational DBs as XML  as actual XML documents: SQL/XML  as XML views: SilkRoute, XPeranto  Here we publish the history of relational content as XML documents or XML views  Using a temporally grouped representation.

Example: Transaction-Time History of Tables  Timestamped tuple snapshots (temporally ungrouped) nameempnosalarytitle deptno DOBstartend Bob1000360000Engineerd011945-04-091995-01-011995-05-31 Bob1000370000Engineerd011945-04-091995-06-011995-09-30 Bob1000370000Sr Engineerd021945-04-091995-10-011996-01-31 Bob1000370000Tech Leaderd021945-04-091996-02-011996-12-31 nameempnosalarytitledeptnoDOB Bob 1995-01-01:1996-12-31 10003 1995-01-01:1996-12-31 60000 1995-01-01:1996-05-31 Engineer 1995-01-01:1995-09-30 d01 1995-01-01:1995-09-30 1945-04-09 1995-01-01:1996-12-31 70000 1995-06-01:1996-12-31 Sr Engineer 1995-10-01:1996-01-31 d02 1995-10-01:1996-12-31 Tech Leader 1996-02-01:1996-12-31  Temporally grouped history of employees

XML Representation of DB History 10003 Bob 60000 70000 Engineer Sr Engineer Tech Leader d01 d02 1945-04-09 10003 Bob 60000 70000 Engineer Sr Engineer Tech Leader d01 d02 1945-04-09

Departments d001 Marketing 110022 110039 d002 Finance 110085 110114 ….etc.

Advantages of XML Representations  The attribute value history is grouped, and can be queried directly without coalescing  The H-document has a well-defined schema generated from the current table

Temporal Queries with XQuery  XQuery: the coming standard query language for XML  With XQuery, we can specify temporal queries without any extension:  Temporal projection, snapshot queries, temporal joins, interval queries  Complex queries: A SINCE B, continuous periods, period containment

Temporal Queries with XQuery  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s } element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }

Temporal Queries with XQuery  Snapshot queries: retrieve the departments on 1996-01-31: for $d in doc("depts.xml")/depts/dept [tstart(.) = "1996-01-31"] let $n := $d/name[tstart(.) ="1996-01-31"] let $m := $d/manager[tstart(.) = "1996-01-31"] return( element dept{$n,$m } ) for $d in doc("depts.xml")/depts/dept [tstart(.) = "1996-01-31"] let $n := $d/name[tstart(.) ="1996-01-31"] let $m := $d/manager[tstart(.) = "1996-01-31"] return( element dept{$n,$m } )

Temporal Functions  Shield the user from the low-level details used in representing time, e.g., “now”  Eliminate the need for the user to write complex functions, e.g., coalescing  Predefined functions:  Restructuring: coalese($l)  Period comparison : toverlaps, tprecedes, tcontains, tequals, tmeets  Duration and date/time: tstart($e), tend($e), timespan($e)  telement(Ts, Te): constructs an empty element element timestamped as tstart=Ts, tend=Te

Support for ‘now’  ‘now’: no change until now  Internally, “end of time” values are used to denote ‘now’, e.g., 9999-12-31  Intervals are only accessed through built-in functions: tstart() returns the start of an interval, tend() returns the end or CURRENT_DATE if it’s different from 9999-12-31  In the output, tend value can be:  “9999-12-31”  CURRENT_DATE by using rtend($e) that recursively replaces all the occurrence of 9999-12-31 with the current date,  “now”, using externalnow($e) that recursively replaces all the occurrence of \9999-12-31" with the string \now".

Outline  Viewing Relation History in XML  Temporal Queries with XQuery  Performance Issues  The ArchIS (Archival Information System) Project

Architecture: Two approaches 1. Native XML database approach: store H-documents directly into XML DB—limited performance and scalability 2. XML-enabled RDBMS. Design issues include:  mapping (shredding) the XML views representing the H- documents into tables (H-tables)  translation of queries from the XML views to the H-tables  indexing, clustering and query mapping techniques  ArchIS: Archival Information System

The ArchIS System: Architecture H-tables Relational Data Current Database Active Rules/ update logs Temporal XML Data SQL Queries Temporal XML Queries H-views (H-documents) ARCHIS

H-tables  Assumptions  Each entity or relation has a unique key ( or composite keys) to identify it which will not change along the history. e.g., employee: empno  H-tables:  attribute history table: store history of each attribute  key table: built for the key  global relation table: record the history of relations  e.g.: current database:  employee(empno, name, sex, DOB, deptno, salary, title)

H-tables (cont’d) current table H-tables employeeglobal relation table relations(relationname, tstart, tend) empnokey tableemployee_id(id, tstart, tend) nameattribute history table employee_name(id, name, tstart, tend) …… salaryemployee_salary(id, salary, tstart, tend) titleemployee_title(id, title, tstart, tend)

H-tables (cont’d)  Sample contents of employee_salary: ID SALARY TSTART TEND ======= ======= ========== ========== 100022 58805 02/04/1985 02/04/1986 100022 61118 02/05/1986 02/04/1987 100022 65103 02/05/1987 02/04/1988 100022 64712 02/05/1988 02/03/1989 100022 65245 02/04/1989 02/03/1990 100023 43162 07/13/1988 07/13/1989...

Query Mapping  General purpose query mapping: XPeranto  In ArchIS, we have well-defined mapping between H-documents (or H-views) and H- tables  We map temporal XQuery queries into SQL, utilizing SQL/XML  SQL/XML is a new standard to map between RDBMS and XML  Both tag-binding and structure construction is pushed inside the relational engine, thus be very efficient

XQuery Mapping to SQL with SQL/XML select XMLElement (Name "salaryhistory", XMLAgg (XMLElement (Name as "salary", XMLAttributes (S.tstart as tstart, S.tend as "tend"), S.salary))) from employee_salary as S, employee_name as N where N.id = S.id and N.name = 'Bob' group by N.id select XMLElement (Name "salaryhistory", XMLAgg (XMLElement (Name as "salary", XMLAttributes (S.tstart as tstart, S.tend as "tend"), S.salary))) from employee_salary as S, employee_name as N where N.id = S.id and N.name = 'Bob' group by N.id  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s } element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }

SQL/XML Publishing Functions  XMLElement and XMLAttribute select XMLElement (Name "dept", XMLAttributes (tstart as "tstart", tend as "tend"), deptname) from dept where deptname = ‘Sales’ Sales Sales

SQL/XML Publishing Functions  XMLAgg select XMLElement (Name as "new_employees", XMLAttributes ("02/04/2003" as "Since") XMLAgg (XMLElement (Name as "employee", e.name)) from employee_name as e where e.tstart >= ‘02/04/2003’ Bob Jack Bob Jack

Performance Study: Experimental Setup  Systems: Tamino, DB2, and ArchIS  ArchIS uses BerkeleyDB as its storage manager, and it builds on top of it its SQL query engine  Temporal data set: the history of 300,024 employees over 17 years  The simulation models real world salary increases, changes of titles, and changes of departments  The size of the XML data is 334MB  The single large XML document is cut into a collection of 15,000 small XML documents with around 25KB each  Machine: Pentium IV 2.4GHz PC with RedHat 8.0

Performance Study: Query Performance snapshot query Q2 on ArchIS is 137 times faster than that on Tamino; interval query Q5 is 91 times faster; history Q6 is 25 times faster; Q4 4 times faster, and Q3 near 3 times faster. Tamino with clustering: snapshot Q2 is 3.3 times faster than without clustering ( still 41 times slower than archIS); interval query Q5 is 2.9 times faster than without clustering ( still 31 times slower than on ArchIS); history queries are much slower DB2 and ArchIS: with clustering Tamino: without clustering

Update Performance  For RDBMS, only the current segment is used for updates. For Tamino, current data and historical data are clustered together  Update an employee’s salary:  DB2: 0.29 seconds; Tamino: 1.2 seconds  Assume that every employee gets updated once a year: about 1/260 of the total employee get updated every day on average  DB2: 1.52 seconds; Tamino: 15 seconds  In the worse case for segment-based archiving: 39 seconds for copying segments and 36 segments for compression: but only once

ArchIS Summary  We built a transaction time temporal database on RDBMS and XML, with:  XML to support temporally grouped (virtual) representations of the database history  XQuery to express powerful temporal queries on such views  The XML historical views are shredded back into relations (H-tables)  SQL/XML for executing the queries on the XML views as equivalent queries on the relational DB  Temporal clustering schemes based on the concept of page useful deliver good performance.

Temporal Clustering and Indexing  Tuples in H-tables are stored in the order of updates, thus neither temporally clustered nor clustered by objects  Traditional indexes such as B+ Tree will not help on snapshot queries, and better temporal clustering is needed  For every segment, usefulness: U = N live /N all  At the beginning, U =100%, and it decreases with updates  The minimum tolerable usefulness: U min

Live Segment-based Clustering Scheme Segment 1 All Segment 2 All Live Segment 3 All segstart1segend1segstart2segend2segstart3segend3 tstart tuple <= segend SEG tend tuple >= segstart SEG tstart tuple <= segend SEG tend tuple >= segstart SEG

Segment-based Clustering Scheme  Initially all tuples for an attribute history table are archived in a live segment SEG live with usefulness U =100%. With updates, when U drops below U min : 1. A new segment is allocated; 2. The interval of this segment is recorded in the table segment(segno, segstart, segend); 3. All tuples in SEGlive are copied into a new segment Si sorted by id; 4. All live tuples in SEG live are copied into a new live segment SEG live', and the old live segment is dropped; After that, the new segment SEG live’ becomes the new starting segment for updates

Segment-based Clustering Scheme (cont’d)  Sample segments: Segment1 (01/01/1985 - 10/17/1991): ID SALARY TSTART TEND 100002 40000 02/20/198802/19/1989 100002 42010 02/20/1989 02/19/1990 100002 42525 02/20/1990 02/19/1991 100002 42727 02/20/1991 12/31/9999... Segment2 (10/18/1991 - 07/08/1995): ID SALARY TSTART TEND 100002 42727 02/20/1991 02/19/1992 100002 45237 02/20/1992 02/18/1993 100002 46465 02/19/1993 02/18/1994 100002 47418 02/19/1994 02/18/1995 100002 47273 02/19/1995 12/31/9999...

Advantages of Segment-based Clustering Scheme  The current live segment always has a high usefulness, assuring efficient updates;  Records are globally temporally clustered on segments;  For snapshot queries, only one segment is used; for interval queries, only segments involved are used;  Flexibility to control the number of redundant tuples in segments with U min

Storage Usage of Segment-based Clustering Relative storage size with different U min N seg <= N 0/ (1-U min ) NSNS

Query Performance on Temporal Data with Segment-based Clustering Queries: Point: Q1 Snapshot: Q2 Interval: Q5 History: Q3, Q4, Q6

ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.

Similar presentations

Presentation on theme: "ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles.

Similar presentations

Presentation on theme: "ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles."— Presentation transcript:

Similar presentations

About project

Feedback