ER Tucson Schema Mediated Exchange of Temporal XML Data Curtis Dyreson – Washington State University Richard T. Snodgrass – University of Arizona Sabah Currim – University of Arizona Faiz Currim – University of Iowa
Scenario Genomic data from NCBI Data collection is growing/changing Want data and data provenance (who, what, when, …)
Obtaining Web Data Overwrite D D NCBI Request D D Write D
Data Evolves Download XML formatted data (as of January 1) trypsin 4 Download again (as of March 6) trypsin 4, beta-cell receptor
NCBI Refreshing the Data (using SDOs) Change summary of D Request updates to D since time t D Update D D old Copy D XMLDiff What about versions between D and D old ? Did I download “valid” data? My DB is pretty big…
Did I Download the Right Data? Validate against schema XML Data Schema Namespace Validating Parser Valid
Fragment of the Genomic Schema … …
Uses of an XML Schema Validation XML editors Guides query formulation Query optimization Provides a web service binding
D Validate the “delta” with the temporal schema cost is size of change NCBI A Temporal Data Collection ΔD [t,now] Request updates to D since time t D Temporal D t now Extend history of D Temporal Schema Which elements vary over time
Outline Motivation XSchema Architecture Summary
Goals for a Temporal Schema Make it easy to create a schema for temporal data Identify which data is temporal Upwards compatibility Minimal extensions of XML Schema Reuse off-the-shelf parsers/tools Support Valid and transaction time Data (element) versioning Schema versioning Logical/physical independence Flexible timestamp representation and location
Persistent Elements An item is an element that persists across snapshots. Item identifier (like a temporally-invariant key) January snapshotMarch snapshot … …
Extend a Snapshot Schema Specify which elements are temporal Temporal elements have Item identifiers Simple constraints (state/event, existence/content-varying) <txs:transactionTime kind="state" contentVarying="true" existenceVarying="no gaps"/> …. definition of gene from the snapshot schema omitted for space…
A version is a change in an item. DOM inequivalence Versions January snapshotMarch snapshot … …
Temporal Genomic Data trypsin 4 …next version of gene… …ontology item…
Outline Motivation XSchema Architecture Summary
XML Data Snapshot data validated with a snapshot schema Construct a representational schema (details in paper) Can also validate the “delta” Validating Temporal Data Snapshot Schema Namespace Validating Parser Valid Construction Process Representational Schema Not valid Temporal Data Valid
Property of a “Good” Construction Every snapshot must conform to the snapshot schema Temporal data Valid (Temporal) Validating Parser Validating Parser Snapshot Schema Temporal Schema Snapshot At time T
Outline Motivation XSchema Architecture Summary
Related Work – Temporal XML Change detection and management Nguyen, Abiteboul, Cobena, Preda, SIGMOD 2001 Xyleme’s Alerter, described in Data Engineering Bulletin, 2001 Dyreson, Lin, Wang WWW 2004 Leonardi, Bhowmick, ER 2006 Representing time-varying XML documents (versioning) Chawathe, Abiteboul, Widom, ICDE 1998 Dyreson, Böhlen, Jensen, VLDB 1999 Chien, Tsotras, Zaniolo, VLDB 2000 Marian, Abiteboul, Cobena, Mignet, VLDB 2001 Buneman, Khanna, Tajima, Tan, SIGMOD 2002, TODS 2004 Rosado, Marquez, Gonzalez, ECDM 2006 XML Versioning Use Cases (W3C)
Related Work – XML Schemas XML Schema languages Many, but XML Schema is backed by the W3C Incremental XML validation Bouchou & Halfeld-Ferrari, DBPL 2003 Papkonstantinou & Vianu, ICDT 2003 Barbosa, Mendelzon, Libkin, Mignet, Arenas, ICDE 2004 Temporal XML schemas Currim, Currim, Dyreson, Snodgrass, EDBT 2004 Dyreson, Snodgrass, Currim, Currim, Joshi, XSDM 2006
An Overarching Vision Aspect-oriented programming Cross-cutting concerns Augment behavior without changing the code Example aspects: logging, garbage collection Program.java Aspect.java weaver Aspect Enhanced.java javac Cut points
Aspects for Data? What are cross-cutting concerns? Milieu of metadata Time is an aspect security time reliability
Aspects in Schema Design Schema for aspect + schema for data Our paper describes the “plumbing” for a temporal aspect data (snapshot) schema aspect schema schema weaver schema tapestry conventional validating parser aspect validator Validation aspect + XML data imports schema snapshot gluer snapshot data imports schema
Our Contributions Temporal schema specification What is time-varying Some simple constraints Validate temporal data ΔD [t-now] cost Upwards compatible with XML Schema Handle schema evolution (Dyreson et al., XSDM ’06) Suite of tools Reuse and extend existing tools
XSchema Project Tools (Beta) VALIDATOR – Validating temporal XML document for conventional and temporal constraints SQUASH – Generating a temporal document from a sequence of snapshot documents UNSQUASH – Extracting snapshot documents from a temporal document RESQUASH – Changing a document representation to be consistent with the new physical annotation.