Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez
Dan SuciuTools for XML Data Exchange XML Has Many Facets XML for fancier Web pages –XML generated with structural editors XML for messaging –generated during applications XML for Data Exchange –generated from legacy data
Dan SuciuTools for XML Data Exchange XML in Data Exchange communities agree on common DTD export their data in XML exchange over HTTP protocol applications understand only that DTD
Dan SuciuTools for XML Data Exchange An Example of XML Data Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998
Dan SuciuTools for XML Data Exchange XML Exchange Vision application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational
Dan SuciuTools for XML Data Exchange Tools export legacy data to XML –RXL query/transform/integrate XML data –XML-QL compress XML data –XMill store/process incoming XML data –STORED
Dan SuciuTools for XML Data Exchange XML-QL: A Query Language for XML (8/98) W3C new Working Group on QL (9/99) XML-QL characteristics: –relational complete (like SQL) –XML input, XML output –queries, transforms, integrates XML data [Deutsch et al., 1999 (WWW8)]
Dan SuciuTools for XML Data Exchange Querying in XML-QL where Morgan Kaufmann $a in “ construct $a where Morgan Kaufmann $a in “ construct $a Pattern
Dan SuciuTools for XML Data Exchange Transformations in XML-QL Note: abbreviates or or... where $a in “ construct $a $l where $a in “ construct $a $l Template
Dan SuciuTools for XML Data Exchange Transformations in XML-QL where $a in “ construct $a $l where $a in “ construct $a $l Skolem Functions in Templates
Dan SuciuTools for XML Data Exchange Data Integration in XML-QL { where $n $t in “ construct $t } { where $n $r in “ construct $r } { where $n $t in “ construct $t } { where $n $r in “ construct $r }...
Dan SuciuTools for XML Data Exchange RXL: Export Legacy Data To XML legacy data –fragmented into many flat relations –3rd normal form –schema is proprietary XML data –nested –un-normalized –schema designed by agreement
Dan SuciuTools for XML Data Exchange RXL: An Example relational database: virtual XML view: n1... n2... … StoreSBBook
Dan SuciuTools for XML Data Exchange A Simple RXL Query specify XML view declaratively from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid construct Store.name Book.title from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid construct Store.name Book.title
Dan SuciuTools for XML Data Exchange RXL: Querying the XML View users ask XML-QL queries: –find stores who sell “The Calculus” where $n The Calculus construct $n where $n The Calculus construct $n
Dan SuciuTools for XML Data Exchange RXL: Query composition system composes query with view: from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus” construct Store.name from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus” construct Store.name StoreSBBook n1... n2... … RXLXML-QL
Dan SuciuTools for XML Data Exchange Compressing XML Data for exchange and archiving can use general tool (gzip) but specialized tool twice as good (Xmill)
Dan SuciuTools for XML Data Exchange Xmill Example: Weblogs |GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478 |-|-| [ja] (Win95; I) GET / HTTP/1.0 text/html /10/01-00:00: Mozilla/3.01 [ja] (Win95; I)
Dan SuciuTools for XML Data Exchange Xmill Example: Weblogs weblog.dat:15.9MBweblog.dat.gz:1.6MB weblog.xml:24.2MBweblog.xml.gz:2.1MB weblog1.xmi:1.75MB weblog2.xmi:1.33MB weblog3.xmi:0.82MB xmill -p // weblog.xml weblog1.xmi xmill weblog.xml weblog2.xmi xmill -f settings.pz weblog.xml weblog3.xmi
Dan SuciuTools for XML Data Exchange Xmill: Fine Tuning the Compression -p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8) -p//apache:userAgent=>seq(e "/" e) -p//apache:byteCount=>u -p//apache:statusCode=>e -p//apache:contentType=>e -p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e) -p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di) -p//apache:referer=>or(seq("file:" t) seq(" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t) -p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8) -p//apache:userAgent=>seq(e "/" e) -p//apache:byteCount=>u -p//apache:statusCode=>e -p//apache:contentType=>e -p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e) -p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di) -p//apache:referer=>or(seq("file:" t) seq(" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)
Dan SuciuTools for XML Data Exchange Storing XML Data Scenario: –receive a large XML data instance –want to store, manage it Could build an XML management system from scratch (eXcelon) Preferably: use existing database systems
Dan SuciuTools for XML Data Exchange &o1 &o3 &o2 &o4&o5 paper title author year &o6 “The Calculus”“…” “1986” Storing XML: Ternary Relation [Florescu, Kossman 1999] Ref Val
Dan SuciuTools for XML Data Exchange Storing XML: Derive Schema from DTD DTD: ODMG classes: [Christophides et al. 1994, Shanmugasundaram et al. 1999] class Employee public type tuple (name:string, address:Address, project:List(Project)) class Address public type tuple (street:string, …)
Dan SuciuTools for XML Data Exchange STORED Approach: Mine Data to Derive Schema paper author title year fn ln Paper1 Paper2 [Deutsch et al. 1999]
Dan SuciuTools for XML Data Exchange Summary XML - simple (?), lightweight syntax Challenge: build bridges to existing database tools XML in data exchange: YES XML as a new data model: NO
Dan SuciuTools for XML Data Exchange More Info Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999