Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Slides:



Advertisements
Similar presentations
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Advertisements

1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.
Spring Part III: Introduction to XPath XML Path Language.
Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
1 Efficient Processing of XPath Queries Using Indexes Yan Chen 1, Sanjay Madria 1, Kalpdrum Passi 2, Sourav Bhowmick 3 1 Department of Computer Science,
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
1 Database Research at the UW  Faculty: Alon Halevy and Dan Suciu. A dozen Ph.D students  Related faculty: Oren Etzioni, Pedro Domingos, Dan Weld and.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
Managing XML and Semistructured Data Lecture 1: Preliminaries and Overview Prof. Dan Suciu Spring 2001.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.
Putting Semi-structured Data to Practice Alon Levy Seattle, Washingon University of Washington.
Managing XML Data Dan Suciu University of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML-QL A Query Language for XML Charuta Nakhe
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
From Semistructured Data to XML Dan Suciu AT&T Labs
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
 To develop the knowledge and skills to manage and tune database management systems  To provide experience the technologies of a variety of database.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Transactions, Relational Algebra, XML February 11 th, 2004.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
Part One XML and Databases Soumen Chakrabarti CSE, IIT Bombay.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
임 순 범 숙명여대 정보과학부 멀티미디어학과 1 III. XML-QL 멀티미디어 데이터베이스 ( ~11.1)
Dan SuciuXML Toolkit1 XMLTK: An XML Toolkit for Scalable XML Stream Processing I. Avila-Campillo, T.J. Green, A. Gupta, M. Onizuka, D. Raven, D. Suciu.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
Lecture 14: Relational Algebra Projects XML?
XML path expressions CSE 350 Fall 2003.
eXtensible Markup Language (XML)
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Wednesday, May 29, 2002 XML Storage Final Review
Lecture 8: XML Data Wednesday, October
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Wednesday, May 22, 2002 XML Publishing, Storage
Lecture 15: Querying XML Friday, October 27, 2000.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez

Dan SuciuTools for XML Data Exchange XML Has Many Facets XML for fancier Web pages –XML generated with structural editors XML for messaging –generated during applications XML for Data Exchange –generated from legacy data

Dan SuciuTools for XML Data Exchange XML in Data Exchange communities agree on common DTD export their data in XML exchange over HTTP protocol applications understand only that DTD

Dan SuciuTools for XML Data Exchange An Example of XML Data Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

Dan SuciuTools for XML Data Exchange XML Exchange Vision application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational

Dan SuciuTools for XML Data Exchange Tools export legacy data to XML –RXL query/transform/integrate XML data –XML-QL compress XML data –XMill store/process incoming XML data –STORED

Dan SuciuTools for XML Data Exchange XML-QL: A Query Language for XML (8/98) W3C new Working Group on QL (9/99) XML-QL characteristics: –relational complete (like SQL) –XML input, XML output –queries, transforms, integrates XML data [Deutsch et al., 1999 (WWW8)]

Dan SuciuTools for XML Data Exchange Querying in XML-QL where Morgan Kaufmann $a in “ construct $a where Morgan Kaufmann $a in “ construct $a Pattern

Dan SuciuTools for XML Data Exchange Transformations in XML-QL Note: abbreviates or or... where $a in “ construct $a $l where $a in “ construct $a $l Template

Dan SuciuTools for XML Data Exchange Transformations in XML-QL where $a in “ construct $a $l where $a in “ construct $a $l Skolem Functions in Templates

Dan SuciuTools for XML Data Exchange Data Integration in XML-QL { where $n $t in “ construct $t } { where $n $r in “ construct $r } { where $n $t in “ construct $t } { where $n $r in “ construct $r }...

Dan SuciuTools for XML Data Exchange RXL: Export Legacy Data To XML legacy data –fragmented into many flat relations –3rd normal form –schema is proprietary XML data –nested –un-normalized –schema designed by agreement

Dan SuciuTools for XML Data Exchange RXL: An Example relational database: virtual XML view: n1... n2... … StoreSBBook

Dan SuciuTools for XML Data Exchange A Simple RXL Query specify XML view declaratively from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid construct Store.name Book.title from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid construct Store.name Book.title

Dan SuciuTools for XML Data Exchange RXL: Querying the XML View users ask XML-QL queries: –find stores who sell “The Calculus” where $n The Calculus construct $n where $n The Calculus construct $n

Dan SuciuTools for XML Data Exchange RXL: Query composition system composes query with view: from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus” construct Store.name from Store, SB, Book where Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus” construct Store.name StoreSBBook n1... n2... … RXLXML-QL

Dan SuciuTools for XML Data Exchange Compressing XML Data for exchange and archiving can use general tool (gzip) but specialized tool twice as good (Xmill)

Dan SuciuTools for XML Data Exchange Xmill Example: Weblogs |GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478 |-|-| [ja] (Win95; I) GET / HTTP/1.0 text/html /10/01-00:00: Mozilla/3.01 [ja] (Win95; I)

Dan SuciuTools for XML Data Exchange Xmill Example: Weblogs weblog.dat:15.9MBweblog.dat.gz:1.6MB weblog.xml:24.2MBweblog.xml.gz:2.1MB weblog1.xmi:1.75MB weblog2.xmi:1.33MB weblog3.xmi:0.82MB xmill -p // weblog.xml weblog1.xmi xmill weblog.xml weblog2.xmi xmill -f settings.pz weblog.xml weblog3.xmi

Dan SuciuTools for XML Data Exchange Xmill: Fine Tuning the Compression -p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8) -p//apache:userAgent=>seq(e "/" e) -p//apache:byteCount=>u -p//apache:statusCode=>e -p//apache:contentType=>e -p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e) -p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di) -p//apache:referer=>or(seq("file:" t) seq(" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t) -p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8) -p//apache:userAgent=>seq(e "/" e) -p//apache:byteCount=>u -p//apache:statusCode=>e -p//apache:contentType=>e -p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e) -p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di) -p//apache:referer=>or(seq("file:" t) seq(" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)

Dan SuciuTools for XML Data Exchange Storing XML Data Scenario: –receive a large XML data instance –want to store, manage it Could build an XML management system from scratch (eXcelon) Preferably: use existing database systems

Dan SuciuTools for XML Data Exchange &o1 &o3 &o2 &o4&o5 paper title author year &o6 “The Calculus”“…” “1986” Storing XML: Ternary Relation [Florescu, Kossman 1999] Ref Val

Dan SuciuTools for XML Data Exchange Storing XML: Derive Schema from DTD DTD: ODMG classes: [Christophides et al. 1994, Shanmugasundaram et al. 1999] class Employee public type tuple (name:string, address:Address, project:List(Project)) class Address public type tuple (street:string, …)

Dan SuciuTools for XML Data Exchange STORED Approach: Mine Data to Derive Schema paper author title year fn ln Paper1 Paper2 [Deutsch et al. 1999]

Dan SuciuTools for XML Data Exchange Summary XML - simple (?), lightweight syntax Challenge: build bridges to existing database tools XML in data exchange: YES XML as a new data model: NO

Dan SuciuTools for XML Data Exchange More Info Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999