XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.

Slides:



Advertisements
Similar presentations
Spring Part III: Introduction to XPath XML Path Language.
Advertisements

Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
1 Part 3: Query Languages Managing XML and Semistructured Data.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 12: XQuery in SQL Server Monday, October 23, 2006.
1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
XQuery language Presented by: Tayeb sbihi supervised by: Dr. H. Haddouti.
1 COS 425: Database and Information Management Systems XML and information exchange.
Query Languages - XQuery Slides partially from Dan Suciu.
CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University1 Database Management Systems Session 10 Instructor: Vinnie Costa
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
XML, XML Schema, Xpath and Xquery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Xquery. Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resource W3C recommendation:
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
PROCESSING AND QUERYING XML 1. ROADMAP Models for Parsing XML Documents XPath Language XQuery Language XML inside DBMSs 2.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Lecture 10 XML Monday, Oct. 21, 2001.
XML path expressions CSE 350 Fall 2003.
Lecture 11: Xpath/XQuery
Querying XML and Semistructured Data
Lecture 11 XML Wednesday, Oct. 24, 2001.
XML: Schemas, Queries Wednesday, 4/17/2002
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

XML May 2 nd, 2002

Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.

More Facts About XML Every database vendor has an XML page: – – – Many applications are just fancier Websites But, most importantly, XML enables data sharing on the Web – hence our interest

What is XML ? From HTML to XML HTML describes the presentation: easy for humans

HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 HTML is hard for applications

XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content: easy for applications

XML eXtensible Markup Language Roots: comes from SGML –A very nasty language After the roots: a format for sharing data Emerging format for data exchange on the Web and between applications

XML Applications Sharing data between different components of an application. Archive data in text files. EDI: electronic data exchange: –Transactions between banks –Producers and suppliers sharing product data (auctions) –Extranets: building relationships between companies Scientists sharing data about experiments.

Web Services A new paradigm for creating distributed applications? Systems communicate via messages, contracts. Example: order processing system. MS.NET, J2EE – some of the platforms XML – a part of the story; the data format.

XML Syntax Very simple: <db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> < > <title>Transaction Processing</title> <author>Bernstein</author> < >Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher> </db>

XML Terminology tags: book, title, author, … start tag:, end tag: start tags must correspond to end tags, and conversely

XML Terminology an element: everything between tags –example element: Complete Guide to DB2 –example element: elements may be nested empty element: abbreviated an XML document has a unique root element well formed XML document: if it has matching tags Complete Guide to DB2 Chamberlin

The XML Tree db book publisher titleauthor titleauthor namestate “Complete Guide to DB2” “Chamberlin”“Transaction Processing” “Bernstein”“Newcomer” “Morgan Kaufman” “CA” Tags on nodes Data values on leaves

More XML Syntax: Attributes Complete Guide to DB2 Chamberlin 1998 Complete Guide to DB2 Chamberlin 1998 price, currency are called attributes

Replacing Attributes with Elements Complete Guide to DB2 Chamberlin USD Complete Guide to DB2 Chamberlin USD attributes are alternative ways to represent data

“Types” (or “Schemas”) for XML Document Type Definition – DTD Define a grammar for the XML document, but we use it as substitute for types/schemas Will be replaced by XML-Schema (will extend DTDs)

An Example DTD PCDATA means Parsed Character Data (a mouthful for string) <!DOCTYPE db [ ]> <!DOCTYPE db [ ]>

More on DTDs: Attributes <!DOCTYPE db [... <!ATTLIS book price CDATA #REQUIRED language CDATA #IMPLIED> ]> <!DOCTYPE db [... <!ATTLIS book price CDATA #REQUIRED language CDATA #IMPLIED> ]> Complete Guide to DB2 Chamberlin … Complete Guide to DB2 Chamberlin … The type: CDATA = string ID = a key IDREF = a foreign key others=rarely used Default declaration: #REQUIRED=required #IMPLIED=optional #FIXED=fixed (rarely used)

DTDs as Grammars Same thing as: A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree XML Documents that have a DTD and conform to it are called valid db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string

More on DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … XML documents can be nested arbitrarily deep

XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons XML: persons

XML vs Data Models XML is self-describing Schema elements become part of the data –Relational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

Semi-structured Data Explained Missing attributes: Repeated attributes John 1234 Joe John 1234 Joe  no phone ! Mary Mary  two phones !

Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

XML Data v.s. E/R, ODL, Relational Q: is XML better or worse ? A: serves different purposes –E/R, ODL, Relational models: For centralized processing, when we control the data –XML: Data sharing between different systems we do not have control over the entire data E.g. on the Web Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.

Exporting Relational Data to XML Product(pid, name, weight) Company(cid, name, address) Makes(pid, cid, price) productcompany makes

Export data grouped by companies GizmoWorks Tacoma gizmo … Bang Kirkland gizmo … … GizmoWorks Tacoma gizmo … Bang Kirkland gizmo … … Redundant representation of products

The DTD

Export Data by Products Gizmo GizmoWorks Tacoma Bang Kirkland … OneClick … Gizmo GizmoWorks Tacoma Bang Kirkland … OneClick … Redundant Representation of companies

Which One Do We Choose ? The structure of the XML data is determined by agreement, with our partners, or dictated by committees –Many XML dialects (called applications) XML Data is often nested, irregular, etc No normal forms for XML

XML Query Languages Xpath XML-QL Xquery

An Example of XML Data Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

XPath Syntax for XML document navigation and node selection A recommendation of the W3C (i.e. a standard) Building block for other W3C standards: – XSL Transformations (XSLT) – XQuery – XML Link (XLink) – XML Pointer (XPointer) Was originally part of XSL – “XSL pattern language”

XPath: Simple Expressions /bib/book/year Result: /bib/paper/year Result: empty (there were no papers)

XPath: Restricted Kleene Closure //author Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman /bib//first-name Result: Rick

Xpath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname

Xpath: Wildcard //author/* Result: Rick Hull * Matches any element

Xpath: Attribute Nodes Result: means that price is has to be an attribute

Xpath: Qualifiers /bib/book/author[firstname] Result: Rick Hull

Xpath: More Qualifiers /bib/book/author[firstname][address[//zip][city]]/lastname Result: … …

Xpath: More Qualifiers < “60”] < “25”] /bib/book[author/text()]

Xpath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…

Xpath: More Details An Xpath expression, p, establishes a relation between: –A context node, and –A node in the answer set In other words, p denotes a function: –S[p] : Nodes -> {Nodes} Examples: –author/firstname –. = self –.. = parent –part/*/*/subpart/../name = what does it mean ?

The Root and the Root 1 2 bib is the “document element” The “root” is above bib /bib = returns the document element / = returns the root Why ? Because we may have comments before and after ; they become siblings of This is advanced xmlogy

Xpath: More Details We can navigate along 13 axes: ancestor ancestor-or-self attribute child descendant descendant-or-self following following-sibling namespace parent preceding preceding-sibling self

Xpath: More Details Examples: –child::author/child:lastname = author/lastname –child::author/descendant::zip = author//zip –child::author/parent::* = author/.. –child::author/attribute::age =

XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !

FLWR (“Flower”) Expressions FOR... LET... FOR... LET... WHERE... RETURN...

XQuery Find all book titles published after 1995: FOR $x IN document(“bib.xml”)/bib/book WHERE $x/year>1995 RETURN $x/title Result: abc def ghi

XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t distinct = a function that eliminates duplicates

XQuery Result: Jones abc def Smith ghi

XQuery FOR $x in expr -- binds $x to each value in the list expr LET $x = expr -- binds $x to the entire list expr –Useful for common subexpressions and for aggregations

XQuery count = a (aggregate) function that returns the number of elms FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p

XQuery Find books whose price is larger than average: LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b

XQuery Summary: FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses WHERE Clause RETURN Clause List of tuples Instance of Xquery data model

FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

FOR v.s. LET FOR $x IN document("bib.xml") /bib/book RETURN $x FOR $x IN document("bib.xml") /bib/book RETURN $x Returns:... LET $x IN document("bib.xml") /bib/book RETURN $x LET $x IN document("bib.xml") /bib/book RETURN $x Returns:...

Collections in XQuery Ordered and unordered collections –/bib/book/author = an ordered collection –Distinct(/bib/book/author) = an unordered collection LET $a = /bib/book  $a is a collection $b/author  a collection (several authors...) RETURN $b/author Returns:...

Collections in XQuery What about collections in expressions ? $b/price  list of n prices $b/price * 0.7  list of n numbers $b/price * $b/quantity  list of n x m numbers ?? $b/price * ($b/quant1 + $b/quant2)  $b/price * $b/quant1 + $b/price * $b/quant2 !!

Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, $b/price SORTBY(price DESCENDING) SORTBY(name) FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, $b/price SORTBY(price DESCENDING) SORTBY(name)

Sorting in XQuery Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause

If-Then-Else FOR $h IN //holding RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author SORTBY (title) FOR $h IN //holding RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author SORTBY (title)

Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title

Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title

Other Stuff in XQuery BEFORE and AFTER –for dealing with order in the input FILTER –deletes some edges in the result tree Recursive functions –Currently: arbitrary recursion –Perhaps more restrictions in the future ?

Processing XML Data Do we really need to process XML data? What are we processing XML for? How are we going to do it? Use existing technology? Are there other processing paradigms that we need to consider?

Query Processing For XML Approach 1: store XML in a relational database. Translate an XML-QL/Quilt query into a set of SQL queries. –Leverage 20 years of research & development. Approach 2: store XML in an object-oriented database system. –OO model is closest to XML, but systems do not perform well and are not well accepted. Approach 3: build a native XML query processing engine. –Still in the research phase; see Zack next week.

Relational Approach Step 1: given a DTD, create a relational schema. Step 2: map the XML document into tuples in the relational database. Step 3: given a query Q in Xquery, translate it to a set of queries P over the relational database. Step 4: translate the tuples returned from the relational database into XML elements.

Which Relational Schema? The key question! Affects performance. No magic solution. Some options: –The EDGE table: put everything in one table –The Attribute tables: create a table for every tag name. –The inlining method: inline as much data into the tables.

An Example DTD <!DOCTYPE db [ ]>

Recall: The XML Tree db book publisher titleauthor titleauthor namestate “Complete Guide to DB2” “Chamberlin”“Transaction Processing” “Bernstein”“Newcomer” “Morgan Kaufman” “CA” Tags on nodes Data values on leaves

The Edge Approach sourceID tag destID destValue - Don’t need a DTD. - Very simple to implement.

The Attribute Approach rootID bookId bookID title rootID pubID pubID pubName bookID author Book Title Author Publisher pubID state PubName PubState

The In-lining Approach bookID title pubName pubState bookID author BookAuthor Book sourceID tag destID destValue Publisher

Let the Querying Begin! Matching data using elements patterns. FOR $t IN document(bib.xml)/book/[author=“bernstein”]/author/title RETURN $t

The Edge Approach SELECT e3.destValue FROM E as e1, E as e2, E as e3 WHERE e1.tag = “book” and e1.destID=e2.sourceID and e2.tag=“title” and e1.destID=e3.sourceID and e3.tag=“author” and e2.author=“Bernstein”

The Attribute Approach SELECT Title.title FROM Book, Title, Author WHERE Book.bookID = Author.bookID and Book.bookID = Title.bookID and Author.author = “Bernstein”

The In-lining Approach SELECT Book.title FROM Book, BookAuthor WHERE Book.bookID =BookAuthor.bookID and BookAuthor.author = “Bernstein”

A Challenge: Reconstructing Elements Matching data using elements patterns. FOR $b IN document(bib.xml)/book/[author=“bernstein”] RETURN $b

Reconstructing XML Elements Matching data using elements patterns. WHERE Bernstein $t ELEMENT-AS $e IN “ CONSTRUCT $e

Some Open Questions Native query processing for XML To order or not to order? Combining IR-style keyword queries with DB-style structured queries Updates Automatic selection of a relational schema How should we extend relational engines to better support XML storage and querying?