XML: Schemas, Queries Wednesday, 4/17/2002

Slides:



Advertisements
Similar presentations
Spring Part III: Introduction to XPath XML Path Language.
Advertisements

Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Introduction to XML, XPath, & XQuery CS186, Fall 2005 R &G - Chapters 7-27 Bill Gates, The Revolution, and a Network of Trees ( based on a true story)
1 Part 3: Query Languages Managing XML and Semistructured Data.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 12: XQuery in SQL Server Monday, October 23, 2006.
1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
XQuery language Presented by: Tayeb sbihi supervised by: Dr. H. Haddouti.
Query Languages - XQuery Slides partially from Dan Suciu.
CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University1 Database Management Systems Session 10 Instructor: Vinnie Costa
XML May 2 nd, Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
XML, XML Schema, Xpath and Xquery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Xquery. Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resource W3C recommendation:
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
PROCESSING AND QUERYING XML 1. ROADMAP Models for Parsing XML Documents XPath Language XQuery Language XML inside DBMSs 2.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
XML path expressions CSE 350 Fall 2003.
Lecture 11: Xpath/XQuery
End of XQuery DBMS Internals
Querying XML and Semistructured Data
Managing XML and Semistructured Data
Lecture 12: XML, XPath, XQuery
Introduction to Database Systems CSE 444 Lecture 12 More Xquery and Xquery in SQL Server April 25, 2008.
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Lecture 12: XQuery in SQL Server
Introduction to Database Systems CSE 444 Lecture 12 Xquery in SQL Server October 22, 2007.
Processing and Querying XML
Lecture 11: XML and Semistructured Data
Lecture 13: XQuery XML Publishing, XML Storage
XML, XML Schema, XPath and XQuery Query Languages
Presentation transcript:

XML: Schemas, Queries Wednesday, 4/17/2002 CSE544: Lecture 6 XML: Schemas, Queries Wednesday, 4/17/2002

Project Ideas System catalog for Piazza. Query optimization in Piazza. Update propagation in Piazza. (taken) Encryption, security in Piazza and the XML Toolkit Define events in the XMLTK. [talk to me] Improved index computation in the XML toolkit (SIX) [talk to me] Xpath containment/equivalence algorithm.

Outline XML DTDs [we skip XML Schema: please see slides from last lecture] XPath XQuery

Document Type Definitions DTD part of the original XML specification an XML document may have a DTD terminology for XML: well-formed: if tags are correctly closed valid: if it has a DTD and conforms to it validation is useful in data exchange

Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Very Simple DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> <phone> 6789 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> <product> ... </product> ... </company>

Very Simple DTD company * * person product ? * phone name pid description ssn office Graph representation of a DTD (notice: not necessarily a tree) Lose order information (office before or after phone ?)

Content Model Element content: what we can put in an element (aka content model) Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA | A | B | C)* (i.e. very restrictied)

Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIS person age CDATA #REQUIRED> <person age=“25”> <name> ....</name> ... </person>

Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ... </person>

Attributes in DTDs Types: CDATA = string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday | Wednesday | Friday) = enumeration NMTOKEN = must be a valid XML name NMTOKENS = multiple valid XML names ENTITY = you don’t want to know this

Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed

Using DTDs Must include in the XML document Either include the entire DTD: <!DOCTYPE rootElement [ ....... ]> Or include a reference to it: <!DOCTYPE rootElement SYSTEM “http://www.mydtd.org”> Or mix the two... (e.g. to override the external definition)

DTDs as Grammars <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> <paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section> </paper>

DTDs as Grammars A DTD = a grammar A valid XML document = a parse tree for that grammar

DTDs as Schemas Not so well suited: impose unwanted constraints on order <!ELEMENT person (name,phone)> references cannot be constrained can be too vague: <!ELEMENT person ((name|phone|email)*)>

From DTDs to Relational Schemas Shanmugasundaram et al. Relational databases for querying XML documents” Design a relational schema for: <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office?, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Where Should We Store XML Data ? “Native” XML databases: Total market in 2001: only $77 million [research firm IDC] But growing dramatically (six fold per year) Main players: Software AG with Tamino XML: 40.5% share of the market. eXcelon (formerly ObjectStore): 23.3% Computer Associates: 19.4% Poet Software: 1.3% Remaining 15%: Linux-based open source tools (many based around the dbXML core), startups (Ipedo, Ixiasoft) What is the risk for these companies ?

XPath http://www.w3.org/TR/xpath (11/99) Building block for other W3C standards: XSL Transformations (XSLT) XML Link (XLink) XML Pointer (XPointer) XML Query Was originally part of XSL

Example for XPath Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib>

Data Model for XPath The root The root element . . . . bib book book publisher author . . . . Addison-Wesley Serge Abiteboul

XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> Result: empty (there were no papers) /bib/paper/year

XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> Result: <first-name> Rick </first-name> /bib//first-name

Xpath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: text() = matches the text value node() = matches any node (= * or @* or text()) name() = returns the name of the current tag

Xpath: Wildcard Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element //author/*

Xpath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute

Xpath: Qualifiers /bib/book/author[firstname] Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>

Xpath: More Qualifiers Result: <lastname> … </lastname> <lastname> … </lastname> /bib/book/author[firstname][address[//zip][city]]/lastname

Xpath: More Qualifiers /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]

Xpath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…

Xpath: More Details An Xpath expression, p, establishes a relation between: A context node, and A node in the answer set In other words, p denotes a function: S[p] : Nodes -> {Nodes} Examples: author/firstname . = self .. = parent part/*/*/subpart/../name = part/*/*[subpart]/name

The Root and the Root <bib> <paper> 1 </paper> <paper> 2 </paper> </bib> bib is the “document element” The “root” is above bib /bib = returns the document element / = returns the root Why ? Because we may have comments before and after <bib>; they become siblings of <bib> This is advanced xmlogy

Xpath: More Details We can navigate along 13 axes: ancestor ancestor-or-self attribute child descendant descendant-or-self following following-sibling namespace parent preceding preceding-sibling self

Xpath: More Details Examples: What does this mean ? child::author/child:lastname = author/lastname child::author/descendant::zip = author//zip child::author/parent::* = author/.. child::author/attribute::age = author/@age What does this mean ? paper/publisher/parent::*/author /bib//address[ancestor::book] /bib//author/ancestor::*//zip

Xpath: Even More Details name() = the name of the current node /bib//*[name()=book] same as /bib//book What does this mean ? /bib//*[ancestor::*[name()!=book]] In a different notation bib.[^book]*._ Navigation axis give us strictly more power !

XQuery Based on Quilt (which is based on XML-QL) http://www.w3.org/TR/xquery/ 2/2001 XML Query data model (of course ) Ordered !

FLWR (“Flower”) Expressions FOR ... LET... FOR... LET... WHERE... RETURN...

XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: <title> abc </title> <title> def </title> <title> ghi </title>

XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } </result> distinct = a function that eliminates duplicates

XQuery Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <author> Smith </author> <title> ghi </title>

XQuery FOR $x in expr -- binds $x to each value in the list expr LET $x = expr -- binds $x to the entire list expr Useful for common subexpressions and for aggregations

XQuery count = a (aggregate) function that returns the number of elms <big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN { $p } </big_publishers> count = a (aggregate) function that returns the number of elms

XQuery Find books whose price is larger than average: LET $a=avg(document("bib.xml")/bib/book/price) FOR $b in document("bib.xml")/bib/book WHERE $b/price > $a RETURN { $b }

XQuery Summary: FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses List of tuples WHERE Clause List of tuples RETURN Clause Instance of Xquery data model

FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

FOR v.s. LET FOR $x IN document("bib.xml")/bib/book Returns: <result> <book>...</book></result> ... FOR $x IN document("bib.xml")/bib/book RETURN <result> { $x } </result> LET $x IN document("bib.xml")/bib/book RETURN <result> { $x } </result> Returns: <result> <book>...</book> <book>...</book> ... </result>

Collections in XQuery Ordered and unordered collections /bib/book/author = an ordered collection Distinct(/bib/book/author) = an unordered collection LET $a = /bib/book  $a is a collection $b/author  a collection (several authors...) Returns: <result> <author>...</author> <author>...</author> ... </result> RETURN <result> { $b/author } </result>

Collections in XQuery What about collections in expressions ? $b/price  list of n prices $b/price * 0.7  list of n numbers $b/price * $b/quantity  list of n x m numbers ?? $b/price * ($b/quant1 + $b/quant2)  $b/price * $b/quant1 + $b/price * $b/quant2 !!

Sorting in XQuery <publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> { $p/text() } </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>

Sorting in XQuery Sorting arugments: refer to the name space of the RETURN clause, not the FOR clause

If-Then-Else FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> SORTBY (title)

Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN { $b/title }

Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN { $b/title }

Other Stuff in XQuery BEFORE and AFTER FILTER Recursive functions for dealing with order in the input FILTER deletes some edges in the result tree Recursive functions Currently: arbitrary recursion Perhaps more restrictions in the future ?