Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to XQuery: the W3C XML Query Language Mary Fernandez AT&T Labs - Research Information and Software Systems Research Florham Park, NJ 2004.

Similar presentations


Presentation on theme: "An Introduction to XQuery: the W3C XML Query Language Mary Fernandez AT&T Labs - Research Information and Software Systems Research Florham Park, NJ 2004."— Presentation transcript:

1 An Introduction to XQuery: the W3C XML Query Language Mary Fernandez AT&T Labs - Research Information and Software Systems Research Florham Park, NJ 2004

2 2 XML ● Standard, flexible syntax for data exchange ● Regular, structured data Database content of all kinds: Inventory, billing, orders, … “Small” typed values ● Irregular, unstructured text Documents of all kinds: Transcripts, books, legal briefs, … “Large” untyped values ● Lingua franca of B2B Applications… ● Increase access to products & services ● Integrate disparate data sources ● Automate business processes ● … and numerous other application domains ● Bio-informatics, library science, …

3 3 XML : A First Look ● XML document describing catalog of books No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc. 17.60 Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended. Mixed content

4 4 Simple Data-Exchange Scenario ● Data suppliers & brokers negotiate format ● Agreement benefits both parties ● Supplier increases value of data ● Broker gains uniform access to many suppliers’ data ● Broker only interested in subset of suppliers’ data Supplier Broker Supplier XML ?

5 5 XML & Data Exchange ● Automate data-exchange process ● Reduce redundancy, error, labor cost ● Industry/consortium defines common vocabulary ● Develop data model ● Parties negotiate semantics & syntax of data ● Document Type Descriptions (DTDs) or XML Schema ● Contract between data producers & consumers ● Classes of XML vocabularies ● Vertical – relevant to single industry ● Horizontal – common business artifact ● Framework – common business processes (discovery, negotiation, transactions, …)

6 6 XML Application Domains http://xml.coverpages.org/gen-apps.html http://www.xml.org/xml/industry_industrysectors.jsp

7 7 XML Food Chain Data Exchange Format XML Schema, Formatting, Querying, XML Schema, XSLT, XQuery, … Web Services Infrastructure SOAP, WSDL, UDDI, … Business Process Frameworks BEA, IBM WebSphere, WebMethods… XML Dialects BPML, ebXML, CommerceXML Enabling Technology: Does NOT solve hard problems Removes excuses for ignoring them

8 8 Outline ● Long-winded introduction ● A Survey of XQuery ● Demonstration of XQuery

9 9 XQuery 1.0 ● Functional, strongly-typed query language ● XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules Modularize large queries Process recursive data + Strong typing Checks values of required type (operator, function) Guarantees result value instance of output type Enforced statically or dynamically

10 10 XSLT vs. XQuery ● XSLT 1.0: XML  XML, HTML, Text ● Loosely-typed scripting language ● Format XML in HTML for display in browser ● Must be highly tolerant of variability/errors in data ● XQuery 1.0: XML  XML ● Strongly-typed query language ● Large-scale database access ● Must guarantee safety/correctness of operations on data ● Over time, XSLT & XQuery may both serve needs of many application domains ● XQuery will become a hidden, commodity language

11 11 Navigation, Selection, Extraction ● Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title No Such Thing As A Bad Day ● Publications with Jerome Simeon as author or editor ●$cat//*[(author|editor) = “Jerome Simeon”] XQuery from the Experts … XQuery Formal Semantics …

12 12 Transformation & Construction ● First author & title of books published by A/W for $b in $cat//book[publisher = “Addison Wesley”] return { $b/author[1], $b/title } Don Chamberlin XQuery from the Experts

13 13 Integration ● For each author, return number of books and receipts books published in past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml“), Joinwww.bn.com/catalog.xml $sales := fn:doc(“www.publishersweekly.com/sales.xml“)www.publishersweekly.com/sales.xml for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2000 and author = $a], S.J. $receipts := $sales/book[@isbn = $books/@isbn]/receipts order by $author Ordering return XML Construction { $author } { fn:count($books) } Aggregation { fn:sum($receipts) }

14 14 Recursive Processing ● Recursive functions support recursive data => declare function partCount($p as element(part)) as element(partCt) { { $p1/@id, for $p2 in $p/part return partCount($p2) } }

15 15 XML Schema Languages ● Many variants… ● DTDs, XML Schema, RELAX-N/G, XDuce ● … with similar goals to define ● Types of literal (terminal) data ● Names of elements & attribute ● “Vertical” & “horizontal” structure of documents ● XQuery designed to support (all of) XML Schema ● Structural & name constraints over types ● Regular tree expressions over elements, attributes, atomic types

16 16 XML Schema: A First Look

17 17 XQuery Types ● Element declaration declare element catalog of type Catalog ● Attribute declaration declare attribute currency of type xsd:string ● Simple type declare type Title { xsd:string } Facets – constrain lexical ( “DDD-DDD” ) & value space ( min/max ) ● Complex type ● Regular tree expression over elements, attributes, simple types declare type Price { attribute currency, xsd:float } declare type Catalog { element(book)*, element(spec)* } declare type Review mixed { ( element(reviewer) | element(title) )* } ● Operators also include: ? (optional), + (one-or-more)

18 18 More on XQuery Types ● Derive new types from existing types ● Restriction declare type BookList restricts Catalog { element(book)+ } ● Extension declare type BookPrice extends Price { attribute(kind) } ● Express constraints on structure only ● No methods or behavioral constraints ● Schema validation ● Check contractual obligation: Given (untyped) input document Check if valid w.r.t. schema Yield (typed) output document ● XQuery operates on schema-validated documents ● Well-formed documents labeled with “no-known-type” types

19 19 Navigation, Selection, Extraction ● Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title No Such Thing As A Bad Day element(title)* ● Publications with Jerome Simeon as author or editor $cat//*[(author|editor) = “Jerome Simeon”] XQuery from the Experts …, XQuery Formal Semantics … ( element(book) | element(spec) ) *

20 20 Transformation & Construction ● First author & title of books published by A/W validate { for $b in $cat//book[publisher = “Addison Wesley”] return { $b/author[1], $b/title } } ● Infers type of expression: element awbook { element author, element title } + ● Checks type matches global declaration of: element(awbook)

21 21 Outline ● Long-winded introduction ● A Survey of XQuery ● Demonstration of XQuery ●Galax : complete, open-source implementation of XQuery 1.0 ●Joint work with Jérôme Siméon (IBM Watson Research) ●http://www.galaxquery.org

22 22 TeXQuery : Full-text extensions ● Text search & querying of structured content ● Limited support in XQuery 1.0 ● String operators with collation sequences $cat//book[contains(review/text(), “two thumbs up”)] ● Stop words, proximity searching, ranking Ex: “Tony Blair” within two words of “George Bush” ● Phrases that span tags and annotations Ex: Match “Mr. English sponsored the bill” in Mr. English for himself and Mr.Coyne sponsored the bill in the Committee for Financial Services

23 23 Growing XQuery ● Language-specific APIs ● Type-safe embeddings of XQuery Can/should there be an IDL for XML schema languages? ● Avoid “throw-data-over-the-wall” problem ● Modules & separate compilations ● Possible Version 2.0 Features ● Updates Typing & non-interference of update statements & expressions ● Exceptions “Throw” but no “catch” in Version 1.0 Interaction with updates : transactional/recoverable semantics? ● Parametric polymorphism Ad-hoc polymorphism (overloaded built-in operators), choice, derivation by restriction & extension in Version 1.0 Type variables, polymorphic types to increase reuse

24 24 Galax ● Web site: http://www.galaxquery.orghttp://www.galaxquery.org ● Version 0.4.0 : Source and Linux binaries ● Aligned with July 2004 XQuery Working Drafts ● APIs for C, Java, O'Caml ● Team ●Jérôme Siméon (IBM Watson Research) Byron Choi (U. Pennsylvania) Vladimir Gapeyev (U. Pennsylvania) Jan Hidders (U. Antwerpen) Amélie Marian (Columbia U.) Philippe Michiels (U. Antwerpen) Roel Vercammen (U. Antwerpen) Nicola Onese (UCSD) Douglas Petkanics (U. Pennsylvania) Christopher Re (U. Washington) Gargi Sur (U. Florida) Avinash Vyas (Lucent Bell Labs) Philip Wadler (U. Endinburgh)

25 25 Non-subliminal Advertising

26 26 Validated XML ● Validation : Document + Schema => ● Elements & attributes “annotated” with types ● Well-formed, unvalidated documents annotated with default types No Such Thing as a Bad Day Hamilton Jordan Longstreet Press, Inc. 17.60 Publisher : This book is the moving account of one man's successful battles against three cancers... No Such Thing as a Bad Day is warmly recommended.

27 27 Literals & Constants ● Strings “hello world” ● Booleans fn:true() fn:false() ● Avoid lexical conflicts with, e.g., //false $v/flag/true ● Numbers 12 [integer], 10.3E2 [double], 1.0 [decimal] xs:decimal(“1.0”) xs:unsignedLong(“22222222222222”) ● Dates, times, & (totally ordered) durations xs:date("2000-01-01") xs:time(“04:20:00") xdt:dayTimeDuration("P21D") xdt:yearMonthDuration("P1Y2M") ● User-defined atomic types mycompany:inventory-id(“XXX-123")

28 28 Functions & Operators ● Arithmetic & comparison operators ● Numerics 1.0 + 10.E2 (-, *, div, idiv, mod) 1900, =, =) ● Dates/Times/Durations xs:date(“2004-01-01”) + xdt:dayTimeDuration(“P10D”) xs:date(“2004-01-01”) >= xs:date(“2002-10-10”) ● Nodes //incision >) ● Built-in functions ● Strings fn:starts-with(“WWW 2004”, “WWW”) fn:matches(“WWW 2004”, “^W*”) ● Sequences fn:avg((1,2,3,4)) fn:distinct-values(//price) ● All other XML Schema primitive types …

29 29 Selection & Projection ● Titles of all books published by Longstreet Press $cat/catalog/book[publisher=“Longstreet Press”]/title => No Such Thing As A Bad Day ● Publications with Jêróme Siméon as author or editor $cat//*[(author|editor) = “Jêróme Siméon”] => XQuery from the Experts.., XQuery 1.0 Formal Semantics … ● Books with “good” reviews $cat//book[fn:contains(review/text(), “2 thumbs up”)]

30 30 Sources of Input ● Several ways to access inputs ● Document function fn:doc(“http://www.books.com/catalog.xml”)http://www.books.com/catalog.xml fn:doc( Expr ) ● Variables ● Bound in for expression or in host language $cat/catalog/book ● Context node – implicit “dot” variable ● Bound in host-language environment.//surgery[//anesthesia[1] << //incision[1]] ● Missing “dot” means root node that contains dot //show == fn:root(.)//show

31 31 Sequences & Iteration ● Sequence constructor Return all books followed by all W3C specifications ($cat/catalog/book, $cat/catalog/W3Cspec) Return all books & W3C specifications in doc order $cat/catalog/(book|W3Cspec) ● For Expression ● Similar to map : apply function to each item in sequence Return number of authors in each book for $b in $cat/catalog/book return fn:count($b/authors) => (3,1,2,…)

32 32 Conditional & Quantified ● Conditional if //show[year >= 2000] then “A-OK!” else “Error!” ● Existential quantification ● Implicit meaning of predicate expressions //show[year >= 2000] ● Explicit expression: //show[some $y in./year satisfies $y >= 2000] ● Universal quantification //show[every $y in year satisfies $y >= 2000]

33 33 Type-aware Operators ● Casting ● Casts can raise dynamic errors “2004-01-01” cast as xs:date == xs:date(“2004-01-01”) “XXX-123” cast as mycompany:inventory-id ● Castable checks permissibility of cast if ($invstr castable as mycompany:inventory-id) then $invstr cast as mycompany:inventory-id else () ● Instance-of permits dynamic type inquiry $invid instance of mycompany:inventory-id

34 34 Variability in XML Data ● Replication, absence of XML values ● Demands flexible semantics for selection ● Selection ●//show[year >= 2000] ● Explicit expression: ●//show[some $v in./year satisfies fn:data($v) ge 2000] ● Existence/absence of value ●//show/reviewer[following-sibling::rating] ● Explicit expression: ●//show/reviewer[not fn:empty(./following-sibling::rating)

35 35 Variability in Schemas ● Document with typed values & un-validated text ● Demands flexible, but consistent, semantics ● Permissive conversion from text to typed values ● ● 45.50 ● ● /book/price * 0.07 ● Strict interpretation of typed values Dynamic type error is fatal 45.50 /book/@isbn * 0.07


Download ppt "An Introduction to XQuery: the W3C XML Query Language Mary Fernandez AT&T Labs - Research Information and Software Systems Research Florham Park, NJ 2004."

Similar presentations


Ads by Google