Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Model and Processing Transparency No. 1 Query XML Documents with XQuery Cheng-Chia Chen.

Similar presentations


Presentation on theme: "XML Model and Processing Transparency No. 1 Query XML Documents with XQuery Cheng-Chia Chen."— Presentation transcript:

1 XML Model and Processing Transparency No. 1 Query XML Documents with XQuery Cheng-Chia Chen

2 Introduction to XLink Transparency No. 2 Objectives How XML generalizes relational databases The XQuery language How XML may be supported in databases

3 Introduction to XLink Transparency No. 3 XQuery 1.0 XML documents naturally generalize database relations XQuery is the corresponding generalization of SQL

4 Introduction to XLink Transparency No. 4 Queries on XML documents XML documents generalize relational data: relations (tables) tuples (record; rows) attributes (fields; columns)

5 Introduction to XLink Transparency No. 5 From Relations to Trees

6 Introduction to XLink Transparency No. 6 Only Some Trees are Relations A relation is a tree of height two with an unbounded number of children (rows) all of which have the same fixed number of child nodes (columns) The database community has been looking for a richer data model than relations. Hierarchical, object-oriented, or multi-dimensional databases have emerged, but neither has reached consensus. A DTD for RDB :

7 Introduction to XLink Transparency No. 7 Trees Are Not Relations Not all XML Document trees satisfy the previous characterization Trees are ordered, while both rows and columns of tables may be permuted without changing the meaning of the data Trees may have height > 2. Trees is in general not homogeneous.  Elements of the same type may have different number and types of children. This does not mean that we cannot store these documents in traditional data base but only that a more complex encoding of xml documents into data base tables is necessary.

8 Introduction to XLink Transparency No. 8 An examlpe student grade records Joe Average 21 Biology Jack Doe 18 Physics XML Science

9 Introduction to XLink Transparency No. 9 A Corresponding Student Database key point of data base design: decompose general xml tree into flat data base tables

10 Introduction to XLink Transparency No. 10 For Relational Data base, we have well-established theory and practice, can we have the same achievement for XML ? database XML DDL(DataDefintionLang) XML Schema DQL(DataQueryLang;SQL) XQuery How should query languages like SQL be similarly generalized?

11 Introduction to XLink Transparency No. 11 Usage Scenarios of an XML query language For data-oriented (highly structured) languages We want to carry over the kinds of queries that we performed in the original relational model query (virtual) XML representations of databases, transform data into new XML representations, integrate data from multiple heterogeneous data sources For document-oriented (semi-structured) languages Queries could be used to retrieve parts of documents to provide dynamic indexes to perform context-sensitive searching to generate new documents as combinations of existing documents Studied in the domain of informational retrieval (IR)

12 Introduction to XLink Transparency No. 12 Usage Scenarios For programming/protocol languages Queries could be used to automatically generate documentation, like javadoc. For hybrid languages Queries could be used to data mine hybrid data, such as patient records to perform queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents

13 Introduction to XLink Transparency No. 13 XQuery Design Requirements Must have at least one XML syntax and at least one human-readable syntax Must be declarative Must be namespace aware Must coordinate with XML Schema Must support simple and complex datatypes Must combine information from multiple documents Must be able to transform and create XML trees

14 Introduction to XLink Transparency No. 14 The XQuery language Developed by W3C, currently at the level of a Working Draft.currently Derived from several previous proposals: XML-QL YATL Lorel Quilt which all agree on the fundamental principles. XQuery relies on XPath and XML Schema datatypes. Although not finalized yet, many commercial implementations have been released.commercial implementations Two formats XQuery : plaintext syntax XQueryX : XML syntax.

15 Introduction to XLink Transparency No. 15 Relationship to XPath XQuery 1.0 is a strict superset of XPath 2.0 Every XPath 2.0 expression is directly an XQuery 1.0 expression (a query) The extra expressive power is the ability to join information from different sources and generate new XML fragments

16 Introduction to XLink Transparency No. 16 Relationship to XSLT XQuery and XSLT are both domain-specific languages for combining and transforming XML data from multiple sources They are vastly different in design, partly for historical reasons XQuery is designed from scratch, XSLT is an intellectual descendant of CSS (cascaded StyleSheet) – about how to render a document. Technically, they may emulate each other. XQuery Grammar

17 Introduction to XLink Transparency No. 17 XQuery concepts A query in XQuery is an expression that: reads a sequence of XML fragments or atomic values returns a sequence of XML fragments or atomic values The principal forms of XQuery expressions are: path expressions element constructors FLWR ("flower") expressions list expressions conditional expressions quantified expressions datatype expressions

18 Introduction to XLink Transparency No. 18 XQuery Modules and Prologs An XQuery is: a main module + zero or more library modules. MainModule ::= Prolog QueryBobyMainModule LibraryModule ::= ModuleDecl PrologLibraryModule Module ::= VersionDecl? (MainModule | LibraryModule)VersionDecl Like XPath expressions, XQuery expressions are evaluated relatively to a context This is explicitly provided by a prolog Settings define various parameters for the XQuery processor language, such as: xquery version "1.0"; (: this is verisondecl :) declare xmlspace preserve; declare xmlspace strip;

19 Introduction to XLink Transparency No. 19 More From the Prolog PrologProlog ::= ( (DefaultNamespaceDecl | Setter | NamespaceDecl | Import)DefaultNamespaceDeclSetterNamespaceDeclImport Separator )*Separator ( (VarDecl | FunctionDecl | OptionDecl)VarDeclFunctionDeclOptionDecl Separator )*Separator [21] SchemaImport ::= "import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)?SchemaImportSchemaPrefixURILiteral [22] SchemaPrefix ::= ("namespace" NCName "=") | ("default" "element" "namespace")SchemaPrefixNCName [23] ModuleImport ::= "import" "module" ("namespace" NCName "=")? URILiteral ("at" URILiteral ("," URILiteral)*)?ModuleImportNCNameURILiteral (: for element and type :) declare default element namespace URI; declare default function namespace URI; import schema namespace soap="http://.../soap-envelope" at "http://.../soap-envelope/"; declare namespace NCName = URI;

20 Introduction to XLink Transparency No. 20 Implicit Declarations declare namespace xml = "http://www.w3.org/XML/1998/namespace"; (: for xml schema and schema instance :) xs = "http://www.w3.org/2001/XMLSchema"; xsi = "http://www.w3.org/2001/XMLSchema-instance"; (: for xpath fucntion and data types :) fn = "http://www.w3.org/2005/11/xpath-functions"; (: for xquery fucntions :) local="http://www.w3.org/2005/11/xquery-local-functions"; only xml prefix cannot be redefined.

21 Introduction to XLink Transparency No. 21 Xquery Conecpts QueryBody = Expr [31] Expr ::= ExprSingle ("," ExprSingle)*ExprExprSingle [32] ExprSingle ::= FLWORExpr | QuantifiedExpr | TypeswitchExpr | IfExpr | OrExprExprSingleFLWORExprQuantifiedExprTypeswitchExprIfExprOrExpr Expressions are evaluated relative to a context: namespaces variables functions date and time context item (current node or atomic value) context position (in the sequence being processed) context size (of the sequence being processed)

22 Introduction to XLink Transparency No. 22 XPath Expressions XPath expressions are also legal XQuery expressions same data model : sequence of items items : nodes or atomic data The XQuery prolog gives the required static context namespace, function library, variable bindings The initial context node, position, and size are undefined Since XQuery is intended to operate on multple docuemnts. initial context info can be obtained by starting from fn:doc()

23 Introduction to XLink Transparency No. 23 Datatype Expressions Same atomic values as XPath 2.0 Also lots of primitive simple values: xs:string("XML is fun") xs:boolean("true") xs:decimal("3.1415") xs:float("6.02214199E23") xs:dateTime("1999-05-31T13:20:00-05:00") xs:time("13:20:00-05:00") xs:date("1999-05-31") xs:gYearMonth("1999-05") xs:gYear("1999") xs:hexBinary("48656c6c6f0a") xs:base64Binary("SGVsbG8K") xs:anyURI("http://www.brics.dk/ixwt/") xs:QName("rcp:recipe") Functions for theses types can be found in fn:XXX().

24 Introduction to XLink Transparency No. 24 XML Expressions XQuery expressions may compute new XML nodes Expressions may denote element, character data, comment, and processing instruction nodes Each node is created with a unique node identity Constructors may be either direct (literal template ) or computed

25 Introduction to XLink Transparency No. 25 Uses the standard XML syntax Ex: The expression baz evaluates to the given XML fragment Note that is evaluates to false why ? ( unique identity of each node ) Direct Constructors

26 Introduction to XLink Transparency No. 26 Namespaces in Constructors (1/3) The following three constructions have the same effect. declare default element namespace "http://businesscard.org"; John Doe CEO, Widget Inc. john.doe@widget.com (202) 555-1414

27 Introduction to XLink Transparency No. 27 Namespaces in Constructors (2/3) declare namespace b = "http://businesscard.org"; John Doe CEO, Widget Inc. john.doe@widget.com (202) 555-1414

28 Introduction to XLink Transparency No. 28 Namespaces in Constructors (3/3) John Doe CEO, Widget Inc. john.doe@widget.com (202) 555-1414

29 Introduction to XLink Transparency No. 29 Enclosed Expressions Called attribuate value template in XSLT can be appled to element contents also. 1 2 3 4 5 {1, 2, 3, 4, 5} {1, "2", 3, 4, 5} {1 to 5} 1 {1+1} {" "} {"3"} {" "} {4 to 5} ‘ { ‘ now has special meaning and must be escaped using {

30 Introduction to XLink Transparency No. 30 Explicit Constructors John Doe CEO, Widget Inc. john.doe@widget.com (202) 555-1414 element card { namespace { "http://businesscard.org" }, element name { text { "John Doe" } }, element title { text { "CEO, Widget Inc." } }, element email { text { "john.doe@widget.com" } }, element phone { text { "(202) 555-1414" } }, element logo { attribute uri { "widget.gif" } }}

31 Introduction to XLink Transparency No. 31 Computed QNames element { "card" } { namespace { "http://businesscard.org" }, element { "name" } { text { "John Doe" } }, element { "title" } { text { "CEO, Widget Inc." } }, element { "email" } { text { "john.doe@widget.com" } }, element { "phone" } { text { "(202) 555-1414" } }, element { "logo" } { attribute { "uri" } { "widget.gif" } }

32 Introduction to XLink Transparency No. 32 element { if ($lang= ” zh_TW “ ) then ” 名片 " else "card" } { namespace { "http://businesscard.org" }, element { if ($lang= “ zh_TW ” ) then ” 姓名 " else "name" } { text { "John Doe" } }, element { if ($lang= ” zh_TW “ ) then ” 頭銜 " else "title" } { text { "CEO, Widget Inc." } }, element { "email" } { text { "john.doe@widget.inc" } }, element { if ($lang= “ zh_TW ” ) then ” 電話 " else "phone"} { text { "(202) 456-1414" } }, element { "logo" } { attribute { "uri" } { "widget.gif" } }} Biliingual Business Cards

33 Introduction to XLink Transparency No. 33 FLWOR Expressions Used for general queries: Find the names of all students which have more than one major and order them by student id. { for $s in fn:doc("students.xml")//student let $m := $s/major where fn:count($m) ge 2 order by $s/@id return { $s/name/text() } }

34 Introduction to XLink Transparency No. 34 The Difference Between For and Let (1/4) for $x in (1, 2, 3, 4) let $y := ("a", "b", "c") return ($x, $y) 1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c

35 Introduction to XLink Transparency No. 35 The Difference Between For and Let (2/4) let $x in (1, 2, 3, 4) for $y := ("a", "b", "c") return ($x, $y) 1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c

36 Introduction to XLink Transparency No. 36 The Difference Between For and Let (3/4) for $x in (1, 2, 3, 4) for $y in ("a", "b", "c") return ($x, $y) 1, a, 1, b, 1, c, 2, a, 2, b, 2, c, 3, a, 3, b, 3, c, 4, a, 4, b, 4, c

37 Introduction to XLink Transparency No. 37 The Difference Between For and Let (4/4) let $x := (1, 2, 3, 4) let $y := ("a", "b", "c") return ($x, $y) 1, 2, 3, 4, a, b, c

38 Introduction to XLink Transparency No. 38 Computing Joins Find distinct names of recipses that make use of at least one stuff in a refrigerator? declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; for $r in fn:doc("recipes.xml")//rcp:recipe for $i in $r//rcp:ingredient/@name for $s in fn:doc("fridge.xml")//stuff[text()=$i] return fn:distinct($r/rcp:title/text()) eggs olive oil ketchup unrecognizable moldy thing

39 Introduction to XLink Transparency No. 39 Inverting a Relation From Recipe  ingredients to ingredietn  recipes declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; { for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name ) return { for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] return $r/rcp:title/text() } }

40 Introduction to XLink Transparency No. 40 Sorting the Results declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; { for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name ) order by $i return { for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] order by $r/rcp:title/text() return $r/rcp:title/text() } }

41 Introduction to XLink Transparency No. 41 A More Complicated Sorting for $s in document("students.xml")//student order by fn:count($s/results/result[fn:contains(@grade,"A")]) descending, fn:count($s/major) descending, xs:integer($s/age/text()) ascending return $s/name/text()

42 Introduction to XLink Transparency No. 42 Using Functions declare function local:grade($g) { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0 }; declare function local:gpa($s) { fn:avg(for $g in $s/results/result/@grade return local:grade($g)) }; { for $s in fn:doc("students.xml")//student return }

43 Introduction to XLink Transparency No. 43 A Height Function Find the heigth of an node tree. declare function local:height($x) { if (fn:empty($x/*)) then 1 else fn:max(for $y in $x/* return local:height($y))+1 };

44 Introduction to XLink Transparency No. 44 A Textual Outline intended textual outline of a recipe unusual in that it generate plain text instead of xml output. Cailles en Sarcophages pastry chilled unsalted butter flour salt ice water filling baked chicken marinated chicken small chickens, cut up Herbes de Provence dry white wine orange juice minced garlic truffle oil...

45 Introduction to XLink Transparency No. 45 Computing Textual Outlines declare namespace rcp = "http://www.brics.dk/ixwt/recipes"; declare function local:ingredients($i,$p){ fn:string-join( (:$p is iden spaces :) for $j in $i/rcp:ingredient return fn:string-join(($p,$j/@name," ",local:ingredients($j,fn:concat($p," "))),""),"") }; declare function local:recipes($r) { fn:concat( $r/rcp:title/text()," ", local:ingredients($r," ")) }; fn:string-join( for $r in fn:doc("recipes.xml")//rcp:recipe[5] return local:recipes($r), ”” )

46 Introduction to XLink Transparency No. 46 Sequence Types 2 instance of xs:integer 2 instance of item() 2 instance of xs:integer? () instance of empty() () instance of xs:integer* (1,2,3,4) instance of xs:integer* (1,2,3,4) instance of xs:integer+ instance of item() instance of node() instance of element() instance of element(foo) /@bar instance of attribute() /@bar instance of attribute(bar) fn:doc("recipes.xml")//rcp:ingredient instance of element()+ fn:doc("recipes.xml")//rcp:ingredient instance of element(rcp:ingredient)+

47 Introduction to XLink Transparency No. 47 An Untyped Function declare function local:grade($g) { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0 };

48 Introduction to XLink Transparency No. 48 A Default Typed Function declare function local:grade($g as item()*) as item()* { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0 };

49 Introduction to XLink Transparency No. 49 A Precisely Typed Function declare function local:grade($g as xs:string) as xs:decimal { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0 };

50 Introduction to XLink Transparency No. 50 Another Typed Function declare function local:grades($s as element(students)) as attribute(grade)* { $s/student/results/result/@grade };

51 Introduction to XLink Transparency No. 51 Runtime Type Checks Type annotations are checked during runtime A runtime type error is provoked when an actual argument value does not match the declared type a function result value does not match the declared type a valued assigned to a variable does not match the declared type

52 Introduction to XLink Transparency No. 52 Built-In Functions Have Signatures fn:contains($x as xs:string?, $y as xs:string?) as xs:boolean op:union($x as node()*, $y as node()*) as node()*

53 Introduction to XLink Transparency No. 53 XQueryX for $t in fn:doc("recipes.xml")/rcp:collection/rcp:recipe/rcp:title return $t <xqx:module xmlns:xqx="http://www.w3.org/2003/12/XQueryX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/12/XQueryX xqueryx.xsd"> t doc recipes.xml child rcp:collection child rcp:recipe child xqx:nodeName> rcp:title t

54 Introduction to XLink Transparency No. 54 XML Databases How can XML and databases be merged? Several different approaches: extract XML views of relations use SQL to generate XML shred XML into relational databases

55 Introduction to XLink Transparency No. 55 The Student Database Again

56 Introduction to XLink Transparency No. 56 Automatic XML Views (1/2) Columns as attributes

57 Introduction to XLink Transparency No. 57 Automatic XML Views (2/2) Column as elements 100026 Joe Average 21 100078 Jack Doe 18

58 Introduction to XLink Transparency No. 58 Programmable Views wrap query results by xml elements SQL/XML xmlelement(name, "Students", select xmlelement(name, "record", xmlattributes(s.id, s.name, s.age)) from Students ) xmlelement(name, "Students", select xmlelement(name, "record", xmlforest(s.id, s.name, s.age)) from Students )

59 Introduction to XLink Transparency No. 59 XML Shredding Shredding an XML documents into multiple data base Tables Each element type is represented by a relation Each element node is assigned a unique key in document order so siblings can be ordered. Each element node contains the key of its parent so parent –child relations can be maintained The possible attributes are represented as fields, where absent attributes have the null value Contents consisting of a single character data node (#PCDATA only) is inlined as a field.

60 Introduction to XLink Transparency No. 60 From XQuery to SQL Any XML document can be faithfully represented This takes advantage of the existing database implementation Queries must now be phrased in ordinary SQL rather than XQuery But an automatic translation is possible //rcp:ingredient[@name="butter"]/@amount select ingredient.amount from ingredient where ingredient.name="butter"

61 Introduction to XLink Transparency No. 61 Summary XML trees generalize relational tables XQuery similarly generalizes SQL XQuery and XSLT have roughly the same expressive power But they are suited for different application domains: data-centric vs. document-centric

62 Introduction to XLink Transparency No. 62 Essential Online Resources http://www.w3.org/TR/xquery/ http://www.w3.org/XML/Query/ miplementations: http://www.galaxquery.org/ http://exists.sf.net http://saxon.sf.net


Download ppt "XML Model and Processing Transparency No. 1 Query XML Documents with XQuery Cheng-Chia Chen."

Similar presentations


Ads by Google