Download presentation
Presentation is loading. Please wait.
Published byBennett Barton Modified over 9 years ago
1
1 Lecture 5: XML and XQuery
2
2 Semistructured Data uAnother data model, based on trees. uMotivation: flexible representation of data. wOften, data comes from multiple sources with differences in notation, meaning, etc. uMotivation: sharing of documents among systems and databases.
3
3 Graphs of Semistructured Data uNodes = objects. uLabels on arcs (attributes, relationships). uAtomic values at leaf nodes (nodes with no arcs out). uFlexibility: no restriction on: wLabels out of a node. wNumber of successors with a given label.
4
4 Example: Data Graph Bud A.B. Gold1995 MapleJoe’s M’lob beer bar manf servedAt name addr prize yearaward root The bar object for Joe’s Bar The beer object for Bud Notice a new kind of data.
5
5 XML uXML = Extensible Markup Language. uWhile HTML uses tags for formatting (e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”). uKey idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents.
6
6 Well-Formed and Valid XML uWell-Formed XML allows you to invent your own tags. wSimilar to labels in semistructured data. uValid XML involves a DTD (Document Type Definition), a grammar for tags.
7
7 Well-Formed XML uStart the document with a declaration, surrounded by. uNormal declaration is: “Standalone” = “no DTD provided.” uBalance of document is a root tag surrounding nested tags.
8
8 Tags uTags, as in HTML, are normally matched pairs, as …. uTags may be nested arbitrarily. uXML tags are case sensitive.
9
9 Example: Well-Formed XML Joe’s Bar Bud 2.50 Miller 3.00 … A NAME subobject A BEER subobject
10
10 XML and Semistructured Data uWell-Formed XML with nested tags is exactly the same idea as trees of semistructured data. uWe shall see that XML also enables nontree structures, as does the semistructured data model.
11
11 Example uThe XML document is: Joe’s Bar Bud2.50Miller3.00 PRICE BAR BARS NAME... BAR PRICE NAME BEER NAME
12
12 DTD Structure [ ( )>... more elements... ]>
13
13 DTD Elements uThe description of an element consists of its name (tag), and a parenthesized description of any nested tags. wIncludes order of subtags and their multiplicity. uLeaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags.
14
14 Example: DTD <!DOCTYPE BARS [ ]> A BARS object has zero or more BAR’s nested within. A BAR has one NAME and one or more BEER subobjects. A BEER has a NAME and a PRICE. NAME and PRICE are text.
15
15 Element Descriptions uSubtags must appear in order shown. uA tag may be followed by a symbol to indicate its multiplicity. w* = zero or more. w+ = one or more. w? = zero or one. uSymbol | can connect alternative sequences of tags.
16
16 Example: Element Description uA name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address: <!ELEMENT NAME ( (TITLE?, FIRST, LAST) | IPADDR )>
17
17 Use of DTD’s 1.Set standalone = “no”. 2.Either: a)Include the DTD as a preamble of the XML document, or b)Follow DOCTYPE and the by SYSTEM and a path to the file where the DTD can be found.
18
18 Example (a) <!DOCTYPE BARS [ ]> Joe’s Bar Bud 2.50 Miller 3.00 … The DTD The document
19
19 Example (b) uAssume the BARS DTD is in file bar.dtd. Joe’s Bar Bud 2.50 Miller 3.00 … Get the DTD from the file bar.dtd
20
20 Attributes uOpening tags in XML can have attributes. uIn a DTD, declares an attribute for element E, along with its datatype.
21
21 Example: Attributes Bars can have an attribute kind, a character string describing the bar. Character string type; no tags Attribute is optional opposite: #REQUIRED
22
22 Example: Attribute Use uIn a document that allows BAR tags, we might see: Akasaka Sapporo 5.00... Note attribute values are quoted
23
23 ID’s and IDREF’s uAttributes can be pointers from one object to another. wCompare to HTML’s NAME = “foo” and HREF = “#foo”. uAllows the structure of an XML document to be a general graph, rather than just a tree.
24
24 Creating ID’s uGive an element E an attribute A of type ID. uWhen using tag in an XML document, give its attribute A a unique value. uExample:
25
25 Creating IDREF’s uTo allow objects of type F to refer to another object with an ID attribute, give F an attribute of type IDREF. uOr, let the attribute have type IDREFS, so the F –object can refer to any number of other objects.
26
26 Example: ID’s and IDREF’s uLet’s redesign our BARS DTD to include both BAR and BEER subelements. Both bars and beers will have ID attributes called name. Bars have SELLS subobjects, consisting of a number (the price of one beer) and an IDREF theBeer leading to that beer. Beers have attribute soldBy, which is an IDREFS leading to all the bars that sell it.
27
27 The DTD <!DOCTYPE BARS [ ]> Beer elements have an ID attribute called name, and a soldBy attribute that is a set of Bar names. SELLS elements have a number (the price) and one reference to a beer. Bar elements have name as an ID attribute and have one or more SELLS subelements. Explained next
28
28 Example Document 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar …”/> …
29
29 Empty Elements uWe can do all the work of an element in its attributes. wLike BEER in previous example. Another example: SELLS elements could have attribute price rather than a value that is a price.
30
30 Example: Empty Element uIn the DTD, declare: uExample use: Note exception to “matching tags” rule
31
31 XPath Path Expressions Conditions
32
32 Paths in XML Documents uXPath is a language for describing paths in XML documents. uReally think of the semistructured data graph and its paths.
33
33 Example DTD <!DOCTYPE BARS [ ]>
34
34 Example Document 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar … ”/> …
35
35 Path Descriptors uSimple path descriptors are sequences of tags separated by slashes (/). uIf the descriptor begins with /, then the path starts at the root and has those tags, in order. uIf the descriptor begins with //, then the path can start anywhere.
36
36 Value of a Path Descriptor uEach path descriptor, applied to a document, has a value that is a sequence of elements. uAn element is an atomic value or a node. uA node is matching tags and everything in between. wI.e., a node of the semistructured graph.
37
37 Example: /BARS/BAR/PRICE 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar …”/> … /BARS/BAR/PRICE describes the set with these two PRICE elements as well as the PRICE elements for any other bars.
38
38 Example: //PRICE 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar …”/>… //PRICE describes the same PRICE elements, but only because the DTD forces every PRICE to appear within a BARS and a BAR.
39
39 Wild-Card * uA star (*) in place of a tag represents any one tag. uExample: /*/*/PRICE represents all price objects at the third level of nesting.
40
40 Example: /BARS/* 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar …”/> … /BARS/* captures all BAR and BEER elements, such as these.
41
41 Attributes uIn XPath, we refer to attributes by prepending @ to their name. uAttributes of a tag may appear in paths as if they were nested within that tag.
42
42 Example: /BARS/*/@name 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar …”/> … /BARS/*/@name selects all name attributes of immediate subelements of the BARS element.
43
43 Selection Conditions uA condition inside […] may follow a tag. uIf so, then only paths that have that tag and also satisfy the condition are included in the result of a path expression.
44
44 Example: Selection Condition u/BARS/BAR[PRICE < 2.75]/PRICE 2.50 3.00 … The condition that the PRICE be < $2.75 makes this price but not the Miller price satisfy the path descriptor.
45
45 Example: Attribute in Selection u/BARS/BAR/PRICE[@theBeer = “Miller”] 2.50 3.00 … Now, this PRICE element is selected, along with any other prices for Miller.
46
46 Axes uIn general, path expressions allow us to start at the root and execute steps to find a sequence of nodes at each step. uAt each step, we may follow any one of several axes. uThe default axis is child:: --- go to all the children of the current set of nodes.
47
47 Example: Axes u/BARS/BEER is really shorthand for /BARS/child::BEER. u@ is really shorthand for the attribute:: axis. wThus, /BARS/BEER[@name = “Bud” ] is shorthand for /BARS/BEER[attribute::name = “Bud”]
48
48 More Axes uSome other useful axes are: 1.parent:: = parent(s) of the current node(s). 2.descendant-or-self:: = the current node(s) and all descendants. wNote: // is really shorthand for this axis. 3.ancestor::, ancestor-or-self, etc.
49
49 XQuery Values FLWR Expressions Other Expressions
50
50 XQuery uXQuery extends XPath to a query language that has power similar to SQL. uXQuery is an expression language. wLike relational algebra --- any XQuery expression can be an argument of any other XQuery expression. wUnlike RA, with the relation as the sole datatype, XQuery has a subtle type system.
51
51 The XQuery Type System 1.Atomic values : strings, integers, etc. uAlso, certain constructed values like true(), date(“2004-09-30”). 2.Nodes. uSeven kinds. uWe’ll only worry about four, on next slide.
52
52 Some Node Types 1.Element Nodes are like nodes of semistructured data. uDescribed by !ELEMENT declarations in DTD’s. 2.Attribute Nodes are attributes, described by !ATTLIST declarations in DTD’s. 3.Text Nodes = #PCDATA. 4.Document Nodes represent files.
53
53 Example Document 2.50 3.00 … <BEER name = “Bud” soldBy = “JoesBar SuesBar … ”/> …
54
54 Example Nodes BARS PRICE BEERBAR name = “JoesBar” theBeer = “Miller” theBeer = “Bud” SoldBy = “…” name = “Bud” 3.002.50 Green = element Gold = attribute Purple = text
55
55 Document Nodes uForm: document(“ ”). uEstablishes a document to which a query applies. uExample: document(“/usr/ullman/bars.xml”)
56
56 FLWR Expressions 1.One or more for and/or let clauses. 2.Then an optional where clause. 3.A return clause.
57
57 Semantics of FLWR Expressions uEach for creates a loop. wlet produces only a local definition. uAt each iteration of the nested loops, if any, evaluate the where clause. uIf the where clause returns TRUE, invoke the return clause, and append its value to the output.
58
58 FOR Clauses for in,... uVariables begin with $. uA for-variable takes on each item in the sequence denoted by the expression, in turn. uWhatever follows this for is executed once for each value of the variable.
59
59 Example: FOR for $beer in document(“bars.xml”)/BARS/BEER/@name return {$beer} u$beer ranges over the name attributes of all beers in our example document. uResult is a list of tagged names, like Bud Miller... “Expand the en- closed string by replacing variables and path exps. by their values.”
60
60 LET Clauses let :=,... uValue of the variable becomes the sequence of items defined by the expression. uNote let does not cause iteration; for does.
61
61 Example: LET let $d := document(“bars.xml”) let $beers := $d/BARS/BEER/@name return {$beers} uReturns one element with all the names of the beers, like: Bud Miller …
62
62 Following IDREF’s uXQuery (but not XPath) allows us to use paths that follow attributes that are IDREF’s. uIf x denotes a sequence of one or more IDREF’s, then x =>y denotes all the elements with tag y whose ID’s are one of these IDREF’s.
63
63 Example uFind all the beer elements where the beer is sold by Joe’s Bar for less than 3.00. uStrategy: 1.$beer will for-loop over all beer elements. 2.For each $beer, let $joe be either the Joe’s- Bar element, if Joe sells the beer, or the empty sequence if not. 3.Test whether $joe sells the beer for < 3.00.
64
64 Example: The Query let $d := document(”bars.xml”) for $beer in $d/BARS/BEER let $joe := $beer/@soldBy=>BAR[@name=“JoesBar”] let $joePrice := $joe/PRICE[@theBeer=$beer/@name] where $joePrice < 3.00 return {$beer} Attribute soldBy is of type IDREFS. Follow each ref to a BAR and check if its name is Joe’s Bar. Find that PRICE subelement of the Joe’s Bar element that represents whatever beer is currently $beer. Only pass the values of $beer, $joe, $joePrice to the RETURN clause if the string inside the PRICE element $joePrice is < 3.00
65
65 Order-By Clauses uFLWR is really FLWOR: an order-by clause can precede the return. uForm: order by wWith optional ascending or descending. uThe expression is evaluated for each output element. uDetermines placement in output sequence.
66
66 Example: Order-By uList all prices for Bud, lowest first. let $d := document(“bars.xml”) for $p in $d/BARS/BAR/PRICE[@theBeer=”Bud”] order by $p return { $p }
67
67 Predicates uNormally, conditions imply existential quantification. uExample: /BARS/BAR[@name] means “all the bars that have a name.” uExample: /BARS/BAR[@name=”JoesBar”]/PRICE = /BARS/BAR[@name=”SuesBar”]/PRICE means “Joe and Sue have at least one price in common.”
68
68 Path Expression Examples Doc = &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122133 paper book paper references author title year http author title publisher author title page firstnamelastname firstname lastname firstlast Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib/paper = Bib/book/publisher = Bib/paper/author/lastname = Bib/paper = Bib/book/publisher = Bib/paper/author/lastname = Note that order of elements matters!
69
69 FOR vs. LET: Example FOR $x IN document("bib.xml") /bib/book RETURN $x FOR $x IN document("bib.xml") /bib/book RETURN $x Returns:... LET $x IN document("bib.xml") /bib/book RETURN $x LET $x IN document("bib.xml") /bib/book RETURN $x Returns:...
70
70 XQuery Example 1 Find all book titles published after 1995: FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title Result: abc def ghi
71
71 XQuery Example 2 For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t distinct = a function that eliminates duplicates (after converting inputs to atomic values)
72
72 Results for Example 2 Jones abc def Smith ghi Observe how nested structure of result elements is determined by the nested structure of the query.
73
73 XQuery Example 3 count = (aggregate) function that returns the number of elements FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p For each publisher p - Let the list of books published by p be b Count the # books in b, and return p if b > 100
74
74 XQuery Example 4 Find books whose price is larger than average: LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b
75
75 Collections in XQuery uOrdered and unordered collections w/bib/book/author = an ordered collection wDistinct(/bib/book/author) = an unordered collection uExamples: wLET $a = /bib/book $a is a collection; stmt iterates over all books in collecion w$b/author also a collection (several authors...) RETURN $b/author Returns a single collection!... However:
76
76 Collections in XQuery What about collections in expressions ? u$b/price list of n prices u$b/price * 0.7 list of n numbers?? u$b/price * $b/quantity list of n x m numbers ?? wValid only if the two sequences have at most one element wAtomization u$book1/author eq "Kennedy" - Value Comparison u$book1/author = "Kennedy" - General Comparison
77
77 Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN $b/title, $b/price FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN $b/title, $b/price
78
78 Conditional Expressions: If-Then- Else FOR $h IN //holding ORDERBY $h/title RETURN $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author FOR $h IN //holding ORDERBY $h/title RETURN $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author
79
79 Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title
80
80 Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title
81
81 Other Stuff in XQuery uBefore and After wfor dealing with order in the input uFilter wdeletes some edges in the result tree uRecursive functions uNamespaces uReferences, links … uLots more stuff …
82
82 Appendix XML Schema and XQuery Data Model
83
83 XML Schema uIncludes primitive data types (integers, strings, dates, etc.) uSupports value-based constraints (integers > 100) uUser-definable structured types uInheritance (extension or restriction) uForeign keys uElement-type reference constraints
84
84 Sample XML Schema …
85
85 XML-Query Data Model u Describes XML data as a tree u Node ::= DocNode | ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode http://www.w3.org/TR/query-datamodel/2/2001
86
86 XML-Query Data Model Element node (simplified definition): uelemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode uQNameValue = means “a tag name” Reads: “Give me a tag, a set of attributes, a list of elements/values, and I will return an element”
87
87 XML Query Data Model Example: <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) … book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) …
88
88
89
89 XQuery Values uItem = node or atomic value. uValue = ordered sequence of zero or more items. uExamples: 1.() = empty sequence. 2.(“Hello”, “World”) 3.(“Hello”, 2.50, 10)
90
90 Nesting of Sequences Ignored uA value can, in principle, be an item of another value. uBut nested list structures are expanded. uExample: ((1,2),(),(3,(4,5))) = (1,2,3,4,5) = 1,2,3,4,5. uImportant when values are computed by concatenating other values.
91
91 Effective Boolean Values uThe effective boolean value (EBV) of an expression is: 1.The actual value if the expression is of type boolean. 2.FALSE if the expression evaluates to 0, “” [the empty string], or () [the empty sequence]. 3.TRUE otherwise.
92
92 EBV Examples 1.@name=”JoesBar” has EBV TRUE or FALSE, depending on whether the name attribute is ”JoesBar”. 2./BARS/BAR[@name=”GoldenRail”] has EBV TRUE if some bar is named the Golden Rail, and FALSE if there is no such bar.
93
93 Boolean Operators uE 1 and E 2, E 1 or E 2, not(E ), if (E 1 ) then E 2 else E 3 apply to any expressions. uTake EBV’s of the expressions first. uExample: not(3 eq 5 or 0) has value TRUE. uAlso: true() and false() are functions that return values TRUE and FALSE.
94
94 Quantifier Expressions some $x in E 1 satisfies E 2 1.Evaluate the sequence E 1. 2.Let $x (any variable) be each item in the sequence, and evaluate E 2. 3.Return TRUE if E 2 has EBV TRUE for at least one $x. uAnalogously: every $x in E 1 satisfies E 2
95
95 Document Order uComparison by document order: >. uExample: $d/BARS/BEER[@name=”Bud”] << $d/BARS/BEER[@name=”Miller”] is true iff the Bud element appears before the Miller element in the document $d.
96
96 Set Operators uunion, intersect, except operate on sequences of nodes. wMeanings analogous to SQL. wResult eliminates duplicates. wResult appears in document order.
97
97 Other Operators uUse Fortran comparison operators to compare atomic values only. weq, ne, gt, ge, lt, le. uArithmetic operators: +, -, *, div, idiv, mod. wApply to any expressions that yield arithmetic or date/time values.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.