Download presentation
Presentation is loading. Please wait.
Published byNoah Stewart Modified over 9 years ago
1
XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015
2
2 Readings & Reminders Reminder: Homework 1 Milestone 2 due tonight @ 11:59PM Homework 2 pre-release is now posted XML, DTD, Schema XPath XSLT For next week: Altinel & Franklin paper on XFilter
3
3 Sample XML Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/
4
4 XML Data Model Visualized Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text
5
5 XML Isn’t Enough on Its Own It’s too unconstrained for many cases! How will we know when we’re getting garbage? How will we query? How will we understand what we got?
6
6 Document Type Definitions (DTDs) DTD is an EBNF grammar defining XML structure XML document specifies an associated DTD, plus the root element DTD specifies children of the root (and so on) DTD defines special significance for attributes: IDs – special attributes that are analogous to keys for elements IDREFs – references to IDs IDREFS – space-delimited list of IDREFs
7
7 An Example DTD Example DTD: <!ATTLIST mastersthesis(mdateCDATA#REQUIRED keyID#REQUIRED advisorCDATA#IMPLIED> … Example use of DTD in XML file: …
8
8 DTDs Are Very Limited DTDs capture grammatical structure, but have some drawbacks: Only string scalar types Global ID/reference space is inconvenient No way of defining OO-like inheritance
9
9 XML Schema: DTDs Rethought Features: XML syntax Better way of defining keys using XPaths Type subclassing … And, of course, built-in datatypes
10
10 Basic Constructs of Schema Separation of elements (and attributes) from types: complexType is a structured type It can have sequences or choices element and attribute have name and type Elements may also have minOccurs and maxOccurs Subtyping, most commonly using: …
11
11 Simple Schema Example
12
12 Embedding XML Schema a But the XML parser is actually free to ignore this – the schema is typically specified “from outside” the document
13
13 Manipulating XML Sometimes: Need to restructure an XML document Or simply need to retrieve certain parts that satisfy a constraint, e.g.: All books All books by author XYZ
14
14 Document Object Model (DOM) vs. Queries Build a DOM tree (as we saw earlier) and access via Java (etc.) DOMNode object DOM objects have methods like “getFirstChild()”, “getNextSibling” Common way of traversing the tree Can also modify the DOM tree – alter the XML – via insertAfter(), etc. Alternate approach: a query language Define some sort of a template describing traversals from the root of the directed graph In XML, the basis of this template is called an XPath Can also declare some constraints on the values you want The XPath returns a node set of matches
15
15 XPaths In its simplest form, an XPath is like a path in a file system: /mypath/subpath/*/morepath The XPath returns a node set representing the XML nodes (and their subtrees) at the end of the path XPaths can have node tests at the end, returning only particular node types, e.g., text(), processing-instruction(), comment(), element(), attribute() XPath is fundamentally an ordered language: it can query in order-aware fashion, and it returns nodes in order
16
16 Sample XML Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/
17
17 XML Data Model Visualized Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text
18
18 Some Example XPath Queries /dblp/mastersthesis/title /dblp/*/editor //title //title/text()
19
19 Context Nodes and Relative Paths XPath has a notion of a context node: it’s analogous to a current directory “.” represents this context node “..” represents the parent node We can express relative paths: subpath/sub-subpath/../.. gets us back to the context node By default, the document root is the context node
20
20 Predicates – Filtering Operations A predicate allows us to filter the node set based on selection- like conditions over sub-XPaths: /dblp/article[title = “Paper1”] which is equivalent to: /dblp/article[./title/text() = “Paper1”] because of type coercion. What does this do: /dblp/article[@key = “123” and./title/text() = “Paper1” and./author/*/element()]
21
21 Axes: More Complex Traversals Thus far, we’ve seen XPath expressions that go down the tree (and up one step) But we might want to go up, left, right, etc. These are expressed with so-called axes: self::path-step child::path-stepparent::path-step descendant::path-stepancestor::path-step descendant-or-self::path-stepancestor-or-self::path-step preceding-sibling::path-stepfollowing-sibling::path-step preceding::path-stepfollowing::path-step The previous XPaths we saw were in “abbreviated form”
22
22 Users of XPath XML Schema uses simple XPaths in defining keys and uniqueness constraints XLink and XPointer, hyperlinks for XML XSLT – useful for converting from XML to other representations (e.g., HTML, PDF, SVG) XQuery – useful for restructuring an XML document or combining multiple documents Might well turn into the “glue” between Web Services, etc.
23
23 A Functional Language for XML XSLT is based on a series of templates that match different parts of an XML document There’s a policy for what rule or template is applied if more than one matches (it’s not what you’d think!) XSLT templates can invoke other templates XSLT templates can be nonterminating (beware!) XSLT templates are based on XPath “match”es, and we can also apply other templates (potentially to “select”ed XPaths) Within each template, directly describe what should be output
24
24 An XSLT Template An XML document itself XML tags create output OR are XSL operations All XSL tags are prefixed with “xsl” namespace All non-XSL tags are part of the XML output Common XSL operations: template with a match XPath Recursive call to apply-templates, which may also select where it should be applied Attach to XML document with a processing-instruction: http://www.com/my.xsl
25
25 An Example XSLT Stylesheet This is DBLP …
26
26 XSLT Processing Model List of source nodes result tree fragment(s) Start with root Find all template rules with matching patterns from root Find “best” match according to some heuristics Set the current node list to be the set of things it maches Iterate over each node in the current node list Apply the operations of the template “Append” the results of the matching template rule to the result tree structure Repeat recursively if specified to by apply-templates
27
27 What If There’s More than One Match? Eliminate rules of lower precedence due to importing Break a rule into any | branches and consider separately Choose rule with highest computed or specified priority Simple rules for computing priority based on “precision”: QName preceded by XPath child/axis specifier: priority 0 NCName preceded by child/axis specifier: priority -0.25 NodeTest preceded by child/axis specifier: pririty -0.5 else priority 0.5
28
28 Other Common Operations Iteration: Conditionals: Copying current node and children to the result set:
29
29 Creating Output Nodes Return text/attribute data (this is a default rule): Create an element from text (attribute is similar): Copy nodes matching a path
30
30 Embedding Stylesheets You can “import” or “include” one stylesheet from another: http://www.com/my.xsl http://www.com/my.xsl “Include”: the rules get same precedence as in including template “Import”: the rules are given lower precedence
31
31 XSLT Summary A very powerful, template-based transformation language for XML document other structured document Commonly used to convert XML PDF, SVG, GraphViz DOT format, HTML, WML, … Primarily useful for presentation of XML or for very simple conversions But sometimes we need more complex operations when converting data from one source to another Joins – combining and correlating information from multiple sources Aggregation – computing averages, counts, etc.
32
32 XSLT and Alternatives XSLT is focused on reformatting documents Stylesheets are focused around one XML file XML file must reference the stylesheet What if we want to: Manage and combine collections of XML documents? Make Web service requests for XML? “Glue together” different Web service requests? Query for keywords within documents, with ranked answers This is where XQuery plays a role – see CIS 330 / 550 for details
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.