Download presentation
Presentation is loading. Please wait.
Published byBenedict Francis Modified over 9 years ago
1
1 XML Basics –Semi-structured data –DTD –XML Schema XML transforming and querying –XPath –XSLT –XQuery Semantic Web –RDF –OWL An introduction to XML and Related Standards
2
2 Background: Markup and Markup Language Markup –Annotations (tags) for carrying information about a document’s content a writer’s handwritten notes for typesetting an editor’s corrections in a manuscript Makeup Language –A language defines a syntax and grammar for tags
3
3 Background: SGML SGML –Standard Generalized Markup Language –Standardized in 1986 (ISO) –A language for defining markup languages –And for marking-up content –Syntax + Document Type Definition (DTD) –Tools aimed at document management
4
4 Background: HTML HTML –A markup language –A particular SGML Document Type (called an “application”) –Tools for browsing and authoring
5
5 Background: Limitations of SGML and HTML SGML –Complex, many options and shortcuts –Must know the DTD to parse correctly –Cost of SGML technology is high HTML –Not extensible—can’t define new tags –Tags for presenting data not describing it –Doesn’t capture much document structure or content meaning
6
6 Enter XML XML (Extensible Markup Language) –Standardized by W3C in 1998 –For data interchange over the Web –A Simpler SGML: Actually, a subset of SGML DTDs are optional Less features and options –Widely available tools for parsing, authoring, browsing, etc.
7
7 Uses for XML Why XML? –Capture logical structure of documents Presentation Independent –Data Interchange XML is implementation independent –Storage Format Any successful interchange format becomes a storage format –Metadata Searching, filtering, organizing –Data Packaging, Movement, and Processing Client-Side processing, Server-to-Server communication, Non- browser based clients, Simplified Server Processing, etc.
8
8 The Many Standards of XML XML Document XML DTD Query XQuery, XQL, XML-QL Programming Document Object Model (DOM) Transformation XSLT for rearranging and restructuring XML documents Transport XML-RPC, SOAP, XML-Protocol for message and object serialization and remote procedure calls Metadata RDF, OWL - using XML to define resource metadata Schema and Types XML Schema Linking XLink for simple and complex hyperlinks between XML Documents Addressing XPath and XPointer for addressing XML subdocuments
9
9 The Running Example Lego Product Catalogs –catalogs have: a publishing date, an identifier, a title, etc. –catalogs are made up of products either a kit or accessory each has an item #, price, name, picture, etc. kits can have an age level, # of pieces, set type (duplo, basic), a theme (star wars), a system (space)
10
10 An Example XML Catalog Document 2000 X-Wing Fighter 7 12 263 Star Wars Take to the skies with Luke as he battles the forces of evil!
11
11 An Example XML Document prolog body elements have start and end-tags elements can also contain content elements are nested “boxes within boxes” 2000 X-Wing Fighter 7 12 263 Star Wars Take to the skies with Luke as he battles the forces of evil! …
12
12 Well Formed Documents Well-formed XML documents: –A single root element –Start and end tags required (unlike HTML) X-Wing Fighter empty-element tags: –Elements must be properly nested 263 –More rules: naming elements, document has at least one element, etc. This is NOT properly nested!!!
13
13 XML Attributes Elements can contain attributes element name attribute name attribute value attribute name attribute value attribute name attribute value Attributes are always assigned in element start tags, are always surrounded by double quotes, and must be unique in the element
14
14 Attributes vs. Content In general, it is up to the document designer In SGML, content usually was for data you see and attributes for metadata
15
15 DTD and XML Schema
16
16 Document Type Definition Why DTDs? –To standardize tags and structure for interchange and creation –To make the documents machine processable What is a DTD? –A grammar for describing XML documents (tags, attributes, nesting, etc.) –An XML document that is well-formed and conforms to a DTD is said to be valid
17
17 An Example DTD: Elements <!ELEMENT kit (name, ages, pieces, theme?, series?, desc)> An element content model for LegoCatalog A character data content model for pubDate * zero or more + one or more ? optional | Choice, Strict Sequence () Grouping Empty, Any, and Mixed content models
18
18 An Example DTD: Attributes <!ATTLIST kit price CDATA #REQUIRED shipWeight CDATA #REQUIRED avail (yes | no) #IMPLIED image CDATA “na.jpg” unitId ID #IMPLIED > <!ATTLIST accessory forKits IDREFS #IMPLIED orderStatus CDATA #FIXED “special” > each attribute has the form: attr-name type default-decl CDATA = character data ID = unique identifier IDREF = reference to an ID IDREFS = list of references enumeration = list of possible values #REQUIRED = must appear #IMPLIED = optionally appear #FIXED + default = if attribute is missing, parser assumes value Default only = if attribute is missing, default is assumed, otherwise any value
19
19 Limitations of DTDs DTDs are not optimal –Not well-formed XML can’t parse them with an XML parser need different tools to create them + but at least you can sort-of read/understand them –Limited support for defining data types –Limited modeling capabilities hard to express some structures no support for reusing structure
20
20 XML Schema W3C proposed recommendation (2001) Divided into 2 parts: structures, datatypes Main features –Well-formed XML documents –A schema can span multiple documents –Can define new data types and constraints –Inheritance among content model types –Improves data interchange Offers more precision for computer-computer transfer
21
21 The.xsd file <xs:schema xmlns:xs=“http://www.w3.org/1999/XMLSchema” targetNamespace=“http://www.lego.com/products” version=“1.1”> …. xmlns:xs - use the ‘xs’ prefix to reference elements defined in a schema from another namespace targetNamespace - all the elements and types defined in this schema come from this namespace. Use this URI to import or include these definitions in other schemas
22
22 Example XML Schema <xs:element xs:name=“kit” type=“Product” xs:minOccurs=“1” xs:maxOccurs=“unbounded”/> <xs:element xs:name=“accessory” xs:type=“Product” xs:minOccurs=“0” xs:maxOccurs=“unbounded”/>... …... Many ways to describe new data types (not just regular expressions) ComplexType = Content Model
23
23 Main Schema Components Definitions of: –Complex types = sub-elements + attributes –Simple types = no sub-elements, constraints on strings(datatypes) Declarations of: –elements (of simple and complex types) –attributes (simple types), attribute groups
24
24 Simple Type Definitions Can have: built-in, pre-declared or anonymous simple type definitions. ……
25
25 Example of Complex Type Definition …
26
26 Constraints on Element Content content = –textOnly : only character data –mixed : character data appears alongside subelements –elementOnly : only subelements –empty : no content (only attributes) –any
27
27 Datatype Example This creates a new datatype called 'TelephoneNumber'. Elements of this type can hold string values, but the string length must be exactly 8 characters long and the string must follow the pattern: ddd-dddd, where ‘\d' represents a 'digit'.
28
28 XPath
29
29 What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to the way an operating system describes paths to files XPath is almost a small simple programming language; it has functions, tests, and expressions XPath is a W3C standard XPath is not itself written as XML, but is used heavily in XSLT, XML Schema and XQuery
30
30 Terminology library is the parent of book ; book is the parent of the two chapter s The two chapter s are the children of book, and the section is the child of the second chapter The two chapter s of the book are siblings (they have the same parent) library, book, and the second chapter are the ancestors of the section The two chapter s, the section, and the two paragraph s are the descendents of the book
31
31 Slashes A path that begins with a / represents an absolute path, starting from the top of the document –Example: /email/message/header/from –Note that even an absolute path can select more than one element –A slash by itself means “the whole document” A path that does not begin with a / represents a path starting from the current element –Example: header/from A path that begins with // can start from anywhere in the document –Example: //header/from selects every element from that is a child of an element header –This can be expensive, since it involves searching the entire document
32
32 Brackets and last() A number in brackets selects a particular matching child (counting starts from 1, except in Internet Explorer) –Example: /library/book[1] selects the first book of the library –Example: //chapter/section[2] selects the second section of every chapter in the XML document –Example: //book/chapter[1]/section[2] –Only matching elements are counted; for example, if a book has both section s and exercise s, the latter are ignored when counting section s The function last() in brackets selects the last matching child –Example: /library/book/chapter[last()] You can even do simple arithmetic –Example: /library/book/chapter[last()-1]
33
33 Stars A star, or asterisk, is a “wildcard” -- it means “all the elements at this level” –Example: /library/book/chapter/* selects every child of every chapter of every book in the library –Example: //book/* selects every child of every book –Example: /*/*/*/paragraph selects every paragraph that has exactly three ancestors –Example: //* selects every element in the entire document
34
34 Attributes I You can select attributes by themselves, or elements that have certain attributes –Remember: an attribute consists of a name-value pair, for example in, the attribute is named num –To choose the attribute itself, prefix the name with @ –Example: @num will choose every attribute named num –Example: //@* will choose every attribute, everywhere in the document To choose elements that have a given attribute, put the attribute name in square brackets –Example: //chapter[@num] will select every chapter element (anywhere in the document) that has an attribute named num
35
35 Attributes II //chapter[@num] selects every chapter element with an attribute num //chapter[not(@num)] selects every chapter element that does not have a num attribute //chapter[@*] selects every chapter element that has any attribute //chapter[not(@*)] selects every chapter element with no attributes
36
36 Values of attributes //chapter[@num='3'] selects every chapter element with an attribute num with value 3 The normalize-space() function can be used to remove leading and trailing spaces from a value before comparison –Example: //chapter[normalize-space(@num)="3"]
37
37 Location Path The central construct is the location path: location path = location step / …/ location step child::section [ position()<6 ] / descendant::cite / attribute::href selects all href attributes in cite elements in the first 5 sections of a document A location step is evaluated wrt. some context A location path is evaluated left-to-right, starting with some initial context, each node resulting from evaluation of one step is used as context for evaluation of the next, and the results are unioned together
38
38 Location Step location step = axis :: node-test [ predicate ] axis a rough set of candidate nodes – e.g. the child nodes of the context node node-test performs an initial filtration based on – types: chardata node, processing instruction, etc. – names: element name predicates a further, more complex, filtration. only candidates for which the predicates evaluate to true are kept child::section [ position()<6 ] / descendant::cite / attribute::href
39
39 Axes :: Node-test [ Predicate ] child descendant parent ancestor following-sibling preceding-sibling following preceding attribute namespace self descendant-or-self ancestor-or-self child::section [ position()<6 ] / descendant::cite / attribute::href Axes Node Test name * text() comment() processing-instruction() node() [attribute::name="flour"] [attribute::name!="flour"] [attribute::amount=“0.5” and attribute::unit=“cup”] [position()=2] Predicate
40
40 Abbreviations child:: nothing (so child is the default axis) attribute:: @ /descendant-or self::node()/ // self::node(). parent::node ()...//@href selects all href attributes in descendants of the context node. section [ position()<6 ] // cite [ @href = “there”] selects all cite elements with href="there" attributes in the first 5 sections
41
41 XSL
42
42 XSL (eXtensible Stylesheet Language) Why do we need it? –Store in one format, display in another. e.g. transforming XML to XHTML and displaying in browser –Convert to a more useful format –Make the document more compact Extracting from XML documents only the data we need We are interested to get another document that looks like we specify
43
43 XSL (eXtensible Stylesheet Language) consists of two parts: –XSL Transformations (XSLT) XSLT stylesheet is an XML document defining transformation from one class of XML documents into another –XSL Formatting Objects (XSL-FO) Specifying formatting in a more low-level and detailed way
44
44 A Simple Example File data.xml: Howdy! File render.xsl:
45
45 The.xsl File An XSLT document has the.xsl extension The XSLT document begins with: – Contains one or more templates, such as: –... And ends with: –
46
46 Explanation of render.xsl The XSL was: The chooses the root The is written to the output file The contents of message is written to the output file The is written to the output file The resultant file looks like: Howdy!
47
47 How XSLT Works The XML text document is read in and stored as a tree of nodes The template is used to select the entire tree The rules within the template are applied to the matching nodes, thus changing the structure of the XML tree –If there are other templates, they must be called explicitly from the main template Unmatched parts of the XML tree are not changed After the template is applied, the tree is written out again as a text document
48
48 xsl:value-of selects the contents of an element and adds it to the output stream –The select attribute is required –Notice that xsl:value-of is not a container, hence it needs to end with a slash Example (from an earlier slide):
49
49 xsl:for-each xsl:for-each is a kind of loop statement The syntax is Text to insert and rules to apply Example: to select every book ( //book ) and make an unordered list ( ) of their titles ( title ), use:
50
50 Filtering output You can filter (restrict) output by adding a criterion to the select attribute’s value: This will select book titles by Terry Smith
51
51 Filter details Here is the filter we just used: author is a sibling of title, so from title we have to go up to its parent, book, then back down to author This filter requires a quote within a quote, so we need both single quotes and double quotes Legal filter operators are: = != < >
52
52 But it doesn’t work right! Here’s what we did: This will output and for every book, so we will get empty bullets for authors other than Terry Smith There is no obvious way to solve this with just xsl:value-of
53
53 xsl:if xsl : if allows us to include content if a given condition (in the test attribute) is true Example: This does work correctly!
54
54 xsl:choose The xsl:choose... xsl:when... xsl:otherwise construct is XML’s equivalent of switch... case... default statement The syntax is:... some code...... some code... xsl:choose is often used within an xsl:for-each loop
55
55 xsl:sort You can place an xsl:sort inside an xsl:for-each The attribute of the sort tells what field to sort on Example: by –This example creates a list of titles and authors, sorted by author
56
56 xsl:apply-templates If you apply a template to an element that has child elements, templates are not automatically applied to those child elements The element applies a template rule to the current element or to the current element’s child nodes If we add a select attribute, it applies the template rule only to the child that matches If we have multiple elements with select attributes, the child nodes are processed in the same order as the elements
57
57 Applying templates to children XML Terry Smith by With this line: XML by Gregory Brill Without this line: XML
58
58 Calling named templates You can name a template, then call it, similar to the way you would call a method in Java The named template:...body of template... A call to the template: Or:...parameters...
59
59 Processing model A list of source nodes is processed to create a result tree fragment. The result tree is constructed by processing a list containing just the root node. A list of source nodes is processed by appending the result tree structure created by processing each of the members of the list in order. A node is processed by finding all the template rules with patterns that match the node, and choosing the best amongst them; the chosen rule's template is then instantiated with the node as the current node and with the list of source nodes as the current node list. A template typically contains instructions that select an additional list of source nodes for processing. The process of matching, instantiation and selection is continued recursively until no new source nodes are selected for processing.
60
60 XQuery
61
61 Enter XQuery XML documents generalize relational data c2b2a2 c3b3a3 c1b1a1 CBA R tuple A a1 /A B b1 /B C c1 /C /tuple tuple A a2 /A B b2 /B C c2 /C /tuple … /R How should query languages like SQL be similarly generalized?
62
62 FLWOR Expressions The main engine of XQuery is the FLWOR expression: –For-Let-Where-Order-Return –pronounced "flower" –generalizes SELECT-FROM-WHERE from SQL for $d in document("depts.xml")//deptno let $e := document("emps.xml")//employee[deptno = $d] where count($e) >= 10 order by avg($e/salary) descending return { $d, {count($e)}, {avg($e/salary)} } generates an ordered list of bindings of deptno values $d for each $d, $e = the list of emp elements with that department number filters that list to retain only the desired tuples sorts that list by the given criteria constructs for each tuple a resulting value have an ordered list of tuples ($d,$e) The result is a list of departments with at least 10 employees, sorted by average salaries.
63
63 List Expressions XQuery expressions often manipulate lists of values for $p in distinct-values(document("bib.xml")//publisher) let $a := avg(document("bib.xml")//book[publisher = $p]/price) return { $p/text() } { $a } List functions: distinct-values, avg, …
64
64 Conditional expressions XQuery supports a general if-then-else construction. extracts from the holdings of a library the titles and either editors or authors. for $h in document("library.xml")//holding return { $h/title, if ($h/@type = "Journal") then $h/editor else $h/author }
65
65 Quantified Expressions for $b in document("bib.xml")//book where some $p in $b//paragraph satisfies ( contains($p,"sailing") AND contains($p,“fishing") ) return $b/title for $b in document("bib.xml")//book where every $p in $b//paragraph satisfies contains($p,"sailing") return $b/title finds the titles of all books which mention both sailing and fishing in the same paragraph finds the titles of all books which mention sailing in every paragraph
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.