XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended to be used as a tutorial on XML and related technologies Slide author: Jürgen Mangler This section contains examples on: –XML, DTD (Document Type Definition) –XSchema –XPath, XPointer –XInclude, –XSLT –XLink
The W3C is the "World Wide Web Consortium", a voluntary association of companies and non-profit organizations. Membership is very expensive but confers voting rights. The decisions of W3C are guided by the Advisory Committee, lead by Tim Berners-Lee. The stages in the life of a W3C Recommendation (REC) Working Draft (maximum gap target: 3 months) Last Call (public comment invited; W3C must respond) Candidate Recommendation (design is stable; implementation feedback invited) Proposed Recommendation (Advisory Committee review) The XML recommendation was written by the W3C's XML Working Group (WC), which has since divided into a number of subgroups.
An XML document is valid if it has an associated document type definition and if the document complies with the constraints expressed in it. The document type definition (DTD) must appear before the first element in the document. The name following the word DOCTYPE in the document type definition must match the name of the root element. tutorial.dtd: tutorial.xml: This is an XML document
An element type has element content if elements of that type contain only child elements (no character data), optionally separated by white space. tutorial.dtd: tutorial.xml: Start End tutorial.xml (with errors, BBB missing): Start
If an element name in the DTD is followed by the star [*], this element can occur zero, once or several times tutorial.dtd: tutorial.xml: Start Again End The root element XXX can contain zero or more elements AAA followed by precisely one element BBB. Element BBB must be always present.:
If an element name in the DTD is followed by the plus [+], this element can occur once or several times. tutorial.dtd: tutorial.xml: Start Again End tutorial.xml (with errors, AAA must occur at least once): End
If an element name in the DTD is followed by the question mark [?], this element can occur zero or one times. tutorial.dtd: tutorial.xml: End This example uses a combination of [ + * ?] How could a valid document look like?
With the character [ | ] you can select one from several elements. test.dtd: test.xml: Valid The root element XXX must contain either one element AAA or one element BBB: test.xml: Also Valid Text can be interspersed with elements.
Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags. The declaration starts with !ATTLIST followed by the name of the element (myElement) to which the attributes belong to, followed by the definition of the individual attributes (myAttributeA, myAttributeB). Text
An attribute of type CDATA may contain any arbitrary character data, given it conforms to well formedness constraints. Type NMTOKEN can contain only letters, digits and point [. ], hyphen [ - ], underline [ _ ] and colon [ : ] NMTOKENS can contain the same characters as NMTOKEN plus whitespaces. White space consists of one or more space characters, carriage returns, line feeds, or tabs. <!ATTLIST taskgroup group CDATA #IMPLIED purpose NMTOKEN #REQUIRED names NMTOKENS #REQUIRED>
The value of an attribute of type ID may contain only characters permitted for NMTOKEN and must start with a letter. No element type may have more than one ID attribute specified. The value of an ID attribute must be unique between all values of all ID attributes (in the document!).
The value of an attribute of type IDREF has to match the value of some ID attribute in the document. The value of an IDREF attribute can contain several references to elements with ID attributes separated by whitespaces.
Permitted attribute values can be defined in the DTD If an attribute is implied, a default value can be provided in case the attribute isn't used. #Required: You must set the attribute #Implied: You can set the attribute
An element can be defined as EMPTY. In such a case it may contain attributes only but no text. Hello! Are there errors in this example? Where are they?
The purpose of XML Schema is to deploy a standard mechanism to describe and evaluate the datatype of the content of an element. XML examples: 12 correct eT also correct The XML Parser can not distiguish the content of an Element. This is where XML Schema comes in: <name xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi=" Jürgen Mangler
If we use the attribute " noNamespaceSchemaLocation ", we tell the document that the schema belongs to an element from the null namespace. Valid document: <name xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi=" Jürgen Mangler correct_0.xsd:
If we use the attribute "schemaLocation", we tell the document that the schema belongs to an element from some particular namespace. In the schema definition, you have to use the "targetNamespace" attribute, which defines the element's namespace. Valid document: <f:anElement xsi:schemaLocation=" correct_0.xsd" xmlns:f=" xmlns:xsi=" This Element contains some cdata. correct_0.xsd: <xsd:schema targetNamespace=" xmlns:xsd="
If we want the root element to be named "AAA", from null namespace and containing text only. correct_0.xsd: Valid document: xxx yyy
If we want the root element to be named "AAA", from null namespace, containing text and an element "BBB", we will need to set the attribute "mixed" to "true" - to allow mixed content. xxx yyy ZZZ aaa
We want the root element to be named "AAA", from null namespace, containing one "BBB" and one "CCC" element. Their order is not important.
We want the root element to be named "AAA", from null namespace, containing a mixture of any number (even zero), of "BBB" and "CCC" elements. We need to use the 'trick' below - we use a "sequence" element with "minOccurs" attribute set to 0 and "maxOccurs" set to "unbounded". The attribute "minOccurs" of the "element" elements has to be 0 too. Give a valid document!
We want the root element to be named "AAA", from null namespace, containing a mixture of any number (even zero) of "BBB" and "CCC" elements. You need to use the trick below - use "sequence" element with "minOccurs" attribute set to 0 and "maxOccurs" set to "unbounded", and the attribute "minOccurs" of the "element" elements must be set to 0 too. 111 YYY ZZZ A valid solution!
We want the root element to be named "AAA", from null namespace, containing either "BBB" or "CCC" elements (but not both) - the "choice" element. Other valid solutions? aaa
In XML Schema, the datatype is referenced by the QName. The namespace must be mapped to the prefix. 25
Restricting simpleType is relatively easy. Here we will require the value of the element "root" to be integer and less than Valid? Use to force element > 0. You can also combine min/max in !
If we want the element "root" to be either a string "N/A" or a string "#REF!", we will use N/A Other solutions?
If we want the element "root" to be either an integer or a string "N/A", we will make a union from an "integer" type and "string" type.
Below we define a group of common attributes, which will be reused. The root element is named "root", it must contain the "aaa" element, and this element must have attributes "x" and "y". Give a valid document!
Below we define a group of common attributes, which will be reused. The root element is named "root", it must contain the "aaa" and "bbb" elements, and these elements must have attributes "x" and "y". Valid document from the previous Schema!
We want the "root" element to have an attribute "xyz", which contains a list of three integers. We will define a general list (element "list") of integers and then restrict it (element "restriction") to have a length (element "length") of exactly three items. Documents on next page …
Valid! Not valid! Why? Use the same method for lists in the content of elem. We want the "root" element to have an attribute "xyz", which contains a list of three integers. We will define a general list (element "list") of integers and then restrict it (element "restriction") to have a length (element "length") of exactly three items.
The element "A" has to contain a string which is exactly three characters long. We will define our custom type for the string named "myString" and will require the element "A" to be of that type. abc
The element "A" must contain an address. We will define our custom type, which will at least approximately check the validity of the address. We will use the "pattern" element, to restrict the string using regular expressions.
Regular Expressions - the meaning of: [abc]…Characters Class (character can be a, b or c) [^abc]…Negative Character Class (everything except a,b,c) *…Match 0 or more times +…Match 1 or more times ?…Match 1 or 0 times {n}…Match exactly n times {n,}…Match at least n times {n,m}…Match at least n but not more than m times.…match any character \w …Match a "word" character (alphanumeric plus "_") \W …Match a non-word character \d …Match a digit character \D …Match a non-digit character \.…Escape a character with a special Meaning (., +, *, ?, …) any character that is not 1 or more exactly [^.]+…match any character that is not a. 1 or more times \.…match exactly a..+…match any character 1 or more times
One of the big problems of XML ist the type ID. An attribute of type ID must be unique for the whole file. XML Schema solves this problem: ID's can be vaild for a certain child axis only. Document on next page …
One of the big problems of XML ist the type ID. An attribute of type ID must be unique for the whole file. XML Schema solves this problem: ID's can be vaild for a certain child axis only.
The "keyref" lets you specify, that an attribute/element refers to some node (which must be defined as "key" or "unique"). The "key" element requires the elements "a" under the "root" element to contain the existing and unique value of an "id" attribute. Replace with: Document on next page … Add to in myList:
The "keyref" lets you specify, that an attribute/element refers to some node (which must be defined as "key" or "unique"). The "key" element requires the elements "a" under the "root" element to contain the existing and unique value of an "id" attribute.
To define attributes AND childs for a certain element you have to use simpleContent. shake
simpleType: Wenn ich den Inhalt eines Elements als xsd:string, xsd:integer, xsd:double,... definieren will complexType Wenn ich Attribute definieren will Wenn ich andere Elemente als Inhalt definieren will (mit sequence, choice, all) Wenn ich Attribute und Elemente mischen will (xsd:attribute unterhalt von sequence, choice, all simpleContent innerhalb von complexType Wenn ich den Inhalt eines Elements als Datentyp definieren und zusätzlich Attribute haben will Wann nehm ich simple-, wann complexType
XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations [XSLT] and XPointer. The primary purpose of XPath is to address parts of an XML document.[XSLT] XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.
In addition to its use for addressing, XPath is also designed to feature a natural subset that can be used for matching (testing whether or not a node matches a pattern); this use of XPath is described in XSLT.XSLT XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes.
The basic XPath syntax is similar to filesystem addressing. If the path starts with the slash /, then it represents an absolute path to the required element. /AAA/CCC Select all elements CCC which are children of the root element AAA /AAA Select the root element AAA
If the path starts with // then all elements in the document, that fulfill the criteria following //, are selected. //DDD/BBB Select all elements BBB which are children of DDD //BBB Select all elements BBB
The star * selects all elements located by the preceeding path /*/*/*/BBB Select all elements BBB which have 3 ancestors /AAA/CCC/DDD/* Select all elements enclosed by elements /AAA/CCC/DDD
The expression in square brackets can further specify an element. A number in the brackets gives the position of the element in the selected set. The function last() selects the last element in the selection. /AAA/BBB[last()] Select the last BBB child of element AAA /AAA/BBB[1] Select the first BBB child of element AAA
Attributes are specified prefix. Select BBB elements which have attribute id Select all Select BBB elements without an attribute Select BBB elements which have any attribute
Values of attributes can be used as selection criteria. Function normalize-space removes leading and trailing spaces and replaces sequences of whitespace characters by a single space. Select BBB elements which have an attribute name with value bbb, leading and trailing spaces are removed before comparison Select BBB elements which have attribute id with value b1
Function count() counts the number of selected elements //*[count(*)=3] Select elements which have 3 children //*[count(BBB)=2] Select elements which have two children BBB
Several paths can be combined with | separator ("|" stands for "or", like the logical or operator in C). /AAA/EEE | //DDD/CCC | /AAA | //BBB Number of combinations is not restricted AAA/EEE | //BBB Select all elements BBB and elements EEE which are children of root element AAA
Axes are a sophisticated concept in XML to find out which nodes relate to each other and how. The above example illustrates how axes work. Starting with node an axe would select the equal named nodes. This example is also the base for the next two pages. parent node following- sibling preceding- sibling descendant
The following main axes are available: the child axis contains the children of the context node the descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes the parent axis contains the parent of the context node, if there is oneparent the following-sibling axis contains all the following siblings of the context node; if the context node is an attribute node or namespace node, the following-sibling axis is empty the preceding-sibling axis contains all the preceding siblings of the context node; if the context node is an attribute node or namespace node, the preceding-sibling axis is empty (
The child axis contains the children of the context node. The child axis is the default axis and it can be omitted. The descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes. //CCC/descendant::DDD Select elements DDD which have CCC among its ancestors /AAA Equivalent of /child::AAA
XPointer is intended to be the basis of fragment identifiers only for the text/xml and application/xml media types (they can point only to documents of these types). Pointing to fragments of remote documents is analogous to the use of anchors in HTML. Roughly: document#xpointer(…) xlink:type="simple"> xlink:href="mydocument.xml#xpointer(//AAA/BBB[1])">
If there are forbidden characters in your expression, you must deal with them somehow. When XPointer appears in an XML document, special characters must be escaped according to directions in XML. <linkxmlns:xlink=" xlink:type="simple" xlink:href="test.xml#xpointer(//AAA position() < 2)"> Bzw. xlink:href="test.xml#xpointer(string-range('^(text in'))"> The characters < or & must be escaped using < and &. Any unbalanced parenthesis must be escaped using circumflex (^)
If your elements have an ID-type attribute, you can address them directly using the value of the ID-type attribute. (Don't forget: you must have an attribute defined as an ID type in your DTD!) Using ID-type attributes, you can easily include or jump to parts of documents. The example below selects node with id("b1"). xpointer(id("b1")) Text in the first element BBB. Text in another element BBB. Text in more nested element. Again some text in some element.
The specification defines one full form and one shorthand form (which is an abbreviation of the full one). Text in the first element BBB. Text in another element BBB. Text in more nested element. Text in more nested element. Text in more nested element. Again some text in some element. Short Form: /1/2/3 Full Form: xpointer(/*[1]/*[2]/*[3])
A location of type point is defined by a node, called the container node (node that contains the point), and a non-negative integer, called the index. (//AAA, //AAA/BBB are the container nodes, [1], [2] is used if more than one container node of the same name exists) xpointer(start-point(//AAA)) xpointer(start-point(range(//AAA/BBB[1]))) ▼ ▼ xpointer(end-point(range(//AAA/BBB[2]))) xpointer(start-point(range(//AAA/CCC)))
When the container node of a point is of a node type that cannot have child nodes (such as text nodes, comments, and processing instructions), then the index is an index into the characters of the string- value of the node; such a point is called a character- point. You can use this to write a link that behaves like a search function. It always jumps to the first appearance of a string, e.g. the word "another". xpointer(start-point(string-range(//*,'another', 2, 0))) Text in the first element BBB. Text in a▼nother element BBB. Text in more nested element. Again some text in some element.
The range function returns ranges covering the locations in the argument location-set. For each location x in the argument location-set, a range location representing the covering range of x is added to the result location set. xpointer(range(//AAA/BBB[2])) Text in another element BBB. The range-inside function returns ranges covering the contents of the locations in the argument location-set. xpointer(range-inside(//AAA/BBB[2])) Text in another element BBB.
For each location x in the argument location-set, end-point adds a location of type point to the result location-set. That point represents the end point of location x. xpointer(end-point(string-range(//AAA/BBB,'another'))) Text in the first element BBB. Text in another▼ element BBB. Text in more nested element. Again some text in some element.
XInclude solves the problem of including external documents / parts of external documents into a XML- Document. It works quite like an #include in c/c++ does, only that you can include documents trough http locators, and include parts of documents (trough xpointer syntax). In other words, this technology provides a way to split your work to logical pieces, and tie the end result back together (we are using a book example, with tied together chapters, in our slides)
XInclude references external documents to be included with include elements in the namespace. The prefix xi is customary though not required. Each xi:include element has an href attribute that contains a URL pointing to the file to include. The Wit and Wisdom of George W. Bush To resolve XIncludes, a document must be passed through an XInclude processor that replaces the xi:include elements with the documents they point to.
XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts: Each part might be further divided into a part intro and several chapters: Circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden.
Technical articles like this one often need to include example code: programs, XML and HTML documents, messages, and so on. Within these examples characters like < and & should be treated as raw text rather than parsed as markup. To include a document as plain text, you have to add a parse="text" attribute to the xi:include element. For example, this fragment loads the source code for the Java program SpellChecker.java from the examples directory into a code element:
For many reasons, documents included from remote servers may be temporarily unavailable. The default action for an XInclude processor in such a case is simply to give up and report a fatal error. However, the xi:include element may contain an xi:fallback element which contains alternate content to be used if the requested resource cannot be found. We're making the right decisions to bring the solution to an end. The xi:fallback element can even include another xi:include element.
To resolve XIncludes, a document must be passed through an XInclude processor that replaces the xi:include elements with the documents they point to. To include parts of other documents you can use The xpointer scheme (see xpointer part). Software Support: Libxml, includes fairly complete support for XInclude. The 4Suite XML library for Python has an option to resolve XIncludes when parsing. GNU JAXP includes a SAX filter that resolves XIncludes, provided no XPointers are used.
With XSL you can freely modify the content and/or layout any source text. You can apply different Stylesheets to the same source to get different results. XSL John Smith Output: John Smith XSL How can you produce the following output? John Smith XSL
Every XSL stylesheet must start with an xsl:stylesheet element. The attribute version='1.0' specifies version of XSL(T) specification. This example shows the simplest possible stylesheet. As it does not contain any information, default processing is used. Hello, world Output: Hello, world
An XSL processor parses an XML source and tries to find a matching template rule. If it does, instructions inside the matching template are evaluated. Hello, world. I am fine. Output Hello, world. I am fine.
Contents of the original elements can be recovered from the original sources in two basic ways. Stylesheet 1 uses xsl:value-of construct. In this case the contents of the element is used without any further processing. The instruction xsl:apply-templates in Stylesheet 2 is different. The parser further processes selected elements, for which a template is defined. Joe Smith For the next page we use this source
Stylesheet 1: Stylesheet 2: Joe Smith Which Stylesheet produces which Output?
Parts of an XML document to which a template should be applied are determined by location paths. The required syntax is specified in the XPath specification. Simple cases looks very similar to filesystem addressing. id= Output DDD id=d1
Processing always starts with the template match="/". This matches the root node (the node whose only child element is the document element, in our case "source"). Many stylesheets do not contain this element explicitly. When this template is not explicitly given, the implicit template is used (it contains as the sole instruction). This instruction means: process all children of the current node, including text nodes.
id= id= Output AAA id=a1 AAA id=a2 Parses only first level under
id= Output AAA id=a1 BBB id=b1 BBB id=b2 AAA id=a2 BBB id=b3 BBB id=b4 Recurses into sublevels of
A template can match individual paths being separated with "|" ( Stylesheet 1) from a selection of location paths. Wildcard "*" selects all possibilities. Compare Stylesheet 1 with Stylesheet 2. Stylesheet 1: [template: outputs ] Joe Smith Output: [template: firstName outputs Joe ] [template: surname outputs Smith ]
A template can match individual paths being separated with "|" ( Stylesheet 1) from a selection of location paths. Wildcard "*" selects all possibilities. Compare Stylesheet 1 with Stylesheet 2. Stylesheet 2: [template: outputs ] Joe Smith Output: [template: source outputs [template: employee outputs [template: firstName outputs Joe ] [template: surname outputs Smith ] ] ]
With modes an element can be processed multiple times, each time producing a different result. Stylesheet 2: Output: CCC CCC CCC CCC CCC CCC
Axes play a very important role in XSLT – e.g. child axis, for-each. Stylesheet 2: : Document: Output: a2: b4 c1
xsl:element generates elements in time of processing. In this example it transforms the sizes to formating tags. Header1 Header3 Bold text Subscript Superscript Output: Header1 Header3 Bold text Subscript Superscript
xsl:if instruction enables conditional processing. A typical case of xsl:for-each usage is to add a text between individual entries. Very often you do not want to add text after the last element:, Output: A, B, C, D
Numbering of individual chapter elements depends on the position of the chapter element. Each level of chapters is numbered independently. Setting the attribute level to multiple enables natural numbering. - First Chap. Sec. Chap. Sub 1 Sub 2 What could be the possible output? Continued on next page.
Numbering of individual chapter elements depends on the position of the chapter element. Each level of chapters is numbered independently. Setting the attribute level to multiple enables natural numbering. Output: 1 - First Chap. 2 - Sec. Chap Sub Sub 2
You can set variables in a Stylesheet and use them later in the processing. The following example demonstrates a way of setting xsl:variable. / xsl:for-eachxsl:template Chapter Chapter Chapter Chapter What could be the possible output? Continued on next page.
You can set variables in a Stylesheet an use them later in the processing. The following example demonstrates a way of setting xsl:variable. Output: Chapter 1/4 Chapter 2/4 Chapter 3/4 Chapter 4/4
There currently exist 2 sorts of XLinks Note: Currently only few Browsers (Amaya, Mozilla [recommend]) have support for XLinks. Simple XLinks Extended XLinks Extended XLinks are a method to describe the dependency of resources in general. Therefore an extended link is a link that associates an arbitrary number of resources. The participating resources may be any combination of remote and local ones. Simple XLinks are intended specifically to replace normal HTML tags.
The use of XLink elements and attributes requires the declaration of the XLink namespace. Any content here For the namespace declarations reserved attributes starting with xmlns are used. The namespace declaration given for the current element is also valid for all elements occuring inside the current one (for all children and descendants).
It is possible to specify default attribute values in the DTD. Attributes then do not have to appear physically on element start-tags. <!ATTLISTAAA:logo xmlns:xlink CDATA #FIXED " xlink:type (simple) #FIXED "simple" xlink:href CDATA #FIXED "sample.gif" xlink:show (embed) #FIXED "embed" xlink:actuate (onLoad) #FIXED "onLoad" > A logo Simple XLinks Simple XLinks are intended to be a more flexible way of creating links in HTML documents, and are scheduled to replace the tags in normal HTML.
If the attribute "show" is set to "new", a new window is opened for displaying the link. When combined with attribute actuate equal to onLoad, the window is opened immediately. Because the element logo is empty, there are no visible links in the page. <AAA:logoxmlns:AAA = " xmlns:xlink=" xlink:type="simple" xlink:href="sample.gif" xlink:show="new" xlink:actuate="onLoad"> Simple XLinks
show= new actuate= onRequest A new window is opened for displaying the link, the window is not opened automatically. show= replace actuate= onRequest The link is opened in the current window, the window is not opened automatically. show= embed actuate= onLoad The link replaces the element link, replacement is immediate (similarly to the well-known element ) show = replace actuate= onLoad The link replaces the current document, replacement is immediate. Simple XLinks
Extended XLinks don't serve only the purpose of mere document linking, but also provide a formalized way to describe, what the link is pointing to. In the case of the later used "Greek myth" example you will realize that somone can't only link to a page about Heracles, but also the link contains more generalized information about Greek heroes at all. So every Extended XLink is not only a pointer to content, but also a pointer to information what the link is all about. In the Greek myth example, every link holds also information about relatives, context, stories … of a referred person. Extended XLinks
An element of type "locator" indicates a remote resource: <BBB xlink:type="locator" xlink:href=" Extended XLinks Elements of type "locator" must have the attribute "href" and its value must be supplied. Elements of type "locator" must have the extended-type element as a parent, otherwise they have no specified meaning in XLink.
An element of type "resource" indicates a local resource. Anything here. Any content. Extended XLinks Resource-type element may have any content. This content has no Xlink-specified relationship to the link. It is possible for resource-type element to have no content.
The extended-, locator- and arc-type elements may have the title attribute. But they may also have a series of one or more title-type elements. Heracles fight against the giants Heracles Kampf gegen die Riesen Extended XLinks This example shows how to distinguish between users speaking different languages.
The attribute role describes the meaning of the resource. The value of the role attribute must be a URI reference. It identifies some resource that describes the intended property. Extended XLinks
The attribute label provides "marks", to which the attributes from and to refer. The value of the role attribute must be a URI reference. It identifies some resource that describes the intended property. Extended XLinks
An element of type "arc" indicates rules for traversing among resources participating in extended-type link. Extended XLinks
The attribute arcrole describes the meaning of the arc-type element. Extended XLinks The attributes from and to refer to "marks", which are set by attribute label.