Valid XML documents To be valid, an XML document –must be well-formed according to the general XML syntax rules –and, in addition, it must satisfy the syntax rules for an application- specific language based on XML Well-formed XML documents are useful but, to be really useful, a document must be valid For example, if two companies wish to exchange information using XML, –the companies must design a special XML-based language which will be capable of expressing the data they wish to exchange and – all documents that are exchanged must be valid according to the syntax rules for this special language
Valid XML documents (contd.) A valid XML document must declare the special XML- based language according to whose rules the XML document is claimed to be valid This done using a document type declaration, which must appear before the start tag of the root element in a valid XML document It has the following general format where someDTDspec would specify the rules for a valid element whose name is nameOfRootElement someDTDspec would either –directly contain an internal document type definition (a DTD) –consist of a reference to an external DTD
Valid XML documents (contd.) For example, if we wanted to turn the example well- formed XML document we have just seen into a valid document, we would insert a line which would look like where someDTDspec would specify what rules must be followed by a valid people element Although a doctype statement may directly contain an internal DTD, we will consider only the cases where a doctype statement refers to an external DTD, because these are the most useful
CS4400 got to here at 14:00 on 25 February 2002
DOCTYPEs referring to external DTDs A doctype statement which refers to an external DTD has one of two formats: In the first format, the keyword SYSTEM says that the external DTD is on the local machine and fileNameAndPath says where to find it In the second format, the keyword PUBLIC says that the external DTD is publicly available on the Internet, the fpi (Formal Public Identifier) gives its formal name and the url says where it can be found
Example DOCTYPEs referring to external DTDs The doctype statement says that the external DTD is on the local machine, in a file called myFirstDTD.dtd The doctype statement says that the external DTD is publicly available on the Internet, gives it a formal name which should be used by everybody who wants to refer to it, and says where it can be found
Formal Public Identifiers We will be focussing on system DTDs but some of you may be wondering about the format of FPIs like -//UCC//DTD PEOPLE v1.0//EN This format is actually defined by an ISO standard and FPIs like this can be registered with organizations which are authorized by the ISO Council to act as FPI registrars The format is the - (which could also be a + ) indicates whether/not the organization name which follows is ISO-registered; UCC is not UCC is an OwnerID, the name of the organization which owns and maintains the document identified by the FPI DTD is a Public Text Class, a keyword identifying the type of doc PEOPLE v1.0 is a unique descriptive name for the document EN is the Public Text Language, in which the doc is written
Formal Public Identifiers (contd.) For example, this is the formal public identifier for one version of the HTML 4 specification: -//W3C//DTD HTML 4.01//EN the W3C is not an ISO-registered organization the HTML4 specification is a DTD HTML 4.01 is a unique descriptive name EN says that the specification is written in English
XML document which claims to be valid The Daffodils William Wordsworth I wandered lonely as a cloud That floats on high o’er vales and hills When all at once I saw a crowd A host of golden daffodils Something something something...
Another XML document claiming to be valid Ray Burke 1 Margaret Thatcher 75
Document Type Definitions A DTD defines the syntax rules for an application-specific type of XML-documents. A DTD describes the root element that may appear in a valid document and, in doing so, also describes the kinds of children elements that the root element may possess A DTD must contain element type declarations, statements which have the following general format: In addition, a DTD may contain a range of other types of statement, only one of which we will cover here, attribute declarations, statements which are used to define the attributes which elements may have.
Element Type Declarations An element type declaration is a statements which has the following general format: where contentModel describes the content (if any) that may or must exist between the start tag and the closing tag (if any) of the element whose name is elementName
Empty Elements Empty elements are declared using the keyword EMPTY inside the parentheses Example: which declares an element which only has a start tag and which, unless it has some attribute(s), must be written as
Elements with Character content Elements that contain character data are declared as follows: where #PCDATA stands for “parsed character data” Example: This element declaration would be satisfied by the following text in an XML file January
Elements with element content Elements that contain only children elements are declared as follows: or <!ELEMENT element-name (child-name1, child-name2,...)> Example which specifies that a country element contains exactly one child element, a position element:.....
Elements with element content (contd.) Example which specifies that a memorandum element contains a sender element, followed by a recipient element, followed by a message element When a sequence of children elements is declared, as above, the children must appear in exactly the same sequence in conforming XML documents:
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following XML fragment would conform: Bertie Ahern Ned O’Keefe Please resign immediately
Elements with element content (contd.) When an element has children, the children may be optional, may occur only once, or may be repeated Declaring exactly one occurrence of a child: For example declares that a poem element must contain exactly one verse child-element. Declaring one or more occurrences of a child: For example declares that a poem element must contain one or more verse child-elements.
Elements with element content (contd.) Declaring zero or more occurrences of a child: For example declares that a poem element contains zero or more author elements, followed by one or more verse elements. Declaring zero or one occurrence of a child: For example declares that a poem element contains an optional author element, followed by one or more verse elements.
Elements with element content (contd.) Parentheses can be used to group sequence of child- elements and subject them to + * ? quantification For example <!ELEMENT song (author?, verse, (chorus, verse)* )> declares that a song element contains an optional author element, followed by a verse element, followed by zero or more instances of a chorus-verse sequence. In other words, a song element contains an optional author element, followed by one or more verse elements, the verse elements being separated by chorus elements
Elements with element content (contd.) An element can have alternative children For example declares that a person element contains either a male child-element or a female child-element It would be satisfied by either Bertie Ahern or Celia Larkin
Elements with element content (contd.) Alternatives can also be subjected to quantification For example <!ELEMENT family (father? mother? (male|female)* ) > declares that a family element contains an optional father, followed by an optional mother, followed by zero or more male or female children
Elements with mixed content Example declares that a person element contains either #PCDATA or name elements It would be satisfied by Ned O’Keefe or Bertie Ahern or Bertie Boss or Bertie Boss Taoiseach PM or Note that * is required in mixed content. Why? Dunno.
Elements with arbitrary content In ANY means that the element may contain arbitrary content Typically, this ANY content model is only used at the start of developing a DTD and is replaced as the DTD is fleshed out..
Attribute Declarations The attributes for an element are declared in a statement which has the following general format: <!ATTLIST elementName attributeDefinition, attriouteDefinition, attriouteDefinition > where an attributeDefinition has the following general format attributeName attributeType attributeDefault An attribute default specifies whether the attribute is required and, if not, how a program processing an XML document should behave if the attribute is absent
Attribute Declarations (contd.) Example: <!ATTLIST person name CDATA #IMPLIED id ID #REQUIRED sex (male|female) “male”> which would be satisfied by the following XML start tag or by
Attribute Declarations (contd.) The statement <!ATTLIST person name CDATA #IMPLIED id ID #REQUIRED sex (male|female) “male”> used three attribute types: –CDATA denoting a string-valued attribute, which can contain any character apart from & ‘ “ –ID one of a set of types with predefined validity constraints, –(male|female) an enumerated type The defaults in the statement specified that –the id attribute must be provided in a tag –the name attribute is optional –if no sex attribute is given the value male should be assumed
Cs 607 got here on 8 march 2005
Attribute Declarations (contd.) The ID, type token imposes the following requirements (called validity constraints) on an attribute with this type: –an element can have only one attribute of type ID ; –the value of an ID attribute must start with a letter, underscore or colon which may be followed by a sequence containing any of these or digits –the value of an ID attribute must uniquely identify the element which bears it -- no other element may have the same value for an ID attribute; –an ID attribute must have a default of #REQUIRED or #IMPLIED Apart from ID, other types with predefined validity constraints are IDREF IDREFS ENTITY ENTITIES NMTOKEN and NMTOKENS -- their details are beyond our scope here
Attribute Declarations (contd.) Attribute defaults are –the keyword #REQUIRED, which means that an explicit value must be given for the attribute –the keyword #IMPLIED, which means that the attribute is optional –one value from an enumerated attribute type, which means that this value can be assumed if no explicit value is given –the keyword #FIXED followed by a default value, which means that instances of the attribute must match the default value
Other types of Declarations Apart from Element and Attribute declarations, DTDs can contain other types of declarations, but they are beyond our scope here
Example DTD The following is a very simple example of a DTD It would be satisfied by the following XML document: Celia Larkin Bertie Ahern
Another Example DTD The following is a very simple example of a DTD It would be satisfied by the following XML document: Celia Larkin Bertie Ahern
XHTML
XHTML is one of the many XML-based languages that have been defined XHTML is, essentially, a “cleaned-up” version of HTML 4, reformulated using XML DTD technology –there are three XHTML DTDs, corresponding to the three versions of HTML 4 (strict, transitional and frameset) XHTML is designed to be compatible with XML-oriented user-agents XHTML is also acceptable to HTML 4-oriented user agents Therefore, Web developers who write their HTML documents to conform to XHTML will give a longer working-life to these documents
XHTML versus HTML An XHTML document must be a well-formed XML document and must be valid according to one of the DTDs which define the three varieties of XHTML: the Strict DTD, which should be used when rendering is controlled by CSS <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " the Transitional DTD, to be used for browsers that cannot handle CSS <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0Transitional//EN" " the Frameset DTD, to be used when frames are used to divide up the browser window: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "
XHTML versus HTML Since an XHTML document must be a well-formed XML document and must be valid according to one of the DTDs, –an XHTML document must contain one root element (an XML well-formedness requirement) –the root element must be delimited by and tags (a validity requirement, since html is defined as the root element in the XHTML DTDs) –all XHTML tags and attributes must be in lower-case (a validity requirement, since the XHTML DTDs define the tags and attributes as lower-case and XML is case-sensitive)
CS4400 got here at 14:00 on 26 Feb 2002
XHTML versus HTML (contd.) –a non-empty element must have start and closing tags, for example, every tag must have a corresponding tag and every tag must have a corresponding tag (a well-formedness requirement) –the start tag for an empty element must have a final /, for example (a well-formedness requirement) –elements must be properly nested (a well-formedness requirement) –attribute values must be quoted (a well-formedness requirement)
XHTML versus HTML (contd.) –attributes must have values (a well-formedness requirement) Ill-formed example: Well-formed example:
XHTML versus HTML (contd.) Since style-sheets and scripts are not XML, they must be escaped by placing them inside the special CDATA tags which XML provides for escaping non-XML text Example style element <![CDATA[ body {background-color:white;color:red} h1 {background-color:orange;color:blue} ]]> Example script element <![CDATA[ alert(“Check-out the specimen exam paper”); ]]>
XHTML versus HTML (contd.) Use the id attribute instead of the name attribute –although the name attribute is still supported in XHTML 1.0, it is expected to be eliminated in future DTDs One advantyage of adopting XHTML is that you can validate your documents, instead hoping that users who find them on the web will be able to view them. So use one of the following document type declarations <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0Transitional//EN" " <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "
Cs 607 got here on 15 mar 2005
Rendering XML documents
Unlike HTML tags, no XML tag (other than XHTML tags) has any pre-defined rendering semantics However, at least four rendering possibilities exist: –some browsers, such as MSIE 5.5, are starting to introduce very simple default rendering semantics for arbitrary XML tags –some browsers, including MSIE 5.5, accept CSS specifications for rendering XML tags –the most powerful approaches involve using XSL (eXtensible Stylesheet Language), a very powerful language which enables arbitrary ways of rendering XML documents some browsers are starting to accept XSL stylesheets server-side software, driven by XSL stylesheets, can transform XML documents into HTML documents before serving them to browsers which cannot understand XML+XSL
Default Rendering Semantics As said before, some browsers, such as MSIE 5.5, are starting to introduce very simple default rendering semantics for XML tags MSIE 5.5 renders XML documents as interactively- expandable/contractable tree structures
Default Rendering Semantics (contd.) Consider the XML specification below: Celia Larkin Bertie Ahern This is displayed by MISE 5.5 as shown on the next slide Notice how the start tags of elements with element content, and tags have a - before them –if we click on this -, MSIE will hide the children
Default Rendering Semantics (contd.) Say we click on the - before the tag MSIE will hide all the children of this element, as shown on the next slide Notice, however that the - changes to a + which indicates that, if we click on it, MSIE will display the children again
Default Rendering Semantics (contd.) Say we click on the - before the first tag MSIE will hide the child of this element, as shown on the next slide Notice, however that the - changes to a + which indicates that, if we click on it, MSIE will display the child again
Using CSS with XML documents Some browsers, including MSIE 5.5, accept CSS specifications for rendering XML tags
Using CSS with XML documents (contd.) Consider the XML specification below: Celia Larkin Bertie Ahern This refers to a CSS style-sheet whose contents are shown on the next slide
Using CSS with XML documents (contd.) The contents of the CSS file personnel2.css are: male {color : blue; background-color : orange} female {color : pink; background-color : green} Remember that the XML content was: Celia Larkin Bertie Ahern Thus Celia Larkin should appear in pink on green while Bertie Ahern should appear in blue on orange See next slide
Rendering XML documents with XSL The most powerful approaches to rendering XML documents involve using XSL (eXtensible Stylesheet Language) XSL enables arbitrary ways of rendering XML documents –XSL enables an XML document to be translated into any arbitrary format, including, say, PDF or HTML Server-side software, driven by XSL stylesheets, can transform XML documents into HTML documents before serving them to browsers In addition, some browsers, such as MSIE 5.5, can accept XSL stylesheets We will consider browser-processed XSL stylesheets first and, later, consider server-side use of XSL
Browser-processed XSL Consider the XML specification below: Celia Larkin Bertie Ahern This refers to a XSL style-sheet, whose content we will examine later
Browser-processed XSL (contd.) When the XML document on the previous slide is loaded into an XSL- enabled browser, such as newer versions of MSIE, it is rendered as shown below
Browser-processed XSL (contd.) The rendering on the previous slide is produced because the XML document is transformed, by the XSL style-sheet, into the following HTML document: Name Sex Celia Larkin female Bertie Ahern male
Browser-processed XSL (contd.) The XSL style-sheet which transformed the XML document into a HTML document is specified on the following slide It will be discussed in detail later
XSL style-sheet <xsl:transform xmlns:xsl=" version="1.0"> Name Sex male female
Server-side use of Style-sheets Unlike the newer versions of MSIE, most browsers are still not able to handle XSL style-sheets For example, the XML document we have just seen transformed to a HTML table by an XSL style-sheet in MSIE 5.5 would be rendered in Opera 5 as shown on the next slide –Opera 5 cannot handle XSL so it uses its own default XML “rendering” which is simply to ignore the tags and output the element content as it finds it, without any attempt at formatting
Rendering of XML document in Opera
Server-side use of style-sheets (contd.) Thus, if we are to deliver XML content to browsers like Opera, we must use XSL on the server –we must transform the XML content to HTML on the server, so that a non XML-enabled browser requesting an XML document actually receives an “equivalent” HTML document Technologies exist for doing this –Perl provides modules for doing it, specifically XML::XSLT and the modules which this module calls, that is XML::DOM and XML::Parser –Apache provides a technology called Cocoon for doing it
A short overview of the Extensible Stylesheet Language (XSL)
There are two main stages to rendering an XML document using XSL: –Tranforming the source document into a new notation which has a rendering semantics –Formatting the resultant document according to the semantics of the notation XSL provides two (sub-) languages for these two tasks
XSL provides a (sub-)language called XSL Transformations (XSLT) for tranforming a source XML document into a new notation which has a rendering semantics XSL provides a (sub-)language with rendering semantics called XSL Formatting Objects (XSL FO)
Browser-side usage of XSL In browser-side usage of XSL, we simply use the XSLT part of XSL to transform XML into HTML This HTML is then rendered by the browser
Trees in XSLT In XSLT, the source and result document are viewed as trees –the types of nodes include element nodes, attribute nodes and text nodes. A stylesheet written in XSLT consists of a set of a set of template rules. A template rule has two parts: –a pattern which is matched against nodes in the source tree and –a template which can be instantiated to form part of the result tree.
XSLT Stylesheets An XSLT stylesheet is itself an XML document The root element is of type xsl:stylesheet but xsl:transform may be used as a synonym for xsl:stylesheet Thus, an XSLT stylesheet may look like this … Or like this: …
XSLT Stylesheets Two attributes are required for the root element: version – at present the correct value is “1.0” xmlns:xsl – different XSLT processessing software require different values of this. Different MSIE browsers contain different XSLT processors, so you may need to try these two different values for the xmlns:xsl attribute: " "
Thus, for some MSIE browsers, an XSLT stylesheet may have to look like this <xsl:transform version=“1.0” xmlns:xsl= " > … For other MSIE browsers, it may have to look like this: <xsl:stylesheet version=“1.0” xmlns:xsl= " > …
Elements within the root element The root element ( xsl:stylesheet or xsl:transform ) may contain a great many different types of elements However, the most commonly-used element is the xsl:template element –Each xsl:template element is used to specify a template rule for transforming one kind of node in the tree for the source XML document
Fragment from an XSLT style-sheet The following shows the root node and first-level child nodes of an XSLT stylesheet <xsl:transform version=“1.0” xmlns:xsl=“ > …
xsl:template elements An xsl:template element is used to specify a template rule Each xsl:template element has a match attribute which is used to specify a type of node in the source tree The content of an xsl:template element specifies the corresponding structure in the result tree
The match attribute in xsl:template elements The value of a match attribute is a pattern which specifies the type of source node to which the template rule applies A pattern may just be the name of a certain type of element in the source document For example, the following template can be used to transform any person element … However, certain meta-characters may also be used in match patterns
Meta-characters in match patterns Meta-characters that may be used in match patterns include: / * | // Here are some examples of patterns: –The pattern / matches the root node –The pattern * matches any element –The pattern male|female matches any male element and any female element –The pattern poem/verse matches any verse element with a poem parent –The pattern poem//line matches any line element with a poem element as an ancestor
The xsl:template element (contd.) Thus, here are some example xsl:template elements: …
Fragment from an XSLT style-sheet The following shows the top-most levels of the XSLT stylesheet for processing our XML document about Bertie and Celia <xsl:transform version=“1.0” xmlns:xsl=“ … The template matches the root of the source document –The content of the template will specify how to generate the corresponding result document
Content of xsl:template elements The content of an xsl:template element specifies how to generate the result text that corresponds to the source node This content may simply specify some canned text Or the xsl:template element may contain other XSLT elements (called instruction elements) which specify certain kinds of processing that should be performed on the source node in order to compute the appropriate result text Or the xsl:template element may contain a mixture of canned text and XSLT processing elements
Canned text in an XSLT stylesheet The stylesheet below transforms the entire source XML document (below-left) to some canned HTML text (below- right) People This document contains information about some people.
Canned HTML text must be well-formed XML The stylesheet below is not well-formed XML because there is no closing tag for the tag –An error message is produced by the MS XML parser People This document contains information about some people.
XSLT instruction elements We have seen that a template can contain non-XSLT text, canned text that it inserts into the result tree A template can also contain XSLT instruction elements –when the template is instantiated, each instruction is executed and the text fragment that it creates is placed in the result tree Instructions can select and process descendant source elements. Processing a descendant element creates a result tree fragment by finding the applicable template rule and instantiating its template.
An XSLT instruction element The instruction recursively processes the children of the source element. It does this by applying the template rules that match those children
The xsl:apply-templates instruction The stylesheet below contains an example application of this instruction: I found some people.
xsl:apply-templates instruction (contd.) The stylesheet below contains another application of this instruction Notice that, by recursion, it finds the male and female elements I found a man. I found a woman.
xsl:apply-templates instruction (contd.) In this third example, the person elements are found –but their children (the male and female elements) are not processed, even thought there are templates for them I found a person. I found a man. I found a woman.
xsl:apply-templates instruction (contd.) In this fourth example, the person elements are found –their children (the male and female elements) are also processed –because the template for person elements uses the xsl:apply- templates instruction to process the children of person elements I found a person. It was. a man a woman
the select attribute The xsl:apply-templates instruction has a select attribute which can be used to limit its application –Below, only male children of person elements are processed, even though there is a template for female elements I found a person. It was a man. It was a woman.
the select attribute (contd.) The select attribute can have complicated values –Below, only male descendents of people elements are processed, even though there is a template for female elements I found a man. I found a woman.
the xsl:value-of instruction This can be used to generate result text by extracting data from the source tree. –Its required select attribute species the data to be extracted –In its value patterns, the character. means “the current node” I found a woman. Her name is. I found a man. His name is.
the xsl:value-of instruction (contd.) This can also extract attribute values from the source tree –In its value patterns, the prefixes an attribute name I found a woman. Her name is and her age is I found a man. His name is and his age is.
Back to the table example We can now return to considering in detail the first XSLT style-sheet we saw –The one which, when applied to the XML document below- left produces the HTML document below-right –It is repeated on the next slide and then considered in detail
The XSL style-sheet for the TABLE example <xsl:transform xmlns:xsl=" version="1.0"> Name Sex male female
The table example (contd.) In this case, our style-sheet consists of four templates: –one template, whose match attribute has the value “/”, is satisfied when it finds the root element of the XML document. It then produces the main skeleton of the resultant HTML document and calls other templates to transform its children elements. –The template whose match attribute has the value “person” is satisfied when it finds a person element. It produces a and tag-pair, in-between which it calls templates to transform its children elements
The table example (contd.) –The template whose match attribute has the value “male” is satisfied when it finds a male element. It produces a tag, followed by the content of the male element, followed by the following sequence of HTML: male –The template whose match attribute has the value “female” is satisfied when it finds a female element. It produces a tag, followed by the content of the female element, followed by the following sequence of HTML: female
Another Example Consider the following XML document female 21 Julia Larkin male 22 Bertie Ahern
Another example (contd.) Suppose we wanted to render it as in the image below
Another example (contd.) Here is the stylesheet: Tabular Personnel Descriptions Here are some tables, one for each person. Details of Name Age Sex
Another example (contd.) Suppose we wanted also wanted to embed a CSS style-sheet in the XSLT style-sheet, as below
Another example (contd.) Here is the XSLT stylesheet: Tabular Personnel Descriptions Here are some tables, one for each person. Details of Name Age Sex
XSLT in full The full richness of XSLT is beyond the scope of the time available in this course However, you may wish to read the W3C specification and experiment The latest version of the XSLT specification can be found at: