instructional media + magic, inc. Introduction to XSLT Justin Tilton instructional media + magic, inc. As presented at the Summer uPortal Conference Denver, Colorado, June 8, 2003
The Abstract… Looking for a methodology to quickly and effectively create Transformations? Interested in the basics of XSLT and Xpath, and a good way to get started? If so, this workshop is for you! We will be discussing the fundamental concepts of XSLT and Xpath, and the methodologies that have emerged from months of developing stylesheet transformations for the uPortal 2.0 project. We will discuss the design aspects related to converting structured information in XML into device-dependent markup languages such as HTML, and WML, and the guidelines and best practices evolving from this experience. No prior XSLT experience is necessary.
Overview Introduction uPortal Basic XPath Basic XSLT Markup: XHTML Cascading Style Sheets Tools The Creation Process Hands-on
Introduction
Background Who: W3C What: XPath and XSLT Specs. When: 11/16/1999 Why: A need arose for a specification to define the syntax and semantics for transforming XML documents.
References “The” definitive reference… Michael Kay Wrox Press Inc ISBN: 1861005067
References Great practical reference… Jeni Tennison Hungry Minds ISBN: 0764547763
References Practical use of transformations in Java code Eric Burke O'Reilly & Assoc. ISBN: 0596001436
JA-SIG’s uPortal
Use of XSLT in uPortal XSLT is used to render the "Structure", "Theme" and "Channels" Structure is an XML to XML Transformation of Abstract Layout to Structured Layout (such as Tab and Column) Theme is an XML to Markup Language Transformation (XHTML, WAP, etc) which renders the container for headers, navigation, channels, footers, etc. Channels have XML to Markup Language Transformations that render content
Structure XSLT Tab/Col/Row Tab/Column User Layout XML Tree/Column XML Processor User Layout XML XML Structure XSLT
Conceptual Structure The Structure is XML at this point, the illustrations are representational
Theme XSLT HTML 4.0 Browser HTML 3.2 PDA Structure XML WML Mobile Phone XSLT Processor Structure XML Theme XSLT
Theme
Channel XSLT Channel XML and Stylesheets Final Output Output Stream to Device
Final Output
Basic Architecture Channel XSLT Channel XSLT Framework Channel XSLT Structure XSLT Theme XSLT Skins - CSS
Channel Elementary unit of presentation, defined by the IChannel interface User Interaction External Information Channel Content (Presentation) IChannel
IChannel content must Be well-formed XML such as XHTML, RSS, SVG, SMIL, or a SOAP message (HTML is not well formed XML) Rendered by an XSL transformation using an XSL stylesheet
Framework Organization User Interaction Presentation uPortal Framework Channel Channel Channel
User Layout User Layout is an abstract structure defining the overall content available to the user userLayout is a tree structure consisting of “folders” and “channels”, the later always being the leaf nodes
User Layout
Structure Transformation Channel Column Tab User Layout
Theme Transformation User Layout Tab Tab Tab Jim Smith Financial Aid Library Column Column Channel Channel Channel Channel Channel Channel Dictionary.com Bookmarks Cartoon
Compiling the Presentation userLayout Structure transformation XSLT structuredLayout setRuntimeData() XSLT Channels Theme transformation renderXML() HTML, WML VoiceML...
Content Transformation XML XSLT Processor XHTML: Web Browser HTML: PDA Stylesheet WML: Cell Phone
Multiple Target Devices
Skins: Cascading Stylesheets
Skin: im+m
Skin: VSAC
Skin: matrix
User Preferences Swappable layout and preference management modules Profile management module Tab-column specific prefs. module Skin selection
User Preferences
Publish/Subscribe Channel publishing document Channel parameters Default values Modification permissions Descriptions Publish/Subscribe steps Step sequence Instructions, help A complex channel with multiple XSL views
uPortal Base Channel Types
Applet Channel
Image Channel
Inline Frame
Remote Channel
RSS Channel
Web Proxy Channel
XSLT Channel
Channel Controls
Channel Review
XHTML
Why XHTML? XSLT Stylesheets are XML documents If you plan to output HTML, it will reside in the template bodies of the XSLT Stylesheet, and the markup will be output as literal output. When the XSLT Stylesheet contains Markup, it has to be well formed
Differences with HTML 4 Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in HTML 4 must be changed. Documents must be well-formed Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form (as described below), and that all the elements must nest. CORRECT: nested elements. <p>here is an emphasized <em>paragraph</em>.</p> INCORRECT: overlapping elements <p>here is an emphasized <em>paragraph.</p></em>
Differences with HTML 4 Element and attribute names must be in lower case XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.
Differences with HTML 4 In HTML 4 certain elements were permitted to omit the end tag; with the elements that followed implying closure. This omission is not permitted in XML-based XHTML. All elements must have an end tag. CORRECT: terminated elements <p>here is a paragraph.</p> <p>here is another paragraph.</p> INCORRECT: unterminated elements <p>here is a paragraph. <p>here is another paragraph.
Differences with HTML 4 All attribute values must be quoted, even those which appear to be numeric. CORRECT: quoted attribute values <table rows="3"> INCORRECT: unquoted attribute values <table rows=3>
Differences with HTML 4 XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as nowrap and checked cannot occur in elements without their value being specified. CORRECT: unminimized attributes <td nowrap=“nowrap"> INCORRECT: minimized attributes <td nowrap>
Differences with HTML 4 Empty elements must either have an end tag or the start tag must end with />. For instance, <br/> or <hr></hr>. CORRECT: terminated empty tags <br/><hr/> INCORRECT: unterminated empty tags <br><hr>
Cascading Style Sheets
CSS uPortal and Skins uPortal uses Cascading Style Sheets to “Skin” the portal and content. These CSS files are optimized for a particular structure. For consistency channel developers should become familiar with the CSS classes that are defined for a particular structure and apply them to the markup language in their XSLT transformations.
The CSS Classes There is a sample “developer” channel in uPortal that describes and gives examples of the CSS classes for the Tab/Column Theme.
The CSS Channel
Basic XPath
Nodes and Node Trees When an application wants to operate on an XML document it builds an internal model of what the document looks like. This model is known as a document object model or DOM. In XPath and XSLT, it's called a node tree.
Types of Nodes Root nodes Element nodes Attribute nodes Text nodes The top of the node tree Element nodes XML elements Attribute nodes XML attributes Text nodes Textual content in XML elements Comment nodes XML comments Processing instruction nodes XML processing instructions Namespace nodes The in-scope namespaces on an element R E E E E A T
Node Tree Example <document> </document> <para type=“note”> content </para> <para type=“warning”> content </para> </document> document para warning note content
XPath Definition XPath is a language for addressing parts of an XML document, designed to be used by XSLT. Example XPath: child::para[attribute::type='warning'][position()=2] In English: select the second para child of the context node that has a type attribute with a value of warning.
Dissecting the Example child::para[attribute::type='warning'][position()=2] Axis: child::para Filter: [attribute::type='warning'] Filter: [position()=2] Context node Axis
Dissecting the Example child::para[attribute::type='warning'][position()=2] Axis: child::para Filter: [attribute::type='warning'] Filter: [position()=2] Context node Axis Filtered
Dissecting the Example child::para[attribute::type='warning'][position()=2] Axis: child::para Filter: [attribute::type='warning'] Filter: [position()=2] Context node Axis Filtered Filtered
Dissecting the Example child::para[attribute::type='warning'][position()=2] Axis: child::para Filter: [attribute::type='warning'] Filter: [position()=2] Context node Axis Filtered Filtered Selected
Types of XPaths Expressions: Patterns: Return a value, which might be a node set that is processed or a string that is output. <xsl:when test="@type=‘warning’"> Patterns: Either match a particular node or don't match that node. <xsl:template match="para">
XPath Expressions Select Nodes Conditional Calculation C Z Z C 1 2 3 4 <xsl:for-each select=“child::Z”> Conditional <xsl:if test=“position()=2”> Calculation <xsl:value-of select=“position()+4"> C Z Z C 1 2 3 4
Node Set Expressions The most common way that XSLT uses XPaths is to select node sets. These XPaths usually occur within a select attribute, for example on xsl:for–each or xsl:apply–templates, and are known as location paths. <xsl:for-each select=“child::para”> <xsl:apply-templates select="paragraph"/>
Location Paths The purpose of location paths is to select node sets from a node tree. Location paths can be absolute or relative Absolute location paths start from a known location such as the root node <xsl:for-each select=“/R/N”> Relative location paths start from the context node. <xsl:for-each select=“N”> note: same as “child::N” R N X Y N R N X Y C ontext N Z
Steps A location path is made up of a number of steps. Each step takes you from a node to a node set. Each step is separated from the one before it with a “/”.
Steps Every step is made up of an axis and a node test. The axis specifies the direction that the step is taken in The node test specifies the kinds of nodes that should be collected in that direction. Within a step, the axis and the node test are separated by a double colon “::”. “ child :: para ”
Some Shorthand Notation If no axis is specified in a step, the default axis is the child axis. Longhand select=“child::para/child::sentence” Shorthand select=“para/sentence” Another important axis is the attribute axis, which takes you from the context node to the attributes of that node. Longhand select=“attribute::type” Shorthand select=“@type” Another essential axis is the parent axis. This takes you from the context node to the parent of that node. Longhand select=“parent::node()/child::xyz” Shorthand select=“../xyz”
Axes ancestor :: node() Takes you up the tree to the ancestors of the context node in reverse document order. 2 1
Axes ancestor-or-self :: node() Takes you up the tree to the ancestors of the context node in reverse document order starting with the context node. 3 2 1
Axes child :: node() Takes you to the children of the context node in document order. This is the default axis. 1 2
Axes descendant :: node() Takes you to the descendants of the context node. The resulting nodes are in document order. 1 3 2 4 5 6
Axes descendant-or-self :: node() Takes you to the descendants of the context node, starting with the context node, in document order. 1 2 4 3 5 6 7
Axes following :: node() Takes you to the nodes that occur after the context node in document order, but that are not the context node’s descendants. 1 4 2 3
Axes following-sibling :: node() Takes you to the siblings of the context node that occur after the context node in document order. 1 2
Axes parent :: node() Selects a single node – the parent of the context node. 1
Axes preceding :: node() Takes you to the nodes that occur before the context node in reverse document order, excluding the context node’s ancestors. 3 2 1
Axes preceding-sibling :: node() Takes you to the siblings (children of the same parent) of the context node that occur before it, in reverse document order. 2 1
Axes self :: node() Takes you to the context node itself. 1
Predicates (filters) Predicates are placed in square brackets either: at the end of a step select=“para[position()=3]/sentence” or at the end of a location path select=“para/sentence[position()=3]” Predicates act as filters on node sets. When predicate expressions test false, the node is filtered out.
Predicates (filters) You can have any number of predicates following each other. select=“para[position()=3][@indent=.5]/sentence” The context node list for each predicate contains the nodes that are still in the node set after it's been filtered by the previous predicates. Predicates can be used at any point in a location path, but they only apply to the immediately preceding step.
Expressions - Union You can combine node sets by creating a union using the “|” operator. If there are any nodes that occur in both node sets, the union only holds one copy of them. You can use predicates on the result of a union, just as you can on any node set.
Expressions - Operators Logical Operators and, or and not() Comparative Operators =, !=, <, <=, >, >= Remember to escape: “<“ to < Mathematical Operators +, -, *, div, mod
Example: Locating Nodes This XPath expression selects all the price elements of all the cd elements of the catalog element: /catalog/cd/price This XPath expression selects all the cd elements in the document: //cd *note most XPath expressions are written in shorthand
Selecting Unknown Elements Wildcards ( * ) can be used to select unknown XML elements. This XPath expression selects all the child elements of all the cd elements of the catalog element: /catalog/cd/* The following XPath expression selects all the price elements that are grandchild elements of the catalog element: /catalog/*/price
Selecting Unknown Elements The following XPath expression selects all price elements which have 2 ancestors: /*/*/price The following XPath expression selects all elements in the document: //*
Selecting Branches The following XPath expression selects the first cd child element of the catalog element: /catalog/cd[1] The following XPath expression selects the last cd child element of the catalog element (Note: There is no function named first()): /catalog/cd[last()]
Selecting Branches The following XPath expression selects all the cd elements of the catalog element that have a price element: /catalog/cd[price] The following XPath expression selects all the cd elements of the catalog element that have a price element with a value of 10.90: /catalog/cd[price=10.90]
/catalog/cd[price=10.90]/price Selecting Branches The following XPath expression selects all the price elements of all the cd elements of the catalog element that have a price element with a value of 10.90: /catalog/cd[price=10.90]/price
Selecting Several Paths The following XPath expression selects all the title and artist elements of the cd element of the catalog element: /catalog/cd/title | /catalog/cd/artist The following XPath expression selects all the title and artist elements in the document: //title | //artist The following XPath expression selects all the title, artist and price elements in the document: //title | //artist | //price
Selecting Attributes This XPath expression selects all attributes named country: //@country This XPath expression selects all cd elements which have an attribute named country: //cd[@country] This XPath expression selects all cd elements which have any attribute: //cd[@*] This XPath expression selects all cd elements which have an attribute named country with a value of 'UK': //cd[@country='UK']
Basic XSLT
Background XSLT is part of a larger initiative within the World Wide Web Consortium (W3C) to define a way of presenting XML documents. This initiative is known as XSL (Extensible Stylesheet Language). XSLT is an XML vocabulary that's used to define a transformation between an XML document and a result document, which might be in XML, in HTML, or a text document.
What is an XSLT Stylesheet? The XSLT processor uses a stylesheet to transform an XML document. The stylesheet contains instructions for generating a new document based on information in the source document. This can involve adding, removing, or rearranging nodes, as well as presenting the nodes in a new way.
The XSL Processing sequence source document XML parser source tree XSL stylesheet rules base apply templates Result file or stream write result to output result tree
Differentiating Stylesheets XSLT stylesheets do not perform the same function as Cascading Style Sheets (CSS) CSS is used to apply style elements to markup languages to affect formatting in a single pass, top down, fashion. XSLT produces a separate result tree and can iterate and perform conditional logic. XSLT and CSS are most powerful when used together. More later…
Example of a Stylesheet When you work with a stylesheet, three documents are involved: Source document in XML Desired output: the result document, which can be HTML, XML, or text XSLT stylesheet, which is also an XML document
“Hello World!” XML Document: Desired Output: XSLT Stylesheet: root <?xml version="1.0"?> <greeting>Hello, World!</greeting> Desired Output: <html> <head><title>Greeting</title></head> <body><p>Hello, World!</p></body> </html> XSLT Stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <body><p><xsl:value-of select="greeting"/></p></body> </xsl:template> </xsl:stylesheet> root greeting Hello World!
Dissecting the XSLT XSLT stylesheets are XML Documents <?xml version="1.0"?> Standard XSLT heading/Namespace <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> The template rule to be triggered when a particular part of the source document is being processed. “/” is XPath for “root node” <xsl:template match="/">
Dissecting the XSLT Once the rule is triggered the body of the template defines what output to generate. <html> <head><title>Greeting</title></head> <body><p><xsl:value-of select="greeting"/></p></body> </html> Most of the template is HTML except the value-of element which is an XSL instruction. The XPath in the select statement is asking for the contents of the child of the context node with an element name of greeting. root greeting Hello World!
Dissecting the XSLT All that’s left to do is close (finish) the template and the stylesheet. </xsl:template> </xsl:stylesheet>
Some XSL Top Level Elements <xsl:decimal-format> Declares a decimal format. <xsl:import> Imports another stylesheet into this stylesheet. <xsl:key> provides a way to work with documents that contain an implicit cross-reference structure <xsl:output> This instruction specifies how you want the result tree to be output
Some XSL Top Level Elements <xsl:param> Declares a parameter for a stylesheet or template, and specifies a default value for the parameter <xsl:template> Specifies a template rule. <xsl:variable> Declares a variable and binds a value to that variable. The difference between the xsl:param and xsl:variable instructions is that xsl:param defines a default value while xsl:variable defines a fixed value. If used as a top level element the scope is global otherwise the scope is local to a specific template
What Is a Template? A template defines what the XSLT processor should do when it processes a particular node in the XML source document. The XSLT processor populates the result document by instantiating a sequence of templates. Instantiation of a template means that the XSLT processor Copies any literal data from the template to the result document Executes the XSLT instructions in the template
Contents of a Template To define a template, you specify the xsl:template instruction. In the xsl:template tag, the value of the match attribute is an XPath pattern. This pattern matches (identifies) nodes in the source XML document. The value of the match attribute is the template rule.
The Template Body The template body defines actions you want the XSLT processor to perform each time it instantiates the template. It contains XSLT instructions you want the XSLT processor to follow Elements that specify literal output you want the XSLT processor to insert in the result document. For example: <table align="center" cellpadding="5">
Determining Templates to Instantiate When the XSLT processor applies a stylesheet to an XML document, it begins processing with the root node of the XML source document Every stylesheet includes default templates Whether or not you explicitly define a template rule that matches the root node, the XSLT processor always instantiates a template that matches the root node.
The Example Handout
Dissected XSLT Stylesheet
Result
Dissecting a sample In the sample stylesheet, the template rule in the first template matches the root node: <xsl:template match="/"> The XSLT processor instantiates this template to start generating the result document. It copies the first few lines from the template to the result document.
Dissecting a sample Then the XSLT processor reaches the following XSLT instruction: <xsl:apply-templates select="/bookstore/book"/> When the XSLT processor reaches the select attribute, it creates a list of all source nodes that match the specified pattern. In this example, the list contains book elements. The processor then processes each node in the list in turn by instantiating its matching template.
Dissecting a sample First, the XSLT processor searches for a template that matches the first book element. The template rule in the second template matches the book element: <xsl:template match="book"> After instantiating this template for the first book element, the XSLT processor searches for a template that matches the second book element…
Dissecting a sample The XSLT processor instantiates the book template again, and then repeats the process for the third book element. After three instantiations of the book template, the XSLT processor returns to the first template (the template that matches the root node) and continues with the line after the xsl:apply-templates instruction.
Built-in Templates When the XSLT processor cannot find a template that matches a selected node, it uses built-in templates. Every stylesheet includes built-in templates whether or not you explicitly define them.
Built-in Templates The following template matches the root node and element nodes and selects all attributes and child nodes for further processing: <xsl:template match="*|/"> <xsl:apply-templates / > </xsl:template> The following template matches text and attribute nodes. This template copies the value of the text or attribute node to the result document: <xsl:template match="@*|text()"> <xsl:value-of select="." / >
Major XSL Instructions xsl:apply-imports xsl:apply-templates xsl:attribute xsl:attribute-set xsl:call-template xsl:choose xsl:comment xsl:copy xsl:copy-of xsl:decimal-format xsl:element xsl:fallback xsl:for-each xsl:if xsl:import xsl:include xsl:key xsl:message xsl:namespace-alias xsl:number xsl:otherwise xsl:output xsl:param xsl:preserve-space xsl:processing-instruction xsl:sort xsl:strip-space xsl:stylesheet xsl:template xsl:text xsl:transform xsl:value-of xsl:variable xsl:when xsl:with-param
The Tools
The Applications… These are the applications I am familiar with -- this is not an endorsement -- and no testing was done on animals… XML/XSLT Document Development IDE: Excelon Stylus Studio XML Spy Cooktop (open source) Komodo (commercial – runs on LINUX) Emacs (minor mode XSLT-process) HTML Markup IDE: Macromedia Dreamweaver MX (XHTML), Adobe GoLive HTML to XHTML Conversion/Cleanup HTML Tidy (open source)
The Creation Process
Creating an XSLT Stylesheet Step One – get or develop one of the following: A representative XML Document A DTD (a whole different seminar) A Schema (a whole different seminar) Step two – Analyze the data Think about the best way to present and interact with it. The data is your friend. Step three – Develop sample markup Design mocks for each markup language & device that you plan to support. Don’t try to paint the canvas without a sketch. Step four – Convert the markup to XML The stylesheet will only eat tasty markup. Poorly formed markup will be regurgitated. Step five – copy markup into XSLT editor Match on a root element and start cooking XSLT.
Hands - on
Hands – on Convert RSS (Rich Site Summary) “Syndicated Content” XML Documents into HTML using an XSLT Stylesheet.