University of Jyväskylä/AHo & VLy Experiences of Document Transformations with XSLT and DOM Anne Honkaranta, Virpi Lyytikäinen, Pasi Tiitinen, University of Jyväskylä, Finland inSGML project
University of Jyväskylä/AHo & VLy Content Poem Publishers, Inc. Poems Publishing environment Transformations Tranformation techniques Transformations in server-client environment Tranformations in Poem Publishers, Inc Challenges encountered Lessons learned
University of Jyväskylä/AHo & VLy Poem Publishers, Inc. Fictional company Publishes Finnish poems on WWW Poems are authored in XML format according to a DTD The company offers the poets an authoring environment if so desired The poems can form collections
University of Jyväskylä/AHo & VLy Poem.dtd
University of Jyväskylä/AHo & VLy Publishing environment Microsoft IIS server v. 5.0 Jscript, VBScript ASP 3.0 DOM II Internet Explorer 5.5 or newer CSS Level 2 MSXML 3.0
University of Jyväskylä/AHo & VLy Transformation Changing/converting document format structure /information schema content organization filtering the content all the above Conversion, filtering, and transformation are sometimes used as synonyms
University of Jyväskylä/AHo & VLy Why you need transformations? Authors need content-oriented DTD Different end-user devices When managing documents we need to have them in an optimal format for processing --> three-step publication process authoring -- processing -- output
University of Jyväskylä/AHo & VLy Transformation techniques Event-based mapping technique Tree-based mapping technique Examples of languages SAX-Simple API for XML Omnimark language/program DOM (document object model) — API Balise language/program XSLT language Pros/cons. fast, uses computing resources efficiently does not give very good control over schema (dtd, grammar) of an output document constructing a parse tree in memory takes resources good control over schema of an output documen best suited for complex (context) transformation)
University of Jyväskylä/AHo & VLy Transformations in client-server environment (XSLT/DOM) Alternatives: using PI in XML source document (c) (can be written to the source document on a web server) DOM-interface and DOM objects for loading the source XML and XSLT (c/s) using DOM-interface + scripting language (Vbscript, Jscript) or Java
University of Jyväskylä/AHo & VLy Transformation chain (an example) Output HTML/ XHTML doc rendered by CSS Output doc. + link to CSS doc. Source XML doc XSLT doc. Client Server/Client
University of Jyväskylä/AHo & VLy Example:using PI in source XML <?xml-stylesheet type="text/xsl" href=”poem_html.xsl" ?> <!DOCTYPE POEM SYSTEM "Poem1.dtd">... <xsl:stylesheet..... <LINK rel="stylesheet” type="text/css” href="runo_htm.css" >
University of Jyväskylä/AHo & VLy Example: using DOM-objects+XSLT Dim objDocument, objXSL, strXML Set objDoc = CreateObject("MSXML2.DOMDocument") Set objXSL = CreateObject ("MSXML2.DOMDocument") objDoc.async=false objXSL.async=false objDoc.Load "../Runot/Pinkku1.xml" objXSL.Load "runo1_htmlksi2.xsl" strXML=objDoc.transformNode(objXSL) Document.Write strXML
University of Jyväskylä/AHo & VLy Example: using Vbscript+DOM Inspect nodes of poem Dim root, xmlDoc, child Set xmlDoc = CreateObject("Msxml2.DOMDocument") xmlDoc.async = False xmlDoc.load("Runot/Pinkku1.xml") 'Walk from the document to each of its child nodes: For Each child In xmlDoc.childNodes document.write ”type of node:" & child.nodeType & " | " document.write ”name of node:" & child.nodeName & " | " document.write ”content of node:" & child.text & " " Next
University of Jyväskylä/AHo & VLy Transformation ”types” tested in Poem Publishers, Inc. XML-to-XML XML-to-HTML XML-to-XHTML
University of Jyväskylä/AHo & VLy Transformation needs tested in Poem Publishers, Inc. Tasks tested: combining multiple source documents into output view (poem+header/footer, poem list, poem metadata) combining multiple source documents into one file (making a poem collection) combining XSLT transformation documents for transformation needs (poem+footer)
University of Jyväskylä/AHo & VLy Example: combining XSLT- stylesheets <xsl:stylesheet xmlns:xsl=" version="1.0" xmlns:xlink=" xmlns=" !-- Filename: header.xsl --> <xsl:stylesheet xmlns:xsl= " version="1.0" xmlns:xlink=" xmlns="
University of Jyväskylä/AHo & VLy Challenges Encountered Problems with parsers and versions character encodings figures and links ”too many” tools, scripting languages, and programs
University of Jyväskylä/AHo & VLy Example: Character encodings and parser MSXML INPUT DOC MSXML 3.0 OUTPUT DOC -input doc encoding -maybe character entities -entities are changed to actual character reps. when transformed -uses UTF-16 -detects output encoding from PI when appropriate load/save methods used -otherwise outputs UTF-16 -has some encoding -has an encoding declaration -problem: either of them is ”wrong”
University of Jyväskylä/AHo & VLy Possibilities you can use XSLT-stylesheets as components and combine them a stylesheet can be seen as a re-usable component on the server you can also chain transformations you can keep your data in content-oriented form and provide multiple output versions by using transformations problem: management of DTD’s, transformation components and versions
University of Jyväskylä/AHo & VLy Lessons learned Use same character encodings in source documents and transformation scripts Offer a content oriented DTD for your authors; there is propably need for transformations anyway Support level of CSS, XSLT and XML varies in browsers Tools are available for building XML publishing environments: allow extra time for dealing with possible problems Multiple skills and tools needed in publishing environment, XML is not enough!
University of Jyväskylä/AHo & VLy More information: inSGML project