An XML Web Publishing Framework From the Apache Project Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer
Today’s Topics: Definitions Motivation Required Tools (Java, Apache Tomcat and Cocoon) Basic Cocoon Operation Matchers, Generators, Transforms and Serializers. Oh My! sitemap.xml glues it all together. 6 August 2002 OAR Web Shop
Cocoon An XML-based WWW publishing framework implemented as a Java Servlet. Web site content stored in XML files (or RDBMS, LDAP Server or other source) is transformed (mostly via XSLT) into new XML files (to exclude certain info for example) and then serialized into human usable output (like an HTML or PDF file). 6 August 2002 OAR Web Shop
Reusable Content 6 August 2002 OAR Web Shop
Motivation for using Cocoon We distribute climate data Users (including scientists) find data via public search engines like google Public search engines index HTML content NOAA and other scientific organization use special purpose search engines that use FGDC (or DIF derived from FGDC) 6 August 2002 OAR Web Shop
Motivation continued These facts add up to maintaining separate “documents” for each purpose XML and Cocoon offers a (yet another potential) way out of the morass of many special purpose document collections 6 August 2002 OAR Web Shop
Suppose info was stored as XML <page> <title>Reynolds Sea Surface Temperature </title> <prefix>data.sst</prefix> <abstract> <para> The optimum interpolation (OI) SST analysis… </abstract> <contact> <name>CDC Data Management Personel</name> <address1>325 Broadway</address1> <phone>(303) 497-6244</phone> <email>cdcdata@cdc.noaa.gov</email> </contact> … </page> 6 August 2002 OAR Web Shop
The Power of XML Content Can be parsed with standard XML tools Can be easily used for another purpose besides the Web Can be written with powerful XML GUI tools (e.g. XML spy) (Might be) easier to maintain 6 August 2002 OAR Web Shop
Reusable Content 6 August 2002 OAR Web Shop
Schematic of the Solution Using Cocoon Cocoon Some other process 6 August 2002 OAR Web Shop
Required Tools On Solaris 7 and 8 I have used the binary distributions of: Java 1.4.0 (java.sun.com) Tomcat 4.0.4 (www.apache.org) Cocoon 2.0.3 (xml.apache.org) At this time, these are the latest releases. Follow the installation instructions for each package. 6 August 2002 OAR Web Shop
Basic Operation Cocoon is based on pipelines: A Bit of Software XML File New XML File A Bit of Software New XML File Info to client (e.g HTML to browser) 6 August 2002 OAR Web Shop
Basic Operation Cocoon is based on pipelines. An XML document is pushed through a pipeline consisting of one Generator (read a file, create a file from an LDAP server, etc.), zero or more Transforms (for example, to leave out sensitive information for external users) and ends with a Serializer that transforms the XML to binary or character data for consumption by the client (Web browser). The entire site could use only one pipeline. 6 August 2002 OAR Web Shop
Basic Operation If you need more than one pipeline… Matchers (wildcard and regular expression) and Selectors (Boolean expressions) can be used to control the pipeline used to process the XML content. 6 August 2002 OAR Web Shop
Components Matchers, Generators, Transforms and Serializers are all Cocoon Components. Pipelines are build out of Components. Components are declared and pipelines are constructed in the sitemap.xmap file. The “Bit of Software” needed for each Component is provided by Cocoon or built by you. 6 August 2002 OAR Web Shop
Components (Matchers) Suppose you wanted these URI patterns to be handled by cocoon: For example the wildcard patterns: http://www.cdc.noaa.gov/cocoon/data/*.html and http://www.cdc.noaa.gov/cocoon/data/*.pdf could result in two pipelines with two different outputs types. 6 August 2002 OAR Web Shop
Components (Matchers) Need a “bit of software” that looks at: http://www.cdc.noaa.gov/cocoon/data/data.sst.html Matches the the URL www.cdc.noaa.gov/cocoon/data And the extension “.html” Extracts the wildcard part of the URL data.sst Starts the pipeline to produce HTML output from the data.sst.xml file (the wildcard plus the .xml extension). 6 August 2002 OAR Web Shop
The WildCard Matcher We’re in luck! A Matcher Component already exists in Cocoon to do what we want. To use a Component we must declare it in the sitemap.xmap file that controls our Cocoon installation. 6 August 2002 OAR Web Shop
Declare the WildCard Matcher In sitemap.xmap configuration file: <map:matchers default=“wildcard”> <map:matcher name=“wildcard” src= "org.apache.cocoon.matchingWildcardURIMatcher"/> … </map:matchers> 6 August 2002 OAR Web Shop
Use the Matcher on a URI We’ve declared the Matcher Component Use the Matcher component in our pipeline to grab the * part of the pattern and use it to specify the source XML file that will be send through the pipeline. 6 August 2002 OAR Web Shop
Use the Matcher in a Pipeline This pipeline uses the default Matcher, which is the WildCard Matcher we declared in the previous slide <map:match pattern=“data/*.html"> <map:generate src=" data/{1}.xml"/> 6 August 2002 OAR Web Shop
Now What? We have successfully declared and used a Matcher to decide which pipeline we will use to process the first of our two examples URIs. Now we need to declare and use a Generator, which is always the first step of the pipeline. 6 August 2002 OAR Web Shop
Components (Generators) Declare a generator in sitemap.xmap: <map:generators default=“file”> <map:generator name=“file” src= “org.apache.cocoon.generationFileGenerator”/> … </map:generators> 6 August 2002 OAR Web Shop
Use the Generator in a Pipeline The File Generator was declared as the default. Its only job is to read the a file from the file system. <map:pipelines> <map:pipeline> <match pattern=“data/*.html”> <map:generate src=“data/{1}.xml”/> … 6 August 2002 OAR Web Shop
Review: Matcher and Generator Components (Matchers) Need a “bit of software” that looks at: http://www.cdc.noaa.gov/cocoon/data/data.sst.html Matches the the URL www.cdc.noaa.gov/cocoon/data And the extension “.html” Extracts the wildcard part of the URL data.sst Starts the pipeline to produce HTML output from the data.sst.xml file (the wildcard plus the .xml extension). 6 August 2002 OAR Web Shop
Review: Pipeline Components Conditional use of pipeline via the Matcher One Generator (FileGenerator) Zero or more Transforms (?) Ends with a Serializer (?) 6 August 2002 OAR Web Shop
Components (Transforms) Declare a Transform: <map:transformers default="xslt"> <map:transformer name="xslt“ src="org. apache.cocoon.transformation.TraxTransformer"> <use-request-parameters> false </use-request-parameters> <use-browser-capabilities-db> </use-browser-capabilities-db> </map:transformer> 6 August 2002 OAR Web Shop
The XSLT Transformer <use-request-parameters> Different from previous declarations we’ve seen. This declaration includes two additional configuration parameters. <use-request-parameters> <use-browser-capabilities-db> 6 August 2002 OAR Web Shop
Add the Transformer to Pipeline <map:match pattern="*.html"> <map:generate src=" {1}.xml"/> <map:transform src=“datastyle/HTMLstyle.xsl"/> 6 August 2002 OAR Web Shop
The Stylesheet written in XSLT: <HTML> <HEAD> <TITLE><xsl:value-of select="/page/title"/></TITLE> </HEAD> <BODY> … <xsl:template match="/page/abstract"> <h2>Abstract:</h2> <xsl:apply-templates select="para"/> </xsl:template> 6 August 2002 OAR Web Shop
Components (Serializers) The last step of each Pipeline is a Serializer It consumes XML (in the form of SAX events) and generates a character stream for a client (Web browser, Acrobat Reader, etc.). 6 August 2002 OAR Web Shop
Declare the Serializer In sitemap.xmap: <map:serializers default="html"> <map:serializer mime-type="text/html" name="html" src=“...HTMLSerializer"> <buffer-size>1024</buffer-size> </map:serializer> 6 August 2002 OAR Web Shop
The Completed Pipeline <map:match pattern=“data/*.html"> <map:generate src=“data/{1}.xml"/> <map:transform src=“datastyle/HTMLstyle.xsl"/> <map:serialize/> </map:match> 6 August 2002 OAR Web Shop
Pipeline to make PDF output <map:match pattern=“data/*.pdf"> <map:generate src=“data/{1}.xml"/> <map:transform src="stylesheets/FOstyle.xsl"/> <map:serialize type="fo2pdf"/> </map:match> 6 August 2002 OAR Web Shop
6 August 2002 OAR Web Shop
http://www.cdc.noaa.gov/cocoon/data/data.sst.html 6 August 2002 OAR Web Shop
http://www.cdc.noaa.gov/cocoon/data/data.sst.pdf 6 August 2002 OAR Web Shop
The Dreaded Demo Demo Data Set Descriptions at CDC. 6 August 2002 OAR Web Shop
Cocoon is all this and more! Action Components to do complex initialization (e.g. get database connection pool) during pipeline setup. Resource Components are internal reusable pipeline fragments. XSP and Logic Sheets offer capabilities similar to JSP with further separation of the logic. 6 August 2002 OAR Web Shop
Resources www.apache.org Inside XSLT by Steven Holzner (New Riders) Java and XSLT by Eric M. Burke (O’Reilly) 6 August 2002 OAR Web Shop
Reality Check! We have not (yet) put this system in production. Still designing the XML representation. Still learning about using Cocoon with a relational database. Considering using XSP pages. 6 August 2002 OAR Web Shop
Conclusions Cocoon offers the potential to use and reuse one bit of XML content for many purposes. Most operations for Web hosting the XML content are built-in to Cocoon. Unlimited customization by writing your own Components. Content is easily maintained and separated from presentation. 6 August 2002 OAR Web Shop