XML Craig Stewart Dr. Alexandra I. Cristea (http://www.dcs.warwick.ac.uk/~acristea/)
XML history Inception: circa 1996 The eXtensible Markup Language (XML) v1.0 became a W3C Recommendation 10. February 1998. Currently v1.0 is in it’s fifth version v1.1 published 2004 End of line issues > Unicode v2.0 character sets Other non-Unicode special characters This is the first time you hear the term W3C Recommendation. W3C Recommendation is the same as a standard. In other words, a W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium (W3C) working group concerning the standard. It is the equivalent of a published standard in many other industries. 1 Standard Maturation 1.1 Working Draft (WD) 1.2 Candidate Recommendation (CR) 1.3 Proposed Recommendation (PR) 1.4 W3C Recommendation (REC) 1.5 Later Revisions See also: http://en.wikipedia.org/wiki/W3C_recommendation In the late 1990s a group of people including Jon Bosak, Tim Bray, James Clark and others came up with XML, eXtensible Markup Language. Like SGML, XML is not itself a markup language, but a specification for defining markup languages. The W3C (World Wide Web Committee) immediately set about reshaping HTML as an XML application, with the result being XHTML. That is only one small part of what XML is all about. The key point is that using XML the industry can specify how to store almost any kind of data, in a form that applications running on any platform can easily import and process. XML appeared due to the fact that HTML, despite its popularity, showed some weaknesses – which were against its initial design. The original thinking in HTML was to separate content from presentation. For example, the <em> tag in a web page means “emphasise”. It was left up to the user agent how to render that, say as bold text, or in a different colour, or with a different tone of voice in a speech reader. This type of thing does not please page designers, who want to nail down the exact appearance of a page. Therefore HTML got extended with things like <font> tags which went right against the initial concept. For example, the <em> tag in a web page means “emphasise”. It was left up to the user agent how to render that, say as bold text, or in a different colour, or with a different tone of voice in a speech reader. This type of thing does not please page designers, who want to nail down the exact appearance of a page. Therefore HTML got extended with things like <font> tags which went right against the initial concept. For example, the <em> tag in a web page means “emphasise”. It was left up to the user agent how to render that, say as bold text, or in a different colour, or with a different tone of voice in a speech reader. This type of thing does not please page designers, who want to nail down the exact appearance of a page. Therefore HTML got extended with things like <font> tags which went right against the initial concept. Read more at: http://www.itwriting.com/xmlintro.php Read also notes from one of its early developers, Jon Bosak, at: http://java.sun.com/xml/birth_of_xml.html
What is XML? – wrong! XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to describe data XML is more of a standard and supporting structure than a standalone programming language – wrong! XML is not a markup language, it is a meta-language, that helps you describe other languages. Read more at: http://www.intranetjournal.com/articles/200312/ij_12_08_03a.html (Article: XML Basics and Benefits by P.G. Daly 12/8/2003 )
How does XML work? XML tags are not predefined. You must define your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe the data Also Relax NG (ISO DSDL) XML with a DTD or XML Schema is designed to be self-descriptive See an example of XML code at: http://prolearn.dcs.warwick.ac.uk/caf/typetest.xml Can you tell from just looking at the example what the purpose of this XML file would be? Can you tell what the corresponding DTD is (after you’ve learned about DTD, later in this course)? Answer: http://wwwis.win.tue.nl/~acristea/MOT/help/CAF.dtd (to view open in a text editor – or view source)
Main Difference XML, HTML XML was designed to carry data. XML is not a replacement for HTML. XML and HTML were designed with different goals: XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. HTML is about displaying information, while XML is about describing information. Syntax: XML is well formed, just like XHTML To briefly recap, XML is a meta language for describing mark-up languages. It provides a facility to define tags and the structural relationship between them. In order to see the differences between HTML and XHTML, take for instance the page: http://prolearn.dcs.warwick.ac.uk/ This is ‘acceptable’ HTML, but not XHTML. In order to visualize what is wrong with it, try the XHTML validator at: http://validator.w3.org/ As an exercise, perform changes to the document till it produces valid XHTML. Take other sites of your choice and parse them in a similar fashion. You will find that there is not a lot of valid XHTML out there on the web, still, browsers cope as best as they can.
XML does not DO anything XML was created to structure, store and to send information <note> <to>John</to> <from>Jane</from> <heading>Reminder</heading> <body>Don't forget the book!</body> </note>
XML is Free and Extensible XML tags are not predefined. You must "invent" your own tags. The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). XHTML is an application of XML but not vice-versa. The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document. XML allows the author to define his own tags and his own document structure. As you know, most browsers allow you to write quite messy HTML .You could start a new item with <li> and forget to close the tag (with </li> ). If you start another item with <li> , your browser will deduce that the previous one has ended, and start a new bullet point for you. Thus, browsers close all tags for you that you forgot to do yourself. They do some reasoning based on common and most frequent use of tags to do so. If we write ‘clean’ HTML, i.e., we add a closing tag for each opening tag, than we produce XHTML. XHTML is actually a subset of XML. It is XML where all the tags have been given a certain ‘meaning’ – same as in HTML.
Benefits XML extensibility and structured nature of XML allows it to be used for communication between different systems from one source of XML-based information you can format and distribute it via a multitude of different channels XSL files act as templates, allowing a single stylesheet to be used to format multiple pages or the same content for multiple distribution channels First, the extensibility and structured nature of XML allows it to be used for communication between different systems, which otherwise would be unable to communicate. While this sounds simple, the magnitude and impact of this benefit is tremendous. Consider this. With the use of XML, you can now communicate not only between internal computing systems but also external systems (vendors, customers, partners, etc.) using a common technology irregardless of the platforms and technologies used for each independent system. Phrased more simplistically, it is like having a single omniscient translator that can work between and among various nations and cultures seamlessly. Besides the obvious benefit of information sharing and system interoperability, knowledge transfer between your different computing teams becomes easier as well. Since XML has a clearly defined set of standards, people on Team A can easily understand and work with information from Team B. From an internal resource standpoint, this enables easier staff rotation (and coverage) with a shortened learning curve. From an external relationship standpoint (vendors, consultants, partners), knowledge transfer time is shortened and the actual understanding of the systems and information is enhanced. Second, from one source of XML-based information you can format and distribute it via a multitude of different channels with minimal effort. Through the use of extensible style language, XSL, developers can easily separate content from formatting instructions. In this way, XSL files act as templates, allowing a single stylesheet to be used to format multiple pages of information. Even more powerful is the ability to use several of these templates to define formatting of the same content for multiple distribution channels. Many times with both intranet and Internet applications your audience requires data through a variety of channels such as Web, e-mail, text, handheld, wireless devices, and print. With the use of XML and the XSL technologies (XSLT, XSL-FO, etc.) you can use a separate stylesheet to distribute the same content to multiple channels. Thus, retrieve the content and data once, deliver many times and in many formats with ease. (source: http://www.intranetjournal.com/articles/200312/ij_12_08_03a.html)
XML is a Complement to HTML XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. XML is a cross-platform, software and hardware independent tool for transmitting information. XML is not a replacement for HTML. It is important to understand that XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. XML is a cross-platform, software and hardware independent tool for transmitting information.
XML in Future Web Development XML is going to be everywhere. the XML standard has been developed quickly and a large number of software vendors have adopted it. XML might be the most common tool for all data manipulation and data transmission. See also the XML conference of the year (as well as their sponsorship!!): http://2007.xmlconference.org/public/content/home (replace 2007 by your current year)
XML Can be Used to Create New Languages XML is the mother of WAP and WML. The Wireless Markup Language (WML), used to markup Internet applications for handheld devices like mobile phones, is written in XML. And many others … WAP is an open international standard for applications that use wireless communication. Its principal application is to enable access to the Internet from a mobile phone or PDA. A WAP browser provides all of the basic services of a computer based web browser but simplified to operate within the restrictions of a mobile phone. The Japanese i-mode system is another major competing wireless data protocol. WAP sites, are websites written in, or dynamically converted to, WML (Wireless Markup Language) and accessed via the WAP browser. (source: http://en.wikipedia.org/wiki/Wireless_Application_Protocol)
Viewing XML to view XML documents hierarchically or view their output, you need an XML parser and processor. there are a number of these tools available: See examples at: http://www.stylusstudio.com/xml_download.html http://www.w3schools.com/xml/xml_parser.asp Please note, however: XML was not designed to display data.
The basic XML flow
XML Rules Every start-tag must have a matching end-tag. Tags cannot overlap. Proper nesting is required. XML documents can only have one root element. Element names must obey the following XML naming conventions: Names must start with letters or the "_" character. Names cannot start with numbers or punctuation characters. After the first character, numbers and punctuation characters are allowed. Read also: http://www.w3schools.com/xml/xml_syntax.asp
XML Rules (cont.) XML is case sensitive. Names cannot contain spaces. Names should not contain the ":" character as it is a "reserved" character. Names cannot start with the letters "xml" in any combination of case. The element name must come directly after the "<" without any spaces between them. XML is case sensitive. XML preserves white space within text. Elements may contain attributes. If an attribute is present, it must have a value, even if it is an empty string "".
Spot the error! <?xml version="1.0" encoding="ISO-8859-1"?> <note date=12/11/2002> <to>Tove</to> <from>Jani</from> </note> The error in the first document is that the date attribute in the note element is not quoted. This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.
Spot the error! <?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> </note> The error in the first document is that the date attribute in the note element is not quoted. This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.
With XML, CR / LF is converted to LF Windows: CR + LF Unix: LF Macintosh: CR Do you know what a typewriter is? Well, a typewriter is a mechanical device which was used last century to produce printed documents. :-) After you have typed one line of text on a typewriter, you have to manually return the printing carriage to the left margin position and manually feed the paper up one line. In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). The character pair bears some resemblance to the typewriter actions of setting a new line. In Unix applications, a new line is normally stored as a LF character. Macintosh applications use only a CR character to store a new line.
There is Nothing Special About XML plain text with XML tags Software that can handle plain text can also handle XML. In an XML-aware application, the XML tags can be handled specially: Visibility, Functional meaning, etc. There is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets. Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially. In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.
Is this an error? <note> <to>Tove</to> <from>Jani</from> <body>Don't forget me this weekend!</body> </note> <heading>Reminder</heading> Should the application break or crash? No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output. XML documents are Extensible.
XML Elements have Relationships Elements are related as parents and children. Root element / Parents Children / Siblings
Elements An element consists of all the information from the beginning of a start-tag to the end of an end-tag including everything in between. E.g. from (X)HTML, all of the following would be the equivalent of one element, named h1: <h1>This is a heading.</h1> Where, <h1> is the start tag, </h1> is the end tag, and the content is in between. Each XML document has a root element within which all other elements are nested.
Examples See at: http://www.intranetjournal.com/articles/200402/ij_02_10_04a.html http://prolearn.dcs.warwick.ac.uk/caf/gipfBegIntAdv.xml Search more by yourself and familiarize yourself with the syntax!
XML Attributes XML elements can have attributes. From HTML you will remember this: <IMG SRC="computer.gif"> The SRC attribute provides additional information about the IMG element.
Attributes versus Elements <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> Both are correct. There are no rules on the choice. In XML, using elements is often preferred.
Comments same as in any other languages with line(s) of code whose sole purpose is to provide the developer, and anyone reading the code in the future, information about the code. <!-- all the comments go in here -->
XML declaration Every XML document begins with a declaration (not mandatory, but good practice) <?xml version=“1.0”?> Or, using optional attributes: <?xml version=“1.0” encoding=“UTF-16” standalone=“yes”?> The encoding attribute specifies to the XML parser what character encoding the text is in so that it can read the document and translate it into Unicode (the "all integers language" machines understand). The standalone attribute specifies whether the XML document depends on other, external files. Most of the time, it will be sufficient to accept the defaults and not include these two attributes.
Document Type Definition (DTD) which tags and attributes are allowed, where they can be placed, whether or not they can be nested within a given document and what additional entity definitions are required The document type definition is used for the expression of a schema via a set of declarations that conform to a particular mark-up syntax that describe a class of XML documents, in terms of constraints on the structure of those documents. DTD is native to the XML (and SGML) specifications, and since its release other specification languages such as XML Schema have been released, with additional functionality.
Document Type Declaration (DOCTYPE) <!DOCTYPE MovieCatalog SYSTEM "movie_catalog.dtd"> URL to DTD (external subset via a system identifier) Root document The link to the DTD is usually the second line of an XML document. The declarations in a DTD are actually divided into an internal and an external subset. The internal subset is embedded in the DTD document itself. E.g.: <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> The external subset is a separate file, referenced either via a public identifier or a system identifier (as in your example. A public identifier would be: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > Public and system identifier can be combined as such: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> In this example MovieCatalog is the name of the root document element (we'll discuss this under elements). It is required and links the DTD to the entire element tree. The keyword SYSTEM and the URL that follows allows the document to locate the corresponding DTD file on the same or an external filesystem.
Internal vs External DTD declaration <!DOCTYPE foo [ <!ENTITY greeting "hello"> ]> External, public: <!DOCTYPE html PUBLIC "//W3C//DTD HTML 4.01//EN” > Here is an example of a Document Type Declaration that encapsulates an internal subset consisting of a single entity declaration . All HTML 4.01 documents are expected to conform to one of three SGML DTDs. The public identifiers of these DTDs are constant and are as follows: -//W3C//DTD HTML 4.01//EN http://www.w3.org/TR/html4/strict.dtd -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd -//W3C//DTD HTML 4.01 Frameset//EN http://www.w3.org/TR/html4/frameset.dtd
Valid XML Documents A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD): <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note SYSTEM "InternalNote.dtd"> <note> <to>Tom</to> <from>Jane</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
Validator syntax-check any XML file Also at: http://www.validome.org/xml/validate/ http://www.w3.org/2001/03/webdata/xsv If you want to check your syntax with the DTD specification together, you have to ‘publish’ your DTD. This means, saving your DTD on a server with a public URI, and with public access rights. Say, you save it at http://yourserver.uk/yourdirectory/file.dtd (be careful, it should be readable by all: chmod a+r file.dtd) Than you can refer it from your XML file as: <!DOCTYPE yourrootelement SYSTEM " http://yourserver.uk/yourdirectory/file.dtd"> Try, e.g., the XML file below, and validate it at http://www.w3schools.com/xml/xml_validator.asp: <?xml version="1.0" standalone="no"?> <!DOCTYPE CAF SYSTEM "http://wwwis.win.tue.nl/~acristea/MOT/help/CAF.dtd"> <CAF> <domainmodel> <concept> <name>leveltest</name> <name>Uitrollen</name> <attribute> <name>elaboration</name> <contents> <p>De vier "levels" zijn: "text" (altijd zichtbaar), "explanation" (vanaf het tweede bezoek), "elaboration" (vanaf het derde bezoek), "keywords" (vanaf het vierde).</p> <p>Van het concept dat u nu bekijkt is dit de "elaboration".</p></contents> </attribute> <name>explanation</name> <p>Dit werkt als volgt: Elk concept heeft meerdere attributen die elk pas getoond worden als het aantal bezoeken aan het concept een drempelwaarde overschrijdt. Vanaf het moment dat een attribuut zichtbaar wordt blijft het zichtbaar.</p> <p>De attributen zelf hebben enkel "placeholder" teksten omdat enkel het principe gedemonstreerd wordt.</p></contents> <name>text</name> <p>Dit is een "leveltest". Elk concept heeft teksten die pas zichtbaar worden nadat het concept minimaal een aantal keren bezocht is. Het idee hierachter is dat als een gebruiker een concept meerdere keren bekijkt hij dit waarschijnlijk interessant vindt (en er dus meer over wil weten) of ingewikkeld (en dus extra uitleg nodig heeft).</p> <p>Een vaak bezocht concept zal dus meer en meer detail krijgen.</p> </contents> <name>keywords</name> <contents>root, levels, uitleg</contents> <name>title</name> <contents>Uitrollen</contents> <name>Hoofdstuk 3</name> <contents>Hoofdstuk 3 (2+ bezoek tekst)</contents> <contents>Hoofdstuk 3 basistekst</contents> <contents>Hoofdstuk 3</contents> <contents>Hoofdstuk 3 (3+ bezoek tekst)</contents> <contents>Hoofdstuk 3 keywords</contents> </concept> <name>Hoofdstuk 4</name> <contents>Hoofdstuk 4 (2+ bezoektekst)</contents> <contents>Hoofdstuk 4 (3+ bezoektekst)</contents> <contents>Hoofdstuk 4 keywords</contents> <contents>Hoofdstuk 4 basistekst</contents> <contents>Hoofdstuk 4</contents> <name>Hoofdstuk 2</name> <contents>Hoofdstuk 2 (3+ bezoek tekst)</contents> <contents>Hoofdstuk 2 (2+ bezoek tekst)</contents> <contents>Hoofdstuk 2 basistekst</contents> <contents>Hoofdstuk 2 keywords</contents> <contents>Hoofdstuk 2</contents> <name>Hoofdstuk 1</name> <contents>En een steen is geen ontdekker.</contents> <contents>Een klok is geen wekker.</contents> <contents>Een kip is geen mus. Een stop is geen stekker.</contents> <contents>Hoofdstuk 1 keywords</contents> <contents>Hoofdstuk 1</contents> </domainmodel> <goalmodel> <lesson weight="0" label=""> <link weight="0" label="">leveltest\Uitrollen\title</link> <link weight="4" label="showafter">leveltest\Uitrollen\keywords</link> <link weight="0" label="">leveltest\Uitrollen\text</link> <link weight="2" label="showafter">leveltest\Uitrollen\explanation</link> <link weight="3" label="showafter">leveltest\Uitrollen\elaboration</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 1\title</link> <link weight="4" label="showafter">leveltest\Uitrollen\Hoofdstuk 1\keywords</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 1\text</link> <link weight="2" label="showafter">leveltest\Uitrollen\Hoofdstuk 1\explanation</link> <link weight="3" label="showafter">leveltest\Uitrollen\Hoofdstuk 1\elaboration</link> </lesson> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 2\title</link> <link weight="4" label="showafter">leveltest\Uitrollen\Hoofdstuk 2\keywords</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 2\text</link> <link weight="2" label="showafter">leveltest\Uitrollen\Hoofdstuk 2\explanation</link> <link weight="3" label="showafter">leveltest\Uitrollen\Hoofdstuk 2\elaboration</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 3\title</link> <link weight="4" label="showafter">leveltest\Uitrollen\Hoofdstuk 3\keywords</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 3\text</link> <link weight="2" label="showafter">leveltest\Uitrollen\Hoofdstuk 3\explanation</link> <link weight="3" label="showafter">leveltest\Uitrollen\Hoofdstuk 3\elaboration</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 4\title</link> <link weight="4" label="showafter">leveltest\Uitrollen\Hoofdstuk 4\keywords</link> <link weight="0" label="">leveltest\Uitrollen\Hoofdstuk 4\text</link> <link weight="2" label="showafter">leveltest\Uitrollen\Hoofdstuk 4\explanation</link> <link weight="3" label="showafter">leveltest\Uitrollen\Hoofdstuk 4\elaboration</link> </goalmodel> </CAF>
Internal DTD <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY cs “Craig Stewart”> ]> <note> <to>Tove</to> <from>&cs;</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
External DTD <?xml version="1.0"?> <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY cs “Craig Stewart”> >> saved as file note.dtd The file above is saved as “note.dtd”. To be used, you use the syntax as on slide 45: <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
Character Entities What are they? How would you write an XML element called ‘summary’ for the following data: The result is <17% of the original <summary> The result is <17% of the original </summary> ??
Character Entities Character entities are a way to solve this problem and get around the limitations of computer character sets (old ones) and keyboards. < < > > ' ‘ " “ & & Are all standard XML entities and can be used without fear of compatibility issues.
Numeric Character Reference “A numeric character reference (NCR) is a common markup construct used in SGML and other SGML-based markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set (UCS) of Unicode” Wikipedia Eg: Σ Σ Σ Σ All represent "Σ"
Defined Entities in DTDs Three types: Internal <!ENTITY cs “Craig Stewart”> External <!ENTITY mypicture SYSTEM "pic01.gif" GIF> Parameter For parameterizing the DTD Start with a % not a & Entirely different to other entities http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
XML Schema (XSD) XML Schema is an XML based alternative to DTD. W3C supports an alternative to DTD called XML Schema: http://www.w3.org/XML/Schema XML Schemas are more powerful than DTDs. The XML Schema language is also referred to as XML Schema Definition (XSD). The prediction is that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons: XML Schemas are extensible to future additions XML Schemas are richer and more powerful than DTDs XML Schemas are written in XML XML Schemas support data types XML Schemas support namespaces However, for this module we shall not go into more details about either XML DTDs or XML Schemas.
Displaying your XML Files with CSS? It is possible to use CSS to format an XML document. Example: XML file: The CD catalog style sheet: The CSS file product: The CD catalog formatted with the CSS file Below is a fraction of the XML file. The second line, <?xml-stylesheet type="text/css" href="cd_catalog.css"?>, links the XML file to the CSS file It is possible to use CSS to format an XML document. Cascading Style sheets were initially designed for HTML. The name ‘cascading’ refers to the fact that multiple style sheets can be used to refer to one document. Below is an example of how to use a CSS (cascading) style sheet to format an XML document: Take a look at this XML file: The CD catalog Then look at this style sheet: The CSS file Finally, view: The CD catalog formatted with the CSS file Below is a fraction of the XML file. The second line, <?xml-stylesheet type="text/css" href="cd_catalog.css"?>, links the XML file to the CSS file (please note: A simplified syntax is allowed for stylesheets that consist of only a single template for the root node. The stylesheet may consist of just a literal result element. You will learn later how to write templates for such documents. In order to understand why here you didn’t need the keyword ‘template’, read 2.3 at http://www.w3.org/TR/xslt). OPTIONAL EXERCISE: As an exercise, change the CSS style sheet to format the same XML document with different colours and indentations. As you can see, we have separated viewing information from content in a clear way by representing data with XML instead of (X)HTML. (You can find a color scheme list at: http://www.w3schools.com/css/css_colornames.asp) IMPORTANT EXERCISE: As you can see, the XML file is given without a supporting DTD. This can conflict if we want to keep a strict grammar for future input. As an exercise, write a DTD that supports the syntax given. Validate you’re the syntax of your DTD in a similar way as shown on slide 46 (comments) at http://www.w3schools.com/xml/xml_validator.asp
Displaying XML with XSL XSL is the preferred style sheet language of XML. XSL (the eXtensible Stylesheet Language) is far more sophisticated than CSS. examples: View the XML file, the XSL style sheet, and View the result. <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href=“simple.xsl"?> XSL is the preferred style sheet language of XML. XSL (the eXtensible Stylesheet Language) is far more sophisticated than CSS. One way to use XSL is to transform XML into HTML before it is displayed by the browser as demonstrated in the above examples.
XML Conclusions We have learned: XML history What it is How it works Differences to (X)HTML XML flow XML Rules XML Elements, Relationships, Attributes, Comments Well-formed-ness concept XML supporting frame: XML Schema or DTD Generics on displaying XML
Next we are looking into more specific information about how to display XML and more, with …
XSL
XSL XSL is an XML-based language used for stylesheets that can be used to transform XML documents into other document types and formats. XSL is a family of recommendations for defining XML document transformation and presentation. It consists of three parts. XSL is an XML-based language used for stylesheets that can be used to transform XML documents into other document types and formats. It consists of the transformation language, abbreviated XSLT, and the formatting language called XSL Formatting Objects and the XPath language. See also: http://www.w3.org/Style/XSL/WhatIsXSL.html This is called ‘stylesheets’ because it can be used for display purposes for XML – however, this is not its only use.
XSL parts: XSL Transformations (XSLT) XML Path Language (XPath) a language for transforming XML XML Path Language (XPath) an expression language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification) XSL Formatting Objects (XSL-FO) an XML vocabulary for specifying formatting semantics See for more information: http://www.w3.org/Style/XSL/
Conclusion XSL We have learned: Next: What is XSL What are its parts Not all parts are equally important We look at the most important one …
XSLT
XSLT XPath XSLT became a W3C Recommendation 16. November 1999. most important part of XSL transforms input document (source tree) into a particular way in a specified output document (result tree). built on a structure known as an XSL template: <xsl:template> e.g., <xsl:template match=“/movie/title"> <xsl:value-of select="."/></xsl:template> this selects one/all movies Multiple templates: first the root template; if that doesn’t match, the next, etc. XPath See also: http://www.w3.org/TR/xslt XSLT = XSL Transformations (XML Stylesheet Language Transformation) XSLT is the most important part of XSL. XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this by transforming each XML element into an (X)HTML element. With XSLT you can add/remove elements and attributes to or from the output file. You can also rearrange and sort elements, perform tests and make decisions about which elements to hide and display, and a lot more. A common way to describe the transformation process is to say that XSLT transforms an XML source-tree into an XML result-tree. Usually, a XSLT file contains multiple templates. These are matched against the source tree from the root down (this sets the order of processing).
XSLT Browsers nearly all major browsers support XML and XSLT. Mozilla Firefox v 1.0.2, Firefox has support for XML and XSLT (and CSS). Mozilla XML + CSS. Namespaces. Available with an XSLT implementation. Netscape v 8, uses the Mozilla engine. Opera v 9, XML, XSLT (and CSS). V 8 only XML + CSS. Internet Explorer v 6, XML, Namespaces, CSS, XSLT, and XPath. V 5 NOT ! compatible Nearly all major browsers have support for XML and XSLT. Mozilla Firefox As of version 1.0.2, Firefox has support for XML and XSLT (and CSS). Mozilla Mozilla includes Expat for XML parsing and has support to display XML + CSS. Mozilla also has some support for Namespaces. Mozilla is available with an XSLT implementation. Netscape As of version 8, Netscape uses the Mozilla engine, and therefore it has the same XML / XSLT support as Mozilla. Opera As of version 9, Opera has support for XML and XSLT (and CSS). Version 8 supports only XML + CSS. Internet Explorer As of version 6, Internet Explorer supports XML, Namespaces, CSS, XSLT, and XPath. Version 5 is NOT compatible with the official W3C XSL Recommendation.
XSLT Elements in common use <xsl:stylesheet> used as root element of nearly all XSLT stylesheets. <xsl:stylesheet version="version number" xmlns="path to W3C namespace"> Version number: current number of XSLT specification from W3C. xmlns is the path to the XML Namespace defined by the W3C for the XSLT Transformation language. Currently, that path is http://www.w3.org/1999/XSL/Transform. <xsl:stylesheet> This element is used as the root element of nearly all XSLT stylesheets. Its format is <xsl:stylesheet version="version number" xmlns="path to W3c namespace">. Version number is the current number of the XSLT specification from the W3C. The xmlns is the path to the XML Namespace defined by the W3C for the XSLT Transformation language. Currently, that path is http://www.w3.org/1999/XSL/Transform.
Example Correct Style Sheet Declaration <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:transform version="1.0" Note: <xsl:stylesheet> and <xsl:transform> are completely synonymous and either can be used! However, ‘stylesheet’ is used for backward compatibility with earlier versions, and now-a-days ‘transform’ should be used, as it refers to XSLT specifically (so it is more elegant to use the latter).
Example use of template <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:transform version="1.0“ xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> <h2>My CD Collection</h2> <table border="1"> <tr bgcolor="#9acd32"> <th>Title</th> <th>Artist</th> </tr> <tr> <td>.</td> <td>.</td> </tr> </table> </body> </html> </xsl:template> </xsl:stylesheet> XML: http://www.w3schools.com/xsl/cdcatalog.xml Result:http://www.w3schools.com/xsl/cdcatalog_with_ex1.xml
XSLT Elements in common use (2) <xsl:apply-templates> call other templates repeatedly <xsl:value-of> insert specified content from source tree into result tree <xsl:output> output method to use, e.g., xml, text, or html. <xsl:apply-templates> This element is used within an XSL template to call other templates. <xsl:value-of> As mentioned earlier, this element is used to insert the content specified within the matched element from the source tree into the result tree. <xsl:output> This element allows us to specify the output method we will be using, for example, xml, text, or html. It affords us greater control over the actual output
Pull method: value-of <xsl:value-of select="catalog/cd/title"/> <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0“ xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> <h2>My CD Collection</h2> <table border="1"> <tr bgcolor="#9acd32"> <th>Title</th> <th>Artist</th> </tr> <tr> <td>.</td> <td>.</td> </tr> </table> </body> </html> </xsl:template> </xsl:stylesheet> <xsl:value-of select="catalog/cd/title"/> <xsl:value-of select="catalog/cd/artist"/> If we replace the dots from the previous example with the <XSL:value-of> element, we can “pull” information out of the XML file into the result file. View the XML file, View the XSL file, and View the result
Push method: apply-templates <xsl:template match="/"> <html> <body> <h2>My CD Collection</h2> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="cd"> <p> <xsl:apply-templates select="title"/> <xsl:apply-templates select="artist"/> </p> <xsl:template match="title"> Title: <span style="color:#ff0000"> <xsl:value-of select="."/></span> <br /> <xsl:template match="artist"> Artist: <span style="color:#00ff00"> <xsl:value-of select="."/></span> <br /> View the XML file, View the XSL file, and View the result.
XSLT Elements in common use (3) <xsl:element> dynamically create generic elements during the transformation process <xsl:text> insert some specified text (PCDATA) <xsl:element> If we don't know ahead of time what elements we need to create in the result tree, <xsl:element> allows us to dynamically create elements during the transformation process. <xsl:text> This element simply allows us to insert some specified text (PCDATA) into the output.
XSLT Elements in common use (4) <xsl:if> evaluate an expression; if true, the contents are evaluated. <xsl:if test="boolean expression">Some content or XSL elements here</xsl:if> <xsl:choose> more flexibility: one of any number of choices + default choice Two elements are available in XSLT for conditional processing. These elements allow you to make choices in your stylesheets: <xsl:if> and <xsl:choose>. <xsl:if> This is the simpler of the two elements and allows you to evaluate an expression. If it is true, the contents of the <xsl:if> element are evaluated. The syntax looks as follows: <xsl:if test="boolean expression">Some content or XSL elements here</xsl:if>. <xsl:choose> This element provides more flexibility than the <xsl:if>. With it, we can make one of any number of choices and even have a default choice.
Format choose: <xsl:choose> <xsl:when test="element name[test criteria 1]"> Result number one.</xsl:when> <xsl:when test="element name[test criteria 2]"> Result number two.</xsl:when> <xsl:otherwise>The default result.</xsl:otherwise> </xsl:choose> Example: (The code below will add a pink background-color to the "Artist" column WHEN the price of the CD is higher than 10. ) <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"> <html> <body> <h2>My CD Collection</h2> <table border="1"> <tr bgcolor="#9acd32"> <th>Title</th> <th>Artist</th> </tr> <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <xsl:choose> <xsl:when test="price > 10"> <td bgcolor="#ff00ff"> <xsl:value-of select="artist"/></td> </xsl:when> <xsl:otherwise> <td><xsl:value-of select="artist"/></td> </xsl:otherwise> </xsl:choose> </xsl:for-each> </table> </body> </html> </xsl:template></xsl:stylesheet>
Example: choose <xsl:choose> <xsl:when test="price > 10"> <td bgcolor="#ff00ff"> <xsl:value-of select="artist"/></td> </xsl:when> <xsl:otherwise> <td><xsl:value-of select="artist"/></td> </xsl:otherwise> </xsl:choose> XML file XLS file result
XSLT Elements in common use (5) <xsl:for-each> loop for a set of nodes; to form template within template. <xsl:for-each select="expression"> <xsl:copy-of> take sections of source tree and copy them directly to the result tree <xsl:for-each> When we need to do particular processing for a number of nodes within the source tree, we need to loop. This element allows us to form a template within a template. Syntax looks as follows: <xsl:for-each select="expression">. Every time the expression is satisfied, the template associated with the "for-each" is instantiated. How does our CDCollection display from page 65 change if we use for-each before and after the application of the pull-method (‘value-of select’) as below? <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each> See the CDCatalogue example below for this element in action: http://www.w3schools.com/xsl/xsl_for_each.asp <xsl:copy-of> If you wish to take sections of the source tree and copy them directly to the result tree, you can use this element to do so easily. Simply specify an expression to point to the node(s) required and it will insert the node(s) directly into the result tree along with any attributes and child elements.
Example: for-each <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each> XML file XLS file result
Why an XML Editor? XML Schema to define XML structures and data types XSLT to transform XML data SOAP to exchange XML data between applications WSDL to describe web services RDF to describe web resources XPath and XQuery to access XML data SMIL to define graphics Altova's XMLSpy 30 days free trial http://www.altova.com/products/xmlspy/xsl_xslt_editor.html Today XML is an important technology, and development projects use XML-based technologies like: XML Schema to define XML structures and data types XSLT to transform XML data SOAP to exchange XML data between applications WSDL to describe web services RDF to describe web resources XPath and XQuery to access XML data SMIL to define graphics To be able to write error-free XML documents, you will need an intelligent XML editor!
Conclusions XSLT We have learned: XSLT Definition Correct style sheet declaration XSLT common elements Looked at some examples Browsers and their support for XSLT and more About XML editors for editing XSLT and more
Next: We look at how to access elements and attributes inside the XML (for XSLT and more) This can be done via … XPATH