14 XML and RSS.

14 XML and RSS

Knowing trees, I understand the meaning of patience
Knowing trees, I understand the meaning of patience. Knowing grass, I can appreciate persistence. Hal Borland Like everything metaphysical, the harmony between thought and reality is to be found in the grammar of the language. Ludwig Wittgenstein

I played with an idea, and grew willful; tossed it into the air; transformed it; let it escape and recaptured it; made it iridescent with fancy, and winged it with paradox. Oscar Wilde

OBJECTIVES In this chapter you will learn: To mark up data using XML.
How XML namespaces help provide unique XML element and attribute names. To create DTDs and schemas for specifying and validating the structure of an XML document. To create and use simple XSL style sheets to render XML document data. To retrieve and manipulate XML data programmatically using JavaScript. RSS and how to programmatically apply an XSL transformation to an RSS document using JavaScript.

14.1 Introduction 14.2 XML Basics 14.3 Structuring Data 14.4 XML Namespaces 14.5 Document Type Definitions (DTDs) 14.6 W3C XML Schema Documents 14.7 XML Vocabularies 14.7.1 MathMLTM 14.7.2 Other Markup Languages 14.8 Extensible Stylesheet Language and XSL Transformations 14.9 Document Object Model (DOM) 14.10 RSS 14.11 Wrap-Up 14.12 Web Resources

14.1 Introduction XML is a portable, widely supported, open (i.e., nonproprietary) technology for data storage and exchange

14.2 XML Basics XML documents are readable by both humans and machines
XML permits document authors to create custom markup for any type of information Can create entirely new markup languages that describe specific types of data, including mathematical formulas, chemical molecular structures, music and recipes An XML parser is responsible for identifying components of XML documents (typically files with the .xml extension) and then storing those components in a data structure for manipulation An XML document can optionally reference a Document Type Definition (DTD) or schema that defines the XML document’s structure An XML document that conforms to a DTD/schema (i.e., has the appropriate structure) is valid If an XML parser (validating or nonvalidating) can process an XML document successfully, that XML document is well-formed

Outline Start tags and end tags enclose data or other elements
player.xml The root element contains all other elements in the document

Software Engineering Observation 14.1
DTDs and schemas are essential for business-to-business (B2B) transactions and mission-critical systems. Validating XML documents ensures that disparate systems can manipulate data structured in standardized ways and prevents errors caused by missing or malformed data.

14.3 Structuring Data An XML document begins with an optional XML declaration, which identifies the document as an XML document. The version attribute specifies the version of XML syntax used in the document. XML comments begin with  An XML document contains text that represents its content (i.e., data) and elements that specify its structure. XML documents delimit an element with start and end tags The root element of an XML document encompasses all its other elements XML element names can be of any length and can contain letters, digits, underscores, hyphens and periods Must begin with either a letter or an underscore, and they should not begin with “xml” in any combination of uppercase and lowercase letters, as this is reserved for use in the XML standards

14.3 Structuring Data (Cont.)
When a user loads an XML document in a browser, a parser parses the document, and the browser uses a style sheet to format the data for display IE and Firefox each display minus (–) or plus (+) signs next to all container elements. A minus sign indicates that all child elements are being displayed. When clicked, a minus sign becomes a plus sign (which collapses the container element and hides all the children), and vice versa Data can be placed between tags or in attributes (name/value pairs that appear within the angle brackets of start tags). Elements can have any number of attributes

Outline The author element is a containing element because it has child elements article.xml The name elements are nested within the author element

Portability Tip 14.1 Documents should include the XML declaration to identify the version of XML used. A document that lacks an XML declaration might be assumed to conform to the latest version of XML—when it does not, errors could result.

Common Programming Error 14.1
Placing any characters, including white space, before the XML declaration is an error.

In an XML document, each start tag must have a matching end tag; omitting either tag is an error. Soon, you will learn how such errors are detected.

XML is case sensitive. Using different cases for the start tag and end tag names for the same element is a syntax error.

Using a white-space character in an XML element name is an error.

Good Programming Practice 14.1
XML element names should be meaningful to humans and should not use abbreviations.

Nesting XML tags improperly is a syntax error. For example, <x><y>hello</x></y> is an error, because the </y> tag must precede the </x> tag.

Fig. 14.3 | article.xml displayed by Internet Explorer 7 and Firefox 2. (Part 1 of 3.)

Outline The DOCTYPE specifies an external DTD in the file letter.dtd
letter.xml (1 of 2) Data can be stored as attributes, which appear in an element’s start tag flag is an empty element because it contains no child elements or content

Outline letter.xml (2 of 2)

Error-Prevention Tip 14.1 An XML document is not required to reference a DTD, but validating XML parsers can use a DTD to ensure that the document has the proper structure.

Portability Tip 14.2 Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD.

Fig. 14.5 | Validating an XML document with Microsoft’s XML Validator.

Fig. 14.6 | Validation result using Microsoft’s XML Validator.

Failure to enclose attribute values in double ("") or single ('') quotes is a syntax error.

14.4 Namespaces XML namespaces provide a means for document authors to prevent naming collisions Each namespace prefix is bound to a uniform resource identifier (URI) that uniquely identifies the namespace A URI is a series of characters that differentiate names Document authors create their own namespace prefixes Any name can be used as a namespace prefix, but the namespace prefix xml is reserved for use in XML standards To eliminate the need to place a namespace prefix in each element, authors can specify a default namespace for an element and its children We declare a default namespace using keyword xmlns with a URI (Uniform Resource Identifier) as its value Document authors commonly use URLs (Uniform Resource Locators) for URIs, because domain names (e.g., deitel.com) in URLs must be unique

Attempting to create a namespace prefix named xml in any mixture of uppercase and lowercase letters is a syntax error—the xml namespace prefix is reserved for internal use by XML itself.

Outline Two namespaces are specified using URNs
namespace.xml Two namespaces are specified using URNs The namespace prefixes are used in element names throughout the document

Outline The default namespace is set in the directory element
defaultnamespace .xml Elements with no namespace prefix use the default namespace

14.5 Document Type Definitions (DTDs)
DTDs and schemas specify documents’ element types and attributes, and their relationships to one another DTDs and schemas enable an XML parser to verify whether an XML document is valid (i.e., its elements contain the proper attributes and appear in the proper sequence) A DTD expresses the set of rules for document structure using an EBNF (Extended Backus-Naur Form) grammar In a DTD, an ELEMENT element type declaration defines the rules for an element. An ATTLIST attribute-list declaration defines attributes for a particular element

XML documents can have many different structures, and for this reason an application cannot be certain whether a particular document it receives is complete, ordered properly, and not missing data. DTDs and schemas (Section 14.6) solve this problem by providing an extensible way to describe XML document structure. Applications should use DTDs or schemas to confirm whether XML documents are valid.

Many organizations and individuals are creating DTDs and schemas for a broad range of applications. These collections—called repositories—are available free for download from the web (e.g.,

Outline Define the requirements for the letter element
letter.dtd Define the requirements for the contact element A contact element may have a type attribute, but it is not required Each of these elements contains parsed character data The flag element must be empty and its gender attribute must be set to either M or F. If there is no gender attribute, gender defaults to M

For documents validated with DTDs, any document that uses elements, attributes or nesting relationships not explicitly defined by a DTD is an invalid document.

DTD syntax cannot describe an element’s or attribute’s data type. For example, a DTD cannot specify that a particular element or attribute can contain only integer data.

Using markup characters (e.g., <, > and &) in parsed character data is an error. Use character entity references (e.g., <, > and &) instead.

Fig. 14.10 | XML Validator displaying an error message.

14.6 W3C XML Schema Documents
Unlike DTDs Schemas use use XML syntax not EBNF grammar XML Schema documents can specify what type of data (e.g., numeric, text) an element can contain An XML document that conforms to a schema document is schema valid Two categories of types exist in XML Schema: simple types and complex types Simple types cannot contain attributes or child elements; complex types can Every simple type defines a restriction on an XML Schema-defined schema type or on a user-defined type Complex types can have either simple content or complex content Both can contain attributes, but only complex content can contain child elements Whereas complex types with simple content must extend or restrict some other existing type, complex types with complex content do not have this limitation

Outline book.xml

Outline Specify the namespace of the elements that this schema defines
book.xsd Specify the namespace of the elements that this schema defines Define the books element Define the requirements for any element of type BooksType An element of type BooksType must contian one or more book elements, which have type SingleBookType A SingleBookType element has a title element, which contains a string

Portability Tip 14.3 W3C XML Schema authors specify URI when referring to the XML Schema namespace. This namespace contains predefined elements that comprise the XML Schema vocabulary. Specifying this URI ensures that validation tools correctly identify XML Schema elements and do not confuse them with those defined by document authors.

Fig. 14.13 | Some XML Schema types. (Part 1 of 3.)

Outline computer.xsd (1 of 2) Define a simpleType that contains a decimal whose value is 2.1 or greater Define a complexType with simpleContent so that it can contain only attributes, not child elements The CPU element’s data must be of type string, and it must have an attribute model containing a string

Outline computer.xsd (2 of 2) The types defined in the last slide are used by these elements The all element specifies a list of elements that must be included, in any order, in the document

Outline laptop.xml

14.7 XML Vocabularies Some XML vocabularies
MathML (Mathematical Markup Language) Scalable Vector Graphics (SVG) Wireless Markup Language (WML) Extensible Business Reporting Language (XBRL) Extensible User Interface Language (XUL) Product Data Markup Language (PDML) W3C XML Schema Extensible Stylesheet Language (XSL) MathML markup describes mathematical expressions for display Divided into two types of markup—content markup and presentation markup Content MathML allows programmers to write mathematical notation specific to different areas of mathematics Presentation MathML is directed toward formatting and displaying mathematical notation By convention, MathML files end with the .mml filename extension

14.7 XML Vocabularies (Cont.)
MathML document root node is the math element Default namespace is mn element marks up a number mo element marks up an operator Entity reference ⁢ indicates a multiplication operation without explicit symbolic representation msup element represents a superscript has two children—the expression to be superscripted (i.e., the base) and the superscript (i.e., the exponent) Correspondingly, the msub element represents a subscript To display variables, use identifier element mi

14.7 XML Vocabularies (Cont.)
mfrac element displays a fraction If either the numerator or the denominator contains more than one element, it must appear in an mrow element mrow element groups elements that are positioned horizontally in an expression Entity reference ∫ represents the integral symbol msubsup element specifies the subscript and superscript of a symbol Requires three child elements—an operator, the subscript expression and the superscript expression msqrt element represents a square-root expression Entity reference δ represents a lowercase delta symbol

Outline mathml1.mml The math element contains number and operator elements that represent the equation = 5

Outline ⁢ represents an implied multiplication operator
mathml2.html ⁢ represents an implied multiplication operator The msup element contains two elements: a base and a superscript The mfrac element contains two elements: a numerator and a denominator

Outline (1 of 2) ∫ represents the integral symbol
mathml3.html (1 of 2) ∫ represents the integral symbol mrow groups elements horizontally in an expression msqrt puts its contents underneath a square root symbol

Outline δ represents the delta symbol mathml3.html (2 of 2)

Fig. 14.19 | Various markup languages derived from XML. (Part 1 of 2.)

Fig. 14.19 | Various markup languages derived from XML. (Part 2 of 2.)

14.8 Extensible Stylehsheet Language and XSL Transformations
Convert XML into any text-based document XSL documents have the extension .xsl XPath A string-based language of expressions used by XML and many of its related technologies for effectively and efficiently locating structures and data (such as specific elements and attributes) in XML documents Used to locate parts of the source-tree document that match templates defined in an XSL style sheet. When a match occurs (i.e., a node matches a template), the matching template executes and adds its result to the result tree. When there are no more matches, XSLT has transformed the source tree into the result tree. XSLT does not analyze every node of the source tree it selectively navigates the source tree using XPath’s select and match attributes For XSLT to function, the source tree must be properly structured Schemas, DTDs and validating parsers can validate document structure before using XPath and XSLTs XSL style sheets can be connected directly to an XML document by adding an xml:stylesheet processing instruction to the XML document

14.8 Extensible Stylehsheet Language and XSL Transformations (Cont.)
Two tree structures are involved in transforming an XML document using XSLT source tree (the document being transformed) result tree (the result of the transformation) XPath character / (a forward slash) Selects the document root In XPath, a leading forward slash specifies that we are using absolute addressing An XPath expression with no beginning forward slash uses relative addressing XSL element value-of Retrieves an attribute’s value symbol specifies an attribute node XSL node-set function name Retrieves the current node’s element name XSL node-set function text Retrieves the text between an element’s start and end tags The XPath expression //* Selects all the nodes in an XML document

Outline The xml-stylesheet declaration points to an XSL style sheet for this document sports.xml (1 of 2)

Outline sports.xml (2 of 2)

XSL enables document authors to separate data presentation (specified in XSL documents) from data description (specified in XML documents).

You will sometimes see the XML processing instruction <?xml-stylesheet?> written as <?xml:stylesheet?> with a colon rather than a dash. The version with a colon results in an XML parsing error in Firefox.

Outline sports.xsl (1 of 2) Use xsl-output to write a doctype.

Outline Write the following HTML for each game element in the sports element that is contained in the root element sports.xsl (2 of 2) Write the value of the game’s id attribute in a table cell Write the value of the game’s name child element in a table cell Write the value of the game’s paragraph child element in a table cell

Outline sorting.xml The chapters are out of order here, but our XSL transformation will sort the data before displaying it

Outline sorting.xsl (1 of 4) Apply the templates in this document to the document root’s child nodes Set the XHTML document’s title to show the book’s ISBN number and title

Outline sorting.xsl (2 of 4) Output the enclosed XHTML for each element in the frontmatter element, which is contained in the chapters element Sort the chapter elements by the number contained in their number attribute in ascending order

Outline sorting.xsl (3 of 4) Sort the appendices alphabetically by their number attributes in ascending order

Outline sorting.xsl (4 of 4)

Fig. 14.24 | XSL style-sheet elements. (Part 1 of 2.)

Fig. 14.24 | XSL style-sheet elements. (Part 2 of 2.)

14.9 Document Object Model Retrieving data from an XML document using traditional sequential file processing techniques is neither practical nor efficient Some XML parsers store document data as tree structures in memory This hierarchical tree structure is called a Document Object Model (DOM) tree, and an XML parser that creates this type of structure is known as a DOM parser Each element name is represented by a node A node that contains other nodes is called a parent node A parent node can have many children, but a child node can have only one parent node Nodes that are peers are called sibling nodes A node’s descendant nodes include its children, its children’s children and so on A node’s ancestor nodes include its parent, its parent’s parent and so on

14.9 Document Object Model (Cont.)
Many of the XML DOM capabilities are similar or identical to those of the XHTML DOM The DOM tree has a single root node, which contains all the other nodes in the document window.ActiveXObject If this object exists, the browser is Internet Explorer Loads Microsoft’s MSXML parser is used to manipulate XML documents in Internet Explorer MSXML load method loads an XML document childNodes property of a document contains a list of the XML document’s top-level nodes If the browser is Firefox 2, then the document object’s implementation property and the implementation property’s createDocument method will exist Firefox loads each XML document asynchronously You must use the XML document’s onload property to specify a function to call when the document finishes loading to ensure that you can access the document’s contents nodeType property of a node contains the type of the node

Nonbreaking spaces ( ) spaces that the browser is not allowed to collapse or that can be used to keep words together. nodeName property of a node Obtain the name of an element childNodes list of a node Nonzero if the currrent node has children nodeValue property Returns the value of an element firstChild property of a node Refers to the first child of a given node lastChild property of a node refers to the last child of a given node nextSibling property of a node refers to the next sibling in a list of children of a particular node. previousSibling property of a node refers to the current node’s previous sibling parentNode property of a node refers to the current node’s parent node

Use XPath expressions to specify search criteria In IE7, the XML document object’s selectNodes method receives an XPath expression as an argument and returns a collection of elements that match the expression Firefox 2 searches for XPath matches using the XML document object’s evaluate method, which receives five arguments the XPath expression the document to apply the expression to a namespace resolver a result type an XPathResult object into which to place the results If the last argument is null, the function simply returns a new XPathResult object containing the matches The namespace resolver argument can be null if you are not using XML namespace prefixes in the XPath processing

Fig. 14.25 | Tree structure for the document article.xml of Fig. 14.2.

Outline XMLDOMTraversal .html (1 of 13) Load the document using the IE-specific ActiveXObject

Outline Load the document using other browsers (2 of 13)
XMLDOMTraversal .html (2 of 13) In either case, call buildHTML and displayDoc methods once the document has loaded

Outline Loop through children and decide what to do depending on nodeType XMLDOMTraversal .html (3 of 13) If the node is an element, display its name If the node has children, call buildHTML recursively to handle the children If a text node or comment node is not simply indentation white space, add its value to the outputHTML

Outline (4 of 13) Insert three spaces for each level of indentation
XMLDOMTraversal .html (4 of 13) Insert three spaces for each level of indentation

Outline XMLDOMTraversal .html (5 of 13) Check that the first child of the current node exists, then select and highlight it Select the next sibling of the current node, if it exists

Outline XMLDOMTraversal .html (6 of 13) Select the previous sibling of the current node, if it exists Check that the last child of the current node exists, then select and highlight it

Outline (7 of 13) Select the parent of the current node
XMLDOMTraversal .html (7 of 13) Select the parent of the current node Method to highlight or remove highlighting from elements

Outline XMLDOMTraversal .html (8 of 13)

Attempting to process the contents of a dynamically loaded XML document in Firefox before the document’s onload event fires is a logic error. The document’s contents are not available until the onload event fires.

Portability Tip 14.4 Firefox’s XML parser does not ignore white space used for indentation in XML documents. Instead, it creates text nodes containing the white-space characters.

Fig. 14.27 | Common Node properties and methods. (Part 1 of 2.)

Fig. 14.27 | Common Node properties and methods. (Part 2 of 2.)

Fig. 14.28 | NodeList property and method.

Fig. 14.29 | Document properties and methods.

Fig. 14.30 | Element property and methods.

Fig | Attr properties.

Fig | Text methods.

Outline xpath.html (1 of 5)

Outline xpath.html (3 of 5) IE7 uses the document object’s selectNodes method to select nodes using an XPath. Other browsers use the document object’s evaluate method.

Outline sports.xml

Fig. 14.35 | XPath expressions and descriptions.

14.10 RSS RSS stands RDF (Resource Description Framework) Site Summary Also known as Rich Site Summary and Really Simple Syndication An XML format used to syndicate simple website content news articles, blog entries, product reviews, podcasts, vodcasts and more RSS feed contains rss root element with a version attribute channel child element with item subelements Depending on the RSS version, the channel and item elements have certain required and optional child elements item elements provide the feed subscriber with a link to a web page or file, a title and description of the page or file Enables website developers to draw more traffic. Required child elements of channel in RSS 2.0 Description link Title Required child elements of item in RSS 2.0 title or description

14.10 RSS (Cont.) RSS aggregator
keeps tracks of many RSS feeds brings together information from the separate feeds Many sites provide RSS feed validators validator.w3.org/feed feedvalidator.org The DOM and XSL can be used to create RSS aggregators A simple RSS aggregator uses an XSL style sheet to format RSS feeds as XHTML MSXML’s built-in XSLT capabilities include method transformNode to apply an XSLT transformation Invoked on an RSS document object and receives the XSL document object as an argument Firefox provides built-in XSLT processing in the form of the XSLTProcessor object After creating this object, use its importStylesheet method to specify the XSL style sheet you’d like to apply Apply the transformation by invoking the XSLTProcessor object’s transformToFragment method, which returns a document fragment

Fig. 14.36 | channel elements and descriptions. (Part 1 of 2.)

Fig. 14.36 | channel elements and descriptions. (Part 2 of 2.)

Fig. 14.37 | item elements and descriptions.

Outline RssViewer.html (1 of 5) Because of browser incompatibilities, our code needs to determine which browser it is running on.

Outline (2 of 5) IE7 downloads and loads the documents synchronously
RssViewer.html (2 of 5) IE7 downloads and loads the documents synchronously Firefox loads documents asynchronously, so use the onload event to make the events happen in the correct sequence

Outline RssViewer.html (3 of 5)

Outline RssViewer.html (4 of 5) Separate methods are used to apply the XSLT for IE and other browsers

Outline RssViewer.html (5 of 5)

Outline deitel-20.xml (1 of 2)

Outline deitel-20.xml (2 of 2)

14 XML and RSS.

Similar presentations

Presentation on theme: "14 XML and RSS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

14 XML and RSS.

Similar presentations

Presentation on theme: "14 XML and RSS."— Presentation transcript:

Similar presentations

About project

Feedback