Download presentation
Presentation is loading. Please wait.
Published bySuzanna Tucker Modified over 8 years ago
1
1 XML and RSS
2
2 Knowing trees, I understand the meaning of patience. Knowing grass, I can appreciate persistence. — Hal Borland Like everything metaphysical, the harmony between thought and reality is to be found in the grammar of the language. — Ludwig Wittgenstein
3
3 I played with an idea, and grew willful; tossed it into the air; transformed it; let it escape and recaptured it; made it iridescent with fancy, and winged it with paradox. — Oscar Wilde
4
4 OBJECTIVES In this chapter you will learn: To mark up data using XML. How XML namespaces help provide unique XML element and attribute names. To create DTDs and schemas for specifying and validating the structure of an XML document. To create and use simple XSL style sheets to render XML document data. To retrieve and manipulate XML data programmatically using JavaScript. RSS and how to programmatically apply an XSL transformation to an RSS document using JavaScript.
5
5 14.1 Introduction 14.2 XML Basics 14.3 Structuring Data 14.4 XML Namespaces 14.5 Document Type Definitions (DTDs) 14.6 W3C XML Schema Documents 14.7 XML Vocabularies 14.7.1 MathML TM 14.7.2 Other Markup Languages 14.8 Extensible Stylesheet Language and XSL Transformations 14.9 Document Object Model (DOM) 14.10 RSS 14.11 Wrap-Up 14.12 Web Resources
6
6 Introduction XML is a portable, widely supported, open (i.e., nonproprietary) technology for data storage and exchange
7
7 XML XML Basics XML documents are readable by both humans and machines XML permits document authors to create custom markup for any type of information – Can create entirely new markup languages that describe specific types of data, including mathematical formulas, chemical molecular structures, music and recipes An XML parser is responsible for identifying components of XML documents (typically files with the.xml extension) and then storing those components in a data structure for manipulation – If an XML parser (validating or nonvalidating) can process an XML document successfully, that XML document is well-formed An XML document can optionally reference a Document Type Definition (DTD) or schema that defines the XML document’s structure – An XML document that conforms to a DTD/schema (i.e., has the appropriate structure) is valid – DTDs and schemas are essential for business-to-business (B2B) transactions and mission-critical systems. – Validating XML documents ensures that disparate systems can manipulate data structured in standardized ways and prevents errors caused by missing or malformed data
8
8 Structuring Data An XML document – begins with an optional XML declaration, which identifies the document as an XML document. The version attribute specifies the version of XML syntax used in the document. Omit version means document conform to the latest version of XML – XML comments begin with – An XML document contains text that represents its content (i.e., data) and elements that specify its structure. XML documents delimit an element with start and end tags – The root element of an XML document encompasses all its other elements player.xml Start tags and end tags enclose data or other elements The root element contains all other elements in the document
9
9 Structuring Data (Cont.) XML element names can be of any length and can contain letters, digits, underscores, hyphens and periods – Must begin with either a letter or an underscore, and they should not begin with “ xml ” in any combination of uppercase and lowercase letters, as this is reserved for use in the XML standards When a user loads an XML document in a browser, a parser parses the document, and the browser uses a style sheet to format the data for display – IE and Firefox each display minus ( – ) or plus ( + ) signs next to all container elements. A minus sign indicates that all child elements are being displayed. When clicked, a minus sign becomes a plus sign (which collapses the container element and hides all the children), and vice versa (name/value pairs that appear within the angle brackets of start tags). Data can be placed between tags or in attributes – Elements can have any number of attributes
10
10 article.xml The name elements are nested within the author element The author element is a containing element because it has child elements Placing any characters, including white space, before the XML declaration is an error. XML is case sensitive. Using different cases for the start tag and end tag names for the same element is a syntax error. Using a white-space character in an XML element name is an error.
11
11 letter.xml The DOCTYPE specifies an external DTD in the file letter.dtd Data can be stored as attributes, which appear in an element’s start tag flag is an empty element because it contains no child elements or content Failure to enclose attribute values in double ("") or single ('') quotes is a syntax error.
12
12 An XML document is not required to reference a DTD, but validating XML parsers can use a DTD to ensure that the document has the proper structure. Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD. An XML parser converts an XML document into an XML DOM object - which can then be manipulated with a JavaScript.
13
13 Structuring Data (Cont.) Tove Jani Reminder Don't forget Note.xsd Tove Jani Reminder Don't forget Note.dtd
14
14 Well-Formed & Valid Document A "Well Formed" XML document has correct XML syntax.: XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must be quoted A Valid XML Document based on XML Schema: defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes
15
15 Fig. 14.5 | Validating an XML document with Microsoft’s XML Validator.
16
16 Fig. 14.6 | Validation result using Microsoft’s XML Validator.
17
17 Namespaces XML namespaces provide a means for document authors to prevent naming collisions. –In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications. The namespace is defined by the xmlns attribute in the start tag of an element ( any one or the XML root element). Children take namespace. The namespace declaration has the following syntax. xmlns:prefix="URI". Each namespace prefix is bound to a uniform resource identifier (URI) that uniquely identifies the namespace –A URI is a series of characters that differentiate names –Document authors create their own namespace prefixes.They use URLs for URIs, because domain names in URLs must be unique –Any name can be used as a namespace prefix, but the namespace prefix xml is reserved for use in XML standards To eliminate the need to place a namespace prefix in each element, authors can specify a default namespace for an element and its children –We declare a default namespace using keyword xmlns with a URI (Uniform Resource Identifier) as its value-> xmlns="namespaceURI" NO prefix named xml
18
18 namespace.xml Two namespaces are specified using URNs The namespace prefixes are used in element names throughout the document
19
19 defaultnamespace.xml The default namespace is set in the directory element Elements with no namespace prefix use the default namespace
20
20 Document Type Definitions (DTDs)DTDs DTDs and schemas – provide an extensible way to describe XML document structure. Applications should use DTDs or schemas to confirm whether XML documents are valid. – specify documents’ element types and attributes, and their relationships to one another – enable an XML parser to verify whether an XML document is valid (i.e., its elements contain the proper attributes and appear in the proper sequence) Many organizations and individuals are creating DTDs and schemas for a broad range of applications. These collections— called repositories—are available free for download from the web (e.g., www.xml.org, www.oasis-open.org ). www.oasis-open.org
21
21 Document Type Definitions (DTDs) The purpose of a DTD (Document Type Definition) is to define the legal building blocks (Elements, Attributes, Entities, PCDATA, CDATA) of an XML document. Entities are some characters that have special meaning in XML, like the less than sign (, &, ‘, “) if salary < 1000 then PCDATA is text that WILL be parsed by a parser. CDATA is text that WILL NOT be parsed by a parser. A DTD defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference. – Internal syntax: – External syntax: A DTD expresses the set of rules for document structure using an EBNF (Extended Backus-Naur Form) grammar
22
22 Document Type Definitions (DTDs) – An ELEMENT element type declaration defines the rules for an element. – category can be EMPTY, ANY,.. – Element-content can be # PCDATA, a child, sequence of children – An ATTLIST attribute-list declaration defines attributes for a particular element example: will match XML attribute of element payment: DTD syntax cannot describe an element’s or attribute’s data type. For example, a DTD cannot specify that a particular element or attribute can contain only integer data. Using markup characters (e.g., and & ) in parsed character data is an error. Use character entity references (e.g., <, > and & ) instead.
23
23 letter.dtd Define the requirements for the letter element Define the requirements for the contact element A contact element may have a type attribute, but it is not required Each of these elements contains parsed character data The flag element must be empty and its gender attribute must be set to either M or F. If there is no gender attribute, gender defaults to M
24
24 W3C XML Schema DocumentsXML Schema Like DTD Scheme describes document structure but unlike DTDs – Schemas use XML syntax not EBNF grammar – XML Schema documents can specify what type of data (e.g., numeric, text) an element can contain An XML document that conforms to a schema document is schema valid Two categories of types exist in XML Schema: simple types and complex types – Simple types cannot contain attributes or child elements; complex types can Every simple type defines a restriction on an XML Schema-defined schema type or on a user-defined type Complex types can have either simple content or complex content – Both can contain attributes, but only complex content can contain child elements Whereas complex types with simple content must extend or restrict some other existing type, complex types with complex content do not have this limitation
25
25 book.xml
26
26 book.xsd Specify the namespace of the elements that this schema defines Define the books element Define the requirements for any element of type BooksType An element of type BooksType must contian one or more book elements, which have type SingleBookType A SingleBookType element has a title element, which contains a string
27
27 Portability Tip 14.3 W3C XML Schema authors specify URI http://www.w3.org/2001/XMLSchema when referring to the XML Schema namespace. This namespace contains predefined elements that comprise the XML Schema vocabulary. Specifying this URI ensures that validation tools correctly identify XML Schema elements and do not confuse them with those defined by document authors.
28
28 Fig. 14.13 | Some XML Schema types. (Part 1 of 3.)
29
29 Fig. 14.13 | Some XML Schema types. (Part 2 of 3.)
30
30 Fig. 14.13 | Some XML Schema types. (Part 3 of 3.)
31
31 computer.xsd Define a simpleType that contains a decimal whose value is 2.1 or greater Define a complexType with simpleContent so that it can contain only attributes, not child elements The CPU element’s data must be of type string, and it must have an attribute model containing a string laptop.xml
32
32 computer.xsd The all element specifies a list of elements that must be included, in any order, in the document The types defined in the last slide are used by these elements laptop.xml
33
33 Document Object Model Retrieving data from an XML document using traditional sequential file processing techniques is neither practical nor efficient Some XML parsers store document data as tree structures in memory – This hierarchical tree structure is called a Document Object Model (DOM) tree, and an XML parser that creates this type of structure is known as a DOM parser – Each element name is represented by a node – A node that contains other nodes is called a parent node – A parent node can have many children, but a child node can have only one parent node – Nodes that are peers are called sibling nodes – A node’s descendant nodes include its children, its children’s children and so on – A node’s ancestor nodes include its parent, its parent’s parent and so on
34
34 Document Object Model Everyday Italian Giada De Laurentiis 2005 30.00 Harry Potter J K. Rowling 2005 29.99 Learning XML Erik T. Ray 2003 39.95 Named ConstantNode Type Named ConstantNode Type TEXT_NODE3ELEMENT_NODE1 COMMENT_NODE8ATTRIBUTE_NODE2
35
35 14.9 Document Object Model (Cont.) Many of the XML DOM capabilities are similar or identical to those of the XHTML DOM The DOM tree has a single root node, which contains all the other nodes in the document If the browser is IE – window.ActiveXObject If this object exists, the browser is Internet Explorer Loads Microsoft’s MSXML parser which is used to manipulate XML documents in Internet Explorer – Doc.l oad method loads an XML document If the browser is Firefox 2, then the document object’s implementation property and the implementation property’s createDocument method will exist Firefox loads each XML document asynchronously – You must use the XML document’s onload property to specify a function to call when the document finishes loading to ensure that you can access the document’s contents
36
36 14.9 Document Object Model (Cont.) Nonbreaking spaces ( ) – spaces that the browser is not allowed to collapse or that can be used to keep words together. nodeName property of a node – Obtain the name of an element childNodes list of a node – Nonzero if the currrent node has children nodeValue property – Returns the value of an element firstChild property of a node – Refers to the first child of a given node lastChild property of a node – refers to the last child of a given node nextSibling property of a node – refers to the next sibling in a list of children of a particular node. previousSibling property of a node – refers to the current node’s previous sibling parentNode property of a node – refers to the current node’s parent node
37
37 Tree structure for the document article.xml.
38
38 XMLDOMTraversal.html Load the document using the IE-specific ActiveXObject artical.xml
39
39 Load the document using other browsers In either case, call buildHTML and displayDoc methods once the document has loaded
40
40 Loop through children and decide what to do depending on nodeType If the node is an element, display its name If the node has children, call buildHTML recursively to handle the children If a text node or comment node is not simply indentation white space, add its value to the outputHTML “ <div id = \“id “ + idCount + “ \”> “
41
41 Insert three spaces for each level of indentation
42
42 Check that the first child of the current node exists, then select and highlight it Select the next sibling of the current node, if it exists
43
43 Check that the last child of the current node exists, then select and highlight it Select the previous sibling of the current node, if it exists
44
44 Select the parent of the current node Method to highlight or remove highlighting from elements
45
45
46
46
47
47
48
48
49
49
50
50 Attempting to process the contents of a dynamically loaded XML document in Firefox before the document’s onload event fires is a logic error. The document’s contents are not available until the onload event fires. Firefox’s XML parser does not ignore white space used for indentation in XML documents. Instead, it creates text nodes containing the white-space characters.
51
51 Common Node properties and methods. (Part 1 of 2.)
52
52 Common Node properties and methods. (Part 2 of 2.)
53
53 NodeList property and method.
54
54 Document properties and methods.
55
55 Element property and methods.
56
56 Attr properties. Text methods.
57
57 XPATH XPath –A string-based language of expressions used by XML and many of its related technologies for effectively and efficiently locating structures and data (such as specific elements and attributes) in XML documents XPath character / (a forward slash) –Selects the document root –In XPath, a leading forward slash specifies that we are using absolute addressing –An XPath expression with no beginning forward slash uses relative addressing The XPath expression //* –Selects all the nodes in an XML document
58
58 XPath is used to navigate through elements and attributes in an XML document –XPath is a syntax for defining parts of an XML document –XPath uses path expressions to navigate in XML documents –XPath contains a library of standard functions –XPath is a major element in XSLT Similar to file system XPath expressions specify search criteria & select nodes or node-sets in an XML document: Nodename- Selects all child nodes of the named node / - Selects from the root node // - Selects nodes in the document from current node that match selection no matter where they are . - Selects the current node .. - Selects the parent of the current node @ - Selects attributes XPATH (Contd.)
59
59 XPath expressions and descriptions. sports.xmlxpath.html
60
60 XPath expressions are defined differently in brwosers –In IE7, the XML document object’s selectNodes method receives an XPath expression as an argument and returns a collection of elements that match expression –Firefox 2 searches for XPath matches using the XML document object’s evaluate method, which receives five arguments the XPath expression the document to apply the expression to a namespace resolver a result type an XPathResult object into which to place the results If the last argument is null, the function simply returns a new XPathResult object containing the matches The namespace resolver argument can be null if you are not using XML namespace prefixes in the XPath processing XPATH (Contd.)
61
61 xpath.html sports.xml
62
62 xpath.html
63
63 xpath.html IE7 uses the document object’s selectNodes method to select nodes using an XPath. Other browsers use the document object’s evaluate method. sports.xml
64
64 xpath.html
65
65 xpath.html
66
66 sports.xml
67
67 Extensible Stylehsheet Language and XSL Transformations XSL is more than a Style Sheet Language. XSL consists of three parts: –XSLT - a language for transforming XML documents into any text- based document. –XPath - a language for navigating in XML documents. It is used to navigate through elements and attributes in XML documents. –XSL-FO - a language for formatting XML documents With XSLT you can –Transform XML source-tree into an XML result-tree element. For example, Transform each XML element into an (X)HTML element. –Add/Remove elements and attributes to or from the output file. You can also rearrange and sort elements, perform tests and make decisions about which elements to hide and display, and a lot more. XSLT does not analyze every node of the source tree –it selectively navigates the source tree using XPath’s select and match attributes. It finds information in an XML document. XSL style sheets can be connected directly to an XML document by adding an xml:stylesheet processing instruction to the XML document. XSL documents have the extension.xsl.
68
68 Extensible Stylehsheet Language and XSL Transformations (Cont.) Two tree structures are involved in transforming an XML document using XSLT –source tree (the document being transformed) –result tree (the result of the transformation) An XSL style sheet consists of one or more set of rules that are called templates. –A template contains rules to apply when a specified node is matched. –The element is used to build templates. –The match attribute is used to associate a template with an XML element. XPATH expression is used to locate parts of the source-tree document that match templates defined in an XSL style sheet. –When a match occurs (i.e., a node matches a template), the matching template executes and adds its result to the result tree. –When there are no more matches, XSLT has transformed the source tree into the result tree.
69
69 Extensible Stylehsheet Language and XSL Transformations (Cont.) match="/" defines the whole document. "/" is an XPATH exp. The element is used to extract the value of a selected node. – –The @ symbol specifies an attribute node The element allows you to do looping in XSLT.
70
70 Extensible Stylehsheet Language and XSL Transformations (Cont.) The element is used to sort the output. XSL node-set function name () –Retrieves the current node’s element name XSL node-set function text () –Retrieves the text between an element’s start and end tags Legal filter operators are: –= (equal) –!= (not equal) –< less than –> greater than The XPath expression //* –Selects all the nodes in an XML document
71
71 sports.xml The xml-stylesheet declaration points to an XSL style sheet for this document
72
72 sports.xml
73
73 sports.xsl Use xsl-output to write a doctype.
74
74 sports.xsl Write the following HTML for each game element in the sports element that is contained in the root element Write the value of the game ’s id attribute in a table cell Write the value of the game ’s name child element in a table cell Write the value of the game ’s paragraph child element in a table cell define the end of the template and the end of the style sheet.
75
75 sorting.xml The chapters are out of order here, but our XSL transformation will sort the data before displaying it
76
76 sorting.xsl Apply the templates in this document to the document root’s child nodes Set the XHTML document’s title to show the book’s ISBN number and title
77
77 sorting.xsl Output the enclosed XHTML for each element in the frontmatter element, which is contained in the chapters element Sort the chapter elements by the number contained in their number attribute in ascending order
78
78 sorting.xsl Sort the appendices alphabetically by their number attributes in ascending order
79
79 sorting.xsl
80
80 Fig. 14.24 | XSL style-sheet elements. (Part 1 of 2.)
81
81 XSL style-sheet elements. (Part 2 of 2.)
82
82 RSS RSS stands – RDF (Resource Description Framework) Site Summary – Also known as Rich Site Summary and Really Simple Syndication An XML format used to syndicate simple website content – news articles, blog entries, product reviews, podcasts, vodcasts and more Without RSS, users will have to check your site daily for new updates. RSS feed contains – rss root element with a version attribute – channel child element with item subelements – Depending on the RSS version, the channel and item elements have certain required and optional child elements item elements: defines an article or "story" in the RSS feed. – provide the feed subscriber with a link to a web page or file, a title and description of the page or file Enables website developers to draw more traffic. Required child elements of channel in RSS 2.0 – Description – link – Title Required child elements of item in RSS 2.0 – title or description
83
83 RSS (Cont.) RSS aggregator : (a site or program that gathers and sorts out RSS feeds). – keeps tracks of many RSS feeds – brings together information from the separate feeds Many sites provide RSS feed validators – validator.w3.org/feed – feedvalidator.org – www.validome.org/rss-atom/ The DOM and XSL can be used to create RSS aggregators A simple RSS aggregator uses an XSL style sheet to format RSS feeds as XHTML MSXML’s built-in XSLT capabilities include method transformNode to apply an XSLT transformation – Invoked on an RSS document object and receives the XSL document object as an argument Firefox provides built-in XSLT processing in the form of the XSLTProcessor object – After creating this object, use its importStylesheet method to specify the XSL style sheet you’d like to apply – Apply the transformation by invoking the XSLTProcessor object’s transformToFragment method, which returns a document fragment
84
deitel-20.xml
86
86 channel elements and descriptions. (Part 1 of 2.)
87
87 channel elements and descriptions. (Part 2 of 2.)
88
88 item elements and descriptions.
89
RssViewer.html Because of browser incompatibilities, our code needs to determine which browser it is running on.
90
RssViewer.html IE7 downloads and loads the documents synchronously Firefox loads documents asynchronously, so use the onload event to make the events happen in the correct sequence
91
RssViewer.html
92
Separate methods are used to apply the XSLT for IE and other browsers
93
RssViewer.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.