Download presentation
Presentation is loading. Please wait.
Published byJemimah Underwood Modified over 9 years ago
1
Chapter 4 Web Pages Using Web Standards Chapter 3 XML – the ‘X’ in Ajax
2
Introduction Integration of heterogeneous information systems is the key challenge to information technologies –System integration: how to let distributed and heterogeneous systems communicate? –Data integration: how to let distributed and heterogeneous systems understand each other’s data? XML technologies address the data integration problem XML is important for providing different views of the same data
3
Extensible Markup Language XML is a simplified descendant of SGML, or Standard Generalized Markup Language Like XHTML, an XML document marks up data with tags and attributes For each data type, a different set of tag names, attributes, and syntax rules could be defined in form of an XML dialect, which should be fixed or not extensible by its users, like XHTML All XML documents based on the same XML dialect are called instance documents of the XML dialect “Extensible” in XML means new XML dialects can always be introduced for new data types
4
Tags and Elements Each XML element consists of a start tag and an end tag with nested elements or text in between (called element value) The start tag is of form, as and The end tag is of form, as and An XML dialect will define what are its allowed tag names Any string consisting of a letter followed by an optional sequence of letters or digits and having no variations of “xml” as its prefix is a valid XML tag
5
Tags and Elements … Tag names are case-sensitive: is not the same as If an element has no value, the start tag and end tag can be combined into, as Elements cannot be partially overlapped, like data data
6
XML Attributes The start tag of an element could have attributes in the form of a sequence of attributeName="attributeValue" separated by white spaces – – … If double quote is used in a value, single quote can also be used to delimit attribute values Any string consisting of a letter followed by an optional sequence of letters or digits can be a valid attribute name Attribute names are case-sensitive: “id” and “ID” are different
7
XML Document Structure An XML document contains an optional XML declaration followed by a single top-level element, which may contain nested elements and text Gone with the Wind Movie Classic Star Trek TV Series Science fiction
8
XML Document Structure … The optional XML declaration can be used as the first line –It declares the XML version; v1.0 is the popular one –It declares character encoding XML data are based on Unicode for supporting international characters UTF-8 is the most efficient Unicode standard for western languages (one byte for each keyboard character) XML comment:
9
XML Document Structure … The nesting structure of an XML document can be described by a tree growing downwards (prefix @ for attributes) Here “library” is the root or top-level element library dvd title format genre @id dvd title format genre @id
10
Using Special Characters The following five characters are used for identifying XML document structures thus cannot be used in XML data directly: & " ' As part of element or attribute values, they should be represented as & < > " ' Invalid: IBM & Microsoft Valid: IBM & Microsoft These alternative representations of characters are examples of entity references to be introduced soon
11
Entity References If a character has hexadecimal Unicode code point nnn, you can refer to it as &#xnnn; If a character has decimal Unicode code point nnn, you can refer to it as &#nnn; If a character or string has an entity name entityName, you can refer to it in XML as &entityName; –Define entity name euro for €: –Define entity name cs for “computer science”: HTML: A &cs; book costs me €52. View: A computer science book costs me €52
12
Well-Formed XML Documents A well-formed XML document must conform to the following rules, among others: –Non-empty elements are delimited by a pair of matching start tag and end tag –Empty elements may be in their self-ending tag form, such as –All attribute values are enclosed in matching single (') or double (") quotes –Elements may be nested but must not partially overlap. Each non-root element must be completely contained in another element –The document complies with its declared or default character encoding
13
Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful XML document must follow the syntax rules of a specific XML dialect: which tags and attributes can be used, how elements can be ordered or nested … Major mechanisms for defining XML dialects: –Document Type Definition (DTD) –XML Schema (XSD) XML validating parsers can read an XML instance document and its syntax definition DTD/XSD file to validate whether the XML document conforms to the syntax constraints
14
Document Type Definition DTD is simpler than XSD and can specify less syntax constraints DTD is part of XML specification DTD syntax is not based on XML Usually DTD is specified in a separate file so it can be referred to by many of its instance XML documents Local DTD definitions, especially entity name definitions, can also be included at the top of an XML document to override some global definitions
15
External DTD Example A “library” element contains one or more “dvd” elements A “dvd” element contains one “title” element, one “format” element, and one “genre” elements, in the same order The “title”, “format” and “genre” elements all have strings as their values A “dvd” element has a required attribute “id” whose value is a string
16
Declaring Elements Empty elements – Elements with text or generic data – #CDATA means the element contains character data that is not supposed to be parsed by a parser for markups like entity references or nested elements – #PCDATA means that the element contains data that is going to be parsed by a parser for markups including entity references but not for nested elements – The keyword ANY declares an element with any content as its value, including text, entity references and nested elements. Any element nested in this element must also be declared
17
Declaring Elements… Elements with children (sequences) – Elements with zero or more nested element – Elements with one or more nested element – Elements with optional nested elements –
18
Declaring Elements… Elements with alternative nested elements –<!ELEMENT section (section1 | section2) Elements with mixed content – –An email element must contain in the same order at least one to child element, exactly one from child element, exactly one header element, zero or more message elements, and some other parsed character data as well
19
Declaring Attributes Syntax: –DTD XML –XML (having default radius 1) –DTD –XML
20
Declaring Attributes … Declaring an optional attribute without default value –DTD –XML (having no attribute radius ) Declaring a mandatory attribute –DTD –XML (invalid element)
21
Declaring Entity Names Syntax: – –Example usage of entity names HTML: A &cs; book costs me €52. View: A computer science book costs me €52
22
Associating DTD Declarations to XML Documents Including DTD declarations in an XML document – <!DOCTYPE library [ ]> Gone with the Wind Movie Classic
23
Associating DTD Declarations to XML Documents … Referencing an external DTD file from an XML document – Gone with the Wind Movie Classic
24
XML Schema (XSD) An alternative industry standard for defining XML dialects More expressive than DTD Using XML syntax Promoting declaration reuse so common declarations can be factored out and referenced by multiple element or attribute declarations
25
Example XSD
26
Example XSD …
27
XML Namespace Tag and attribute names are supposed to be meaningful Namespace is for reducing the chance of name conflicts A namespace is any unique string, typically in URL form An XML document can define a short prefix for each namespace to qualify tag and attribute names declared under that namespace –“http://www.w3.org/2001/XMLSchema” is the namespace for tag and attribute names of XML Schema –“xs” is defined as a prefix for this namespace –“xs:schema”: the “schema” tag name defined in namespace “xs” or “http://www.w3.org/2001/XMLSchema”
28
XML Namespace … To declare "http://csis.pace.edu" to be the default namespace (to which all unqualified names belong), use attribute xmlns="http://csis.pace.edu" To declare that all tag/attribute names declared in the current XSD file belong to namespace "http://csis.pace.edu", use the targetNamespace attribute: Declarations in a schema element without specifying targetNamespace value does not belong to any namespace
29
XML Declarations: Global vs. Local Global declarations: XSD declarations immediately nested in the top-level schema element Global declarations are the key for reusing declarations Local declarations: only valid in their hosting elements (not schema)
30
Declaring Simple Elements To declare element color that can take on any string value – –Element “ blue ” will have value “blue”, and element “ ” will have no value To declare element color that can take on any string value with “red” to be its default value – –Element “ blue ” will have value “blue”, and element will have the default value “red”
31
Declaring Simple Elements … To declare element color that can take on only the fixed string value “red” – –Element “ red ” will be correct, element “ blue ” will be invalid, and element “ ” will have the fixed (default) value “red”
32
Declaring Attributes Attribute declarations are always nested in their hosting elements’ declarations To declare that lang is an attribute of type xs:string, and its default value is “EN” – If the above attribute lang doesn’t have a default value but it must be specified for its hosting element –
33
Declaring Complex Elements To declare that product is an empty element type with optional integer-typed attribute pid –Example product elements
34
Declaring Complex Elements.. To declare that an employee element’s value is a sequence of two nested elements: a firstName element followed by a lastName element, both of type string –Example Tom Sawyer
35
Using Global Type Declarations Global declarations promote declaration reuse
36
Declaring Complex Type Elements To declare a complexType element shoeSize with integer element value and a string-type attribute named country –Example: 35
37
Declaring Mixed Complex Type Elements A mixed complex type element can contain attributes, elements, and text To declare a letter element that can have a mixture of elements and text as its value
38
Declaring Mixed Complex Type Elements … Example letter element Dear Mr. John Smith, Your order 1032 will be shipped on 2008-09- 23.
39
Specifying Unlimited Element Order Use “xs:all” elements to replace “xs:sequence” elements is you allow the nested elements to occur in any order <xs:element name="firstName" type="xs:string"/> <xs:element name="lastName" type="xs:string"/> The firstName and lastName elements can occur in any order
40
Specifying Multiple Occurrence of an Element Use occurrence indicators, maxOccurs and minOccurs, to indicate an element can occur how many times Attribute maxOccurs has default value unbounded Attribute minOccurs has default value 1 To declare that the dvd element can occur zero or unlimited number of times <xs:element name="dvd" minOccurs="0" maxOccurs="unbounded">
41
Specifying an XML Schema without Target Namespace Assume that –an XML dialect is specified with an XML Schema file schemaFile.xsd without using a target namespace –the Schema file has URL schemaFileURL, which is either a local file system path like “schemaFile.xsd” or a Web URL like “http://csis.pace.edu/schemaFile.xsd” The instance documents of this dialect can be associated with its XML Schema declaration with the following structure, where rootTag is the name of a root element <rootTag xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schemaFileURL" >
42
Specifying an XML Schema with Namespace Assume that –an XML dialect is specified with an XML Schema file schemaFile.xsd using target namespace namespaceString –the Schema file has URL schemaFileURL, which is either a local file system path like “schemaFile.xsd” or a web URL like http://csis.pace.edu/schemaFile.xsd http://csis.pace.edu/schemaFile.xsd The instance documents of this dialect can be associated with its XML Schema declaration with the following structure, where rootTag is the name of a root element <rootTag xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="namaspaceString schemaFileURL" >
43
XML Parsing and Validation with SAX and DOM XML parsers are for –Read and parse XML documents –Check whether an XML document is well-defined –Check whether an XML (instance) document is conforming to the syntax specification of its DTD or XSD declarations Two type of XML parsers –SAX (Simple API for XML) SAX works as a pipeline. It reads in the input XML document sequentially, and fires events when it detects the start or end of language features like elements and attributes. It is memory-efficient for data sequential processing. –DOM (Document Object Model) A DOM parser builds a complete tree data structure in the computer memory so it can be more convenient for detailed document analysis and language transformation.
44
XML Transformation with XSLT XSL (Extensible Stylesheet Language) is the standard language for writing stylesheets to transform XML documents among different dialects or into other languages XSL stylesheets are pure XML documents XSL includes three components: –XSLT (XSL Transformation) as an XML dialect for specifying XML transformation rules or stylesheets –XPath as a standard notation system for specifying subsets of elements in an XML document –XSL-FO for formatting XML documents
45
Example XML Document Gone with the Wind Movie Classic Star Trek TV Series Science fiction
46
Identifying XML Nodes with XPath Visualize all components in an XML document, including the elements, attributes and text, as graph nodes A node is connected to another node under it if the latter is immediately nested in the former or is an attribute or text value of the former The attribute names have symbol @ as their prefix The sibling nodes are ordered as they appear in the XML document library dvd title format genre @id dvd title format genre @id
47
Path Expressions Path expressions are used to select nodes in an XML document An absolute location path starts with a slash / and has the general form of /step/step/… A relative location path does not start with a slash / and has the general form of step/step/… In both cases, the path expression is evaluated from left to right, and each step is evaluated in the current node set to refine it
48
Path Expressions … Each step has the following general form: [axisName::]nodeTest[predicate] –the optional axis name specifies the tree-relationship between the selected nodes and the current node –the node test identifies a node type within an axis –zero or more predicates are for further refining the selected node set
49
Path Expressions … ExpressionDescription nodeNameSelects all child nodes of the named node /Selects from the root node //Selects nodes in the document from the current node that match the selection no matter where they are.Selects the current node..Selects the parent of the current node @Selects attributes text()Selects the text value of the current element *Selects any element nodes @*Selects any attribute node node()Selects any node of any kind (elements, attributes, …)
50
Path Expressions … library: all the library elements in the current node set /library: the root element library library/dvd: all dvd elements that are children of library elements in the current node set //dvd: all dvd elements no matter where they are in the document (no matter how many levels they are nested in other elements) relative to the current node set library//title: all title elements that are descendants of the library elements in the current node set no matter where they are under the library elements //@id: all attributes that are named “id” relative to the current node set
51
Path Expressions … /library/dvd/title/text(): the text values of all the title elements of the dvd elements /library/dvd[1]: the first dvd child element of library /library/dvd[last()]: the last dvd child element of library /library/dvd[last()-1]: the last but one dvd child element of library /library/dvd[position()<3]: the first two dvd child elements of library //dvd[@id]: all dvd elements that have an id attribute //dvd[@id='2']: the dvd element that has an id attribute with value 2
52
Path Expressions … /library/dvd[genre='Classic']: all dvd child elements of library that have “Classic” as their genre value /library/dvd[genre='Classic']/title: all title elements of dvd elements of library that have “Classic” as their genre value /library/*: all the child nodes of the library element //*: all elements in the document //dvd[@*]: all dvd elements that have any attribute //title | //genre: all title and genre elements in the document
53
Popular Axis Names Axis NameResult ancestorSelects all ancestors (parent, grandparent, etc.) of the current node ancestor-or-selfSelects all ancestors (parent, grandparent, etc.) of the current node and the current node itself attributeSelects all attributes of the current node childSelects all children of the current node descendantSelects all descendants (children, grandchildren, etc.) of the current node descendant-or-selfSelects all descendants (children, grandchildren, etc.) of the current node and the current node itself followingSelects everything in the document after the end tag of the current node following-siblingSelects all siblings after the current node namespaceSelects all namespace nodes of the current node parentSelects the parent of the current node precedingSelects everything in the document that is before the start tag of the current node preceding-siblingSelects all siblings before the current node selfSelects the current node
54
Path Expressions … child::dvd: all dvd nodes that are children of the current node attribute::id: the id attribute of the current node child::*: all children of the current node attribute::*: all attributes of the current node child::text(): all text child nodes of the current node child::node(): all child nodes of the current node descendant::dvd: all dvd descendants of the current node ancestor::dvd: all dvd ancestors of the current node child::*/child::title: all title grandchildren of the current node
55
Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace “http://www.w3.org/1999/XSL/Transform”. Its root element is stylesheet or transform Example XSLT stylesheet DVD Library Listing
56
Transforming XML Documents to XHTML Documents … Title Format Genre
57
Transforming XML Documents to XHTML Documents … The root element stylesheet declares a namespace prefix “xsl” for XSL namespace “http://www.w3.org/1999/XSL/Transform” The 4th line’s xsl:output element specifies that the output file of this transformation should follow the specification of HTML v 4.0 Each xsl:template element specifies a transformation rule: if the document contains nodes satisfying the XPath expression specified by the xsl:template’s match attribute, then they should be transformed based on the value of this xsl:template element Since this particular match attribute has value “/” selecting the root element of the input XML document, the rule applies to the entire XML document The template element’s body (element value) dumps out an HTML template linked to an external CSS stylesheet named “style.css”
58
Transforming XML Documents to XHTML Documents … The XSLT template uses an xsl:for-each element to loop through the dvd elements selected by the xsl:for-each element’s select attribute In the loop body, the selected dvd elements are first sorted based on their genre value Then the xsl:value-of elements are used to retrieve the values of the elements selected by their select attributes To use a web browser to transform the earlier file dvd.xml with this XSLT file dvdToHTML.xsl into HTML, you can add the following line after the XML declaration:
59
Transforming XML Documents to XHTML Documents … The resulting web browser presentation
60
Conclusion XML is an important technology for data integration across heterogeneous systems DTD and XML Schema are major technologies for defining XML dialects for precise business data representation XML instance documents can be validated against their DTD or XML Schema definitions with SAX or DOM validating parsers XML documents can be transformed into different document formats with XSLT stylesheets
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.