4/8/99 C. Edward Chow Page 1 XML Edward Chow Some of the presentation material is adapted from tutorial at msdn.microsoft.com/xml and articles at xml.com by Norman Walsh and Tim Bray.Norman Walsh Tim Bray Code and information provided by Tim Bray of Texturality on a XML-HTML converter and John Bower of Microsoft on VML are greatly appreciated.
4/8/99 C. Edward Chow Page 2 XML: eXtensible Markup Language A markup language for “richly” structured documents. Structure documents with content info, and structural info. Markups (tags) are used to specify the structural info. It was approved as a W3C recommendation in 2/10/98. It was used as a meta language to specify other markup languages, called XML applications. The list of XML applications grows fast: –SMIL (Synchronized Multimedia Integration Language) –MathML (Math Markup Language) –VML (Vector Graphic Markup Language) –E-commerce transactions, Server API, …
4/8/99 C. Edward Chow Page 3 A SMIL XML Document <audio src="audio/vodpaper.ra" id="Soundtrack 2" title="Soundtrack 2" begin="2s" end="8s" /> <video src="file:///d:/uccs/cs525/doc/videof/uccs.avi" id="videoclip 1" title="video clip 1" /> <audio src="audio/kissingcamel.ra" id="Soundtrack 3" title="kissingcamel" begin="id(Soundtrack 1)(end)"/>
4/8/99 C. Edward Chow Page 4 MathML For (a + b) 2 a + b 2
4/8/99 C. Edward Chow Page 5 VML v\:* {behavior:url(#default#VML);} <v:shape style='top: 0; left: 0; width: 250; height: 250' stroke="true" strokecolor="red" strokeweight="2" fill="true" fillcolor="green" coordorigin="0 0" coordsize=" "> <v:path v="m 8,65 l 72,65,92,11,112,65,174,65,122,100, 142,155,92,121,42,155,60,100 x e"/>
4/8/99 C. Edward Chow Page 6 How XML is used XML SMILMathMLVML XML applications (Languages) Specify XML (Parser) Processor (Specific) Application XML Document XML capable Server/Client Text editor Perl Scripts XML editor
4/8/99 C. Edward Chow Page 7 XML vs. HTML The tag semantics and tag set are “rather” fixed. With CSS1 you can create your own tags. The standard body adds and depreciates tags slowly. XML specifies neither semantics nor tag set. It is a meta-language for describing markup language, i.e., define tag set. The semantics of an XML document will either be defined by the applications process them or by the stylesheet.
4/8/99 C. Edward Chow Page 8 XML vs. SGML XML is a restrict form (application profile) of Standard Generalized Markup Language (SGML). The syntax of XML specified as Extended Backup-Naur Form (EBNF). Modern compiler technique makes the parsing of EBNF-based XML documents fast. The full blown SGML syntax is much more complex and a parser can not be easily made and SGML documents take longer time to process.
4/8/99 C. Edward Chow Page 9 XML Development Goals View XML documents quick and easy as HTML documents. Support wide-variety of applications: authoring, browsing, content analysis,… Compatible with SGML, allow easy conversion of SGML documents to XML. Easy to write programs that process XML documents. (2 weeks for CS graduate). Option features keep to minimum or zero. XML documents should be human-legible and reasonable clear. View with text editor.
4/8/99 C. Edward Chow Page 10 XML Development Goals (2) XML design can be prepare quickly. XML design shall be formal and concise must be expressed in EBNF, amendable to modern compiler tools and techniques. XML document shall be easy to create. Terseness in XML markup is not (minimal) important.
4/8/99 C. Edward Chow Page 11 How is XML Defined? Extensible Markup Language (XML) 1.0 SpecificationExtensible Markup Language (XML) 1.0 Specification 10 February 1998, Tim Bray, Jean Paoli, C. M. Sperberg-McQueen. XML Linking Language (XLink) 3 March 1998, Eve Maler, Steve DeRose XML Pointer Language (XPointer) 3 March 1998, Eve Maler, Steve DeRose Extensible Stylesheet Language (XSL) 16 December 1998, James Clark, Stephen Deach Namespaces in XML 14 January 1999, Tim Bray, Dave Hollander,Andrew Layman
4/8/99 C. Edward Chow Page 12 A Simple XML Document <? For processing instruction Say goodnight Gracie. Goodnight, Gracie. /> for empty element which does not have end-tag
4/8/99 C. Edward Chow Page 13 Weather Report in XML March 25, :00 Seattle WA West Coast USA partly cloudy 46 SW
4/8/99 C. Edward Chow Page 14 On-Line Bidding Record Risotto Linda Mann 20x30 inches Oil :59:51 AM Paul :59:40 AM John :27:08 PM opening price
4/8/99 C. Edward Chow Page 15 Stock Profolio Record in XML zacx corp ZCXM zaffymat inc ZFFX zysmergy inc ZYSZ
4/8/99 C. Edward Chow Page 16 Embed HTML in XML --> <article xmlns=" xmlns:html="any-old-bollocks" > Test XML, name space, HMTL tag It seems to be critical to include < ?xml-stylesheet href="first-x.css" type="text/css" ?> Otherwise it won't work. The following is the demonstration of < html:ul> and < html:li> tags. What is XML How to create it
4/8/99 C. Edward Chow Page 17 Six Types of Markups in XML Elements: begin with start tag, ends with end-tag Entity references: special characters, repeated text, external file content Comments: Processing Instructions: to be pass to application. Marked (CDATA) Sections: transparent text What if we have ]]> as part of the text? Document Type Declarations
4/8/99 C. Edward Chow Page 18 Element Elements are most common form of markup. Delimited by angle brackets. is the starting tag, is the ending tag where “element” the element name. They identify the nature of the content, they surround. Some comments may be empty, they need to be ended with /> Attributes are name-value pairs, occur inside start- tags. In XML, all attribute values must be quoted.
4/8/99 C. Edward Chow Page 19 Entity References Entity references are used to represent special characters (e.g., <), repeated text, external file content. Each entity must have unique name. Entity reference begin with ‘&’ and end with ‘;’ <element> Character references: Decimal references: ℞ Hexadecimal reference: ℞ Rx
4/8/99 C. Edward Chow Page 20 Processing Instruction They are escape hatch to provide information to an application. Their form: where name is called PI target, identifies the PI to the application, pidata is the information passed. PI name beginning with xml is reserved for XML standardization.
4/8/99 C. Edward Chow Page 21 Document Type Declaration A large part of the XML specification deals with declarations that are allowed in XML. They are derived from SGML Document Type Definitions (DTD). Declarations express the constraints on –the tag sequence, –nesting of tags, –attribute values and defaults, –format and name of external files, and –entity that may encounter. DTD allows a document to communicate meta-info to the parser about its content.
4/8/99 C. Edward Chow Page 22 4 Types of DTD in XML Element Type Declarations: identify the names of elements and the nature of their content. Attribute list declarations Entity declarations Notation declarations
4/8/99 C. Edward Chow Page 23 Element Type Declaration Element Type Declarations: identify the names of elements and the nature of their content. Form: e.g., Content model define what element may contain. It follows the typical regular expression usage. –‘,’ specify succession among elements –+ may repeat more than once and must occur at least once. –? may be absent, may occur exactly once –Name without punctuation must occur exactly once. –The names referenced in content model must appear in DTD for XML processor to check the validity of the document.
4/8/99 C. Edward Chow Page 24 Content Models Beside element name, special symbol #PCDATA (parseable character data) is reserved to indicate character data. Element contains only other elements are said to have “element content” Element contains both other element and #PCDATA are said to have “mixed content”. E.g., –| or relationship. –* may occur zero or more times. –Here burn my contain zero or more characters and quote tags, mixed in any order. –#PCDATA must precede other elements.
4/8/99 C. Edward Chow Page 25 Empty and Any Content Model EMPTY (All upper case) indicate no content and no end-tag. ANY (All upper case) indicate any content allowed. Useful in document conversion. Avoid this in production environment.
4/8/99 C. Edward Chow Page 26 Element Declaration for the Simple XML Example
4/8/99 C. Edward Chow Page 27 Attribute List Declarations Identify which elements may have attributes, what attribute they may have, what values the attributes may hold and what value is the default. <!ATTLIST oldjoke name ID #REQUIRED label CDATA #IMPLIED status (funny|notfunny) ‘funny’> Each attribute has 3 parts: name, type, default value. oldjoke element has 3 attributes. CDATA string (character data).
4/8/99 C. Edward Chow Page 28 6 possible Attribute Types CDATA: strings, any text allowed. ID: value of an ID attribute must be a name. Element must have a single ID attribute. IDREF: allow multiple IDREF value separated by white space. ENTITY (ENTITIES): must be name of a single entity (or multiple entity names separated with white space. NMTOKEN (NMTOKENS): restrict form of string attribute, must be single word. A list of Names (enumerated type)
4/8/99 C. Edward Chow Page 29 4 possible Default Values #REQURIED #IMPLIED: attribute value not required, no default provided. XML processor must proceed without one. “value” #FIXED “value”: not required but when appear, it must have the specified value. Use to associate semantics with an element. XML processor perform attribute value normalization on attribute value, recursively resolve character references or entity references.
4/8/99 C. Edward Chow Page 30 Entity Declarations Associate a name with fragment of content. These content can be regular text, DTD, references to external files. Internal entities: first example. Use &AT I; any where in the document to insert ArborText, Inc. It is a short cut for frequently typed text or text expected to change. 5 predefine internal entities: < > & ' "
4/8/99 C. Edward Chow Page 31 External Entities Associate a name with the content of another file. The content will be inserted and parsed as part of the referring document. 2 nd example &boilerplate, include the /standard/legal.xml file and parse it. Binary data, indicated by NDATA, (figures or other non-XML content) is not parsed and may only be referenced in an attribute. 3 rd example. &ATIlogo can be used as the value of an ENTITY attribute.
4/8/99 C. Edward Chow Page 32 Parameter Entities Can only occur in DTD. Identified by % in front of its name. Referred as %name. They are expanded immediately. Other normal entity references will not be expanded.
4/8/99 C. Edward Chow Page 33 Parameter Entities Example
4/8/99 C. Edward Chow Page 34 Notation Declarations Identify specific types of external binary data. This information is passed to the processing application.
4/8/99 C. Edward Chow Page 35 Where DTD is required XML content can be processed without DTD. Structure of data can enforced with DTD. Here are instances where DTD is required: Authoring Environments. They need it to enforce the content models of documents. Default Attribute Values. When document relies on default attribute values. (at least part of the DTD needs to be processed.) White Space Handling: semantic associate with element content is different from that with mixed content.
4/8/99 C. Edward Chow Page 36 Including a DTD DTD must be the first thing in the document after optional processing instructions and comments. <!DOCTYYPE chapter SYSTEM “dbook.dtd” [ <!ATTLIST ulink xml:link CDATA #FIXED “SIMPLE” xml-attribute CDATA #FIXED “HREF URL” URL CDATA #REQUIRED> ]> …
4/8/99 C. Edward Chow Page 37 DTD Processing DTD identifies the root element of the document (In this case the “chapter”). All XML documents must have a single root element that contain all the content of the document. Additional declaration may come from an external DTD (external subset), or be included directly (internal subset). Here dbook.dtd include element and attribute declarations for the ulink (simple link) element in the internal subset. Declarations in Internal subset override those in external subset. The standalone=“no” indicates both external and internal subset must be processed.
4/8/99 C. Edward Chow Page 38 Well-formed Documents Document is well-formed if it obeys the syntax of XML. Document including sequence of markup characters that can not be parsed or are invalid is not well-formed. Document must conform to the grammar of XML documents (in DTD) Some markup constructs such as parameter entity reference with %, are only allowed in specific places such as attribute value. No attribute may appear more than once on the same start tag. String attribute values cannot contain references to external entities. Non-empty tags must be properly nested. Parameter entities must be declared before referenced. All entities except the amp, lt, gt, apos, quot must be declared. A binary entity cannot be reference in the flow of content, only in attribute. Neither text nor parameter entities are allowed to be recursive, directly, or indirectly. By definition, if a document is not well-formed, it is not XML.
4/8/99 C. Edward Chow Page 39 Xlink Working Draft 1.0 Xlink specifies the relationship between resources or portion of resources. Since XML does not have fixed tag set, links is identified by xml:link attribute. Simple link: Link Text here location can be URL, query, or extended pointer.. Extended link: Multimedia SMIL Stock Star Picture
4/8/99 C. Edward Chow Page 40 XPointer Working Draft 1.0 XPointers operate on the tree defined by the elements and other markup constructs of an XML document.tree An XPointer consists of a series of location terms, each of which specifies a location, usually relative to the location specified by the prior location term.location terms Each location term has a keyword (such as id, child, ancestor, and so on) and can have arguments such as an instance number, element type, or attribute. For example, the location term child(2,CHAP) refers to the second child element whose type is CHAP. child(2,oldjoke).(3,.) : locate the 3rd child (no restriction) of the 2nd oldjoke element in the document.
4/8/99 C. Edward Chow Page 41 XPointers Span regions of the tree: span(child(2,oldjoke), child(3,oldjoke)) select 2nd and 3rd olkjoke in the document. Selection by ID, attribute value, and string matching: span(root()child(3,sect1)string(1,”Here”,0), root()child(3,sect1)string(1,”Here”,4)) select 1st instance of the work “here” in the 3rd section of the document. Note here we do not have add anchor the referred document.
4/8/99 C. Edward Chow Page 42 XSL Extensible Stylesheet Language, a working draft, to be complete later A language to specify the association of presentation style with XML information It contains two parts: –Transformation language for preparing document for display. XSL can be used to convert (e.g., reordering) one XML document to another XML document Transformation to an HTML document can be handled as a special case –Formatting Object (FO) Set for actual visual styling.
4/8/99 C. Edward Chow Page 43 Presentation Process First, the result tree is constructed from the source tree. –achieved by associating patterns with templates. –A pattern is matched against elements in the source tree. –When a pattern is matched, its template is instantiated to create part of the result tree –source tree can be filtered and reordered, and arbitrary structure can be added. Second, the result tree is interpreted to produce formatted output on a display, on paper, in speech or onto other media.
4/8/99 C. Edward Chow Page 44 XSL for XML Enabling display: The XSL Transformation Language enables display of XML by transforming XML into grammar and structure suitable for display—for instance, HTML or the XSL Formatting Objects language. Direct browsing of XML files: Internet Explorer 5 can apply XSL style sheets that produce HTML, allowing direct browsing of the XML files. Content delivery to downlevel browsers: XSL transformations can be executed on the server to provide HTML documents for downlevel browsers. Schema Translation: The transformation process is independent of any particular output grammar and can be used for translating XML data from one schema to another. Converting XML through querying, sorting, and filtering: The transformation can be used for general-purpose transformations within a single grammar, including filtering, sorting, and summarizing data.
4/8/99 C. Edward Chow Page 45 Result Tree Construction A template can contain elements that specify literal result element structure. A template can also contain elements that are instructions for creating result tree fragments. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates. Instructions can select and process descendant elements The result tree is constructed by finding the template rule for the root node and instantiating its template.
4/8/99 C. Edward Chow Page 46 XSL name space XSL uses XML namespaces, to distinguish elements that are instructions to the XSL processor from elements that specify literal result tree structure. Instruction elements all belong to the XSL namespace, xmlns:xsl=“ An XSL stylesheet contains an xsl:stylesheet document element. This element may contain xsl:template elements specifying template rules. A stylesheet contains a set of template rules. A template rule has two parts: a pattern which is matched against nodes in the source tree and a template which can be instantiated to form part of the result tree.
4/8/99 C. Edward Chow Page 47 XSL Elements xsl:apply-templatesxsl:apply-templatesDirects the XSL processor to find the appropriate template to apply based on the results of the pattern. xsl:attributexsl:attributeCreates an attribute node and attaches it to the output element. xsl:choosexsl:chooseProvides multiple conditional testing in conjunction with the xsl:otherwise and xsl:when elements.xsl:otherwisexsl:when xsl:commentxsl:commentGenerates a comment in the output. xsl:copyxsl:copyCopies the target node from the source to the output. xsl:elementxsl:elementCreates an element with the specified name in the output. xsl:evalxsl:eval * Computes a string of generated text. xsl:for-eachxsl:for-eachAllows application of the same template to multiple nodes. xsl:ifxsl:ifAllows conditional subpatterns within a template. xsl:otherwisexsl:otherwiseProvides multiple conditional testing in conjunction with the xsl:choose and xsl:when elements.xsl:choosexsl:when xsl:pixsl:piGenerates a processing instruction in the output. xsl:scriptxsl:script * Defines global variable declarations and functions. xsl:stylesheetxsl:stylesheetDefines the set of templates to be applied to the input source tree to generate the output source tree. xsl:templatexsl:templateDefines a template for the output of nodes of a specific pattern. xsl:value-ofxsl:value-ofEvaluates an XSL pattern in the select attribute and returns the value of the requested node as text, which is inserted into the template. xsl:whenxsl:whenProvides multiple conditional testing in conjunction with the xsl:choose and xsl:otherwise elements.xsl:choose xsl:otherwise * Microsoft proprietary extensions to support scripting.
4/8/99 C. Edward Chow Page 48 XSL Pattern XSL Patterns are constructed using the operators and special characters shown in the following table. / Child operator; selects immediate children of the left-side collection. When this path operator appears at the start of the pattern, it indicates that children should be selected from the root node. // Recursive descent; searches for the specified element at any depth. When this path operator appears at the start of the pattern, it indicates recursive descent from the root node.. Indicates the current context. * Wildcard; selects all elements regardless of the element Attribute; prefix for an attribute Attribute Wildcard; selects all attributes regardless of name. : Namespace separator; separates the namespace prefix from the element or attribute name. ! * Applies the specified method to the reference node. ( ) * Groups operations to explicitly establish precedence. [ ] Applies a filter pattern. [ ] * Subscript operator. Used for indexing within a collection.
4/8/99 C. Edward Chow Page 49 XSL Pattern Examples s.asp. s.asp //author Find all author elements anywhere within the current document = "textbooks"] Find all bookstores where the value of the specialty attribute is equal to "textbooks": author[first-name][2] finds the third author element that has a first-name author[degree and not(publication)] Find all author elements that contain at least one degree element and that contain no publication elements:
4/8/99 C. Edward Chow Page 50 A Grade XSL Example: grade.xml CS Spring Edward Chow Roy Rogers
4/8/99 C. Edward Chow Page 51 A Grade XSL Example <![CDATA[ counter = 0; function even(e) { counter = counter+1; return counter%2 == 0; } ]]> <BODY STYLE="font-family:Arial, helvetica, sans-serif; font-size:12pt; background-color:#EEEEEE"> Grade Web Page for
4/8/99 C. Edward Chow Page 52 NAME HW1 HW2 HW3 HW4 HW5 MIDTERM FINAL background-color:magenta even row
4/8/99 C. Edward Chow Page 53 Formatted Grade Web Page
4/8/99 C. Edward Chow Page 54 Simple Example: Simple.xml Belgian Waffles $5.95 two of our famous Belgian Waffles with plenty of real maple syrup 650 Strawberry Belgian Waffles $7.95 light Belgian waffles coverred with strawberrys and whipped cream 900
4/8/99 C. Edward Chow Page 55 IE5.0 Display of simple.xml
4/8/99 C. Edward Chow Page 56 Simple.xsl <BODY STYLE="font-family:Arial, helvetica, sans-serif; font-size:12pt; background-color:#EEEEEE"> - ( calories per serving)
4/8/99 C. Edward Chow Page 57 Transform Viewer Example
4/8/99 C. Edward Chow Page 58 Homework#9 Create a XML document with your catalog database content in Homework #8. Create a XSL document that transforms the XML document as a HTML with a table form with all the fields, similar to that in gradeS3.xml and grade.xsl. You can choose to sort by the name or sort by the price. Use and xslex1s3.xsl as a template and xslex1s3.xsl Save the.xml and.xsl file in your directory at frodo. Verify its correctness using IE 5.0. Send me with the url to.xml file when done.