A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2004.

Slides:



Advertisements
Similar presentations
XML I.
Advertisements

XML for Information Management – Day 4: Logical and Physical Structure of XML Documents Airi Salminen XML for Information Management University of Erlangen-Nuremberg.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2002.
Introduction to XML: DTD
XML Study-Session: Part II Validating XML Documents.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
Sistemi basati su conoscenza XML (esempi) Prof. M.T. PAZIENZA a.a
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a
XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)
XML Fundamentals Transparency No. 1 XML Fundamentals Cheng-Chia Chen November 2004.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
Introduction to XML This material is based heavily on the tutorial by the same name at
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
XML for Information Management – Day 3 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
IS432: Semi-Structured Data Dr. Azeddine Chikh. 4. Document Type Definitions (DTDs)
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
An Introduction to XML Sandeep Bhattaram
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
Introduction to XML February 07, From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this.
Unit 10 Schema Data Processing. Key Concepts XML fundamentals XML document format Document declaration XML elements and attributes Parsing Reserved characters.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Well Formed XML The basics. A Simple XML Document Smith Alice.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
CH 9 Attribute Declaration 1. Objective What is an attribute Declaring attributes Declaring multiple attribute Alternatives to default attributes values.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
The XML Language.
Session III Chapter 6 – Creating DTDs
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Allyson Falkner Spokane County ISD
Presentation transcript:

A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2004

A technical Introduction to XML Transparency No Documents A data object is an XML document if it is well-formed, as defined in this specification. A well-formed XML document may in addition be valid if it meets certain further constraints. Two views of an XML document: Physical structure:  composed of units called entities.  An entity may refer to other entities to cause their inclusion in the document.  begins in a "root" or document entity. logical structure:  the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup. The logical and physical structures must nest properly, as described in "4.3.2 Well-Formed Parsed Entities".

A technical Introduction to XML Transparency No Well-formed XML documents [1] document ::= prolog element Misc* A textual object is a well-formed XML document if it matches the document production. Matching the document production implies that: 1. It contains one or more elements. 2.There is exactly one element, called the root, or document element, no part of which appears in the content of any other element. For all other elements, if the start-tag is in the content of another element, the end-tag is in the content of the same element. More simply stated, the elements, delimited by start- and end-tags, nest properly within each other. Parent element vs Child element.

A technical Introduction to XML Transparency No Characters A parsed entity contains text, a sequence of characters, which may represent markup or character data. A character is an atomic unit of text as specified by ISO/IEC [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ character encoding may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646;

A technical Introduction to XML Transparency No Common Syntactic Constructs Define some symbols used in the grammar. White Space: [3] S ::= (#x20 | #x9 | #xD | #xA)+ S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. Names and Tokens [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name (S Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (S Nmtoken)* Names beginning with (x|M)(m|M)(l|L) are reserved.

A technical Introduction to XML Transparency No Common Syntactic Constructs (cont’d) Literals [9] EntityValue ::= ‘”’ ([^%&”] | PEReference | Reference)* ‘”’ | “’” ([^%&'] | PEReference | Reference)* “’” [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | Literal data is any quoted string not containing the quotation mark used as a delimiter for that string. Literals are used for specifying the content of internal entities (EntityValue), the values of attributes (AttValue), and external identifiers (SystemLiteral).

A technical Introduction to XML Transparency No Character Data and Markup Text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations and white space outside root element All text that is not markup constitutes the character data of the document.

A technical Introduction to XML Transparency No Character Data and Markup (cont’d) Usage of special characters: The ampersand character (&) and the left angle bracket (<) may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. They are also legal within the literal entity value of an internal entity declaration. If needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) may be represented using the string ">", and must be escaped using ">" or a numeric character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section. I.e., ccc ]]> >

A technical Introduction to XML Transparency No Character Data and Markup (cont’d) In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>". To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double- quote character (") as """. Character Data : [14] CharData ::= [^ ' [^<&]*) i.e., Any string contains none of.

A technical Introduction to XML Transparency No Comments Comments may appear 1. anywhere in a document outside other markup; 2. within the document type declaration at places allowed by the grammar. They are not part of the document's character data. The string "--" (double-hyphen) must not occur within comments. Comments [15] Comment ::= ' ' Example: & -->

A technical Introduction to XML Transparency No Processing Instructions (PIs) Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions: [16] PI ::= ' ' Char*)))? '?>' [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) PIs are not part of the document's character data, but must be passed through to the application. The PI begins with a target (PITarget) used to identify the application. The target names "XML", "xml", and so on are reserved for standardization.

A technical Introduction to XML Transparency No CDATA Section CDATA sections may occur anywhere character data may occur; used to escape blocks of text containing characters which would otherwise be recognized as markup. begin with the string " ": CDATA Sections [18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '<![CDATA[' [20] CData ::= (Char* - (Char* ']]>' Char*)) [21] CDEnd ::= ']]>' Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets ‘<‘ and ampersands ‘&’ may and must occur in their literal form. Example: Hello, world! ]]>

A technical Introduction to XML Transparency No Prolog and Document type Declaration Well-formed but not valid documents: Hello, world! Prolog: [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [24] VersionInfo ::= S 'version' Eq (“‘” VersionNum “‘” | ‘”’ VersionNum ‘“’) [25] Eq ::= S? '=' S? [26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ [27] Misc ::= Comment | PI | S

A technical Introduction to XML Transparency No Document Type Declaration The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition, or DTD. The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD for a document consists of both subsets taken together. A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration.

A technical Introduction to XML Transparency No Document Type Declaration (cont’d) Document Type Definition [28] doctypedecl ::= '<!DOCTYPE' S Name ( S ExternalID)? S? ('[' (markupdecl | DeclSep)* ']' S?)? '>' [ VC: Root Element Type ] [28a] DeclSep ::= PEReference | S [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment [ VC: Proper Declaration/PE Nesting ] [ WFC: PEs in Internal Subset ]

A technical Introduction to XML Transparency No. 16 WFC: PEs in Internal Subset Well-formedness constraint: PEs in Internal Subset In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.) Ex: the following is not well-formed! <!DOCTYPE test SYSTEM "test.dtd" [ ]> PE reference cannot appear here In internal subset even YN is defined in test.dtd

A technical Introduction to XML Transparency No External Subset Like the internal subset, the external subset and any external parameter entities referred to in the DTD must consist of a series of complete markup declarations of the types allowed by the non-terminal symbol markupdecl, interspersed with white space or parameter-entity references. However, portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset. External Subset [30] extSubset ::= TextDecl? extSubsetDecl [31] extSubsetDecl ::= ( markupdecl | conditionalSect | DeclSep )*

A technical Introduction to XML Transparency No Example XML documents An example of an XML document with a document type declaration: Hello, world! The system identifier "hello.dtd" gives the URI of a DTD for the document. The declarations can also be given locally, as in this example: <!DOCTYPE greeting [ ]> Hello, world!

A technical Introduction to XML Transparency No Standalone Document Declaration Standalone Document Declaration [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) [ VC: Standalone Document Declaration ] In a standalone document declaration, the value "yes" indicates that there are no markup declarations external to the document entity (either in the DTD external subset, or in an external parameter entity referenced from the internal subset) which affect the information passed from the XML processor to the application. The value "no" indicates that there are or may be such external markup declarations. Example:

A technical Introduction to XML Transparency No White Space and End-of_line Handling White Space: special attribute xml:space used to indicate if (markup) spaces should be preserved. Normalize End-of-line: #xD#xA --> #xA #D --> #xA before parsing

A technical Introduction to XML Transparency No Language Identification A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages”. Example: xml:lang NMTOKEN #IMPLIED

A technical Introduction to XML Transparency No Language Identifications The quick brown fox jumps over the lazy dog. What colour is it? What color is it? Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit hei 絽 m Bem'n.

A technical Introduction to XML Transparency No Logical Structures Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty- element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value. Element [39] element ::= EmptyElemTag | STag content ETag

A technical Introduction to XML Transparency No Start-Tags,End-Tags, and Empty-Element Tags Start-tag [40] STag ::= ' ' [ WFC: Unique Att Spec ] [41] Attribute ::= Name Eq AttValue [ VC: Attribute Value Type ] [ WFC: No External Entity References ] [ WFC: No < in Attribute Values ] Example: End-tag [42] ETag ::= ' ’ Example: vs

A technical Introduction to XML Transparency No (cont’d) The text between the start-tag and end-tag is called the element's content: Content of Elements [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. Tags for Empty Elements [44] EmptyElemTag ::= ' ' Empty element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY.

A technical Introduction to XML Transparency No (cont’d) Examples of empty elements: <IMG align="left” src=" />

A technical Introduction to XML Transparency No Element Type Declarations The element structure of an XML document may be defined using element type declaration and attribute-list declarations. An element type declaration constrains the element's content. Element type declarations often constrain which element types can appear as children of the element. At user option, an XML processor may issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.

A technical Introduction to XML Transparency No (cont’d) Element Type Declaration [45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [ VC: Unique Element Type Declaration] [46] contentspec ::= ‘EMPTY’ | ‘ANY’ | Mixed | children Examples:

A technical Introduction to XML Transparency No Element Content An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear. The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles:

A technical Introduction to XML Transparency No (cont’d) Element-content Models [47] children ::= (choice | seq) ('?' | '*' | '+')? [48] cp ::= (Name | choice | seq) ('?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')' [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Examples: Note: (x) (0)

A technical Introduction to XML Transparency No Mixed Content Mixed-content Declaration [51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' Examples:

A technical Introduction to XML Transparency No Attribute-List Declarations Attributes: used to associate name-value pairs with elements. may appear only within start-tags and empty-element tags. Attribute-list declarations may be used: to define the set of attributes pertaining to a given element type. to establish type constraints for these attributes. to provide default values for attributes. Attribute-list Declaration [52] AttlistDecl ::= ' ' [53] AttDef ::= S Name S AttType S DefaultDecl

A technical Introduction to XML Transparency No Attribute Types XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. Attribute Types [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS’ | 'ENTITY’ | 'ENTITIES' | 'NMTOKEN’ | 'NMTOKENS’ ID, IDREF and IDREFS for cross references ENTITY for referring to external unparsed objects NMTOKEN restrict attvalue to be a Nmtoken.

A technical Introduction to XML Transparency No (cont’d) Example of Entity type usage <! DOCTYPE BookCategory [... … <!ENTITY cover1 SYSTEM NDATA PDF> … ]>... … …

A technical Introduction to XML Transparency No Enumerated Attribute Types Enumerated Attribute Types [57] EnumeratedType ::= NotationType | Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

A technical Introduction to XML Transparency No Attribute Defaults An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults [60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue) Examples: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST file format NOTATION (ps | pdf | word ) #REQUIRED <!ATTLIST form method CDATA #FIXED "POST">

A technical Introduction to XML Transparency No Attribute-value normalization When: after end-of-line processing but before passed to app. Steps: initially nv=“” // normalized value 0. End-of-line processing ( &#XD &#XA  &#xA) 1.Repeat until end of input. character reference => append the referenced character to the normalized value entity reference => recursively apply step 1 to the replacement text of the entity. white space character (#x20, #xD, #xA, #x9) => append a space character (#x20) to the normalized value. O/w (other character ) =>append the character to the normalized value. 2. If not CDATA type => removing leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character Notes : 1. char and entity references are not treated equal. 2. White spaces are normalized to space.

A technical Introduction to XML Transparency No. 38 Examples Attribute specification a is NMTOKENSA is CDATA a= “ xyz ” xyz #x20 #x20 x y z a="&d;&d;A&a;&a;B &da;" A #x20 B#x20 #x20 A #x20 #x20 B #x20 #x20 a= " A B " #xD #xD A #xA #xA B #xD #xA #xD #xD A #xA #xA B #xD #xD

A technical Introduction to XML Transparency No Conditional Sections Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section [61] conditionalSect ::= includeSect | ignoreSect [62] includeSect ::= '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' [63] ignoreSect ::= '<![' S? 'IGNORE' S? '[’ ignoreSectContents* ']]>' [64] ignoreSectContents ::= Ignore ('<![' ignoreSectContents ']]>' Ignore)* [65] Ignore ::= Char* - (Char* (' ') Char*)

A technical Introduction to XML Transparency No Conditional Sections Example: <![%draft;[ ]]> <![%final;[ ]]>

A technical Introduction to XML Transparency No Physical Structures An XML document may consist of one or many storage units. These are called entities; they all have content and are all identified by name. Each XML document has one entity called the document entity, which serves as the starting point for the XML processor and may contain the whole document. Entities may be either parsed or unparsed. A parsed entity‘s contents are referred to as its replacement text; this text is considered an integral part of the document.

A technical Introduction to XML Transparency No. 42 An unparsed entity is a resource whose contents may or may not be text, and if text, may not be XML. Each unparsed entity has an associated notation, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities. Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of ENTITY or ENTITIES attributes. General entities are entities for use within the document content. In this specification, general entities are sometimes referred to with the unqualified term entity when this leads to no ambiguity. Parameter entities are parsed entities for use within the DTD. These two types of entities use different forms of reference and are recognized in different contexts.

A technical Introduction to XML Transparency No Character and Entity References A character reference refers to a specific character in the ISO/IEC character set, for example one not directly accessible from available input devices. Character Reference [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

A technical Introduction to XML Transparency No Character and Entity References (cont’d) Entity Reference [67] Reference ::= EntityRef | CharRef [68] EntityRef ::= '&' Name ';' [69] PEReference ::= '%' Name ';’

A technical Introduction to XML Transparency No Entity Declarations Entity Declaration [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= ' ' [72] PEDecl ::= ' ' [73] EntityDef ::= EntityValue [9] | ( ExternalID NDataDecl?) [74] PEDef ::= EntityValue | ExternalID notes: 1. General entities can only be referenced at non-DTD region 2. Parameter entities are referenced at DTD

A technical Introduction to XML Transparency No Internal Entities Entities defined by EntityValue is called an internal entity. no separate physical storage object, the content of the entity is given in the declaration. Some processing of entity and character references in the literal entity value may be required to produce the correct replacement text: see "4.5 Construction of Internal Entity Replacement Text". An internal entity is a parsed entity. Example of an internal entity declaration: <!ENTITY Pub-Status "This is a pre-release of the specification.">

A technical Introduction to XML Transparency No External Entities If the entity is not internal, it is an external entity. External Entity Declaration [75] ExternalID ::= 'SYSTEM' S SystemLiteral [9] | 'PUBLIC' S PubidLiteral S SystemLiteral [76] NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ] If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity. [VC: Notation Declared]: The Name must match the declared name of a notation. SystemLiteral is called the entity’ system identifier, which is a URI. PubidLiteral is called the entity’s public identifier, which the XML processor may use to produce an alternative URI.

A technical Introduction to XML Transparency No. 48 Examples of external entity declaration <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" " > <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

A technical Introduction to XML Transparency No Parsed Entities The Text Declaration External parsed entities may each begin with a text declaration. Text Declaration [77] TextDecl ::= ' ' Notes: The text declaration must be provided literally, not by reference to a parsed entity. cannot appear at any position other than the beginning of an external parsed entity.

A technical Introduction to XML Transparency No Well-formed Parsed Entities The document entity is well-formed if it matches the production labeled document [1]. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt [78]. An external parameter entity is well-formed if it matches the production labeled extPE [79]. All external parameter entities are well-formed by definition. Well-Formed External Parsed Entity [78] extParsedEnt ::= TextDecl? content [79] extPE ::= TextDecl? extSubsetDecl

A technical Introduction to XML Transparency No Well-Formed Parsed Entities (cont’d) An internal general parsed entity is well-formed if its replacement text matches the production labeled content [43]. All internal parameter entities are well-formed by definition. A consequence of well-formedness in entities: the logical and physical structures in an XML document are properly nested; i.e., no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.

A technical Introduction to XML Transparency No Character Encoding in Entities Each external parsed entity in an XML document may use a different encoding for its characters. All XML processors must be able to read entities in either UTF-8 or UTF-16. Parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration containing an encoding declaration: Encoding Declaration [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'"EncName "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */ Examples:

A technical Introduction to XML Transparency No XML Processor Treatment of Entities and References The contexts in which character references, entity references, and invocations of unparsed entities might appear: 1. Reference in Content : as a reference anywhere after the start-tag and before the end-tag of an element; corresponds to the nonterminal content. EX: He said: &WhatHeSaid; 2. Reference in Attribute Value : as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue. ex: 3. Occurs as Attribute Value: as a Name, not a reference, appearing either as the value of an attribute which has been declared as type ENTITY, or as one of the space-separated tokens in the value of an attribute which has been declared as type ENTITIES.

A technical Introduction to XML Transparency No Context in which entities or character reference may occur ex: <!ENTITY Apicture SYSTEM " NDATA GIF> … 4. Reference in Entity Value : as a reference within a parameter or internal entity's literal entity value in the entity's declaration; corresponds to the nonterminal EntityValue. ex: 5. Reference in DTD : as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue or AttValue. ex:

A technical Introduction to XML Transparency No (cont’d) types of entities internal v.s. external: internal ==> content given in the declaration external ==> content obtained outside the declaration ex1: ex2: ex3: general v.s. parameter entities: general ==> used in document instance parameter ==> used in document declaration(DTD) ex: ex1==> general; ex2=> PE parsed v.s. unparsed entities: parsed => XML processor will parse it ==> ex1, ex2 unparsed => XML processopr need’t parse it. ==> ex3 note: unparsed entities must be general and external.

A technical Introduction to XML Transparency No XML Processor Treatment of Entities and References

A technical Introduction to XML Transparency No Included An entity is included when its replacement text is retrieved and processed,in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and (except for parameter entities) markup, which must be recognized in the usual way, ex: ==>”&AC;” ==> “The &W3C; Advisory Council” ==> “The WWW Consortium Advisory Council”. (The string "AT&T;” expands to "AT&T;" and the remaining ampersand is not recognized as an entity- reference delimiter.)

A technical Introduction to XML Transparency No include in literal Same as Included except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. Example: this is well-formed: while this is not:

A technical Introduction to XML Transparency No Construction of Internal Entity Replacement Text Two forms of the entity's value of an internal entity. literal entity value : the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. replacement text : the content of the entity, after replacement of character references and parameter-entity references. Notes: 1. General-entity references in literal entity value are not expanded to produce replacement text. 2. It is the replacement text of the entity that is substituted for every occurrence of it entity reference.

A technical Introduction to XML Transparency No Example <!ENTITY book "La Peste: Albert Camus, © 1947 %pub;. &rights;" > => Entity book has replacement text: “La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights;” Note: No forward reference for PE is permitted. Hence entity ‘book’ could not be put before ‘pub’ entity.

A technical Introduction to XML Transparency No Predefined Entities Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "<" and "&" may be used to escape < and & when they occur in character data. 1. // < double escaping required for 2. // & well-formed replacement text 3. // > double escaping harmless but 4. // ‘ not needed 5. // “

A technical Introduction to XML Transparency No Notation Declarations Notations identify by name the format of unparsed entities e.g., GIF, JPEG, DOC,BMP,… Notation Declarations [82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' [83] PublicID ::= 'PUBLIC' S PubidLiteral 4.8 Document Entity serves as the root of the entity tree and a starting-point for an XML processor. unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

A technical Introduction to XML Transparency No Grammar Notation (EBNF) #xN [a-zA-Z], [#xN-#xN], [acg] [^a-z] [^abc] “string”, ‘STRING’[vc: …. ] (expression)[wfc: …. ] A? A B/* Comment */ A | B A-B A+ A*

A technical Introduction to XML Transparency No. 64 Appendix D. Expansion of Entity and Character References An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;). " > ==> ENTITY example has value(replacement text): An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;). A reference in the document to “&example;” cause the text to be reparsed: ==> An ampersand (&) may be escaped numerically (&) or with a general entity (&).

A technical Introduction to XML Transparency No. 65 D. More complex example 1 2 <!DOCTYPE test [ ' > 6 %xx; 7 ]> 8 This sample shows a &tricky; method. line4 => xx has value “%zz;” line5 => zz has value “ ” line6 => %xx; => %zz; => declared line 8 => element test has content: “This sample shows a error-prone method.”

A Technical Introduction to XML Transparency No. 1 XML quick References

A technical Introduction to XML Transparency No. 67 XML Declaration Version of the XML specification character encoding of the document, expressed in Latin characters, e.g., UTF-8, UTF-16, iso , no: parsing affected by external DTD subset yes: not affected.

A technical Introduction to XML Transparency No. 68 Processing Instruction and comment may contain any characters except the string “--”

A technical Introduction to XML Transparency No. 69 Start tag with attribute ( in document) and end tag name of the attribute value or values of the attribute name(or type) of the element single or double quotes, ‘ or “ must match Each element may contain zero or more attributes start tag and end tag must match

A technical Introduction to XML Transparency No. 70 EMPTY Element and CDATA Section may contains any characters except the string “]]>”, characters in CDATA section will not be parsed (preserve their literal meaning).

A technical Introduction to XML Transparency No. 71 DOCTYPE Declaration name of the document type the internal subset of the DTD (optional) pointer to another file DSO DSC Declaration Subset Open Subset Close Keyword DOCTYPE

A technical Introduction to XML Transparency No. 72 Internal Subset <!DOCTYPE root [ ]> … DOCTYPE declaration include other declarations in this internal subset tags and text: the document

A technical Introduction to XML Transparency No. 73 External Subset <!DOCTYPE root SYSTEM “rootURI.dtd” > … DOCTYPE declaration refers to a DTD in an external subset. the other form: PUBLIC “publicLiteral” “root.dtd” tags and text: the document a file named : rootURI.dtd

A technical Introduction to XML Transparency No. 74 Internal and external Subsets <!DOCTYPE root SYSTEM “root.dtd” [ ]> … DOCTYPE declaration refers to an external subset and includes an internal subset. DTD is sum of both parts with internal subset taking precedence when conflict. tags and text: the document an external file

A technical Introduction to XML Transparency No. 75 Conditional Section (DTD only) and External-ID Include: not-include: External-ID: SYSTEM “URI” or PUBILC “publicID” “URI”

A technical Introduction to XML Transparency No. 76 Parameter Entity(PE) Declarations Internal Parameter Entity Keyword ENTITY percent sign % show this is a PE. name of the entity entity value (any literal) single or double quotes, ‘ or “ must match

A technical Introduction to XML Transparency No. 77 Parameter Entity Declarations External Parameter Entity Keyword ENTITY percent sign % show this is a PE. name of the entity pointer to a file, whose content is the entity value

A technical Introduction to XML Transparency No. 78 Notation declaration Notes: 1.keyword NOTATION 2.name of the notation (GIF, JPEG, PNG, etc); must be unique in DTD. 3.SYSTEM or PUBLIC identifer (PUBLIC does not require URI). EXs: 1.SYSTEM “my-gif.def” 2.PUBLIC “-//W3c PNG//PNG’s public id //EN” “pngLoc.def”

A technical Introduction to XML Transparency No. 79 general Entity Declarations Internal [general] Entity Keyword ENTITY name of the entity entity value (any literal) single or double quotes, ‘ or “ must match

A technical Introduction to XML Transparency No. 80 General Entity Declarations External Unparsed [General] Entity Keyword ENTITY Keyword NDATA followed by notation name, which must be defined name of the entity SYSTEM or PUBLIC identifier, pointer to a file, whose content is the entity value and will not be parsed.

A technical Introduction to XML Transparency No. 81 Predefined general entities ENTITYDisplay AsCharacter value &&&& <<&< >>> &apos;‘' "“"

A technical Introduction to XML Transparency No. 82 Element Declaration ANY Element keyword may contain zero or more elements and text data EMPTY ELEMENT keyword must not contain any content keyword ELEMENT name of the element type (tag name) formal definition of the element’s allowed content

A technical Introduction to XML Transparency No. 83 Special symbols used in content-model Connectors:, => “then” Follow with (in sequence) | => “Or” Select (only) one from the group Only one connector type per group -- no mixing! Groupings ( => start c.m or grouping ) => end c.m. or grouping Ex: (A, B, C) (A | (B,C) | (C,D)) (A,B | C) (A | B, C) Occurrence Indicators ? => optional, zero or one * => zero or more + => one or more (no indicator) => one and only one

A technical Introduction to XML Transparency No. 84 #PCDATA in content-model pure text content: (#PCDATA) mixed (mode) with other elements (#PCDATA | element-1 | … | element-n )* Notes: #PCDATA must be placed first must always include the *

A technical Introduction to XML Transparency No. 85 Attribute Declaration 1.Keyword ATTLIST 2.name of the associated element 3.name of the attribute 4.type of the attribute 5.keyword or default value Reserved attributes: xml:space :(default | preserve), preserve white space or use default xml:lang : indicate langugae of element and that element’s attributes and children.

A technical Introduction to XML Transparency No. 86 Types of XML Attributes CDATAData character string (default if well-formed) NMTOKEN Name Token NMTOKENS one or more name tokens (spaces between) ID Unique identifier for element IDREF reference to ID on another element IDREFS one or more IDREFs ( spaces between) ENTITY Name of an unparsed entity ENTITIES one or more names of entities Enumerations: ( a | b | c ) : list attribute values: a,b,c ( Or between) NOTATION ( x | y | z) : names of notation: ( Requires a list of values as well as the keyword. x,y,z must be declared elsewhere with NOTATION).

A technical Introduction to XML Transparency No. 87 Attribute Defaults “value” if omitted in document, assume this value. #REQUIRED cannot be omitted in document for validity. #IMPLIED optional. no default can be inferred; application is free to handle as appropriate. #FIXED “value” fixed value. if a different value appears in document, it is not valid.