XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

Slides:



Advertisements
Similar presentations
Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
Advertisements

XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2002.
Introduction to XML: DTD
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
Sistemi basati su conoscenza XML (esempi) Prof. M.T. PAZIENZA a.a
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
Structured Documents KA1 Document Type definition DTD.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a
Week 4 Document Type Definition (DTD)
A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2004.
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a
XML Fundamentals Transparency No. 1 XML Fundamentals Cheng-Chia Chen November 2004.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML - DTD. The building blocks of XML documents Elements, Tags, Attributes, Entities, PCDATA, and CDATA.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
3. Document Type Definitions(DTDs) Data Warehousing Lab. 윤 혜 정.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 Dr Alexiei Dingli XML Technologies DTD. 2 Document Type Definition Defines –the legal building blocks of an XML document –the document structure –The.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
Document Type Definitions (DTD) A Document Type Definition (DTD) defines the structure and the legal elements and attributes of an XML document. A DTD.
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
CIS 375—Web App Dev II DTD. 2 Introduction to DTD DTD stands for ______________________. The purpose of a DTD is to define the legal building blocks of.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML DTD. XML Validation XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid" XML.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
CH 9 Attribute Declaration 1. Objective What is an attribute Declaring attributes Declaring multiple attribute Alternatives to default attributes values.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML Technologies DTD.
Session III Chapter 6 – Creating DTDs
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Presentation transcript:

XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD Transparency No. 2 DTD - Table of Contents Introduction to DTD An introduction to the XML Document Type Definition. DTD - XML Building Blocks What XML building blocks are defined in a DTD. DTD Elements How to define the elements of an XML document using DTD. DTD Attributes How to define the legal attributes of XML elements using DTD. DTD Entities How to define XML entities using DTD.

XML DTD Transparency No. 3 Introduction to DTD The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. A DTD can be declared inline in your XML document, or as an external reference.

XML DTD Transparency No. 4 Internal DTD This is an XML document with a Document Type Definition: <!DOCTYPE note [ ]> Tove Jani Reminder Don't forget me this weekend! The DTD is interpreted like this: !ELEMENT note (in line 2) defines the element "note" as having four elements: "to,from,heading,body". and so on.....

XML DTD Transparency No. 5 External DTD This is the same XML document with an external DTD: Tove Jani Reminder Don't forget me this weekend!

XML DTD Transparency No. 6 note.dtd This is a copy of the file "note.dtd" containing the Document Type Definition:

XML DTD Transparency No. 7 Why use a DTD? XML provides an application independent way of sharing data. With a DTD, independent groups of people can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to verify your own data.

XML DTD Transparency No Document Type Declaration (cont’d) Document Type Definition [28] doctypedecl ::= '<!DOCTYPE' S Name ( S ExternalID)? S? ('[' (markupdecl | DeclSep)* ']' S?)? '>' [ VC: Root Element Type ] [28a] DeclSep ::= PEReference | S [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment [ VC: Proper Declaration/PE Nesting ] [ WFC: PEs in Internal Subset ]

XML DTD Transparency No. 9 WFC: PEs in Internal Subset Well-formedness constraint: PEs in Internal Subset In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.) Ex: the following is not well-formed! <!DOCTYPE test SYSTEM "test.dtd" [ ]> PE reference cannot appear here In internal subset even YN is defined in test.dtd

XML DTD Transparency No External Subset Portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset. External Subset [30] extSubset ::= TextDecl? extSubsetDecl [31] extSubsetDecl ::= ( markupdecl | conditionalSect | DeclSep )*

XML DTD Transparency No Example XML documents An example of an XML document with a document type declaration: Hello, world! The system identifier "hello.dtd" gives the URI of a DTD for the document. The declarations can also be given locally, as in this example: <!DOCTYPE greeting [ ]> Hello, world!

XML DTD Transparency No Standalone Document Declaration Standalone Document Declaration [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) [ VC: Standalone Document Declaration ] Example:

XML DTD Transparency No White Space and End-of_line Handling White Space: special attribute xml:space used to indicate if (markup) spaces should be preserved. Normalize End-of-line: #xD#xA --> #xA #D --> #xA before parsing

XML DTD Transparency No Language Identification A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages”. Example: xml:lang NMTOKEN #IMPLIED

XML DTD Transparency No Language Identifications The quick brown fox jumps over the lazy dog. What colour is it? What color is it? Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit hei 絽 m Bem'n.

XML DTD Transparency No. 16 DTD - XML building blocks The building blocks of XML documents XML documents (and HTML documents) are made up by the following building blocks: Elements, Tags, Attributes, Entities, PCDATA, and CDATA sections This is a brief explanation of each of the building blocks:

XML DTD Transparency No. 17 Elements Elements are the main building blocks of both XML and HTML documents. Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".

XML DTD Transparency No. 18 Tags Tags are used to markup elements. A starting tag like mark up the beginning of an element, and an ending tag like mark up the end of an element. Examples: A body element: body text in between. A message element: some message in between

XML DTD Transparency No. 19 Attributes Attributes provide extra information about elements. Attributes are placed inside the start tag of an element. Attributes come in name/value pairs. The following "img" element has an additional information about a source file: Notes: The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".

XML DTD Transparency No. 20 PCDATA PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded, hence they should not appear pcdata. Ex: abc de

XML DTD Transparency No. 21 CDATA sections CDATA also means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded. Ex: abc def]]> g

XML DTD Transparency No. 22 Entities Entities are used to define common text like macros. Entity references are references to entities. Most of you will know the HTML entity reference: " " that is used to insert an extra space in an HTML document. Entities are expanded when a document is parsed by an XML parser. The following entities are predefined in XML: Entity References Character << >> && "" &apos;'

XML DTD Transparency No. 23 DTD - Elements Declaring an Element In the DTD, XML elements are declared with an element declaration. An element declaration has the following syntax: Types of element contents: EMPTY – no contents ANY -- no restriction on contents MIXED-- allow character data (character data only) or (character data + elements) ELEMENTs-ONLY -- allow elements only

XML DTD Transparency No. 24 EMPTY elements Elements with empty content Declared with the keyword EMPTY: Example: Legal Instances:

XML DTD Transparency No. 25 ANY Elements Elements that can contain any combination of elements and text data. Declared with the ‘ANY’ keyword Example: Legal instances: e2 fff … dddd

XML DTD Transparency No. 26 Elements with MIXED contents Elements that can only contain text contents Elements allowing text as well as element contents Example: Instances: ddd cd ttt

XML DTD Transparency No. 27 Elements that can contains element contents only Issue: how to declare the possible sequences of content elements occurrences. Solu: regular expressions over element names Definition: CP ::= (name | choice | seq ) (‘+’ | ‘*’ | ‘?’ )? choice ::=a list of two or more CPs separated by ‘|’ and is enclosed by ‘(‘ and ‘)’. seq::= a list of one or more CPs seprated by ‘,’ and is enclosed by ‘(‘ and ‘)’ Element-Only elements:

XML DTD Transparency No. 28 Recursive definition of CP, seq and choice: Basis: if  is a name, then ,  ?,  +,  * are CPs (content particle). closure: if  is an seq or a choice, then ,  ?,  +,  * are CPs. if  1,  2,…  n (n > 1) are CPs, then (  1 |  2 | … |  n ) is a choice. if  1,  2,…  n (n > 0) are CPs, then (  1,  2, …,  n ) is an seq.  is a children if  is a CP but is not a name optionally followed by +, ? or *. Examples of children: Illegal :, Legal :,,

XML DTD Transparency No. 29 More examples <!ELEMENT note (to, from, heading1 | heading2, body)> (X) <!ELEMENT note (to, from, (heading1 | heading2), body)> (0) <!ELEMENT E1 ( (E1, E2) | (E1, E3, E2)) > (x, 1-ambiguous) Rewritten as … (E1, (E2 | (E3,E2)))> (0)

XML DTD Transparency No (cont’d) Element Type Declaration [45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [ VC: Unique Element Type Declaration] [46] contentspec ::= ‘EMPTY’ | ‘ANY’ | Mixed | children Examples:

XML DTD Transparency No Element Content An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S). In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear. The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles:

XML DTD Transparency No (cont’d) Element-content Models [47] children ::= (choice | seq) ('?' | '*' | '+')? [48] cp ::= (Name | choice | seq) ('?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')' [50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')' where each Name is the type of an element which may appear as a child. Examples: Note: (x) (0)

XML DTD Transparency No Mixed Content Mixed-content Declaration [51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' Examples:

XML DTD Transparency No. 34 Attribute Definition Defined for the elements they belong to <!ELEMENT book (preface, toc, chapter+, index?) Or <!ATTLIST book title CDATA #REQUIRED isbn CDATA #IMPLIED > Format: Atributes have a name, a type, a default-value and belong to an element.

XML DTD Transparency No. 35 Attribute types typeExplanation CDATAThe value is character data (eval|eval|..) The value must be an enumerated value IDThe value is an unique id IDREFThe value is the id of another element IDREFSThe value is a list of other ids NMTOKENThe value is a valid XML name token NMTOKENSThe value is a list of valid XML name tokens ENTITYThe value is an entity ENTITIESThe value is a list of entities NOTATIONThe value is a name of a notation

XML DTD Transparency No. 36 Attribute-default value ValueExplanation “ v1 ” The attribute has a default value “ v1 ” #REQUIRED The attribute value must be included in the element #IMPLIED The attribute does not have to be included #FIXED “ value ” The attribute value is fixed

XML DTD Transparency No. 37 Attribute Examples DTD example: XML example: Default attribute value Syntax: DTD example: XML example: equ.to.

XML DTD Transparency No. 38 Implied attribute Syntax: example: instance:

XML DTD Transparency No. 39 Required attribute Syntax: DTD example: XML example: (x)

XML DTD Transparency No. 40 Fixed attribute value Syntax: DTD example: XML example: equ.to

XML DTD Transparency No. 41 Enumerated attribute values Syntax: DTD example: XML example: or

XML DTD Transparency No Attribute-List Declarations Attribute-list Declaration [52] AttlistDecl ::= ' ' [53] AttDef ::= S Name S AttType S DefaultDecl XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types.

XML DTD Transparency No Attribute Types Attribute Types [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS’ | 'ENTITY’ | 'ENTITIES' | 'NMTOKEN’ | 'NMTOKENS’ ID, IDREF and IDREFS for cross references ENTITY for referring to external unparsed objects NMTOKEN restrict attvalue to be a Nmtoken.

XML DTD Transparency No (cont’d) Example of Entity type usage <! DOCTYPE BookCategory [... … <!ENTITY cover1 SYSTEM NDATA PDF> … ]>... … …

XML DTD Transparency No Enumerated Attribute Types Enumerated Attribute Types [57] EnumeratedType ::= NotationType | Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

XML DTD Transparency No Attribute Defaults An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document. Attribute Defaults [60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue) Examples: <!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST file format NOTATION (ps | pdf | word ) #REQUIRED <!ATTLIST form method CDATA #FIXED "POST">

XML DTD Transparency No Attribute-value normalization When: after end-of-line processing but before passed to app. 0. End-of-line processing ( &#XD &#XA  &#xA) Steps: initially nv=“” // normalized value 1.Repeat until end of input. character reference => append the referenced character to the normalized value entity reference => recursively apply step 1 to the replacement text of the entity. white space character (#x20, #xD, #xA, #x9) => append a space character (#x20) to the normalized value. O/w (other character ) =>append the character to the normalized value. 2. If not CDATA type => removing leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character Notes : 1. char and entity references are not treated equal. 2. White spaces are normalized to space.

XML DTD Transparency No. 48 Examples Attribute specification a is NMTOKENSA is CDATA a= “ xyz ” xyz #x20 #x20 x y z a="&d;&d;A&a;&a; B&da;" A #x20 B#x20 #x20 A #x20 #x20 B #x20 #x20 a= " A&#x a; B &#x a;" #xD #xD A #xA #xA B #xD #xA #xD #xD A #xA #xA B #xD #xD

XML DTD Transparency No. 49 DTD-Entities Entities used to define shortcuts to common text, like macros in programming languages. Entity references are references to entities. If name is an entity [name], then &name; (or %name; but not both) is its reference Entities can be declared internal ( contents in the same doc as its declaration) or external (contents external to its declaration)

XML DTD Transparency No. 50 Internal Entity Declaration Syntax: DTD Example: XML example: &p1; &birthday; Equ. To. Peter 2/12/2000

XML DTD Transparency No. 51 External Entity Declaration Syntax: DTD Example: XML example: &writer;&copyright;

XML DTD Transparency No. 52 Structure of XML Documents Logical Structure Elements Character data Physical Structure Entities Document Unit Sub-unit Document entity External parsed entity External unparsed entity

XML DTD Transparency No Physical Structures An XML document may consist of one or many storage units called entities; have content identified by name. Each XML document has one entity called the document entity, the starting entity for the XML processor and may contain the whole document. the only kind of entities without name. Entities may be either parsed or unparsed. this text is considered an integral part of the document.

XML DTD Transparency No. 54 Classification of entities parsed v.s. unparsd entities general v.s. parameter entities external v.s. internal entities

XML DTD Transparency No. 55 Parsed entity and unparsed entity An unparsed entity is a resource whose contents are not to be processed by XML processor. has an associated notation, identified by name. must be an external entity (with publicId or SystemId) referenced by [entity] name (instead of entity reference) occurring only in the value of ENTITY or ENTITIES attributes. Parsed entities are entities whose contents need to be processed by XML Processor. reference by using entity references. contents are referred to as its replacement text;

XML DTD Transparency No. 56 Examples external general parsed entity. internal general parsed entity internal parameter parsed entity external general unparsed entity. <!ENTITY cover1 SYSTEM NDATA PDF> Note: Notation and unparsed entity are rarely used in practice.

XML DTD Transparency No. 57 General entity and parameter entity Parameter entities are parsed entities for use within the DTD. referenced by the form: %name; General entities are entities for use within the document content. sometimes simply called entity when this leads to no ambiguity. reference by the form: &name; Comparisons: use different syntax in DTD for definition. use different forms of references recognized in different contexts.

XML DTD Transparency No. 58 Examples external general parsed entity. internal general parsed entity internal parameter parsed entity external parameter parsed entity. Notes: all parameter entities are parsed entities Parameter entities generally contain only grammar information.

XML DTD Transparency No Character and Entity References A character reference refers to a specific character in the ISO/IEC character set. Character Reference [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

XML DTD Transparency No Character and Entity References (cont’d) Entity Reference [67] Reference ::= EntityRef | CharRef [68] EntityRef ::= '&' Name ';' [69] PEReference ::= '%' Name ';’

XML DTD Transparency No Entity Declarations Entity Declaration [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= ' ' [72] PEDecl ::= ' ' [73] EntityDef ::= EntityValue [9] | ( ExternalID NDataDecl?) [74] PEDef ::= EntityValue | ExternalID notes: 1. General entities can only be referenced at non-DTD region 2. Parameter entities are referenced at DTD

XML DTD Transparency No. 62 Literals [9] EntityValue ::= ‘”’ ([^%&”] | PEReference | Reference)* ‘”’ | “’” ([^%&'] | PEReference | Reference)* “’” [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] |

XML DTD Transparency No Internal Entities Entities defined by EntityValue is called an internal entity. the content of the entity is given in the declaration. no separate physical storage object, Some processing of entity and character references in the literal entity value may be required to produce the correct replacement text. An internal entity is always a parsed entity. Example of an internal entity declaration: <!ENTITY Pub-Status "This is a pre-release of the specification.">

XML DTD Transparency No External Entities If the entity is not internal, it is an external entity. External Entity Declaration [75] ExternalID ::= 'SYSTEM' S SystemLiteral [9] | 'PUBLIC' S PubidLiteral S SystemLiteral [76] NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ] If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity. [VC: Notation Declared]: The Name must match the declared name of a notation. SystemLiteral is called the entity’ system identifier, which is a URI. PubidLiteral is called the entity’s public identifier, which the XML processor may use to produce an alternative URI.

XML DTD Transparency No. 65 Examples of external entity declaration <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" " > <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

XML DTD Transparency No Parsed Entities The Text Declaration External parsed entities may each begin with a text declaration. Text Declaration [77] TextDecl ::= ' ' Notes: must appear at the beginning of an external parsed entity. The text declaration must be provided literally, not by reference to a parsed entity.

XML DTD Transparency No Well-formed Parsed Entities The document entity is well-formed if it matches the production labeled document [1]. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt [78]. All external parameter entities are well-formed by definition. Well-Formed External Parsed Entity [78] extParsedEnt ::= TextDecl? content

XML DTD Transparency No Well-Formed Parsed Entities (cont’d) An internal general parsed entity is well-formed if its replacement text matches the production labeled content [43]. content All internal parameter entities are well-formed by definition. A consequence of well-formedness in entities: the logical and physical structures in an XML document are properly nested; i.e., no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.

XML DTD Transparency No Character Encoding in Entities External parsed entities may use different encoding for their characters. All XML processors must support UTF-8 and UTF-16. must declare encoding in text declaration for encoding other than UTF-8 or UTF-16. Encoding Declaration [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'"EncName "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */ Examples:

XML DTD Transparency No XML Processor Treatment of Entities and References The contexts in which character references, entity references, and unparsed entity names might appear: 1. Reference in Content : as a reference in content. EX: He said: &WhatHeSaid; 2. Reference in Attribute Value : as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue. ex: 3. Occurs as Attribute Value: as a Name, not a reference, appearing as the value of an attribute declared as type ENTITY, or ENTITIES.

XML DTD Transparency No Context in which entities or character reference may occur ex: <!ENTITY Apicture SYSTEM " NDATA GIF> … 4. Reference in Entity Value : as a reference in rule EntityValue. ex: 5. Reference in DTD : as a reference in internal or external subsets of the DTD, but outside of an EntityValue or AttValue. ex: %manyElements;

XML DTD Transparency No summary on entities internal v.s. external: internal ==> content given in the declaration external ==> content obtained outside the declaration ex1: ex2: ex3: general v.s. parameter entities: general ==> used in document instance parameter ==> used in document declaration(DTD) ex: ex1==> general; ex2=> PE parsed v.s. unparsed entities: parsed => XML processor will parse it ==> ex1, ex2 unparsed => XML processopr need’t parse it. ==> ex3 note: unparsed entities must be general and external.

XML DTD Transparency No. 73 possible types of entities: There are only 5 kinds of entities Since unparsed entities must be external and general. Internal parsed general entities Internal parsed parameter entities external parsed general entities external parsed parameter entities external unparsed general entities internal unparsed ---- entities (x) All internal entities are parsed entities. external unparsed parameter entities (x) All parameter entities are parsed.

XML DTD Transparency No Construction of Internal Entity Replacement Text Two forms of the entity's value of an internal entity. literal entity value : the quoted string actually present in the entity declaration, corresponding to the non-terminal EntityValue. replacement text : the content of the entity, after replacement of character references and parameter-entity references. Notes: 1. General-entity references in literal entity value are not expanded to produce replacement text. 2. It is the replacement text of the entity that is substituted for every occurrence of its entity reference.

XML DTD Transparency No Example <!ENTITY book "La Peste: Albert Camus, © 1947 %pub;. &rights;" > => Entity book has replacement text: “La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights;” Note: No forward reference for PE is permitted. Hence entity ‘book’ could not be put before ‘pub’ entity.

XML DTD Transparency No Included An entity is included when its replacement text is retrieved and processed,in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and (except for parameter entities) markup, which must be recognized in the usual way, ex: ==>”&AC;” ==> “The &W3C; Advisory Council” ==> “The WWW Consortium Advisory Council”.

XML DTD Transparency No include in literal Same as Included except that a single or double quote character in the replacement text is always treated as a normal data character and will not terminate the literal. Example: this is well-formed: while this is not:

XML DTD Transparency No included as PE same as ‘included as literal’ but the replacement text is enlarged by the attachment of one leading and one following space (#x20) character. ex: pe1 => [red | gree | blue]

XML DTD Transparency No XML Processor Treatment of Entities and References

XML DTD Transparency No Predefined Entities Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references "<" and "&" may be used to escape < and & when they occur in character data. 1. // < double escaping required for 2. // & well-formed replacement text 3. // > double escaping harmless but 4. // ‘ not needed 5. // “ ex: The string "AT&T;” ==> "AT--&T;" ==> “AT&--T;”. If Define 2. as “& => AT&T;” ==> “AT--&T;” ==> err.

XML DTD Transparency No Notation Declarations Notations identify by name the format of unparsed entities e.g., GIF, JPEG, DOC,BMP,… Notation Declarations [82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' [83] PublicID ::= 'PUBLIC' S PubidLiteral 4.8 Document Entity serves as the root of the entity tree and a starting-point for an XML processor. unlike other entities, the document entity has no name and might well appear on a processor input stream without any identification at all.

XML DTD Transparency No Grammar Notation (EBNF) #xN [a-zA-Z], [#xN-#xN], [acg] [^a-z] [^abc] “string”, ‘STRING’[vc: …. ] (expression)[wfc: …. ] A? A B/* Comment */ A | B A-B A+ A*

XML DTD Transparency No. 83 Appendix D. Expansion of Entity and Character References An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;). " > ==> ENTITY example has value(replacement text): An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;). A reference in the document to “&example;” cause the text to be reparsed: ==> An ampersand (&) may be escaped numerically (&) or with a general entity (&).

XML DTD Transparency No. 84 D. More complex example 1 2 <!DOCTYPE test [ ' > 6 %xx; 7 ]> 8 This sample shows a &tricky; method. line4 => xx has value “%zz;” line5 => zz has value “ ” line6 => %xx; => %zz; => declared line 8 => element test has content: “This sample shows a error-prone method.”

XML DTD Transparency No Conditional Sections Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them. Conditional Section [61] conditionalSect ::= includeSect | ignoreSect [62] includeSect ::= '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' [63] ignoreSect ::= '<![' S? 'IGNORE' S? '[’ ignoreSectContents* ']]>' [64] ignoreSectContents ::= Ignore ('<![' ignoreSectContents ']]>' Ignore)* [65] Ignore ::= Char* - (Char* (' ') Char*)

XML DTD Transparency No Conditional Sections Example: <![%draft;[ ]]> <![%final;[ ]]>