Presentation is loading. Please wait.

Presentation is loading. Please wait.

Validating DOCUMENTS with DTDs

Similar presentations


Presentation on theme: "Validating DOCUMENTS with DTDs"— Presentation transcript:

1 Validating DOCUMENTS with DTDs
Tutorial 3: XML Validating DOCUMENTS with DTDs

2 Creating a Valid Document
Section 3.1 Creating a Valid Document

3 Customer orders table

4 The structure of the order.xml document
customers + customer custID [custType] name [title] The customers must have at least one customer child A customer must have a custID, name, address, phone, and may have a custType, title, address phone An orders element is used to group one or more separate order placed by a customer ? orders + The orders must have at least one order child order orderID orderBy orderDate items The items must have at least one item child + item itemPrice [itemQty]

5 The first customer in the Orders.xml

6 DTD and A Valid Document
An XML document can be validated using either DTDs (Document Type Definitions) or schemas. A DTD is a collection of rules that define the content and structure of an XML document. A DTD can be used to: enforce a specific data structure ensure all required elements are present prevent undefined elements from being used specify the use of attributes and define their possible values

7 Declaring a DTD A DTD is declared in a DOCTYPE statement. It has to be added to the document prolog, after the XML declaration and before the document's root element. While there can only be one DTD per XML document, it can be divided into two parts: An internal subset is placed within the same XML document. An external subset is located in a separate file.

8 To declare an internal DTD subset
<!DOCTYPE document’s root [ declarations ]> An example: <!DOCTYPE customers

9 To declare an external DTD subset
External subsets have two types of locations: system and public. For a system DTD, <!DOCTYPE root SYSTEM “uri_ExternalFile”> An example: <!DOCTYPE customers SYSTEM "rules.dtd">

10 To declare an external DTD subset
The syntax of the DOCTYPE declaration using a public identifier: <!DOCTYPE root PUBLIC “id” “uri” > Where id is public identifier acting like the namespace URI An example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "

11 Using External & Internal DTDs
The real power of XML comes from an external DTD that can be shared among many documents. If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.

12 Using External and Internal DTDs

13 Declaring Document Elements
In a valid document, every element must be declared in the DTD. The syntax of an element declaration is: <!ELEMENT element content-model> where element is the name of the element and content- model specifies what type of content the element contains. The element name is case sensitive

14 Five values for content-model
ANY - No restrictions on the element’s content EMPTY - The element cannot store any content #PCDATA - The element can only contain parsed character data Elements - The element can only contain child elements Mixed - The element contains both parsed character data and child elements

15 <!ELEMENT element ANY>
An example: <!ELEMENT product ANY> All of the following satisfy the above declaration: <product>PLBK70 Painted Lady Breeding Kit</product> <product type = "Painted Lady Breeding Kit" /> <product> <name>PLBK70</name> <type> Painted Lady Breeding Kit</type> </product>

16 <!ELEMENT element EMPTY>
An example: <!ELEMENT img EMPTY> The following would satisfy the above declaration: <img />

17 <!ELEMENT element (#PCDATA)>
An example <!ELEMENT name (#PCDATA)> would permit the following element in an XML document: <name>Lea Ziegler</name> PCDATA element may contain plain text. The "parsed" part of it means that markup in it is parsed instead of displayed as raw text. It also means that entity references are replaced. PCDATA element does not allow for child elements

18 <!ELEMENT parent (children)>
<!ELEMENT customer (phone)> The customer element can contain only a single child element, named phone. The following would be invalid: <customer> <name>Lea Ziegler</name> <phone> </phone> </customer>

19 <!ELEMENT customer (name, phone, email)>
Specifying an element sequence <!ELEMENT parent (child1, child2, . .)> child1, child2, . . is the order in which the child elements must appear within the parent element <!ELEMENT customer (name, phone, )> indicates the document below is invalid: <customer> <name>Lea Ziegler</name> <phone>(813) </phone> </customer>

20 Specifying an element choice <. ELEMENT parent (child1 | child2 |
child1, child2 are the possible child elements of the parent element <!ELEMENT customer (name | company)> allows the customer element to contain either the name element or the company element. <!ELEMENT customer ((name | company), phone, )> indicates that the customer element must have three child elements

21 Modifying Symbols DTDs use a modifying symbol to specify the number of occurrences of each element ? allows zero or one of the item + allows one or more of the item * allows zero or more of the item If you want to specify that an element contain exactly three child elements you have to enter the sequence child child child into the declaration

22 Modifying Symbols <!ELEMENT customers (customer+)> the customers element must contain at least one element named customer <!ELEMENT order (orderDate, items)+> the (orderDate, items) sequence can be repeated one or more times within each order element <!ELEMENT customer (name, address, phone, ?, orders)> the customer element contains zero or one element

23 Declaring child elements
customers element can contain one or more customer elements customer element has the following child elements: name, address, phone (optional), and orders

24 Working with Mixed Content
Mixed content elements contain both parsed character data and child elements. The syntax is: <!ELEMENT parent (#PCDATA | child1 | child2 | … )*> The parent element can contain character data or any number of the specified child elements, or it can contain no content at all. It is better not to work with mixed content if you want a tightly structured document.

25 Section 3.2 Declaring Attributes

26 Declaring Element Attributes
Add an attribute-list declaration to the document’s DTD to accomplish the following: lists the names of all of the attributes associated with a specific element specifies the data type of each attribute indicates whether each attribute is required or optional provides a default value for each attribute, if necessary

27 Attributes used in orders.xml
Element Attributes Required Default Value(s) customer custID custType Yes No None “home”, “school”, or “business name Title “Mr.”, “Mrs.”, “Ms.” order orderID orderBy item itemPrice itemQty “1”

28 Declaring Attributes in a DTD
<!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3 … > or <!ATTLIST element attribute1 type1 default1 > <!ATTLIST element attribute2 type2 default2 > <!ATTLIST element attribute3 type3 default3 > element is the name of the element associated with the attributes attribute is the name of an attribute type is the attribute’s data type default indicates whether the attribute is required and whether it has a default value

29 Declaring Attribute Names
Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated

30 Attribute Types Attribute values can consist only of character data, but you can control the format of those characters CDATA - character data Enumerated list - a list of possible attribute values ID - A unique text string IDREF - A reference to an ID value ENTITY - a reference to an external unparsed entity ENTITIES - a list of entities separated by white space NMTOKEN - an accepted XML name NMTOKENS - a list of XML names separated by white space

31 CDATA can contain any character except those reserved by XML
<!ATTLIST element attribute CDATA default> <!ATTLIST item itemPrice CDATA> <!ATTLIST item itemQty CDATA> Any of the following attributes values are allowed: <item itemPrice=“29.95”> </item> <item itemPrice=“$29.95”> </item> <item itemPrice=“£29.95”> </item>

32 Enumerated Types: Attributes that are limited to a set of possible values
<!ATTLIST element attribute (value1 | value2 | value3 | . . ) default > where value1, value2, . . are allowed values <!ATTLIST customer custType (home | business | school)> any custType attribute whose value is not “home”, “school”, or “business” causes parsers to reject the document as invalid

33 Tokenized Types are character strings that follow certain rules (known as tokens) for format & content DTDs support four kinds of tokens: IDs, ID references, name tokens, and entities

34 <!ATTLIST customer custID ID>
ID Token is used when an attribute value must be unique within the document <!ATTLIST customer custID ID> This declaration ensures each customer will have a unique ID The following elements would not be valid because the same custID value is used more than once: <customer custID="Cust021"> ... </customer> <customer custID="Cust021"> ... </customer>

35 <!ATTLIST element attribute IDREF default>
An attribute declared as an IDREF token must have a value equal to the value of an ID attribute located somewhere in the same document. This enables an XML document to contain cross-references between one element and another. <!ATTLIST order orderBy IDREF> When an XML parser encounters this attribute, it searches the XML document for an ID value that matches the value of the orderBy attribute. If it doesn't find one, it rejects the document as invalid.

36 An attribute contains a list of ID references
<!ATTLIST customer orders IDREFS> <!ATTLIST order orderID ID> <customer orders="OR3413 OR3910 OR5310"> ... </customer> ... <order orderID="OR3413"> ... </order> <order orderID="OR3910"> ... </order> <order orderID="OR5310"> ... </order> 36

37 Specifying attribute IDs and IDREFs
each custID value must be unique in the document each orderBy value must reference an ID value somewhere in the document

38 Attribute Defaults There are four possible defaults:
#REQUIRED: The attribute must appear with every occurrence of the element. #IMPLIED: The attribute is optional. An optional default value: A validated XML parser will supply the default value if one is not specified. #FIXED: The attribute is optional. If an attribute value is specified, it must match the default value.

39 An attribute contains a list of ID references
<!ATTLIST customer custID ID #REQUIRED> a customer ID is required for every customer <!ATTLIST customer custType (school | home | business) #IMPLIED> If an XML parser encounters a customer element without a custType attribute, it assumes a blank value for the attribute <!ATTLIST item itemQty CDATA "1"> 39 Assume a value of "1" for itemQty if it's missing 39

40 Specifying attribute defaults

41 DTDs and Namespaces You can work with namespace prefixes, applying a validation rule to the element's qualified name. <!ELEMENT cu:phone (#PCDATA)> Any namespace declarations in a document must also be included in the DTD for the document to be valid. This is usually done using a fixed datatype for the namespace's URI. <!ATTLIST cu:customers xmlns:cu CDATA #FIXED " ">

42 Validating a Document with SMLSpy
The Web is an excellent source for validating parsers, including Web sites in which you can upload your XML document for free to have it validated. XMLSpy is an XML development environment created by Altova, which is used for designing and editing professional applications involving XML, XML Schema, and other XML-based technologies.

43 Section 3.3 Working with entities

44 Introducing Entities XML supports the following built-in entities: & < > &apos; " If you have a long text string that will be repeated throughout your XML document, avoid data entry errors by placing the text string in its own entity. You can create your own customized set of entities corresponding to text strings like product descriptions that you want referenced by the XML document.

45 Working with General Entities
A general entity is an entity that references content to be used within an XML document. That content can be either parsed or unparsed. A parsed entity references text that can be readily interpreted or parsed by an application reading the XML document. An entity that references content that is either nontextual or which cannot be interpreted by an XML parser is an unparsed entity. One example of an unparsed entity is an entity that references a graphic image file.

46 Working with General Entities
The content referenced by an entity can be placed either within the DTD or in an external file. Internal entities reference content found in the DTD. External entities reference content from external files.

47 Internal Parsed Entities
<!ENTITY entity “value”> where entity is the name assigned to the entity and value is the entity’s value that must be well-formed XML text Examples: <!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"> <!ENTITY MBL25 "<desc>Monarch Butterfly, 6-12 larvae</desc>"> & and % are not allowed as part of an entity's value. Use & to include the & symbol, if necessary

48 External Parsed External Entities
For longer text strings, place the content in an external file. To create an external parsed entity, use: <!ENTITY entity SYSTEM “uri”> where uri is the URI of the external file containing the entity value In the declaration: <!ENTITY MBL25 SYSTEM "description.xml"> an entity named “MBL25” gets its value from the description.xml file

49 Referencing a General Entity
After an entity is declared, it can be referenced anywhere within the document. The syntax is: &entity; For example, <item>&MLB25;</item> is interpreted as <item>Monarch Butterfly, 6-12 larvae</item>

50 Declare parsed entities in the codes
Declare parsed entities in the codes.dtd file for the product codes in the orders.xml documentation <!ENTITY BF100P "Butterfly farm pop-up self erecting portable greenhouse"> <!ENTITY BFGK10 "Field of Dreams backyard butterfly garden kit"> <!ENTITY HME100 "Hummingbird Hawkmoth (Manduca Sexta), 100 eggs"> <!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"> <!ENTITY MP12 "Monarch Pupae (Danaus Plexippus), 12 pupae"> <!ENTITY MWT15 "Giant Milkweed Tree (Calotropis Ssp.), 1 crown flower"> <!ENTITY PLBK70 "Painted Lady classroom breeding kit, 70 larvae"> Entity name Entity value

51 Parameter Entities Parameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is: <!ENTITY % entity “value”> For external parameter entities, the syntax is: <!ENTITY % entity SYSTEM “uri”> Once a parameter has been declared, you can add a reference to it within the DTD using: %entity

52 Combining DTDs with parameter entities

53 Add a parameter entity to the DTD within the orders.xml
file to load the contents of the codes.dtd file <!DOCTYPE customers [ . <!ENTITY % itemCodes SYSTEM "codes.dtd"> %itemCodes; ]> <customers> . <orders> <order orderID="or10311" orderBy="cust201"> </order> parameter entity pointing to the code in the codes.dtd file reference to the itemCodes parameter entity

54 Inserting general entities
reference to the BFGK10 general entity

55 Entity references and values

56 Adding entities to the internal DTD

57 Adding DTD comments

58 Conditional Sections <![ keyword [ declarations ]]>
where INCLUDE is for a section of declarations that you want parsers to interpret and IGNORE for the declarations that you want parsers to pass over <![IGNORE[ <!ELEMENT Magazine (Name)> <!ATTLIST Magazine Publisher CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> ]]> <![INCLUDE[ <!ELEMENT Book (Title, Author)> <!ATTLIST Book Pages CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> ]]>

59 Conditional Sections using a parameter entity
<!ENTITY % UseFullDTD "IGNORE” > <![ %UseFullDTD; [ <!ELEMENT Magazine (Name)> <!ATTLIST Magazine Publisher CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> ]]> By changing the value of the UseFullDTD from IGNORE to INCLUDE, you can add any conditional section that uses this entity reference to the document's DTD. Thus, you can switch multiple sections in the DTD off and on by editing a single line in the file. This is most useful when several conditional sections are scattered throughout a very long DTD. Conditional sections can be applied only to external DTDs. 59

60 Working with Unparsed Entities
For a DTD to be able to validate either binary data (images, video) or character data that is not well formed, you need to work with unparsed entities. Because an XML parser cannot work with this type of data directly, a DTD needs to include instructions for how to treat the unparsed entity. To declare an unparsed entity, you must first declare a notation for the data type used in the entity, and then associate a notation with an unparsed entity

61 Declaring a notation <!NOTATION notation SYSTEM "uri">
where notation is the name of the notation and uri is a system location that defines the data type or a program that can work with the data type For example, the following notation named “jpeg” that points to an application paint.exe: <!NOTATION jpeg SYSTEM "paint.exe“> You could also use the mime-type value <!NOTATION jpeg SYSTEM "image/jpeg">

62 Associating a notation with an unparsed entity <
Associating a notation with an unparsed entity <!ENTITY entity SYSTEM "uri" NDATA notation> where entity is the name of the enity, uri is the system location of a file containing the unparsed data, and notation is the name of the notation that defines the data type For example, the following declaration creates an unparsed entity named BF100PIMG that references the graphic image file bf100p.jpg: <!ENTITY BF100PIMG SYSTEM "bf100p.jpg" NDATA jpeg>

63 Adding the image attribute to an XML document
Once you created an entity to reference unparsed data, that entity can be associated with attribute values by using the ENTITY data type in the attribute declaration. For example, <!ATTLIST item image ENTITY #REQUIRED> With this declaration added, you could then add the image attribute to an XML document, using the BF100PIMG entity as the attribute's value: <item image="BF100PIMG">

64 Validating Standard Vocabularies
To validate a document used with a standard vocabulary, you have to access an external DTD located on a Web server or rely upon a DTD built into your XML parser. For example, to validate an XHTML document against the XHTML 1.0 strict standard, add: <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html> </html>

65 DOCTYPE declarations for standard vocabularies


Download ppt "Validating DOCUMENTS with DTDs"

Similar presentations


Ads by Google