Validating DOCUMENTS with DTDs

Slides:



Advertisements
Similar presentations
XML I.
Advertisements

Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
An Introduction to XML Based on the W3C XML Recommendations.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
Tutorial 9 Working with XHTML
Introduction to XML: DTD
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Declare A DTD File. Declare A DTD Inline File For example, use DTD to restrict the value of an XML document to contain only character data.
Tutorial 11 Creating XML Document
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
WORKING WITH NAMESPACES
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
XML (2) DTD Sungchul Hong.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
New Perspectives on XML, 2nd Edition
IS432: Semi-Structured Data Dr. Azeddine Chikh. 4. Document Type Definitions (DTDs)
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
XP New Perspectives on XML, 2nd Edition Tutorial 2 1 TUTORIAL 2 WORKING WITH NAMESPACES.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
1/11 ITApplications XML Module Session 3: Document Type Definition (DTD) Part 1.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
1 Tutorial 12 Working with Namespaces Combining XML Vocabularies in a Compound Document.
Beginning XML 3 rd Edition. Chapter 4: Document Type Definitions.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Session III Chapter 6 – Creating DTDs
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
Presentation transcript:

Validating DOCUMENTS with DTDs Tutorial 3: XML Validating DOCUMENTS with DTDs

Creating a Valid Document Section 3.1 Creating a Valid Document

Customer orders table

The structure of the order.xml document customers + customer custID [custType] name [title] The customers must have at least one customer child A customer must have a custID, name, address, phone, and may have a custType, title, email address phone An orders element is used to group one or more separate order placed by a customer ? email orders + The orders must have at least one order child order orderID orderBy orderDate items The items must have at least one item child + item itemPrice [itemQty]

The first customer in the Orders.xml

DTD and A Valid Document An XML document can be validated using either DTDs (Document Type Definitions) or schemas. A DTD is a collection of rules that define the content and structure of an XML document. A DTD can be used to: enforce a specific data structure ensure all required elements are present prevent undefined elements from being used specify the use of attributes and define their possible values

Declaring a DTD A DTD is declared in a DOCTYPE statement. It has to be added to the document prolog, after the XML declaration and before the document's root element. While there can only be one DTD per XML document, it can be divided into two parts: An internal subset is placed within the same XML document. An external subset is located in a separate file.

To declare an internal DTD subset <!DOCTYPE document’s root [ declarations ]> An example: <!DOCTYPE customers

To declare an external DTD subset External subsets have two types of locations: system and public. For a system DTD, <!DOCTYPE root SYSTEM “uri_ExternalFile”> An example: <!DOCTYPE customers SYSTEM "rules.dtd">

To declare an external DTD subset The syntax of the DOCTYPE declaration using a public identifier: <!DOCTYPE root PUBLIC “id” “uri” > Where id is public identifier acting like the namespace URI An example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Using External & Internal DTDs The real power of XML comes from an external DTD that can be shared among many documents. If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.

Using External and Internal DTDs

Declaring Document Elements In a valid document, every element must be declared in the DTD. The syntax of an element declaration is: <!ELEMENT element content-model> where element is the name of the element and content- model specifies what type of content the element contains. The element name is case sensitive

Five values for content-model ANY - No restrictions on the element’s content EMPTY - The element cannot store any content #PCDATA - The element can only contain parsed character data Elements - The element can only contain child elements Mixed - The element contains both parsed character data and child elements

<!ELEMENT element ANY> An example: <!ELEMENT product ANY> All of the following satisfy the above declaration: <product>PLBK70 Painted Lady Breeding Kit</product> <product type = "Painted Lady Breeding Kit" /> <product> <name>PLBK70</name> <type> Painted Lady Breeding Kit</type> </product>

<!ELEMENT element EMPTY> An example: <!ELEMENT img EMPTY> The following would satisfy the above declaration: <img />

<!ELEMENT element (#PCDATA)> An example <!ELEMENT name (#PCDATA)> would permit the following element in an XML document: <name>Lea Ziegler</name> PCDATA element may contain plain text. The "parsed" part of it means that markup in it is parsed instead of displayed as raw text. It also means that entity references are replaced. PCDATA element does not allow for child elements

<!ELEMENT parent (children)> <!ELEMENT customer (phone)> The customer element can contain only a single child element, named phone. The following would be invalid: <customer> <name>Lea Ziegler</name> <phone>555-2819</phone> </customer>

<!ELEMENT customer (name, phone, email)> Specifying an element sequence <!ELEMENT parent (child1, child2, . .)> child1, child2, . . is the order in which the child elements must appear within the parent element <!ELEMENT customer (name, phone, email)> indicates the document below is invalid: <customer> <name>Lea Ziegler</name> <email>LZiegler@tempmail.net</email> <phone>(813) 555-8931</phone> </customer>

Specifying an element choice <. ELEMENT parent (child1 | child2 | child1, child2 are the possible child elements of the parent element <!ELEMENT customer (name | company)> allows the customer element to contain either the name element or the company element. <!ELEMENT customer ((name | company), phone, email)> indicates that the customer element must have three child elements

Modifying Symbols DTDs use a modifying symbol to specify the number of occurrences of each element ? allows zero or one of the item + allows one or more of the item * allows zero or more of the item If you want to specify that an element contain exactly three child elements you have to enter the sequence child child child into the declaration

Modifying Symbols <!ELEMENT customers (customer+)> the customers element must contain at least one element named customer <!ELEMENT order (orderDate, items)+> the (orderDate, items) sequence can be repeated one or more times within each order element <!ELEMENT customer (name, address, phone, email?, orders)> the customer element contains zero or one email element

Declaring child elements customers element can contain one or more customer elements customer element has the following child elements: name, address, phone email (optional), and orders

Working with Mixed Content Mixed content elements contain both parsed character data and child elements. The syntax is: <!ELEMENT parent (#PCDATA | child1 | child2 | … )*> The parent element can contain character data or any number of the specified child elements, or it can contain no content at all. It is better not to work with mixed content if you want a tightly structured document.

Section 3.2 Declaring Attributes

Declaring Element Attributes Add an attribute-list declaration to the document’s DTD to accomplish the following: lists the names of all of the attributes associated with a specific element specifies the data type of each attribute indicates whether each attribute is required or optional provides a default value for each attribute, if necessary

Attributes used in orders.xml Element Attributes Required Default Value(s) customer custID custType Yes No None “home”, “school”, or “business name Title “Mr.”, “Mrs.”, “Ms.” order orderID orderBy item itemPrice itemQty “1”

Declaring Attributes in a DTD <!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3 … > or <!ATTLIST element attribute1 type1 default1 > <!ATTLIST element attribute2 type2 default2 > <!ATTLIST element attribute3 type3 default3 > element is the name of the element associated with the attributes attribute is the name of an attribute type is the attribute’s data type default indicates whether the attribute is required and whether it has a default value

Declaring Attribute Names Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated

Attribute Types Attribute values can consist only of character data, but you can control the format of those characters CDATA - character data Enumerated list - a list of possible attribute values ID - A unique text string IDREF - A reference to an ID value ENTITY - a reference to an external unparsed entity ENTITIES - a list of entities separated by white space NMTOKEN - an accepted XML name NMTOKENS - a list of XML names separated by white space

CDATA can contain any character except those reserved by XML <!ATTLIST element attribute CDATA default> <!ATTLIST item itemPrice CDATA> <!ATTLIST item itemQty CDATA> Any of the following attributes values are allowed: <item itemPrice=“29.95”> . . . </item> <item itemPrice=“$29.95”> . . . </item> <item itemPrice=“£29.95”> . . . </item>

Enumerated Types: Attributes that are limited to a set of possible values <!ATTLIST element attribute (value1 | value2 | value3 | . . ) default > where value1, value2, . . are allowed values <!ATTLIST customer custType (home | business | school)> any custType attribute whose value is not “home”, “school”, or “business” causes parsers to reject the document as invalid

Tokenized Types are character strings that follow certain rules (known as tokens) for format & content DTDs support four kinds of tokens: IDs, ID references, name tokens, and entities

<!ATTLIST customer custID ID> ID Token is used when an attribute value must be unique within the document <!ATTLIST customer custID ID> This declaration ensures each customer will have a unique ID The following elements would not be valid because the same custID value is used more than once: <customer custID="Cust021"> ... </customer> <customer custID="Cust021"> ... </customer>

<!ATTLIST element attribute IDREF default> An attribute declared as an IDREF token must have a value equal to the value of an ID attribute located somewhere in the same document. This enables an XML document to contain cross-references between one element and another. <!ATTLIST order orderBy IDREF> When an XML parser encounters this attribute, it searches the XML document for an ID value that matches the value of the orderBy attribute. If it doesn't find one, it rejects the document as invalid.

An attribute contains a list of ID references <!ATTLIST customer orders IDREFS> <!ATTLIST order orderID ID> <customer orders="OR3413 OR3910 OR5310"> ... </customer> ... <order orderID="OR3413"> ... </order> <order orderID="OR3910"> ... </order> <order orderID="OR5310"> ... </order> 36

Specifying attribute IDs and IDREFs each custID value must be unique in the document each orderBy value must reference an ID value somewhere in the document

Attribute Defaults There are four possible defaults: #REQUIRED: The attribute must appear with every occurrence of the element. #IMPLIED: The attribute is optional. An optional default value: A validated XML parser will supply the default value if one is not specified. #FIXED: The attribute is optional. If an attribute value is specified, it must match the default value.

An attribute contains a list of ID references <!ATTLIST customer custID ID #REQUIRED> a customer ID is required for every customer <!ATTLIST customer custType (school | home | business) #IMPLIED> If an XML parser encounters a customer element without a custType attribute, it assumes a blank value for the attribute <!ATTLIST item itemQty CDATA "1"> 39 Assume a value of "1" for itemQty if it's missing 39

Specifying attribute defaults

DTDs and Namespaces You can work with namespace prefixes, applying a validation rule to the element's qualified name. <!ELEMENT cu:phone (#PCDATA)> Any namespace declarations in a document must also be included in the DTD for the document to be valid. This is usually done using a fixed datatype for the namespace's URI. <!ATTLIST cu:customers xmlns:cu CDATA #FIXED " http://www.butterfly.com/customers ">

Validating a Document with SMLSpy The Web is an excellent source for validating parsers, including Web sites in which you can upload your XML document for free to have it validated. XMLSpy is an XML development environment created by Altova, which is used for designing and editing professional applications involving XML, XML Schema, and other XML-based technologies.

Section 3.3 Working with entities

Introducing Entities XML supports the following built-in entities: & < > &apos; " If you have a long text string that will be repeated throughout your XML document, avoid data entry errors by placing the text string in its own entity. You can create your own customized set of entities corresponding to text strings like product descriptions that you want referenced by the XML document.

Working with General Entities A general entity is an entity that references content to be used within an XML document. That content can be either parsed or unparsed. A parsed entity references text that can be readily interpreted or parsed by an application reading the XML document. An entity that references content that is either nontextual or which cannot be interpreted by an XML parser is an unparsed entity. One example of an unparsed entity is an entity that references a graphic image file.

Working with General Entities The content referenced by an entity can be placed either within the DTD or in an external file. Internal entities reference content found in the DTD. External entities reference content from external files.

Internal Parsed Entities <!ENTITY entity “value”> where entity is the name assigned to the entity and value is the entity’s value that must be well-formed XML text Examples: <!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"> <!ENTITY MBL25 "<desc>Monarch Butterfly, 6-12 larvae</desc>"> & and % are not allowed as part of an entity's value. Use & to include the & symbol, if necessary

External Parsed External Entities For longer text strings, place the content in an external file. To create an external parsed entity, use: <!ENTITY entity SYSTEM “uri”> where uri is the URI of the external file containing the entity value In the declaration: <!ENTITY MBL25 SYSTEM "description.xml"> an entity named “MBL25” gets its value from the description.xml file

Referencing a General Entity After an entity is declared, it can be referenced anywhere within the document. The syntax is: &entity; For example, <item>&MLB25;</item> is interpreted as <item>Monarch Butterfly, 6-12 larvae</item>

Declare parsed entities in the codes Declare parsed entities in the codes.dtd file for the product codes in the orders.xml documentation <!ENTITY BF100P "Butterfly farm pop-up self erecting portable greenhouse"> <!ENTITY BFGK10 "Field of Dreams backyard butterfly garden kit"> <!ENTITY HME100 "Hummingbird Hawkmoth (Manduca Sexta), 100 eggs"> <!ENTITY MBL25 "Monarch Butterfly, 6-12 larvae"> <!ENTITY MP12 "Monarch Pupae (Danaus Plexippus), 12 pupae"> <!ENTITY MWT15 "Giant Milkweed Tree (Calotropis Ssp.), 1 crown flower"> <!ENTITY PLBK70 "Painted Lady classroom breeding kit, 70 larvae"> Entity name Entity value

Parameter Entities Parameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is: <!ENTITY % entity “value”> For external parameter entities, the syntax is: <!ENTITY % entity SYSTEM “uri”> Once a parameter has been declared, you can add a reference to it within the DTD using: %entity

Combining DTDs with parameter entities

Add a parameter entity to the DTD within the orders.xml file to load the contents of the codes.dtd file <!DOCTYPE customers [ . <!ENTITY % itemCodes SYSTEM "codes.dtd"> %itemCodes; ]> <customers> . <orders> <order orderID="or10311" orderBy="cust201"> </order> parameter entity pointing to the code in the codes.dtd file reference to the itemCodes parameter entity

Inserting general entities reference to the BFGK10 general entity

Entity references and values

Adding entities to the internal DTD

Adding DTD comments

Conditional Sections <![ keyword [ declarations ]]> where INCLUDE is for a section of declarations that you want parsers to interpret and IGNORE for the declarations that you want parsers to pass over <![IGNORE[ <!ELEMENT Magazine (Name)> <!ATTLIST Magazine Publisher CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> ]]> <![INCLUDE[ <!ELEMENT Book (Title, Author)> <!ATTLIST Book Pages CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> ]]>

Conditional Sections using a parameter entity <!ENTITY % UseFullDTD "IGNORE” > <![ %UseFullDTD; [ <!ELEMENT Magazine (Name)> <!ATTLIST Magazine Publisher CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> ]]> By changing the value of the UseFullDTD from IGNORE to INCLUDE, you can add any conditional section that uses this entity reference to the document's DTD. Thus, you can switch multiple sections in the DTD off and on by editing a single line in the file. This is most useful when several conditional sections are scattered throughout a very long DTD. Conditional sections can be applied only to external DTDs. 59

Working with Unparsed Entities For a DTD to be able to validate either binary data (images, video) or character data that is not well formed, you need to work with unparsed entities. Because an XML parser cannot work with this type of data directly, a DTD needs to include instructions for how to treat the unparsed entity. To declare an unparsed entity, you must first declare a notation for the data type used in the entity, and then associate a notation with an unparsed entity

Declaring a notation <!NOTATION notation SYSTEM "uri"> where notation is the name of the notation and uri is a system location that defines the data type or a program that can work with the data type For example, the following notation named “jpeg” that points to an application paint.exe: <!NOTATION jpeg SYSTEM "paint.exe“> You could also use the mime-type value <!NOTATION jpeg SYSTEM "image/jpeg">

Associating a notation with an unparsed entity < Associating a notation with an unparsed entity <!ENTITY entity SYSTEM "uri" NDATA notation> where entity is the name of the enity, uri is the system location of a file containing the unparsed data, and notation is the name of the notation that defines the data type For example, the following declaration creates an unparsed entity named BF100PIMG that references the graphic image file bf100p.jpg: <!ENTITY BF100PIMG SYSTEM "bf100p.jpg" NDATA jpeg>

Adding the image attribute to an XML document Once you created an entity to reference unparsed data, that entity can be associated with attribute values by using the ENTITY data type in the attribute declaration. For example, <!ATTLIST item image ENTITY #REQUIRED> With this declaration added, you could then add the image attribute to an XML document, using the BF100PIMG entity as the attribute's value: <item image="BF100PIMG">

Validating Standard Vocabularies To validate a document used with a standard vocabulary, you have to access an external DTD located on a Web server or rely upon a DTD built into your XML parser. For example, to validate an XHTML document against the XHTML 1.0 strict standard, add: <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> . . . </html>

DOCTYPE declarations for standard vocabularies