1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.

Slides:



Advertisements
Similar presentations
XML I.
Advertisements

Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
An Introduction to XML Based on the W3C XML Recommendations.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
History Leading to XHTML
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
Tutorial 9 Working with XHTML
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Tutorial 11 Creating XML Document
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Creating Document Type Definitions (DTDs) Ellen Pearlman Eileen Mullin.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XP The University of Akron Summit College Business Technology Department Computer Information Systems 2440: 140 Internet Tools Instructor: Enoch E. Damson.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
XML (2) DTD Sungchul Hong.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
New Perspectives on XML, 2nd Edition
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
1 Tutorial 12 Working with Namespaces Combining XML Vocabularies in a Compound Document.
Beginning XML 3 rd Edition. Chapter 4: Document Type Definitions.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Creating a Well-Formed Valid Document
Tutorial 9 Working with XHTML
Tutorial 9 Working with XHTML
Session III Chapter 6 – Creating DTDs
Tutorial 9 Working with XHTML
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
Presentation transcript:

1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Creating a Valid Document You validate documents to make certain that necessary elements are never omitted. For example, each customer order should include a customer name, address, and phone number.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 3 Creating a Valid Document Some elements and attributes may be optional; for example an address. An XML document can be validated using either DTDs (Document Type Definitions) or schemas.

Customer Information Collected by Samantha New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 4

Structure of the orders.xml Document New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 5

6 Declaring a DTD A DTD can be used to: –Ensure that all required elements are present in the document –Prevent undefined elements from being used in the document –Enforce a specific data structure on document contents –Specify the use of element attributes and define their permissible values –Define default values for attributes –Describe how parsers should access non-XML or nontextual content

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 7 Declaring a DTD A document type definition is a collection of rules or declarations that define the content and structure of the document. A document type declaration attaches those rules to the document’s content. Each XML document can have only one DOCTYPE.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 8 Declaring a DTD You create a DTD by first entering a document type declaration into your XML document. DTD in this tutorial will refer to document type definition and not the declaration. While there can only be one DTD, it can be divided into two parts: an internal subset and an external subset.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 9 Declaring a DTD An internal subset is declarations placed in the same file as the document content. An external subset is located in a separate file.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 10 Declaring a DTD The DOCTYPE declaration for an internal subset is: <!DOCTYPE root [ statements ]> Where root is the name of the document’s root element, and statements are the statements that comprise the DTD.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 11 Declaring a DTD The DOCTYPE declaration for external subsets can take two forms: one that uses a SYSTEM location and one that uses a PUBLIC location. The syntax is: or

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 12 Declaring a DTD Here, root is the document’s root element, identifier is a text string that tells an application how to locate the external subset, and URI is the location and filename of the external subset. Use the PUBLIC location form when the DTD needs to be limited to an internal system or when the XML document is part of an old SGML application.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 13 Declaring a DTD The SYSTEM location form specifies the name and location of the external subset through the “uri” value. Unless your application requires a public identifier, you should use the SYSTEM location form.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 14 Declaring a DTD A DOCTYPE declaration can indicate both an external and an internal subset. The syntax is: <!DOCTYPE root SYSTEM “URI” [ declarations ]> or <!DOCTYPE root PUBLIC “id” “URI” [ declarations ]>

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 15 Declaring a DTD If you place the DTD within the document, it is easier to compare the DTD to the document’s content. However, the real power of XML comes from an external DTD that can be shared among many documents written by different authors.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 16 Declaring a DTD If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.

Internal and External DTDs New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 17

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 18 Writing the Document Type Declaration

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 19 Declaring Document Elements Every element used in the document must be declared in the DTD for the document to be valid. An element type declaration specifies the name of the element and indicates what kind of content the element can contain.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 20 Declaring Document Elements The element declaration syntax is: Where element is the element name and content-model specifies what type of content the element contains.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 21 Declaring Document Elements The element name is case sensitive. DTDs define five different types of element content: –The element can contain only parsed character data. –The element can contain only child elements. –The element cannot store any content. –The element can store any type of content or no content at all. –The element stores both parsed character data and child elements

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 22 Types of Element Content ANY content: The declared element can store any type of content. The syntax is: EMPTY content: This is reserved for elements that store no content. The syntax is:

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 23 Types of Element Content Parsed Character Data content: These elements can only contain parsed character data. The syntax is: The keyword #PCDATA stands for “parsed-character data” and is any well-formed text string.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 24 Working with Child Elements ELEMENT content: The syntax for declaring that elements contain only child elements is: Where children is a list of child elements.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 25 Working with Child Elements The declaration indicates the customer element can only have one child, named phone. You cannot repeat the same child element more than once with this declaration.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 26 Specifying an Element Sequence A sequence is a list of elements that follow a defined order. The syntax is: The order of the child elements must match the order defined in the element declaration. A sequence can be applied to the same child element.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 27 Specifying an Element Sequence Thus, indicates the customer element should contain three child elements for each customer.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 28 Specifying an Element Choice Choice is the other way to list child elements and presents a set of possible child elements. The syntax is: where child1, child2, etc. are the possible child elements of the parent element.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 29 Specifying an Element Choice For example, This allows the customer element to contain either the name element or the company element. However, you cannot have both the customer and the name child elements since the choice model allows only one of the child elements.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 30 Modifying Symbols Modifying symbols are symbols appended to the content model to indicate the number of occurrences of each element. There are three modifying symbols: –a question mark (?), allow zero or one of the item. –a plus sign (+), allow one or more of the item. –an asterisk (*), allow zero or more of the item.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 31 Modifying Symbols For example, would allow the document to contain one or more customer elements to be placed within the customer element. Modifying symbols can be applied within sequences or choices. They can also modify entire element sequences or choices by placing the character immediately following the closing parenthesis of the sequence or choice.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 32 DTDs and Mixed Content Mixed content elements contain both character data and child elements. The syntax is: This form applies the * modifying symbol to a choice of character data or elements. Therefore, the parent element can contain character data or any number of the specified child elements, or it can contain no content at all.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 33 Mixed Content Because you cannot constrain the order in which the child elements appear or control the number of occurrences for each element, it is better not to work with mixed content if you want a tightly structured document.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 34 Declaring Attributes For a document to be valid, all the attributes associated with elements must also be declared. To enforce attribution properties, you must add an attribute-list declaration to the document’s DTD.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 35 Attributes Used in orders.xml

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 36 Declaring Attributes The attribute-list declaration : –Lists the names of all attributes associated with a specific element –Specifies the data type of the attribute –Indicates whether the attribute is required or optional –Provides a default value for the attribute, if necessary

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 37 Declaring Attributes The syntax to declare a list of attributes is: <!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3…> Where element is the name of the element associated with the attributes, attribute is the name of an attribute, type is the attribute’s data type, and default indicates whether the attribute is required or implied, and whether it has a fixed or default value.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 38 Declaring Attributes Attribute-list declaration can be placed anywhere within the document type declaration, although it is easier if they are located adjacent to the declaration for the element with which they are associated.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 39 Working with Attribute Types While all attribute types are text strings, you can control the type of text used with the attribute. There are three general categories of attribute values: –CDATA –enumerated –Tokenized CDATA types are the simplest form and can contain any character except those reserved by XML. Enumerated types are attributes that are limited to a set of possible values.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 40 Working with Attribute Types The general for of an enumerated type is: attribute (value1 | value2 | value3 | …) For example, the following declaration: customer custType (school | home | business )> restricts CustType to either “school”, “home”, or “business”

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 41 Working with Attribute Types Another type of enumerated attribute is notation. It associates the value of the attribute with a declaration located elsewhere in the DTD. The notation provides information to the XML parser about how to handle non-XML data. Tokenized types are text strings that follow certain rules for the format and content. The syntax is: attribute token

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 42 Working with Attribute Types There are seven tokenized types. For example, the ID token is used with attributes that require unique values. For example, if a customer ID needs to be unique, you may use the ID token: customer custID ID This ensures each customer will have a unique ID.

Attribute Types New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 43

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 44 Working with Attribute Defaults The final part of an attribute declaration is the attribute default. There are four possible defaults: –#REQUIRED: the attribute must appear with every occurrence of the element. –#IMPLIED: The attribute is optional. –An optional default value: A validated XML parser will supply the default value if one is not specified. –#FIXED: The attribute is optional but if one is specified, it must match the default.

Inserting Attribute-List Declarations New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 45

Validating an XML Document The Web has many excellent sources for validating parsers, including Web sites in which you can upload an XML document for free to have it validated against an internal or external DTD. Internet Explorer versions 6 and above have a built-in XML parser. –MSXML New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 46

Validating an XML Document New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 47

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 48 Introducing Entities Entities are storage units for a document’s content. XML supports the following five built-in entities: –& –< –> –&apos; –" When an XML parser encounters these entities, it can display the corresponding character symbol.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 49 Working with General Entities Entities can be declared in a DTD. How to declare an entity depends on how it is classified. There are three factors involved in classifying entities: –Where the entity will be applied –Where the entity’s content is located –What type of content is referenced by the entity Entities that are used within the XML document are known as general entities.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 50 Creating Parsed Entities General entities are declared in the DTD of a document. The syntax is: Where entity is the name assigned to the entity and value is the general entity’s value. For example, an entity named “DCT5Z” can be created to store a product description:

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 51 General Parsed Entities After an entity is declared, it can be referenced anywhere within the document. &MBL25; This is interpreted as

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 52 Creating General Entities

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 53 Working with Parameter Entities Parameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is: where entity is the name of the parameter entity and value is a text string of the entity’s value. For external parameter entities, the syntax is: where uri is the name assigned to the parameter entity.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 54 Working with Parameter Entities Parameter entity references can only be placed where a declaration would normally occur, such as an internal or external DTD. Parameter entities used with an internal DTD do not offer any time or effort savings. However, an external parameter entity can allow XML to use more than one DTD per document by combining declarations from multiple DTDs.

DTDs Combined with Parameter Entities New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 55

Inserting Comments into a DTD Comments in a DTD follow the same syntax as comments in XML – New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 56

Creating Conditional Sections A conditional section is a section of the DTD that is processed only in certain situations <![keyword[ declarations ]]> where keyword is either INCLUDE (for a section of declarations that you want parsers to interpret) or IGNORE (for the declarations that you want parsers to pass over New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 57

Creating Conditional Sections The following code creates two sections of declarations - one for Magazine elements and another for Book elements: <![IGNORE[ ]]> New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 58

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 59 Working with Unparsed Data You need to create an unparsed entity in order to reference binary data such as images or video clips, or character data that is not well formed. The unparsed entity includes instructions for how the unparsed entity should be treated. A notation is declared that identifies a resource to handle the unparsed data.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 60 Working with Unparsed Data For example, to create a notation named “jpeg” that points to an application paint.exe: Once the notation has been declared, you then declare an unparsed entity that instructs the XML parser to associate the data to the notation.

New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 61 Working with Unparsed Data For example, the following declaration creates an unparsed entity named BF100PIMG that references the graphic image file bf100p.jpg Here, the notation is the jpeg notation that points to the paint.exe file. This declaration does not tell the paint.exe application to run the file but simply identifies for the XML parser what resource is able to handle the unparsed data.

Validating Standard Vocabularies New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 62