Introduction to DTD Bun Yue Professor, CS/CIS UHCL.

Slides:



Advertisements
Similar presentations
XML I.
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
17 Apr 2002 XML Syntax: DTDs Andy Clark. Validation of XML Documents XML documents must be well-formed XML documents may be valid – Validation verifies.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
Introduction to XML: DTD
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
Physical and Logical Structure
Declare A DTD File. Declare A DTD Inline File For example, use DTD to restrict the value of an XML document to contain only character data.
Tutorial 11 Creating XML Document
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
Introduction to XML This material is based heavily on the tutorial by the same name at
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
XML (2) DTD Sungchul Hong.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
SNU OOPSLA Lab. XML Documents 1 : Structure The ubiquitous XML(2) © copyright 2001 SNU OOPSLA Lab.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
XML Introduction to XML Extensible Markup Language.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Document Type Definition DTDs
Bun Yue Professor, CS/CIS UHCL
Session III Chapter 6 – Creating DTDs
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Presentation transcript:

Introduction to DTD Bun Yue Professor, CS/CIS UHCL

Introduction  A DTD is a grammar that is used to determine the validity of an XML document.  There is no separation recommendation of DTD.  It is embedded inside the XML recommendation: xml / (5th edition). xml /

DTD  DTD is used to specify additional constraints and rules for a given vocabulary, such as element nesting rules attribute name and value constraints.  DTD allows XML parsers to capture errors as soon as possible. Errors are less costly to fix in earlier stages.

Validation  An XML document satisfying the rules of a DTD is said to be validated.  The command line DTD validation tool, xmlvalid, can be obtained from lid.html. lid.html  XML editors and parsers usually can be used to validate XML documents.

Example Adam Lucy Eva  Should there be two spouses? Is it an error?  Is "Eva"or "Lucy" a person?  Are there any additional information about "Lucy" or "Eva"?

Creator’s Intentions  General problems with XML documents: Creators may not know what applications will use the file.  Need to communicate creator's intentions to users.

Document Modeling  XML document modeling defines a grammar to restrict and constrain an XML application.  Advantages of document modeling: Clear intention. Restrictions lead to easier processing. Interoperability improves if everyone uses the same standards. Facilitate the development of tools for the XML applications.

Document Modeling  Disadvantage of document modeling: Time for development. Potentially more timely to check validity. May be too restrictive.

XML Modeling Languages  Many methods.  Two main standards: Document Type Definition (DTD): more established, but limited. XML Schema: more sophisticated and gaining popularity.  May use both.

Example Continuing on the previous example, a better approach is to specify the constraints using DTD, such as: A person may only have up to one spouse. A spouse must refer to a person in the same XML doc.

DTD Example A possible DTD declaration for this: … <!ATTLIST person id ID #REQUIRED spouse IDREF #IMPLIED> …

XML Example An XML document satisfying the DTD:... Adam Eva Lucy... This XML document is validated w.r.t. the DTD.

Document Modeling  Without a document model, an XML document only needs to be well-formed and it may have: unlimited and unrestricted vocabulary: any element and attributes will be allowed. no grammar rules, for example:  any element can be nested within any other element.  any element may have any attribute.  an attribute may have any value.

Associating XML to Document Type Declarations  The tag is used in XML to associate the XML document to its document type declarations.  It is optional but must follow the XML declaration immediately.  DTD declarations can be: Internal DTD Subset External DTD  The name of the root element should follow the keyword DOCTYPE.

Internal Subset  DTD is defined at the beginning of an XML document within the tag.  Format:.

Internal Subset Example <!DOCTYPE persons [ ] > Kwok-Bun Yue

Internal Subset Example <!DOCTYPE board SYSTEM "msg.dtd" [ ]> …

Consideration  Internal DTD declarations have higher precedence than external DTD.  Internal DTD advantages: always available as it is part of the XML document. Higher precedence than external DTD.  Disadvantages: Wasted transmission for non-validating parsers. Redundancy problems: many documents may have the same internal DTD subset definitions.  Good to use Internal DTD subset to override external DTD (for example, to define entities suitable for the XML document.)

External DTD  External DTD is stored in external resources (e.g. files specified by an URL.) Example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " tml1-strict.dtd">

External DTD Format  Instruct the XML document to get the DTD from the URL.  The keyword after the root element can be: SYSTEM: always get the DTD from the URL. PUBLIC: may get the DTD from some other means.

Formal Public Identifier  "-//W3C//DTD XHTML 1.0 Strict//EN" is the formal public identifier (FPI) of the xhtml DTD. FTI identify resources by names instead of URLs and thus do not have the URL relocation problem. Rough meaning in this example: '-': not registered. 'W3C': owner id, W3C in this case. 'DTD XHTML 1.0 Strict': type and description of document. 'EN': language, English, in this case.

FPI  Another example of FPI: "- //W3C//DTD HTML 4.0 Transitional//EN"  FPI is required for PUBLIC but not SYSTEM.

DTD Declarations  Vigorous data modeling should be used to define DTD. Need to define the right business rules and constraints. Errors in DTD are costly. Usually, define the DTD to be as restrictive as possible.

DTD  DTD declarations are composed of a sequence of declarations.  Each DTD declaration declares one of the following constructs: ELEMENT: XML element types ATTLIST: attributes of an element ENTITY: reusable content referenced by the &…; syntax NOTATION: external contents not to be parsed.

DTD Declarations  If there is conflict, earlier declarations have higher precedence.  Although internal declarations are physically located after external declarations, they are read first and have thus high precedence.  No forward reference is allowed for parameter entities.

Element Declarations  Format of element declaration:.  Element declarations can be one of the following four kinds. EMPTY ANY #PCDATA Content model: most important.

EMPTY and ANY  EMPTY: empty element. E.g.  ANY: may contain anything. No parsing checking. Any embedded descendant elements will still need to be declared within the DTD. E.g.

#PCDATA and Content Model  #PCDATA (parsed character data): text that is parsed for entity reference replacement. E.g.  Content model: a declaration of contents enclosed by ( & ) for specifying child elements.

Content Model  The following symbols can be used by content models:,: sequencing. (): grouping ?: 0 or 1. *: 0 or more. +: 1 or more. |: or.

Example

Example Acceptable: Bun Yue Bun K Yue

Example Not acceptable: Yue Bun The one and only: K Bun Yue

Exercise #1 Provide a DTD that will validate the following: Bun Yue Bun K Yue

Mixed Content Model  For mixed content model, the following format must be used: (#PCDATA | child- element-1 | child-element-2...)* #PCDATA must come first. * must be used.  (#PCDATA) is also mixed content.  In general, mixed content models (character data and elements) should be avoided if possible.

Mixed Content Model  Mixed content models should be avoided if possible because: provide minimum constraints are harder to parse. behaviors may also be different with or without DTD: some spaces may be for cosmetic uses only. Scattered #PCDATA is up to interpretation.

Exercise #2  Can you provide an example of well known elements that use mixed content models?

Exercise #3a How many text nodes are there? Greeting Hello How are you? Goodbye

Exercise #3b How many text nodes are there using ? Greeting Hello How are you? Goodbye

Exercise #3c How many text nodes are there using ? Greeting Hello How are you? Goodbye

Example Acceptable: Yue Lee Smith King hello good bye

Example Not acceptable: Yue King Lee Queen Smith hello good bye

Exercise #4  Modify the DTD so cc and to can come in any order (there should still be at least on to).

Exercise #5 Comments on this DTD:

Exercise #6  How do you declare in DTD that element may have a and a child in any order?

ATTLIST Declarations  To declare attribute properties of an element.  Format:  Attribute declarations declare one or more attributes.  Each attribute declaration includes the attribute name, its type and a setting.

Example <!ATTLIST person comment CDATA #IMPLIED> <!ATTLIST person ssn ID #REQUIRED gender (male|female) #IMPLIED age CDATA #IMPLIED iq CDATA "100">

Example

Attribute data types  CDATA: character data. A string, least restrictive.  ID: unique identifier. An unique name string within the document; Must start with a letter, a "_" or a ":". Like a primary key of the element, but not exactly so. No two elements within the XML document should have the same ID value. The scope of ID is for the document, not for the element.

Attribute data types  IDREF: identifier reference. Refer to an ID value of some other elements.  IDREFS: identifier reference list. Refer to many ID values separated by white spaces.  ENTITY: entity name. Name of a pre- defined external entity.  ENTITIES: entity name list. Many entity names separated by a space.

Attribute data types  NMTOKEN: name token. A name formed by alphanumeric characters only (including ".", "-", "_", and ":"). The first character may be a letter, ".", ":", "_" or "-".  NMTOKENS: name token list. Many NMTOKENS separated by a white space.  NOTATION: notation list. For referencing data other than XML. A list of notation names. Each notation contains instruction for processing non XML data.

Attribute data types  Enumeration: provide explicit choices separated by | within a pair of parenthesis. Note that the value of Enumeration must be NMTOKEN.

Example Here is the XML 1.0 specification for name (for ID and IDREFS) and NMTOKEN: Names and Tokens: [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name (S Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (S Nmtoken)*

Example <!ATTLIST node name NMTOKEN #IMPLIED id ID #REQUIRED links IDREFS #IMPLIED >

Attribute values and settings  #REQUIRED: mandatory; commonly used.  #IMPLIED: optional; commonly used.  Default value: if the attribute is missing, assume default value (use with care)  #FIXED and default value: only one possible value of the attribute is acceptable and that is the default value.

Example <!ATTLIST node name NMTOKEN #IMPLIED id ID #REQUIRED links IDREFS #IMPLIED type (element|attribute|comment) "comment" author (yue|davari|liaw) #FIXED "yue" >

Example: AddressBook.dtd <!ATTLIST person gender NMTOKEN #IMPLIED luckynumber CDATA #IMPLIED> <!ATTLIST link spouse IDREF #IMPLIED children IDREFS #IMPLIED>

A Conforming XML <person id="s " gender="male" luckynumber="7 12 3"> Hope Bob King Deborah <link spouse="s " children="s s " /> King Jim King John King Jane

Exercise #7  Can you improve the DTD?

Exercise #8 Construct a simple and restrictive DTD for a labeled directed graph.

Entity Declarations  Entities are like macros in C. When XML processors parse an entity (usually of the format &entity- name;), the entity is replaced by its value.  Entity declaration uses the syntax.

Kind of Entity Declarations  General entity:.  External entity:. The entity will be replaced by text from the external source. (therefore it can be long and shared.)  Nonparsed external entity: NDATA is a keyword. entity-type is the type of the non-data source, which will not be parsed by the XML processor.

Kind of Entity Declarations  Parameter entity:. to be used inside the DTD only. referred to as %entity-name;  External parameter entity:. to be used inside the DTD only. Referred to as %entity-name.

Example In -strict.dtd: <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1; HTML lat1 is an external parameter entity.

Example In

Notation Declarations  XML is not designed for storing binary data.  If needed, binary data can be stored by using notation declaration and will not be parsed.  To handle binary data, the XML processor needs to know its data type as well as some instructions on handle it.

Notation Declarations  Notation declaration syntax:. name is the notation type name.  identifier is some instruction meaningful to the target XML processor.  Notation is usually used together with non- parsed external entity.  Notation types defined by notation declaration is used as the entity types of non-parsed external entity.

Example

Document Modeling with DTD  Mapping of data requirements of the problem to XML model.  May usually take two steps: Use a modeling language for analysis and design: e.g. UML. Map the model to DTD, XML Schema, etc.

DTD Modeling Tips  Use formal modeling techniques (such as UML).  Use modeling tools, such as Rational's Rose.  Track versions carefully.  Decide on organization before modeling. For example, declarations may be grouped by: functions hierarchical elements

Some General Tips  Consider major design options, such as: Elements versus attributes. Flat element structures versus nested element structures. Descendants, siblings or ancestors.  Model should generally be as restrictive as possible.  Include sufficient documentation.

Some General Tips  Generous uses of whitespaces and inline comments.  Use parameter entities generously.  Import modules by using external parameter entities.  Use meaningful names.

Exercise #9  Start with the following UML diagram for a graph. Refine it and design a suitable DTD.

Exercise #10  Consider the XML file, satvexample.xml, and its DTD, tvschedule.dtd. Both files are obtained from the site ml with very minor changes in satvexample.xml (to remove XSL references and modify DOCTYPE to refer to a local DTD).satvexample.xml tvschedule.dtd ml  Validation results of the XML file: no error. xmlvalid (from (you will need to register, download, install and run it in command line mode)  Error: non-deterministic content model for element 'DAY': more than one path leads to element ' DATE'.  Error: element content invalid. Element 'PROGRAMSLOT' is not expected here, expecting 'HOLIDAY'.  Which validator is correct? How do you correct the problem?

Weakness of DTD  Not XML compliant. Cannot be parsed by XML parsers.  Difficult to extract information from XML applications.  Closed construct: all defined within one DTD. Not easily extensible.  Difficult to break down to smaller pieces.  Type definitions not rich. E.g. Insufficient types and precisions. no int, float, etc. No user defined types.  Do not work well with XML namespaces.  Difficult to enforce sophisticated constraints.

Other Schema Languages  There are many efforts to overcome the limitations of DTD.  XML Schema is one of the most important, since it is a W3C standard.  Other important languages include RelaxNG, Schematron, etc.  However, the ‘schema war’ is considered settled by many.

Questions