Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to DTD Bun Yue Professor, CS/CIS UHCL.

Similar presentations


Presentation on theme: "Introduction to DTD Bun Yue Professor, CS/CIS UHCL."— Presentation transcript:

1 Introduction to DTD Bun Yue Professor, CS/CIS UHCL

2 Introduction  A DTD is a grammar that is used to determine the validity of an XML document.  There is no separation recommendation of DTD.  It is embedded inside the XML recommendation: http://www.w3.org/TR/2008/REC- xml-20081126/ (5th edition). http://www.w3.org/TR/2008/REC- xml-20081126/

3 DTD  DTD is used to specify additional constraints and rules for a given vocabulary, such as element nesting rules attribute name and value constraints.  DTD allows XML parsers to capture errors as soon as possible. Errors are less costly to fix in earlier stages.

4 Validation  An XML document satisfying the rules of a DTD is said to be validated.  The command line DTD validation tool, xmlvalid, can be obtained from http://www.elcel.com/products/xmlva lid.html. http://www.elcel.com/products/xmlva lid.html  XML editors and parsers usually can be used to validate XML documents.

5 Example Adam Lucy Eva  Should there be two spouses? Is it an error?  Is "Eva"or "Lucy" a person?  Are there any additional information about "Lucy" or "Eva"?

6 Creator’s Intentions  General problems with XML documents: Creators may not know what applications will use the file.  Need to communicate creator's intentions to users.

7 Document Modeling  XML document modeling defines a grammar to restrict and constrain an XML application.  Advantages of document modeling: Clear intention. Restrictions lead to easier processing. Interoperability improves if everyone uses the same standards. Facilitate the development of tools for the XML applications.

8 Document Modeling  Disadvantage of document modeling: Time for development. Potentially more timely to check validity. May be too restrictive.

9 XML Modeling Languages  Many methods.  Two main standards: Document Type Definition (DTD): more established, but limited. XML Schema: more sophisticated and gaining popularity.  May use both.

10 Example Continuing on the previous example, a better approach is to specify the constraints using DTD, such as: A person may only have up to one spouse. A spouse must refer to a person in the same XML doc.

11 DTD Example A possible DTD declaration for this: … <!ATTLIST person id ID #REQUIRED spouse IDREF #IMPLIED> …

12 XML Example An XML document satisfying the DTD:... Adam Eva Lucy... This XML document is validated w.r.t. the DTD.

13 Document Modeling  Without a document model, an XML document only needs to be well-formed and it may have: unlimited and unrestricted vocabulary: any element and attributes will be allowed. no grammar rules, for example:  any element can be nested within any other element.  any element may have any attribute.  an attribute may have any value.

14 Associating XML to Document Type Declarations  The tag is used in XML to associate the XML document to its document type declarations.  It is optional but must follow the XML declaration immediately.  DTD declarations can be: Internal DTD Subset External DTD  The name of the root element should follow the keyword DOCTYPE.

15 Internal Subset  DTD is defined at the beginning of an XML document within the tag.  Format:.

16 Internal Subset Example <!DOCTYPE persons [ ] > Kwok-Bun Yue

17 Internal Subset Example <!DOCTYPE board SYSTEM "msg.dtd" [ ]> …

18 Consideration  Internal DTD declarations have higher precedence than external DTD.  Internal DTD advantages: always available as it is part of the XML document. Higher precedence than external DTD.  Disadvantages: Wasted transmission for non-validating parsers. Redundancy problems: many documents may have the same internal DTD subset definitions.  Good to use Internal DTD subset to override external DTD (for example, to define entities suitable for the XML document.)

19 External DTD  External DTD is stored in external resources (e.g. files specified by an URL.) Example: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xh tml1-strict.dtd">

20 External DTD Format  Instruct the XML document to get the DTD from the URL.  The keyword after the root element can be: SYSTEM: always get the DTD from the URL. PUBLIC: may get the DTD from some other means.

21 Formal Public Identifier  "-//W3C//DTD XHTML 1.0 Strict//EN" is the formal public identifier (FPI) of the xhtml DTD. FTI identify resources by names instead of URLs and thus do not have the URL relocation problem. Rough meaning in this example: '-': not registered. 'W3C': owner id, W3C in this case. 'DTD XHTML 1.0 Strict': type and description of document. 'EN': language, English, in this case.

22 FPI  Another example of FPI: "- //W3C//DTD HTML 4.0 Transitional//EN"  FPI is required for PUBLIC but not SYSTEM.

23 DTD Declarations  Vigorous data modeling should be used to define DTD. Need to define the right business rules and constraints. Errors in DTD are costly. Usually, define the DTD to be as restrictive as possible.

24 DTD  DTD declarations are composed of a sequence of declarations.  Each DTD declaration declares one of the following constructs: ELEMENT: XML element types ATTLIST: attributes of an element ENTITY: reusable content referenced by the &…; syntax NOTATION: external contents not to be parsed.

25 DTD Declarations  If there is conflict, earlier declarations have higher precedence.  Although internal declarations are physically located after external declarations, they are read first and have thus high precedence.  No forward reference is allowed for parameter entities.

26 Element Declarations  Format of element declaration:.  Element declarations can be one of the following four kinds. EMPTY ANY #PCDATA Content model: most important.

27 EMPTY and ANY  EMPTY: empty element. E.g.  ANY: may contain anything. No parsing checking. Any embedded descendant elements will still need to be declared within the DTD. E.g.

28 #PCDATA and Content Model  #PCDATA (parsed character data): text that is parsed for entity reference replacement. E.g.  Content model: a declaration of contents enclosed by ( & ) for specifying child elements.

29 Content Model  The following symbols can be used by content models:,: sequencing. (): grouping ?: 0 or 1. *: 0 or more. +: 1 or more. |: or.

30 Example

31 Example Acceptable: Bun Yue Bun K Yue

32 Example Not acceptable: Yue Bun The one and only: K Bun Yue

33 Exercise #1 Provide a DTD that will validate the following: Bun Yue Bun K Yue

34 Mixed Content Model  For mixed content model, the following format must be used: (#PCDATA | child- element-1 | child-element-2...)* #PCDATA must come first. * must be used.  (#PCDATA) is also mixed content.  In general, mixed content models (character data and elements) should be avoided if possible.

35 Mixed Content Model  Mixed content models should be avoided if possible because: provide minimum constraints are harder to parse. behaviors may also be different with or without DTD: some spaces may be for cosmetic uses only. Scattered #PCDATA is up to interpretation.

36 Exercise #2  Can you provide an example of well known elements that use mixed content models?

37 Exercise #3a How many text nodes are there? Greeting Hello How are you? Goodbye

38 Exercise #3b How many text nodes are there using ? Greeting Hello How are you? Goodbye

39 Exercise #3c How many text nodes are there using ? Greeting Hello How are you? Goodbye

40 Example Acceptable: Yue Lee Smith King hello good bye

41 Example Not acceptable: Yue King Lee Queen Smith hello good bye

42 Exercise #4  Modify the DTD so cc and to can come in any order (there should still be at least on to).

43 Exercise #5 Comments on this DTD:

44 Exercise #6  How do you declare in DTD that element may have a and a child in any order?

45 ATTLIST Declarations  To declare attribute properties of an element.  Format:  Attribute declarations declare one or more attributes.  Each attribute declaration includes the attribute name, its type and a setting.

46 Example <!ATTLIST person comment CDATA #IMPLIED> <!ATTLIST person ssn ID #REQUIRED gender (male|female) #IMPLIED age CDATA #IMPLIED iq CDATA "100">

47 Example

48 Attribute data types  CDATA: character data. A string, least restrictive.  ID: unique identifier. An unique name string within the document; Must start with a letter, a "_" or a ":". Like a primary key of the element, but not exactly so. No two elements within the XML document should have the same ID value. The scope of ID is for the document, not for the element.

49 Attribute data types  IDREF: identifier reference. Refer to an ID value of some other elements.  IDREFS: identifier reference list. Refer to many ID values separated by white spaces.  ENTITY: entity name. Name of a pre- defined external entity.  ENTITIES: entity name list. Many entity names separated by a space.

50 Attribute data types  NMTOKEN: name token. A name formed by alphanumeric characters only (including ".", "-", "_", and ":"). The first character may be a letter, ".", ":", "_" or "-".  NMTOKENS: name token list. Many NMTOKENS separated by a white space.  NOTATION: notation list. For referencing data other than XML. A list of notation names. Each notation contains instruction for processing non XML data.

51 Attribute data types  Enumeration: provide explicit choices separated by | within a pair of parenthesis. Note that the value of Enumeration must be NMTOKEN.

52 Example Here is the XML 1.0 specification for name (for ID and IDREFS) and NMTOKEN: Names and Tokens: [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name (S Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (S Nmtoken)*

53 Example <!ATTLIST node name NMTOKEN #IMPLIED id ID #REQUIRED links IDREFS #IMPLIED >

54 Attribute values and settings  #REQUIRED: mandatory; commonly used.  #IMPLIED: optional; commonly used.  Default value: if the attribute is missing, assume default value (use with care)  #FIXED and default value: only one possible value of the attribute is acceptable and that is the default value.

55 Example <!ATTLIST node name NMTOKEN #IMPLIED id ID #REQUIRED links IDREFS #IMPLIED type (element|attribute|comment) "comment" author (yue|davari|liaw) #FIXED "yue" >

56 Example: AddressBook.dtd <!ATTLIST person gender NMTOKEN #IMPLIED luckynumber CDATA #IMPLIED> <!ATTLIST link spouse IDREF #IMPLIED children IDREFS #IMPLIED>

57 A Conforming XML <person id="s123456789" gender="male" luckynumber="7 12 3"> Hope Bob BobHope@hollywood.com King Deborah <link spouse="s222222223" children="s222222226 s222222227" /> King Jim King John King Jane

58 Exercise #7  Can you improve the DTD?

59 Exercise #8 Construct a simple and restrictive DTD for a labeled directed graph.

60 Entity Declarations  Entities are like macros in C. When XML processors parse an entity (usually of the format &entity- name;), the entity is replaced by its value.  Entity declaration uses the syntax.

61 Kind of Entity Declarations  General entity:.  External entity:. The entity will be replaced by text from the external source. (therefore it can be long and shared.)  Nonparsed external entity: NDATA is a keyword. entity-type is the type of the non-data source, which will not be parsed by the XML processor.

62 Kind of Entity Declarations  Parameter entity:. to be used inside the DTD only. referred to as %entity-name;  External parameter entity:. to be used inside the DTD only. Referred to as %entity-name.

63 Example In http://www.w3.org/TR/xhtml1/DTD/xhtml1 -strict.dtd: <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1; HTML lat1 is an external parameter entity.

64 Example In http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:...

65 Notation Declarations  XML is not designed for storing binary data.  If needed, binary data can be stored by using notation declaration and will not be parsed.  To handle binary data, the XML processor needs to know its data type as well as some instructions on handle it.

66 Notation Declarations  Notation declaration syntax:. name is the notation type name.  identifier is some instruction meaningful to the target XML processor.  Notation is usually used together with non- parsed external entity.  Notation types defined by notation declaration is used as the entity types of non-parsed external entity.

67 Example

68 Document Modeling with DTD  Mapping of data requirements of the problem to XML model.  May usually take two steps: Use a modeling language for analysis and design: e.g. UML. Map the model to DTD, XML Schema, etc.

69 DTD Modeling Tips  Use formal modeling techniques (such as UML).  Use modeling tools, such as Rational's Rose.  Track versions carefully.  Decide on organization before modeling. For example, declarations may be grouped by: functions hierarchical elements

70 Some General Tips  Consider major design options, such as: Elements versus attributes. Flat element structures versus nested element structures. Descendants, siblings or ancestors.  Model should generally be as restrictive as possible.  Include sufficient documentation.

71 Some General Tips  Generous uses of whitespaces and inline comments.  Use parameter entities generously.  Import modules by using external parameter entities.  Use meaningful names.

72 Exercise #9  Start with the following UML diagram for a graph. Refine it and design a suitable DTD.

73 Exercise #10  Consider the XML file, satvexample.xml, and its DTD, tvschedule.dtd. Both files are obtained from the site http://mysite.verizon.net/vze20h45/comp/xml/videoxml.ht ml with very minor changes in satvexample.xml (to remove XSL references and modify DOCTYPE to refer to a local DTD).satvexample.xml tvschedule.dtd http://mysite.verizon.net/vze20h45/comp/xml/videoxml.ht ml  Validation results of the XML file: http://xmlvalidation.com/: no error. xmlvalid (from http://www.elcel.com/products/xmlvalid.html (you will need to register, download, install and run it in command line mode)  Error: non-deterministic content model for element 'DAY': more than one path leads to element ' DATE'.  Error: element content invalid. Element 'PROGRAMSLOT' is not expected here, expecting 'HOLIDAY'.  Which validator is correct? How do you correct the problem?

74 Weakness of DTD  Not XML compliant. Cannot be parsed by XML parsers.  Difficult to extract information from XML applications.  Closed construct: all defined within one DTD. Not easily extensible.  Difficult to break down to smaller pieces.  Type definitions not rich. E.g. Insufficient types and precisions. no int, float, etc. No user defined types.  Do not work well with XML namespaces.  Difficult to enforce sophisticated constraints.

75 Other Schema Languages  There are many efforts to overcome the limitations of DTD.  XML Schema is one of the most important, since it is a W3C standard.  Other important languages include RelaxNG, Schematron, etc.  However, the ‘schema war’ is considered settled by many.

76 Questions


Download ppt "Introduction to DTD Bun Yue Professor, CS/CIS UHCL."

Similar presentations


Ads by Google