Download presentation
Presentation is loading. Please wait.
Published byGarry Hines Modified over 9 years ago
1
XML Validation I DTDs Robin Burke ECT 360 Winter 2004
2
Outline History Grammars / Regular expressions DTDs elements attributes entities Declarations
3
Validation Why bother?
4
The idea Language consists of terminals a, b, c Set of productions beginning with non-terminals A, B, C rules specifying how to generate sequences of terminals
5
Example A aB A aBA B b generates strings ababab etc.
6
Grammar Can be used to efficiently parse a language basis of all modern programming language parsing since Algol-60 Java Language Specification is completely in EBNF grammar
7
Grammar XML grammar-based syntax adheres to EBNF SGML SGML had a more complex language definition syntax HTML is defined the SGML way
8
Regular expressions Language for expressing patterns Basic components pattern elements optional element = ? repetition (1 or more) = + repetition (0 or more) = * choice = | grouping = ( ) sequence =,
9
Examples (a, b)* all strings "ab" "abab" etc. (a | b | c)+, q, (b, c)* aaqb bq bqcccccccc
10
Note Regular expressions are different in different applications Perl Javascript XML Schemas DTDs only support ?+*|,()
11
EBNF EBNF is more compact version of BNF it uses regular expressions to simplify grammar expression A aB A aBA turns into A aB(A)? only one production per non-terminal allowed
12
DTDs Use EBNF to specify structure of XML documents Plus attributes entities Syntax holdover from SGML Ugly
13
DTD Syntax Content model contains the RHS of the production rule Example <!ELEMENT name (firstName, lastName)>
14
DTD Syntax cont'd Not XML <! begins a declaration No "content" Empty elements not indicated with />
15
Simple content models Content can be any text #PCDATA Content can be anything at all (useful for debugging) ANY Element has no content EMPTY
16
Example Jane Doe A John Doe A-
17
Example Jane Doe A John Doe A- Wayne Doe I Alien abduction
18
Mixed content Legal to have a content model with text and element data President Meets with Congress <![CDATA[ The President meet with Congressional leaders today in effort to jump-start faltering budget negotiations. Sources described the mood of the meeting as "cordial". ]]>
19
CDATA? Forgot to mention last week Content that appears here will not be parsed Can include arbitrary text including <, &, etc. Only restriction termination sequence ]]>
20
Mixed content, cont'd Mixed content makes handling XML complex necessary for many applications
21
Recursion Unlike grammars recursive formulation ≠ repetition Difference between
22
Restriction The grammar cannot be ambiguous A (a, b)| (a, c) this makes the parser implementation difficult Usually easy to make non-ambiguous A a, (b | c)
23
Attribute lists Declared separately from elements can be anywhere in the DTD Specification includes name of the element name of the attribute attribute type default
24
Attribute types Character data CDATA different from XML CDATA section! Enumerated (yes|no) ID must be unique in the document IDREF must refer to an id in the document NMTOKEN a restriction of CDATA to single "word" Also IDREFS and NMTOKENS
25
Default declaration #REQUIRED #IMPLIED means optional Value this becomes the default #FIXED value provided
26
Examples <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED align (left|right|center) "left" id ID #IMPLIED > <!ATTLIST timestamp time-zone NMTOKEN #IMPLIED>
27
Entities Like macros content to be inserted indicated with &name; Predefined general entities & < essential part of XML User-defined general entities &disclaimer;
28
Entities, cont'd Parameter entities can also be used to simplify DTD creation or to combine DTDs indicated with a % More on this next week
29
Defining general entities Example <!ENTITY disclaimer "This is a work of fiction. Any resemblance to persons living or dead is unintentional.">
30
Unparsed data What about non-text data? images, audio files In XML we define a notation create a name and associate an application suggestion to the application how to interpret the unparsed data not part of parsing operation
31
Using Notation Example declares the jpeg notation Example
32
Notation, cont'd Note that the content is defined in the DTD not the document binary data embedded in XML document Not that useful in practice more likely to use URLs
33
Typical Example... Now it is up to the application to do something appropriate with the src attribute
34
A better solution Use XLink We'll talk about this later
35
DTD limitations Not in XML need a special parser for the DTD No content type restrictions #PCDATA can be anything Element names must be globally unique cannot reuse a common term at different places in the document course-name professor-name
36
DTD benefits Relatively easy to write and understand wait until you see XML Schema! Possible to modularize and combine DTDs more next week
37
Next week More DTDs Modularization and parameterization on-line reading Beginning Schemas 4.1-4.30
38
Lab
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.