Download presentation
Presentation is loading. Please wait.
Published byFelicia Wilkerson Modified over 9 years ago
1
XML Validation I DTDs Robin Burke ECT 360 Winter 2004
2
Outline History Grammars / Regular expressions DTDs elements attributes entities Declarations
3
Validation Why bother?
4
The idea Language consists of terminals a, b, c Set of productions beginning with non-terminals A, B, C rules specifying how to generate sequences of terminals
5
Example A aB A aBA B b generates strings ababab etc.
6
Grammar Can be used to efficiently parse a language basis of all modern programming language parsing since Algol-60 Java Language Specification is completely in EBNF grammar
7
Grammar XML grammar-based syntax adheres to EBNF SGML SGML had a more complex language definition syntax HTML is defined the SGML way
8
Regular expressions Language for expressing patterns Basic components pattern elements optional element = ? repetition (1 or more) = + repetition (0 or more) = * choice = | grouping = ( ) sequence =,
9
Examples (a, b)* all strings "ab" "abab" etc. (a | b | c)+, q, (b, c)* aaqb bq bqcccccccc
10
Note Regular expressions are different in different applications Perl Javascript XML Schemas DTDs only support ?+*|,()
11
EBNF EBNF is more compact version of BNF it uses regular expressions to simplify grammar expression A aB A aBA turns into A aB(A)? only one production per non-terminal allowed
12
DTDs Use EBNF to specify structure of XML documents Plus attributes entities Syntax holdover from SGML Ugly
13
DTD Syntax Content model contains the RHS of the production rule Example <!ELEMENT name (firstName, lastName)>
14
DTD Syntax cont'd Not XML <! begins a declaration No "content" Empty elements not indicated with />
15
Simple content models Content can be any text #PCDATA Content can be anything at all (useful for debugging) ANY Element has no content EMPTY
16
Example Jane Doe A John Doe A-
17
Example Jane Doe A John Doe A- Wayne Doe I Alien abduction
18
DTD?
19
Mixed content Legal to have a content model with text and element data President Meets with Congress The President meet with Congressional leaders today in effort to jump-start faltering budget negotiations. Sources described the mood of the meeting as "cordial".
20
Mixed content, cont'd Mixed content makes handling XML complex necessary for many applications
21
Recursion Unlike grammars recursive formulation ≠ repetition Difference between
22
Restriction The grammar cannot be ambiguous A (a, b)| (a, c) this makes the parser implementation difficult Usually easy to make non-ambiguous A a, (b | c)
23
Attribute lists Declared separately from elements can be anywhere in the DTD Specification includes name of the element name of the attribute attribute type default
24
Attribute types Character data CDATA different from XML CDATA section! Enumerated (yes|no) ID must be unique in the document IDREF must refer to an id in the document NMTOKEN a restriction of CDATA to single "word" Also IDREFS and NMTOKENS
25
Default declaration #REQUIRED #IMPLIED means optional Value this becomes the default #FIXED value provided
26
Examples <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED align (left|right|center) "left" id ID #IMPLIED > <!ATTLIST timestamp time-zone NMTOKEN #IMPLIED>
27
Entities Like macros content to be inserted indicated with &name; Predefined general entities & < essential part of XML User-defined general entities &disclaimer;
28
Entities, cont'd Parameter entities can also be used to simplify DTD creation or to combine DTDs indicated with a % More on this next week
29
Defining general entities Example <!ENTITY disclaimer "This is a work of fiction. Any resemblance to persons living or dead is unintentional.">
30
In-class exercise Business cards
31
Next week More DTDs Entities Modularization and parameterization pg. 129-148
32
Lab
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.