Download presentation
Presentation is loading. Please wait.
1
1 Document Type Descriptors (DTDs) Imposing Structure on XML Documents
2
2 Document Type Descriptors Document Type Descriptors (DTDs) impose structure on an XML document Using DTDs, we can specify what a “valid” document should contain. These specifications require more than just being well-formed, e.g., what elements are legal, what nesting is allowed DTDs do not have very great expressive power, e.g., cannot specify types
3
3 What is it good for? DTDs can be used to define special languages of XML, i.e., restricted XML for special needs Examples: –FOAF –SVG (scalable vector graphics) –WML (a kind of html for wireless devices) –SOAP (for web services) –XHTML (well-formed version of HTML) Standards can be defined using DTDs, for data exchange and special applications can be written
4
4 Address Book DTD Suppose we want to create a DTD that describes legal address book entries This DTD will be used to exchange address book information between programs How should it be written? (What is a legal address?) We discuss both element definitions and attribute definitions
5
5 Element Definitions
6
6 Example: An Address Book Homer Simpson Dr. H. Simpson 1234 Springwater Road Springfield USA, 98765 (321) 786 2543 (321) 786 2544 homer@math.springfield.edu Mixed telephones and faxes At least one email As many address lines as needed At most one greetingExactly one name
7
7 Specifying the Structure How do we specify exactly what must appear in a person element? In a DTD, we can specify the permitted content for each element. The permitted content is specified as a regular expression We show the general syntax, and then an example
8
8 aElement a e1?0 or 1 occurrences of expression e1 e1*0 or more occurrences of expression e1 e1+1 or more occurrences of expression e1 e1,e2Expression e2 after expression e2 e1|e2 Either expression e1 or expression e2 (but not both!) (e)Grouping #PCDATAParsed character data (i.e., text) EMPTYNo content ANYAny content (#PCDATA|a 1 |..|a n )*Mixed content
9
9 What’s in a person Element? The expression is: –name, greet?, addr*, (tel | fax)*, email+ We discuss what each part of this means –name = there must be a name element –greet? = there is an optional greet element (i.e., 0 or 1 greet elements) –name, greet? = the name element is followed by an optional greet element
10
10 What’s in a person Element? (cont.) addr* = there are 0 or more address elements tel | fax= there is a tel or a fax element (tel | fax)* = there are 0 or more repeats of tel or fax email+ = there are 1 or more email elements name, greet?, addr*, (tel | fax)*, email+
11
11 What’s in a person Element? (cont.) Does this expression differ from: –name, greet?, addr*, tel*, fax*, email+ –name, greet?, addr*, (fax | tel)*, email+ –name, greet?, addr*, (fax | tel)*, email, email* –name, greet?, addr*, (fax | tel)*, email*, email name, greet?, addr*, (tel | fax)*, email+
12
12 DTD For the Address Book <!DOCTYPE addressbook [ <!ELEMENT person (name, greet?, address*, (fax | tel)*, email+)> ]>
13
13 Example Requirements: –Every country must have a name as the first node. –Every country must have a capital city as the following node. –A country may have a king. –A country may have a queen. What is wrong with the following: –
14
14 Unambiguity A DTD must be 1-unambigious, i.e., it must be clear at any moment when parsing a document, which point we are at in the regular expression Which of the following is 1-unambigious? –(a,b)|(a,c) –a,(b|c) We now formalize these ideas…
15
15 Languages An element definition defines a language, i.e., the set of all legal series of children Example: Which of the following are in the language defined by a*,(b|c),a+ –aba –abca –aab –aaacaaa
16
16 Automata Languages can also be defined using an automata An automata is: –a set of states Q. –an alphabet –a transition function , which associates a pair (q,a) with a state q’ –an initial state q 0 –a set of accepting states F A word a 1 …a n is in the language defined by an automata if there is a path from q 0 to a state in F with edges labeled a 1,…,a n
17
17 Automata Example: What Language Does this Define? q0q0 q1q1 q2q2 a a b
18
18 Automata Example: What Language Does this Define? q0q0 q1q1 q2q2 a a b q3q3 b c
19
19 Automata Example: What Language Does this Define? q0q0 q1q1 q2q2 a b b q3q3 b c Note that this automata is non-deterministic!
20
20 Non-Deterministic Automata An automaton is non-deterministic if there is a state q and a letter a such that there are at least two transitions from q via edges labeled with a –What words are in the language of a non- deterministic automata? We now show how to create a Glushkov automata from a regular expression
21
21 Creating an automata from an element definition a*,(b|c),a+ Step 1: Normalize the expression by replacing any occurrence of an expression e+ with e,e* Step 2: Use subscripts to number each occurrence of each letter a*,(b|c)a,a* a 1 *,(b 1 |c 1 )a 2,a 3 *
22
22 Creating an automata from an element definition Step 3: Create a state for each subscripted letter, and a state q 0 a 1 *,(b 1 |c 1 )a 2,a 3 * Step 4: Choose as accepting states all subscripted letters with which it is possible to end a word q0q0 a1a1 b1b1 c1c1 a2a2 a3a3
23
23 Creating an automata from an element definition Step 5: Create a transition from a state l j to a state k j if there is a word in which k j follows l i. Label the transition with k a 1 *(b 1 |c 1 )a 2,a 3 * q0q0 a1a1 b1b1 c1c1 a2a2 a3a3 You fill in the transitions!
24
24 1-unambigious A language is 1-unambigious if its Glushov automata is deterministic. –otherwise it is 1-ambigious –element definitions in a DTD must be 1-unambigious! Examples: Create a Glushkov automata for the following and check whether the corresponding languages are 1-unambigious –(a,b)|(a,c) –a,(b|c) –a?, d+, b*, d*, (c|b)+
25
25 Ambigious Example Replace the following with a 1-unambigious equivalent expression <!ELEMENT country (president | king | (king,queen) | queen)>
26
26 Another Example Customers at may pay with a combination of credit cards and cash. If cards and cash are both used the cards must come first. There may be more than one card. There may be no more than one cash element. At least one method of payment must be used. Find a 1-unambigious definition for the element payment, using the elemenrs card and cash
27
27 Attribute Definitions
28
28 More DTD Syntax XML documents can have elements, which can have attributes. How are they defined? General Syntax: <!ATTLIST element-name attribute-name1 type1 default-value1 attribute-name2 type2 default-value2 …. attribute-namen typen default-valuen> Example:
29
29 <!ATTLIST element-name attribute-name1 type1 default-value1 attribute-name2 type2 default-value2 …. attribute-namen typen default-valuen> type is one of the following (there are additional possibilities that we don’t discuss) CDATAcharacter data (en1|en2|..)value must be one from the given list IDvalue is a unique id IDREFvalue is the id of another element IDREFSvalue is a list of other ids
30
30 <!ATTLIST element-name attribute-name1 type1 default-value1 attribute-name2 type2 default-value2 …. attribute-namen typen default-valuen> default-value is one of the following valueThe default value of the attribute #REQUIRED The attribute value must be included in the element #IMPLIED The attribute does not have to be included
31
31 Examples <!ATTLIST height dimension (cm | in) #REQUIRED accuracy CDATA #IMPLIED resizable CDATA “yes” >
32
32 Specifying ID and IDREF Attributes <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>
33
33 Specifying ID and IDREF Attributes (cont.) The attributes mother and father are references to IDs of other elements However, those are not necessarily person elements! The mother attribute is not necessarily a reference to a female person References to IDs have no type
34
34 Some Conforming Data Lisa Simpson Bart Simpson Marge Simpson Homer Simpson
35
35 Consistency of ID and IDREF Attribute Values If an attribute is declared as ID –the associated values must all be distinct (no confusion) –In other words, No two ID attributes can have the same value If an attribute is declared as IDREF –the associated value must exist as the value of some ID attribute (no dangling “pointers”) Similarly for all the values of an IDREFS attribute
36
36 Is This Legal? Clark Kent Linda Lee
37
37 Is This Legal? Clark Kent Linda Lee Banana
38
38 Adding a DTD to the Document A DTD can be internal –The DTD is part of the document file or external –The DTD and the document are on separate files An external DTD may reside –In the local file system (where the document is) –In a remote file system (by using a URL)
39
39 Connecting a Document with its DTD An internal DTD: … ]>... A DTD from the local file system: A DTD from a remote file system:
40
40 Valid Documents A document with a DTD is valid if it conforms to the DTD, i.e., –the document conforms to the regular- expression grammar, –types of attributes are correct, and –constraints on references are satisfied
41
41 DTD Issues
42
42 DTDs Problems (1) DTDs are rather weak specifications by DB & programming-language standards –Only one base type – PCDATA –No useful “abstractions”, e.g., sets –IDREFs are untyped –No constraints, e.g., child is inverse of parent –Tag definitions are global –Not easily parsed (since they are not XML) Some extensions of XML impose a schema or types on an XML document, e.g., XSchema
43
43 DTD Problems (2) How would you say that element a has exactly the children c, d, e in any order? In general, can such definitions be written efficiently?
44
44 Be Careful (1) <DOCTYPE genealogy [ <!ELEMENTperson ( name, dateOfBirth, person, -- mother person )> -- father... ]> What is the problem with this?
45
45 Be Careful (2) <DOCTYPE genealogy [ <!ELEMENTperson ( name, dateOfBirth, person?, -- mother person? )> -- father... ]> What is now the problem with this?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.