1 Document Type Descriptors (DTDs) Imposing Structure on XML Documents.

Slides:



Advertisements
Similar presentations
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
Advertisements

XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
XML eXtensible Markup Language.
1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
1 XML – Extensible Markup Language DBI – Representation and Management of Data on the Internet.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
1 XML Data Management Document Type Definitions (DTDs) Werner Nutt.
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Chapter 10: XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
1 XML Data Management 3. Document Type Definitions (DTDs) Werner Nutt based on slides by Sara Cohen, Jerusalem.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
CIS 451: XML DTDs Dr. Ralph D. Westfall February, 2009.
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Credits Prepared by: Rajendra P. Srivastava Ernst & Young Professor University of Kansas Sponsored by: Ernst & Young, LLP (August 2005) XBRL Module Part.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
CSE3201 Information Retrieval Systems DTD Document Type Definition.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
XML Technology. Emerging Importance of XML –HTML-tagging is display oriented. –XML-based content tagging has important uses: data mining role-oriented.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 14 This presentation © 2004, MacAvon Media Productions XML.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
XML eXtensible Markup Language.
XML – eXtensible Markup Language. The World Wide Web and What We Would Like to Do with It XML has a lot of hype surrounding it This week we discuss: –Why.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Session III Chapter 6 – Creating DTDs
Web Programming Maymester 2004
XML Data DTDs, IDs & IDREFs.
DTD (Document Type Definition)
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
Presentation transcript:

1 Document Type Descriptors (DTDs) Imposing Structure on XML Documents

2 Document Type Descriptors Document Type Descriptors (DTDs) impose structure on an XML document Using DTDs, we can specify what a “valid” document should contain. These specifications require more than just being well-formed, e.g., what elements are legal, what nesting is allowed DTDs do not have very great expressive power, e.g., cannot specify types

3 What is it good for? DTDs can be used to define special languages of XML, i.e., restricted XML for special needs Examples: –FOAF –SVG (scalable vector graphics) –WML (a kind of html for wireless devices) –SOAP (for web services) –XHTML (well-formed version of HTML) Standards can be defined using DTDs, for data exchange and special applications can be written

4 Address Book DTD Suppose we want to create a DTD that describes legal address book entries This DTD will be used to exchange address book information between programs How should it be written? (What is a legal address?) We discuss both element definitions and attribute definitions

5 Element Definitions

6 Example: An Address Book Homer Simpson Dr. H. Simpson 1234 Springwater Road Springfield USA, (321) (321) Mixed telephones and faxes At least one As many address lines as needed At most one greetingExactly one name

7 Specifying the Structure How do we specify exactly what must appear in a person element? In a DTD, we can specify the permitted content for each element. The permitted content is specified as a regular expression We show the general syntax, and then an example

8 aElement a e1?0 or 1 occurrences of expression e1 e1*0 or more occurrences of expression e1 e1+1 or more occurrences of expression e1 e1,e2Expression e2 after expression e2 e1|e2 Either expression e1 or expression e2 (but not both!) (e)Grouping #PCDATAParsed character data (i.e., text) EMPTYNo content ANYAny content (#PCDATA|a 1 |..|a n )*Mixed content

9 What’s in a person Element? The expression is: –name, greet?, addr*, (tel | fax)*, + We discuss what each part of this means –name = there must be a name element –greet? = there is an optional greet element (i.e., 0 or 1 greet elements) –name, greet? = the name element is followed by an optional greet element

10 What’s in a person Element? (cont.) addr* = there are 0 or more address elements tel | fax= there is a tel or a fax element (tel | fax)* = there are 0 or more repeats of tel or fax + = there are 1 or more elements name, greet?, addr*, (tel | fax)*, +

11 What’s in a person Element? (cont.) Does this expression differ from: –name, greet?, addr*, tel*, fax*, + –name, greet?, addr*, (fax | tel)*, + –name, greet?, addr*, (fax | tel)*, , * –name, greet?, addr*, (fax | tel)*, *, name, greet?, addr*, (tel | fax)*, +

12 DTD For the Address Book <!DOCTYPE addressbook [ <!ELEMENT person (name, greet?, address*, (fax | tel)*, +)> ]>

13 Example Requirements: –Every country must have a name as the first node. –Every country must have a capital city as the following node. –A country may have a king. –A country may have a queen. What is wrong with the following: –

14 Unambiguity A DTD must be 1-unambigious, i.e., it must be clear at any moment when parsing a document, which point we are at in the regular expression Which of the following is 1-unambigious? –(a,b)|(a,c) –a,(b|c) We now formalize these ideas…

15 Languages An element definition defines a language, i.e., the set of all legal series of children Example: Which of the following are in the language defined by a*,(b|c),a+ –aba –abca –aab –aaacaaa

16 Automata Languages can also be defined using an automata An automata is: –a set of states Q. –an alphabet  –a transition function , which associates a pair (q,a) with a state q’ –an initial state q 0 –a set of accepting states F A word a 1 …a n is in the language defined by an automata if there is a path from q 0 to a state in F with edges labeled a 1,…,a n

17 Automata Example: What Language Does this Define? q0q0 q1q1 q2q2 a a b

18 Automata Example: What Language Does this Define? q0q0 q1q1 q2q2 a a b q3q3 b c

19 Automata Example: What Language Does this Define? q0q0 q1q1 q2q2 a b b q3q3 b c Note that this automata is non-deterministic!

20 Non-Deterministic Automata An automaton is non-deterministic if there is a state q and a letter a such that there are at least two transitions from q via edges labeled with a –What words are in the language of a non- deterministic automata? We now show how to create a Glushkov automata from a regular expression

21 Creating an automata from an element definition a*,(b|c),a+ Step 1: Normalize the expression by replacing any occurrence of an expression e+ with e,e* Step 2: Use subscripts to number each occurrence of each letter a*,(b|c)a,a* a 1 *,(b 1 |c 1 )a 2,a 3 *

22 Creating an automata from an element definition Step 3: Create a state for each subscripted letter, and a state q 0 a 1 *,(b 1 |c 1 )a 2,a 3 * Step 4: Choose as accepting states all subscripted letters with which it is possible to end a word q0q0 a1a1 b1b1 c1c1 a2a2 a3a3

23 Creating an automata from an element definition Step 5: Create a transition from a state l j to a state k j if there is a word in which k j follows l i. Label the transition with k a 1 *(b 1 |c 1 )a 2,a 3 * q0q0 a1a1 b1b1 c1c1 a2a2 a3a3 You fill in the transitions!

24 1-unambigious A language is 1-unambigious if its Glushov automata is deterministic. –otherwise it is 1-ambigious –element definitions in a DTD must be 1-unambigious! Examples: Create a Glushkov automata for the following and check whether the corresponding languages are 1-unambigious –(a,b)|(a,c) –a,(b|c) –a?, d+, b*, d*, (c|b)+

25 Ambigious Example Replace the following with a 1-unambigious equivalent expression <!ELEMENT country (president | king | (king,queen) | queen)>

26 Another Example Customers at may pay with a combination of credit cards and cash. If cards and cash are both used the cards must come first. There may be more than one card. There may be no more than one cash element. At least one method of payment must be used. Find a 1-unambigious definition for the element payment, using the elemenrs card and cash

27 Attribute Definitions

28 More DTD Syntax XML documents can have elements, which can have attributes. How are they defined? General Syntax: <!ATTLIST element-name attribute-name1 type1 default-value1 attribute-name2 type2 default-value2 …. attribute-namen typen default-valuen> Example:

29 <!ATTLIST element-name attribute-name1 type1 default-value1 attribute-name2 type2 default-value2 …. attribute-namen typen default-valuen> type is one of the following (there are additional possibilities that we don’t discuss) CDATAcharacter data (en1|en2|..)value must be one from the given list IDvalue is a unique id IDREFvalue is the id of another element IDREFSvalue is a list of other ids

30 <!ATTLIST element-name attribute-name1 type1 default-value1 attribute-name2 type2 default-value2 …. attribute-namen typen default-valuen> default-value is one of the following valueThe default value of the attribute #REQUIRED The attribute value must be included in the element #IMPLIED The attribute does not have to be included

31 Examples <!ATTLIST height dimension (cm | in) #REQUIRED accuracy CDATA #IMPLIED resizable CDATA “yes” >

32 Specifying ID and IDREF Attributes <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>

33 Specifying ID and IDREF Attributes (cont.) The attributes mother and father are references to IDs of other elements However, those are not necessarily person elements! The mother attribute is not necessarily a reference to a female person References to IDs have no type

34 Some Conforming Data Lisa Simpson Bart Simpson Marge Simpson Homer Simpson

35 Consistency of ID and IDREF Attribute Values If an attribute is declared as ID –the associated values must all be distinct (no confusion) –In other words, No two ID attributes can have the same value If an attribute is declared as IDREF –the associated value must exist as the value of some ID attribute (no dangling “pointers”) Similarly for all the values of an IDREFS attribute

36 Is This Legal? Clark Kent Linda Lee

37 Is This Legal? Clark Kent Linda Lee Banana

38 Adding a DTD to the Document A DTD can be internal –The DTD is part of the document file or external –The DTD and the document are on separate files An external DTD may reside –In the local file system (where the document is) –In a remote file system (by using a URL)

39 Connecting a Document with its DTD An internal DTD: … ]>... A DTD from the local file system: A DTD from a remote file system:

40 Valid Documents A document with a DTD is valid if it conforms to the DTD, i.e., –the document conforms to the regular- expression grammar, –types of attributes are correct, and –constraints on references are satisfied

41 DTD Issues

42 DTDs Problems (1) DTDs are rather weak specifications by DB & programming-language standards –Only one base type – PCDATA –No useful “abstractions”, e.g., sets –IDREFs are untyped –No constraints, e.g., child is inverse of parent –Tag definitions are global –Not easily parsed (since they are not XML) Some extensions of XML impose a schema or types on an XML document, e.g., XSchema

43 DTD Problems (2) How would you say that element a has exactly the children c, d, e in any order? In general, can such definitions be written efficiently?

44 Be Careful (1) <DOCTYPE genealogy [ <!ELEMENTperson ( name, dateOfBirth, person, -- mother person )> -- father... ]> What is the problem with this?

45 Be Careful (2) <DOCTYPE genealogy [ <!ELEMENTperson ( name, dateOfBirth, person?, -- mother person? )> -- father... ]> What is now the problem with this?