IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.

Slides:



Advertisements
Similar presentations
Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
Advertisements

An Introduction to XML Based on the W3C XML Recommendations.
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
XML Study-Session: Part II Validating XML Documents.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Document Type Definitions
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
26-Jun-15 XML. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to.
Declare A DTD File. Declare A DTD Inline File For example, use DTD to restrict the value of an XML document to contain only character data.
COS 381 Day 14. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
Introduction to XML This material is based heavily on the tutorial by the same name at
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
XML (2) DTD Sungchul Hong.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
New Perspectives on XML, 2nd Edition
IS432: Semi-Structured Data Dr. Azeddine Chikh. 4. Document Type Definitions (DTDs)
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 Dr Alexiei Dingli XML Technologies DTD. 2 Document Type Definition Defines –the legal building blocks of an XML document –the document structure –The.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
Tutorial 13 Validating Documents with Schemas
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Well Formed XML The basics. A Simple XML Document Smith Alice.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Introduction to XML Extensible Markup Language.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Unit 4 Representing Web Data: XML
Session III Chapter 6 – Creating DTDs
Chapter 7 Representing Web Data: XML
Web Programming Maymester 2004
New Perspectives on XML
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
Presentation transcript:

IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy

You will learn DTD Syntax (Recap) Linking XML & DTD Validation of DTD Limitations of DTD

XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes: –Elements –Attributes, and –Entities –(We will discuss each of these in turn) An XML document is well-structured if it follows certain simple syntactic rules An XML document is valid if it also specifies and conforms to a DTD

Why DTDs? XML documents are designed to be processed by computer programs –If you can put just any tags in an XML document, it’s very hard to write a program that knows how to process the tags –A DTD specifies what tags may occur, when they may occur, and what attributes they may (or must) have A DTD allows the XML document to be verified (shown to be legal) A DTD that is shared across groups allows the groups to produce consistent XML documents

Parsers An XML parser is an API that reads the content of an XML document –Currently popular APIs are DOM (Document Object Model) and SAX (Simple API for XML) A validating parser is an XML parser that compares the XML document to a DTD and reports any errors –Most browsers don’t use validating parsers

An XML example This is the great American novel. It was a dark and stormy night. Suddenly, a shot rang out! An XML document contains (and the DTD describes): Elements, such as novel and paragraph, consisting of tags and content Attributes, such as number="1", consisting of a name and a value Entities (not used in this example)

A DTD example ]> A novel consists of a foreword and one or more chapter s, in that order –Each chapter must have a number attribute A foreword consists of one or more paragraph s A chapter also consists of one or more paragraph s A paragraph consists of parsed character data (text that cannot contain any other elements)

ELEMENT descriptions Suffixes: ? optional foreword? + one or more chapter+ * zero or more appendix* Separators, both, in order foreword?, chapter+ | or section|chapter Grouping ( ) grouping (section|chapter)+

Elements without children The syntax is –The name is the element name used in start and end tags –The category may be EMPTY : In the DTD: In the XML: or just –In the XML, an empty element may not have any content between the start tag and the end tag –An empty element may (and usually does) have attributes

Elements with unstructured children The syntax is –The category may be ANY This indicates that any content--character data, elements, even undeclared elements--may be used Since the whole point of using a DTD is to define the structure of a document, ANY should be avoided wherever possible –The category may be (#PCDATA), indicating that only character data may be used In the DTD: In the XML: A shot rang out! The parentheses are required! Note: In (#PCDATA), whitespace is kept exactly as entered Elements may not be used within parsed character data Entities are character data, and may be used

Elements with children A category may describe one or more children: –Parentheses are required, even if there is only one child –A space must precede the opening parenthesis –Commas (, ) between elements mean that all children must appear, and must be in the order specified –“ | ” separators means any one child may be used –All child elements must themselves be declared –Children may have children –Parentheses can be used for grouping:

Elements with mixed content #PCDATA describes elements with only character data #PCDATA can be used in an “or” grouping: – –This is called mixed content –Certain (rather severe) restrictions apply: #PCDATA must be first The separators must be “ | ” The group must be starred (meaning zero or more)

Names and namespaces All names of elements, attributes, and entities, in both the DTD and the XML, are formed as follows: –The name must begin with a letter or underscore –The name may contain only letters, digits, dots, hyphens, underscores, and colons (and, for foreign languages, combining characters and extenders) The DTD doesn’t know about namespaces--as far as it knows, a colon is just part of a name –The following are different (and both legal): –Avoid colons in names, except to indicate namespaces

An expanded DTD example ]>

Attributes and entities In addition to elements, a DTD may declare attributes and entities –This slide shows examples; we will discuss each in detail An attribute describes information that can be put within the start tag of an element –In XML: –In DTD: An entity describes text to be substituted –In XML: &copyright; In the DTD:

Attributes The format of an attribute is: where the name-type-requirement may be repeated as many times as desired –Note that only spaces separate the parts, so careful counting is essential –The element-name tells which element may have these attributes –The name is the name of the attribute –Each element has a type, such as CDATA (character data) –Each element may be required, optional, or “fixed” –In the XML, attributes may occur in any order

Important attribute types There are ten attribute types These are the most important ones: –CDATA The value is character data –(man|woman|child) The value is one from this list –ID The value is a unique identifier ID values must be legal XML names and must be unique within the document –NMTOKEN The value is a legal XML name This is sometimes used to disallow whitespace in the name It also disallows numbers, since an XML name cannot begin with a digit

Less important attribute types IDREF The ID of another element IDREFS A list of other IDs NMTOKENS A list of valid XML names ENTITY An entity ENTITIES A list of entities NOTATION A notation xml: A predefined XML value

Requirements Recall that an attribute has the form The requirement is one of: –A default value, enclosed in quotes Example: –#REQUIRED The attribute must be present –#IMPLIED The attribute is optional –#FIXED "value" The attribute always has the given value If specified in the XML, the same value must be used

Example DTD | A message may contain either an or a letter At least 1 or more text elements 1 or NO subject elements At least 1 or more text elements sender AND recipient AND date in the same order Element name Element declaration

Assignment Pizza is an Italian fast food. Pizza Recipe includes various ingredients. Certain amount or units of each ingredient should be used. Preparation method must follow specific steps. Pizza recipes are available with helping comments. Nutrition value of a Pizza such as carbohydrates, fats, calories etc are also provided. Create a DTD model for Pizza recipes. Create a valid XML instance that for your DTD

Entities There are exactly five predefined entities: <, >, &, ", and &apos; Additional entities can be defined in the DTD: Entities can be defined in another document: Example of use in the XML: This document is &copyright; Entities are a way to include fixed text (sometimes called “boilerplate”) Entities should not be confused with character references, which are numerical values between & and # Example: &233#; or &xE9#; to indicate the character é

Another example: XML 05/29/2002 Philadelphia, PA USA 84 51

The DTD for this example

Inline DTDs If a DTD is used only by a single XML document, it can be put directly in that document: ]> An inline DTD can be used only by the document in which it occurs

External DTDs An external DTD (a DTD that is a separate document) is declared with a SYSTEM or a PUBLIC command: –The name that appears after DOCTYPE (in this example, myRootElement ) must match the name of the XML document’s root element –Use SYSTEM for external DTDs that you define yourself, and use PUBLIC for official, published DTDs –External DTDs can only be referenced with a URL The file extension for an external DTD is.dtd External DTDs are almost always preferable to inline DTDs, since they can be used by more than one document

Limitations of DTDs DTDs are a very weak specification language –You can’t put any restrictions on element contents –It’s difficult to specify: All the children must occur, but may be in any order This element must occur a certain number of times –There are only ten data types for attribute values But most of all: DTDs aren’t written in XML! –If you want to do any validation, you need one parser for the XML and another for the DTD –This makes XML parsing harder than it needs to be –There is a newer and more powerful technology: XML Schemas –However, DTDs are still very much in use

Limitations of DTDs Non-negative numbers –we cannot express that, e.g. protein, must contain a non-negative numberprotein Data dependencies –unit should only be allowed when amount is presentunit amount Arbitrary placement of data elements –The comment element should be allowed to appear anywherecomment

Validating XML Document An XML parser is an API that reads the content of an XML document –Currently popular APIs are DOM (Document Object Model) and SAX (Simple API for XML) A validating XML parser is an XML parser that compares the XML document to a DTD and reports any errors –Most browsers don’t use validating parsers

Validating XML Document Create a DTD –Universe of Discourse (Vocabulary + Grammar) DTD Syntax –Elements –Attributes

Validating XML Document Web browsers –generally do not validate documents but only check them for well-formed-ness. –IE5 and Opera 5 (Only internal DTDs) Write a program that uses validating parser –Using Parsers Online validators – – – spxhttps:// spx