DTD and XML Schema.

Slides:



Advertisements
Similar presentations
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
Advertisements

4 XML Schema.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
XML 6.5 XML Schema (XSD) 6. What is XML Schema? The origin of schema  XML Schema documents are used to define and validate the content and structure.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
XML Schema Definition Language
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
Managing XML and Semistructured Data
1 Week5 – Schema Why Schema? Schemas vs. DTDs Introduction – W3C vs. Microsoft XDR Schema, How To? Element Types – Simple vs. Complex Attributes Restrictions/Facets.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
XML Schema Vinod Kumar Kayartaya. What is XML Schema?  XML Schema is an XML based alternative to DTD  An XML schema describes the structure of an XML.
Dr. Azeddine Chikh IS446: Internet Software Development.
CSE4500 Information Retrieval Systems XML Schema – Part 1.
XML & XML Schema Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 2 Lecturer.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall XML & XML Schema Semantic Web - Fall 2005 Computer Engineering.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
New Perspectives on XML, 2nd Edition
XML Schema. Why Schema? To define a class of XML documents Serve same purpose as DTD “Instance document" used for XML document conforming to schema.
Schemas 1www.tech.findforinfo.com. What is a Schema a schematic or preliminary plan Description of a structure, details... 2www.tech.findforinfo.com.
An Introduction to XML Sandeep Bhattaram
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
CSE 6331 © Leonidas Fegaras XML Schema 1 XML Schema Leonidas Fegaras.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Schema Definition (XSD). Definition of a Schema It is a model for describing the structure and content of data The XML Schema was developed as a content.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
Lecture 0 W3C XML Schema. Topics Status Motivation Simple type vs. complex type.
CSE3201 Information Retrieval Systems XML Schema – Part 2.
CITA 330 Section 4 XML Schema. XML Schema (XSD) An alternative industry standard for defining XML dialects More expressive than DTD Using XML syntax Promoting.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
XML Schemas Dr. Awad Khalil Computer Science Department AUC.
Session III Chapter 10 – Defining Simple Types
eXtensible Markup Language
XML Schema.
CMP 051 XML Introduction Session IV
Lecture 9 XML & its applications
RELAX NG 2-Aug-18.
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
XML Validation III Schemas
RELAX NG 18-Sep-18.
Managing XML and Semistructured Data
Design and Implementation of Software for the Web
Introduction to XML Extensible Markup Language
THE DATATYPES OF XML SCHEMA A Practical Introduction
CMP 051 XML Introduction Session IV Chapter 10 – Defining Simple Types
New Perspectives on XML
CMP 051 XML Introduction Session III
XML Technologies X-Schema.
RELAX NG 19-Feb-19.
CSE 544: Lecture 5 XML 4/15/2002.
XML Schema Primer Seong Jong Choi Multimedia Lab.
Lecture 9 XML & its applications
XML Schema Diyar A. Abdulqder
Presentation transcript:

DTD and XML Schema

XML Document Type Definitions part of the original XML specification an XML document may have a DTD terminology for XML: well-formed: if tags are correctly closed valid: if it has a DTD and conforms to it validation is useful in data exchange Web Services: DTD+XML Schema

Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]> Web Services: DTD+XML Schema

Example of Valid XML document <company> <person><ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person><ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> <product> ... </product> ... </company> Web Services: DTD+XML Schema

Content Model Element content: what we can put in an element (aka content model) Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA | A | B | C)* (i.e. very restricted) Web Services: DTD+XML Schema

Attributes <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIS person age CDATA #REQUIRED> <person age=“25”> <name> ....</name> ... </person> Web Services: DTD+XML Schema

Attributes <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <ssn> ... </ssn> <name> ... </name> ... </person> Web Services: DTD+XML Schema

Attribute Types CDATA : string ID : key IDREF : foreign key IDREFS : foreign keys separated by space (Monday | Wednesday | Friday) : enumeration NMTOKEN : must be a valid XML name NMTOKENS : multiple valid XML names ENTITY : Reference to, e.g., an external file Web Services: DTD+XML Schema

Attribute Kind #REQUIRED #IMPLIED : optional value : default value value #FIXED : the only value allowed Web Services: DTD+XML Schema

Using DTDs Must include in the XML document Either include the entire DTD: <!DOCTYPE rootElement [ ....... ]> Or include a reference to it: <!DOCTYPE rootElement SYSTEM “http://www.mydtd.org/file.dtd”> Or mix the two... (e.g. to override the external definition) Web Services: DTD+XML Schema

DTDs as Grammars <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title, section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> <paper><section><text></text></section> <section><title></title> <section> … </section> </section> </paper> Web Services: DTD+XML Schema

DTDs as Grammars A DTD = a grammar A valid XML document = a parse tree for that grammar Web Services: DTD+XML Schema

XML Schemas generalizes DTDs uses XML syntax two documents: structure and datatypes http://www.w3.org/TR/xmlschema-1 http://www.w3.org/TR/xmlschema-2 XML-Schema is very complex often criticized some alternative proposals Web Services: DTD+XML Schema

Why XML Schemas? DTDs provide a very weak specification language You can’t put any restrictions on text content You have very little control over mixed content (text plus elements) Little control over ordering of elements DTDs are written in a strange (non-XML) format Separate parsers for DTDs and XML The XML Schema Definition language solves these problems XSD gives you much more control over structure and content XSD is written in XML Web Services: DTD+XML Schema

Why not XML schemas? DTDs have been around longer than XSD Therefore they are more widely used Also, more tools support them XSD is very verbose, even by XML standards More advanced XML Schema instructions can be non-intuitive and confusing Nevertheless, XSD is not likely to go away quickly Web Services: DTD+XML Schema

Referring to a Schema To refer to a DTD in an XML document, the reference goes before the root element: <?xml version="1.0"?> <!DOCTYPE rootElement SYSTEM "url"> <rootElement> ... </rootElement> To refer to an XML Schema in an XML document, the reference goes in the root element: <?xml version="1.0"?> <rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" (the XML Schema Instance reference is required) xsi:noNamespaceSchemaLocation="url.xsd"> (where your XML Schema definition can be found) ... </rootElement> Web Services: DTD+XML Schema

The XSD Document Since the XSD is written in XML, it can get confusing which we are talking about Except for the additions to the root element of our XML data document, we will discuss the XSD schema document The file extension is .xsd The root element is <schema> The XSD starts like this: <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.rg/2001/XMLSchema"> The last line specifies where all XSD tags are defined Web Services: DTD+XML Schema

“Simple” and “Complex” Elements A simple element is one that contains text and nothing else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types, and may have various restrictions applied to it If an element isn’t simple, it’s complex A complex element may have attributes A complex element may be empty, or it may contain text, other elements, or both text and other elements Web Services: DTD+XML Schema

Defining a Simple Element A simple element is defined as <xs:element name="name" type="type" /> where: name is the name of the element the most common values for type are xs:boolean xs:integer xs:date xs:string xs:decimal xs:time Other attributes a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified Web Services: DTD+XML Schema

Defining an Attribute Attributes themselves are always declared as simple types An attribute is defined as <xs:attribute name="name" type="type" /> where: name and type are the same as for xs:element Other attributes a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="optional" the attribute is not required (default) use="required" the attribute must be present Web Services: DTD+XML Schema

Restrictions or “Facets” The general form for putting a restriction on a text value is: Example Restriction on attribute: use xs:attribute <xs:element name="name"> <xs:restriction base="type"> ... the restrictions ... </xs:restriction> </xs:element> <xs:element name="age"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"> <xs:maxInclusive value="140"> </xs:restriction> </xs:element> Web Services: DTD+XML Schema

Restrictions on Numbers minInclusive -- number must be ≥ the given value minExclusive -- number must be > the given value maxInclusive -- number must be ≤ the given value maxExclusive -- number must be < the given value totalDigits -- number must have exactly value digits fractionDigits -- number must have no more than value digits after the decimal point Web Services: DTD+XML Schema

Restrictions on Strings length : the string must contain exactly value characters minLength : the string must contain at least value characters maxLength : the string must contain no more than value characters pattern : the value is a regular expression that the string must match whiteSpace : not really a “restriction”--tells what to do with whitespace value= “preserve”: Keep all whitespace value=“replace”: Change all whitespace characters to spaces value=“collapse”: Remove leading and trailing whitespace, and replace all sequences of whitespace with a single space Web Services: DTD+XML Schema

Enumeration An enumeration restricts the value to be one of a fixed set of values Example: <xs:element name="season"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Spring"/> <xs:enumeration value="Summer"/> <xs:enumeration value="Autumn"/> <xs:enumeration value="Fall"/> <xs:enumeration value="Winter"/> </xs:restriction> </xs:simpleType> </xs:element> Web Services: DTD+XML Schema

Complex Elements A complex element is defined as <xs:element name="name"> <xs:complexType> ... information about the complex type ... </xs:complexType> </xs:element> Attributes are always simple types Web Services: DTD+XML Schema

Example of a Complex Element <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> <xs:sequence> says that elements must occur in this order Remember that attributes are always simple types Web Services: DTD+XML Schema

Another Complex Element <xsd:element name=“paper” type=“papertype”/> <xsd:complexType name=“papertype”> <xsd:sequence> <xsd:element name=“title” type=“xsd:string”/> <xsd:element name=“author” minOccurs=“0”/> <xsd:element name=“year”/> <xsd: choice><xsd:element name=“journal”/> <xsd:element name=“conference”/> </xsd:choice> </xsd:sequence> </xsd:complexType> DTD: <!ELEMENT paper (title, author*, year, (journal|conference))> Web Services: DTD+XML Schema

Declaration and Use Types can be declared/defined for later uses To use a type, use it as the value of type="..." Examples: <xs:element name="student" type="person"/> <xs:element name="professor" type="person"/> Scope is important: you cannot use a type if is local to some other type Web Services: DTD+XML Schema

Elements v.s. Types in XML Schema <xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name=“person” type=“ttt” /> <xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType> DTD: <!ELEMENT person (name, address)> Web Services: DTD+XML Schema

Elements v.s. Types in XML Schema Simple types (integers, strings, ...) Complex types (regular expressions, like in DTDs) Element-type-element alternation: Root element has a complex type That type is a regular expression of elements Those elements have their complex types... ... On the leaves we have simple types Web Services: DTD+XML Schema

Global and Local Definitions Elements declared at the “top level” of a <schema> are available for use throughout the schema Elements declared within a <xs:complexType> are local to that type firstName and lastName elements are locally declared The order of declarations at the “top level” of a <schema> do not specify the order in the XML data document <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> Web Services: DTD+XML Schema

Local and Global Types in XML Schema Local type: Global type: Global types can be reused in other elements <xsd:element name=“person”> … define locally the person’s type … </xsd:element> <xsd:element name=“person” type=“ttt”/> <xsd:complexType name=“ttt”> … define here the type ttt … </xsd:complexType> Web Services: DTD+XML Schema

Local v.s. Global Elements in XML Schema Local element: Global element: Global elements: like in DTDs <xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“address” type=“...”/>... </xsd:sequence> </xsd:complexType> <xsd:element name=“address” type=“...”/> <xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element ref=“address”/> ... </xsd:sequence> </xsd:complexType> Web Services: DTD+XML Schema

Local Names in XML Schema name has different meanings in person and in product <xsd:element name=“person”> <xsd:complexType> . . . . . <xsd:element name=“name”> <xsd:complexType> <xsd:sequence> <xsd:element name=“firstname” type=“xsd:string”/> <xsd:element name=“lastname” type=“xsd:string”/> </xsd:sequence> </xsd:element> . . . . . </xsd:complexType> </xsd:element> <xsd:element name=“product”> <xsd:complexType> . . . . . <xsd:element name=“name” type=“xsd:string”/> . . . . . </xsd:complexType> </xsd:element> Web Services: DTD+XML Schema

xs:sequence We’ve already seen an example of a complex type whose elements must occur in a specific order: <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> Web Services: DTD+XML Schema

xs:all xs:all allows elements to appear in any order The members of an xs:all group can occur exactly once You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1) <xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element> Web Services: DTD+XML Schema

All Group A restricted form of & in SGML Restrictions: <xsd:complexType name="PurchaseOrderType"> <xsd:all> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:all> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> A restricted form of & in SGML Restrictions: Only at top level Has only elements Each element occurs at most once E.g. “comment” occurs 0 or 1 times Web Services: DTD+XML Schema

Regular Expressions in XML Schema Recall the element-type-element alternation: <xsd:complexType name=“....”> [regular expression on elements] </xsd:complexType> Regular expressions: <xsd:sequence> A B C </...> = A B C <xsd:choice> A B C </...> = A | B | C <xsd:group> A B C </...> = (A B C) <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ... </...> = (...)* <xsd:... minOccurs=“0” maxOccurs=“1”> ... </...> = (...)? Web Services: DTD+XML Schema

Referencing Once you have defined an element or attribute (with name="..."), you can refer to it with ref="..." Example: Or just: <xs:element ref="person"> <xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element> <xs:element name="student" ref="person"> Web Services: DTD+XML Schema

Attributes Again Attributes are associated to the type, not to the element Only to complex types; more trouble if we want to add attributes to simple types <xsd:element name=“paper” type=“papertype”/> <xsd:complexType name=“papertype”> <xsd:sequence> <xsd:element name=“title” type=“xsd:string”/> . . . . . . </xsd:sequence> <xsd:attribute name=“language" type="xsd:NMTOKEN“ fixed=“English"/> </xsd:complexType> Web Services: DTD+XML Schema

Text Element with Attributes If a text element has attributes, it is no longer a simple type <xs:element name="population"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="year" type="xs:integer"> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> Web Services: DTD+XML Schema

“Any” Type Means anything is permitted there <xsd:element name="anything" type="xsd:anyType"/> . . . . Web Services: DTD+XML Schema

Empty Elements Empty elements are (ridiculously) complex <xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:anyType"/> <xs:attribute name="count" type="xs:integer"/> </xs:complexContent> </xs:complexType> Web Services: DTD+XML Schema

Mixed Elements Mixed elements may contain both text and elements We add mixed="true" to the xs:complexType element The text itself is not mentioned in the element, and may go anywhere (it is basically ignored) <xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName" type="xs:anyType"/> </xs:sequence> </xs:complexType> Web Services: DTD+XML Schema

Predefined String Types Recall that a simple element is defined as: <xs:element name="name" type="type" /> Here are a few of the possible string types: xs:string -- a string xs:normalizedString -- a string that doesn’t contain tabs, newlines, or carriage returns xs:token -- a string that doesn’t contain any whitespace other than single spaces Allowable restrictions on strings: enumeration, length, maxLength, minLength, pattern, whiteSpace Web Services: DTD+XML Schema

Predefined Date and Time Types xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05 xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds) xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss Allowable restrictions on dates and times: enumeration, minInclusive, maxExclusive, maxInclusive, maxExclusive, pattern, whiteSpace Web Services: DTD+XML Schema

Predefined Numeric Types Here are some of the predefined numeric types: xs:decimal xs:positiveInteger xs:byte xs:negativeInteger xs:short xs:nonPositiveInteger xs:int xs:nonNegativeInteger xs:long Allowable restrictions on numeric types: enumeration, minInclusive, maxExclusive, maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace Web Services: DTD+XML Schema

Extensions You can base a complex type on another complex type <xs:complexType name="newType"> <xs:complexContent> <xs:extension base="otherType"> ...new stuff... </xs:extension> </xs:complexContent> </xs:complexType> Web Services: DTD+XML Schema

Derived Types by Extensions (Inheritance) <complexType name="Address"> <sequence> <element name="street" type="string"/> <element name="city" type="string"/> </sequence> </complexType> <complexType name="USAddress"> <complexContent> <extension base="ipo:Address"> <sequence> <element name="state" type="ipo:USState"/> <element name="zip" type="positiveInteger"/> </extension> </complexContent> Web Services: DTD+XML Schema

Derived Types by Restrictions <complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent> (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions… Web Services: DTD+XML Schema

Summary Similar role to DTD: define structures for XML documents XML Schema definition itself is in XML Detailed simple types String, Token, Byte, unsignedByte, Integer, positiveInteger, Int (larger than Integer), unsignedInt, Long, Short, Time, dateTime, Duration, Date, ID, IDREF, IDREFS, … 15 facets Web Services: DTD+XML Schema