An Introduction to XML and Web Technologies Schema Languages Anders Møller & Michael I. Schwartzbach  2006 Addison-Wesley.

Slides:



Advertisements
Similar presentations
17 Apr 2002 XML Syntax: DTDs Andy Clark. Validation of XML Documents XML documents must be well-formed XML documents may be valid – Validation verifies.
Advertisements

Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
4 XML Schema.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
1 XML DTD & XML Schema Monica Farrow G30
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
XML Schema Definition Language
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
XML Schema Matthias Hauswirth. Agenda 4 W3C Process 4 XML Schema Requirements 4 The Specifications 4 Schema Tools.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
Introduction to XML This material is based heavily on the tutorial by the same name at
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
XML Open Computing Institute, Inc. 1 eXtensible Markup Language (XML)
1 XML Schemas. 2 Useful Links Schema tutorial links:
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
SDPL 2005Notes 2.5: XML Schemas1 2.5 XML Schemas n Short introduction to XML Schema –W3C Recommendation, 1 st Ed. May, 2001; 2 nd Ed. Oct, 2004: »XML Schema.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XML Schema. Why Schema? To define a class of XML documents Serve same purpose as DTD “Instance document" used for XML document conforming to schema.
An OO schema language for XML SOX W3C Note 30 July 1999.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
CSE 6331 © Leonidas Fegaras XML Schema 1 XML Schema Leonidas Fegaras.
XML Schema Definition (XSD). Definition of a Schema It is a model for describing the structure and content of data The XML Schema was developed as a content.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
Lecture 0 W3C XML Schema. Topics Status Motivation Simple type vs. complex type.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XML Validation III Schemas + RELAX NG Robin Burke ECT 360.
4 Copyright © 2004, Oracle. All rights reserved. Validating XML by Using XML Schema.
MSc in Communication Sciences Program in Technologies for Human Communication Davide Eynard Facoltà di scienze della comunicazione Università.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
THE DATATYPES OF XML SCHEMA A Practical Introduction
Presentation transcript:

An Introduction to XML and Web Technologies Schema Languages Anders Møller & Michael I. Schwartzbach  2006 Addison-Wesley

Messages  Thank you for your feedback!  About lecture preparation: apologies!  About exercise preparation: 2 An Introduction to XML and Web Technologies

Please do the exercises before class! (With apologies and thanks to those (few!) who do or try.)

4 An Introduction to XML and Web Technologies Objectives  The purpose of using schemas  The schema languages DTD and XML Schema (and DSD2 and RELAX NG)  Regular expressions – a commonly used formalism in schema languages

So, why schemas?

6 An Introduction to XML and Web Technologies Motivation  We have designed our Recipe Markup Language ...but so far only informally described its syntax  How can we make tools that check that an XML document is a syntactically correct Recipe Markup Language document (and thus meaningful)?  Implementing a specialized validation tool for Recipe Markup Language is not the solution...

7 An Introduction to XML and Web Technologies Recipe (1/2) Recipes suggested by Jane Dow Rhubarb Cobbler Wed, 14 Jun 95 Combine all and use as cobbler, pie, or crisp.

8 An Introduction to XML and Web Technologies Recipe (2/2) Rhubarb Cobbler made with bananas as the main sweetener. It was delicious. <nutrition calories="170" fat="28%" carbohydrates="58%" protein="14%"/> Garden Quiche is also yummy

9 An Introduction to XML and Web Technologies XML Languages  XML language: a set of XML documents with some semantics  schema: a formal definition of the syntax of an XML language  schema language: a notation for writing schemas

10 An Introduction to XML and Web Technologies Validation instance document schema processor schema valid invalid normalized instance document error message

11 An Introduction to XML and Web Technologies Why use Schemas?  Formal but human-readable descriptions  Data validation can be performed with existing schema processors

12 An Introduction to XML and Web Technologies General Requirements  Expressiveness  Efficiency  Comprehensibility

Interlude: regular expressions (Very briefly.)

14 An Introduction to XML and Web Technologies Examples  A regular expression describing integers: 0|-?(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*  A regular expression describing the valid contents of table elements in XHTML: caption? ( col* | colgroup* ) thead? tfoot? ( tbody+ | tr+ )

15 An Introduction to XML and Web Technologies Regular Expressions  Commonly used in schema languages to describe sequences of characters or elements   : an alphabet (typically Unicode characters or element names)   matches the string    ? matches zero or one    * matches zero or more  ’s   + matches one or more  ’s    matches any concatenation of an  and a    |  matches the union of  and 

DTD (Simple and insufficient)

17 An Introduction to XML and Web Technologies DTD – Document Type Definition  Defined as a subset of the DTD formalism from SGML  Specified as an integral part of XML 1.0  A starting point for development of more expressive schema languages  Considers elements, attributes, and character data – processing instructions and comments are mostly ignored

18 An Introduction to XML and Web Technologies RecipeML with DTD (1/2) <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED>

19 An Introduction to XML and Web Technologies RecipeML with DTD (2/2) <!ATTLIST nutrition calories CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED protein CDATA #REQUIRED alcohol CDATA #IMPLIED>

20 An Introduction to XML and Web Technologies Element Declarations Content models:  EMPTY  ANY  mixed content: (#PCDATA| e 1 | e 2 |... | e n )*  element content: regular expression over element names (concatenation is written with “, ”) Example:

Exercise: DTD Element Description (You have plenty of time)

22 An Introduction to XML and Web Technologies Attribute-List Declarations Each attribute definition consists of  an attribute name  an attribute type  a default declaration Example:

23 An Introduction to XML and Web Technologies Attribute Types  CDATA : any value  enumeration: ( s 1 | s 2 |... | s n )  ID : must have unique value  IDREF (/ IDREFS ): must match some ID attribute(s) ... Examples:

24 An Introduction to XML and Web Technologies Attribute Default Declarations  #REQUIRED  #IMPLIED (= optional)  ” value ” (= optional, but default provided)  #FIXED ” value ” (= required, must have this value) Examples: <!ATTLIST form action CDATA #REQUIRED onsubmit CDATA #IMPLIED method (get|post) "get" enctype CDATA "application/x-www-form-urlencoded" > <!ATTLIST html xmlns CDATA #FIXED "

Exercise: DTD Attribute Description (You have plenty of time)

26 An Introduction to XML and Web Technologies Document Type Declarations  Associates a DTD schema with the instance document ... 

Exercise: Add a doctype & validate your solution at validator.w3.org (Still plenty of time)

28 An Introduction to XML and Web Technologies Entity Declarations (1/3)  Internal entity declarations – a simple macro mechanism Example: Schema: Input: A gadget has a medium size head and a big gizmo subwidget. &copyrightnotice; Output: A gadget has a medium size head and a big gizmo subwidget. Copyright © 2005 Widgets'R'Us.

29 An Introduction to XML and Web Technologies Entity Declarations (2/3)  Internal parameter entity declarations – apply to the DTD, not the instance document Example: Schema: corresponds to

30 An Introduction to XML and Web Technologies Entity Declarations (3/3)  External parsed entity declarations – references to XML data in other files Example:  External unparsed entity declarations – references to non-XML data Example: not widely used!

31 An Introduction to XML and Web Technologies Conditional Sections  Allow parts of schemas to be enabled/disabled by a switch Example: ]]> ]]>

32 An Introduction to XML and Web Technologies Checking Validity with DTD A DTD processor (also called a validating XML parser)  parses the input document (includes checking well-formedness)  checks the root element name  for each element, checks its contents and attributes  checks uniqueness and referential constraints ( ID / IDREF ( S ) attributes)

Exercise: what didn’t we check?

34 An Introduction to XML and Web Technologies RecipeML with DTD (1/2) <!ATTLIST ingredient name CDATA #REQUIRED amount CDATA #IMPLIED unit CDATA #IMPLIED>

35 An Introduction to XML and Web Technologies RecipeML with DTD (2/2) <!ATTLIST nutrition calories CDATA #REQUIRED carbohydrates CDATA #REQUIRED fat CDATA #REQUIRED protein CDATA #REQUIRED alcohol CDATA #IMPLIED>

36 An Introduction to XML and Web Technologies Problems with the DTD description  calories should contain a non-negative number  protein should contain a value on the form N % where N is between 0 and 100;  comment should be allowed to appear anywhere in the contents of recipe  unit should only be allowed in an elements where amount is also present  nested ingredient elements should only be allowed when amount is absent – our DTD schema permits in some cases too much and in other cases too little!

37 An Introduction to XML and Web Technologies Limitations of DTD 1. Cannot constraint character data 2. Specification of attribute values is too limited 3. Element and attribute declarations are context insensitive 4. Character data cannot be combined with the regular expression content model 5. The content models lack an “interleaving” operator 6. The support for modularity, reuse, and evolution is too primitive 7. The normalization features lack content defaults and proper whitespace control 8. Structured embedded self-documentation is not possible 9. The ID/IDREF mechanism is too simple 10. It does not itself use an XML syntax 11. No support for namespaces

Ritter Sport Winners: Morten & Per Note: The Ritter Sport exercises should be within reach of everyone. (Even if maybe they weren’t so far.)

XML Schema (Complex and insufficient)

40 An Introduction to XML and Web Technologies Requirements for XML Schema - W3C’s proposal for replacing DTD Design principles:  More expressive than DTD  Use XML notation  Self-describing  Simplicity Technical requirements:  Namespace support  User-defined datatypes  Inheritance (OO-like)  Evolution  Embedded documentation ...

41 An Introduction to XML and Web Technologies Example (1/3) John Doe CEO, Widget Inc. (202) Instance document:

42 An Introduction to XML and Web Technologies Example (2/3) <schema xmlns=" xmlns:b=" targetNamespace=" Schema:

43 An Introduction to XML and Web Technologies Example (3/3)

44 An Introduction to XML and Web Technologies Connecting Schemas and Instances <b:card xmlns:b=" xmlns:xsi=" xsi:schemaLocation=" business_card.xsd"> John Doe CEO, Widget Inc. (202)

45 An Introduction to XML and Web Technologies Types and Declarations  Simple type definition: defines a family of Unicode text strings  Complex type definition: defines a content and attribute model  Element declaration: associates an element name with a simple or complex type  Attribute declaration: associates an attribute name with a simple type

ELEMENT DECLARATIONS ATTRIBUTE DECLARATIONS 46 An Introduction to XML and Web Technologies

47 An Introduction to XML and Web Technologies Element and Attribute Declarations Examples:

SIMPLE TYPE DEFINITIONS 48 An Introduction to XML and Web Technologies

49 An Introduction to XML and Web Technologies Simple Types (Datatypes) – Primitive string any Unicode string boolean true, false, 1, 0 decimal float E23 double 42E970 dateTime T16:29:00-05:00 time 16:29:00-05:00 date hexBinary 48656c6c6f0a base64Binary SGVsbG8K anyURI QName rcp:recipe, recipe...

Exercise: write an attribute declaration for the “name” attribute of. (Hint, “ ”)

51 An Introduction to XML and Web Technologies Examples regular expression

Exercise: write an attribute declaration for the “no” attribute of. (Hint, “ ” ”...”)

53 An Introduction to XML and Web Technologies Derivation of Simple Types – Restriction Constraining facets: length minLength maxLength pattern enumeration whiteSpace maxInclusive maxExclusive minInclusive minExclusive totalDigits fractionDigits

Simple Type Derivation – Enumeration 54 An Introduction to XML and Web Technologies

Exercise: write an attribute declaration for the “type” attribute of. (Values: iron, wood, putter) (Hint, “ ” ”...”)

56 An Introduction to XML and Web Technologies Simple Type Derivation – Union

57 An Introduction to XML and Web Technologies Built-In Derived Simple Types normalizedString token language Name NCName ID IDREF integer nonNegativeInteger unsignedLong long int short byte...

COMPLEX TYPE DEFINITIONS 58 An Introduction to XML and Web Technologies

59 An Introduction to XML and Web Technologies Example

Exercise: write an element declaration for Exercise: write an element declaration for (“ ” and ” <attribute...” )

61 An Introduction to XML and Web Technologies Complex Types with Complex Contents  Content models as regular expressions: Element reference Concatenation... Union... All... Element wildcard:  Attribute reference:  Attribute wildcard: Cardinalities: minOccurs, maxOccurs, use Mixed content: mixed=”true”

Exercise: Write an element declaration for Exercise: Write an element declaration for <club no="p" type="putter" name="Extreme Precision Pro 1998"/>

Extended Example: Business Cards (click)

Thank you for listening Next week: Beyond the Basics of XML Schema

65 An Introduction to XML and Web Technologies Complex Types with Simple Content

66 An Introduction to XML and Web Technologies Derivation with Complex Content <element ref="b: " minOccurs="0"/> Note: restriction is not the opposite of extension !

67 An Introduction to XML and Web Technologies Global vs. Local Descriptions Global (toplevel) style: <element name="card“ type="b:card_type"/> <element name="name“ type="string"/>... Local (inlined) style: <element name="name" type="string"/>... inlined

68 An Introduction to XML and Web Technologies Global vs. Local Descriptions  Local type definitions are anonymous  Local element/attribute declarations can be overloaded – a simple form of context sensitivity (particularly useful for attributes!)  Only globally declared elements can be starting points for validation (e.g. roots)  Local definitions permit an alternative namespace semantics (explained later...)

69 An Introduction to XML and Web Technologies Requirements to Complex Types  Two element declarations that have the same name and appear in the same complex type must have identical types This requirement makes efficient implementation easier  all can only contain element (e.g. not sequence !) so we cannot use all to solve the problem with comment in RecipeML ...

70 An Introduction to XML and Web Technologies Namespaces   Prefixes are also used in certain attribute values!  Unqualified Locals: if enabled, the name of a locally declared element or attribute in the instance document must have no namespace prefix (i.e. the empty namespace URI) such an attribute or element “belongs to” the element declared in the surrounding global definition always change the default behavior using elementFormDefault="qualified"

71 An Introduction to XML and Web Technologies Derived Types and Subsumption  Assume that T is some type T- is derived from T by restriction T+ is derived from T by extension  Subsumption: Whenever a T instance is required, a T- instance may be used instead (trivial) a T+ instance may be used instead – if the instance has xsi:type=” T+ ” (with xmlns:xsi=" )  Derivation, instantiation, and subsumption can be constrained using final, abstract, and block

72 An Introduction to XML and Web Technologies Substitution Groups  Assume D is (in some number of steps) derived from B, E D is an element declaration of type D, and E B is an element declaration of type B  If E D is in substitution group of E B then an E D element may be used whenever an E B is required  (This is subsumption based on element declarations, not on types)

73 An Introduction to XML and Web Technologies Uniqueness, Keys, References... unique : as key, but fields may be absent in every widget, each part must have unique ( manufacturer, productid ) in every widget, for each annotation, ( manu, prod ) must match a my_widget_key only a “downward” subset of XPath is used

74 An Introduction to XML and Web Technologies Other Features in XML Schema  Groups  Nil values  Annotations  Defaults and whitespace  Modularization – read the book chapter

75 An Introduction to XML and Web Technologies RecipeML with XML Schema (1/5) <schema xmlns=" xmlns:r=" targetNamespace=" elementFormDefault="qualified">

76 An Introduction to XML and Web Technologies RecipeML with XML Schema (2/5)

77 An Introduction to XML and Web Technologies RecipeML with XML Schema (3/5)

78 An Introduction to XML and Web Technologies RecipeML with XML Schema (4/5)

79 An Introduction to XML and Web Technologies RecipeML with XML Schema (5/5)

80 An Introduction to XML and Web Technologies Problems with the XML Schema description  calories should contain a non-negative number  protein should contain a value on the form N % where N is between 0 and 100;  comment should be allowed to appear anywhere in the contents of recipe  unit should only be allowed in an elements where amount is also present  nested ingredient elements should only be allowed when amount is absent – even XML Schema has insufficient expressiveness! solved

81 An Introduction to XML and Web Technologies Limitations of XML Schema 1. The details are extremely complicated (and the spec is unreadable) 2. Declarations are (mostly) context insentitive 3. It is impossible to write an XML Schema description of XML Schema 4. With mixed content, character data cannot be constrained 5. Unqualified local elements are bad practice 6. Cannot require specific root element 7. Element defaults cannot contain markup 8. The type system is overly complicated 9. xsi:type is problematic 10. Simple type definitions are inflexible

82 An Introduction to XML and Web Technologies Strengths of XML Schema  Namespace support  Data types (built-in and derivation)  Modularization  Type derivation mechanism

83 An Introduction to XML and Web Technologies Document Structure Description 2.0 – read the book chapter

84 An Introduction to XML and Web Technologies RELAX NG  OASIS + ISO competitor to XML Schema  Validation only (no normalization)  Designed for simplicity and expressiveness, solid mathematical foundation

85 An Introduction to XML and Web Technologies Processing Model  For a valid instance document, the root element must match a designated pattern  A pattern may match elements, attributes, or character data  Element patterns can contain sub-patterns, that describe contents and attributes

86 An Introduction to XML and Web Technologies Patterns – Regular Hedge Expressions ...  ... (concatenation) ... ... (union)  ...

87 An Introduction to XML and Web Technologies Example

88 An Introduction to XML and Web Technologies Grammars  Pattern definitions and references allow description of recursive structures 

89 An Introduction to XML and Web Technologies Other Features in RELAX NG  Name classes  Datatypes (based on XML Schema’s datatypes)  Modularization  An alternative compact, non-XML syntax – read the book chapter

90 An Introduction to XML and Web Technologies RecipeML with RELAX NG (1/5) <grammar xmlns=" ns=" datatypeLibrary="

91 An Introduction to XML and Web Technologies RecipeML with RELAX NG (2/5)

92 An Introduction to XML and Web Technologies RecipeML with RELAX NG (3/5) *

93 An Introduction to XML and Web Technologies RecipeML with RELAX NG (4/5)

94 An Introduction to XML and Web Technologies RecipeML with RELAX NG (5/5) ([0-9]|[1-9][0-9]|100)% 0

95 An Introduction to XML and Web Technologies Summary  schema: formal description of the syntax of an XML language  DTD: simple schema language elements, attributes, entities,...  XML Schema: more advanced schema language element/attribute declarations simple types, complex types, type derivations global vs. local descriptions...

96 An Introduction to XML and Web Technologies Essential Online Resources   