Copyright 2003-05 John Cowan under the GNU GPL1 Describing Document Types: The Schema Languages of XML Part 1 John Cowan.

Slides:



Advertisements
Similar presentations
XML I.
Advertisements

ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
Document Type Definitions
15-Jun-15 RELAX NG. 2 What is RELAX NG? RELAX NG is a schema language for XML It is an alternative to DTDs and XML Schemas It is based on earlier schema.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
RELAX NG. Caveat I did not have a RELAX NG validator when I wrote these slides. Therefore, if an example appears to be wrong, it probably is.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
Copyright John Cowan under the GNU GPL1 RELAX NG: DTDs ON WARP DRIVE John Cowan Reuters.
**1 RELAX NG: DTDs ON WARP DRIVE John Cowan Reuters Health Information.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
Validating DOCUMENTS with DTDs
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
New Perspectives on XML, 2nd Edition
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
An OO schema language for XML SOX W3C Note 30 July 1999.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Schematron Tim Bornholtz. Schema languages Many people turn to schema languages when they want to be sure that an XML instance follows certain rules –DTD.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
Copyright 2004 John Cowan 1 Infinite Diversity in Infinite Combinations why one schema language is not enough John Cowan.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Using DSDL plus annotations for Netconf (+) data modeling Rohan Mahy draft-mahy-canmod-dsdl-01.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Unit 4 Representing Web Data: XML
Infinite Diversity in Infinite Combinations
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Describing Document Types: The Schema Languages of XML Part 1
New Perspectives on XML
Document Type Definition (DTD)
New Perspectives on XML
Presentation transcript:

Copyright John Cowan under the GNU GPL1 Describing Document Types: The Schema Languages of XML Part 1 John Cowan

Copyright John Cowan under the GNU GPL2 Copyright Copyright © John Cowan Licensed under the GNU General Public License ABSOLUTELY NO WARRANTIES; USE AT YOUR OWN RISK Portions written by James Clark; used by permission Black and white for readability The Gentium font available at

Copyright John Cowan under the GNU GPL3 Abstract This tutorial will teach you the basics of several XML schema languages: DTDs, RELAX NG, Schematron, and W3C XML Schema. You will end up understanding the principles of each and their advantages and disadvantages in various applications. Prerequisites: An understanding of basic XML concepts. Knowledge of any XML or SGML schema languages is helpful but certainly not a requirement.

Copyright John Cowan under the GNU GPL4 Part 1 Abstract In this part of the tutorial you will learn how to use the RELAX NG schema language, one of the principal schema languages for XML. RELAX NG allows easy and intuitive descriptions of just what is and what is not allowed in an XML document. It is simple enough to learn in a few hours, and rich and flexible enough to support the design and validation of every kind of document from the very simple to the very complex. We will also review DTDs, and talk briefly about Schematron (a rules-based schema language) and NVDL (a meta-schema language for compound documents).

Copyright John Cowan under the GNU GPL5 Roadmap Review of DTDs (17) RELAX NG goals (21) Invoice example(10) Basic Patterns (13) More Patterns (13) Datatypes (11) Tools (14) NVDL (4) Schematron (12)

Copyright John Cowan under the GNU GPL6 REVIEW OF DTDS

Copyright John Cowan under the GNU GPL7 DTDs and documents DTDs can appear within a document, outside a document, or both The DOCTYPE declaration specifies a DTD Internal subset: External subset: External before internal

Copyright John Cowan under the GNU GPL8 Functions of the DTD Document validation Default values for missing attributes Entity declaration and replacement Document documentation

Copyright John Cowan under the GNU GPL9 ELEMENT declarations Specify what all the elements that share a common name can contain (the content model) No direct support for namespaces Elements can contain –Nothing –Text only –Text and child elements –Just child elements

Copyright John Cowan under the GNU GPL10 ELEMENT declaration syntax Empty: Text only: Text and child elements:

Copyright John Cowan under the GNU GPL11 Element content models Sequence of child elements: Choice between child elements: Sequence of optional child elements:

Copyright John Cowan under the GNU GPL12 Element content models Repeated child elements (zero or more): Repeated child elements (one or more): Any of these possibilities can be freely combined

Copyright John Cowan under the GNU GPL13 Element content models Choice between sequences: ((a, b) | (c, d) Sequence of choices: ((a | b), (c | d)) Optional sequence: (a, b, c)? You cannot mix, and | in one list; use parentheses to disambiguate

Copyright John Cowan under the GNU GPL14 Restrictions on content models A mixed model cannot constrain how many times or in what order the child elements appear, only which elements are allowed An element-only content model requires that each child element in the instance match exactly one part of the content model – (A?, A?) is not a legal content model

Copyright John Cowan under the GNU GPL15 Element wildcard specifies an element that can contain any child elements declared elsewhere in the DTD May include text as well Cannot include undeclared elements

Copyright John Cowan under the GNU GPL16 ATTLIST declarations Declares the allowed attributes of an element

Copyright John Cowan under the GNU GPL17 ATTLIST declarations Any number of attributes for a single element type may be declared in a single ATTLIST Any number of ATTLIST declarations may be used for a given element type The typical case: one ATTLIST per element type

Copyright John Cowan under the GNU GPL18 Attribute types DTDs support only a few attribute types CDATA: no constraints ID : an identifier for the element IDREF : must match some element's ID NMTOKEN : a name or number

Copyright John Cowan under the GNU GPL19 Attribute types IDREFS : one or more space-separated element IDs NMTOKENS : one or more space- separated names or numbers Enumerated tokens: value must be one

Copyright John Cowan under the GNU GPL20 Attribute defaults Specific value required (supplied by parser): #FIXED "value" Attribute required but no specific value: #REQUIRED Attribute not required, default value: "value" Attribute not required, no default value: #IMPLICIT

Copyright John Cowan under the GNU GPL21 ENTITY declarations Declare a name for text to be inserted into a document Entity references are of the form &name; The text is typically just one character long There is a long list of ISO-standardized character names General entities have no equivalents in other schema languages

Copyright John Cowan under the GNU GPL22 Parameter entity declarations Declare a name for text (a single string or a whole file) to be inserted into a DTD; Allow parameterization of DTDs through included/ignored sections A trick involving parameter entities makes it possible to handle namespace prefixes Various specialized rules make parameter entities difficult to use

Copyright John Cowan under the GNU GPL23 Obsolescent features Unparsed entities –Non-XML objects –Declared by declarations –Referred to by attributes of type ENTITY or ENTITIES NOTATION declarations –Specifies type of an unparsed entity –Specifies type of a text-only element using attributes of type NOTATION

Copyright John Cowan under the GNU GPL24 RELAX NG GOALS

Copyright John Cowan under the GNU GPL25 "DTDs on warp drive" An evolution/generalization of DTDs Shares the same basic paradigm Based on experience with SGML, XML Adds and subtracts features from DTDs DTDs can be automatically converted

Copyright John Cowan under the GNU GPL26 Reusable knowledge Experts in designing SGML and XML DTDs will find their skills transfer easily Design patterns commonly used in XML DTDs can be reused Much more mature than if based on a completely new and different paradigm Higher degree of confidence in its design is possible

Copyright John Cowan under the GNU GPL27 Easy to learn and use Allows schemas to be patterned after the structure of the documents they describe Allows definitions to be composed from other definitions in a variety of ways Treats attributes and elements as uniformly as possible

Copyright John Cowan under the GNU GPL28 Namespaces DTDs are namespace-blind RELAX NG fully supports namespaces for elements and attributes Namespace support is purely syntactic, not tied to one schema per namespace Name classes support “any name” and “any name in specified namespace”

Copyright John Cowan under the GNU GPL29 Datatyping Supports pluggable simple datatype libraries Basic library supports strings and tokens Full XML Schema Part 2 datatypes available (including facets) New libraries can be readily designed and built as needed.

Copyright John Cowan under the GNU GPL30 Composability Schema languages provide atomic objects (elements, attributes, text, typed data) and methods of composing them (sequence, repetition, choice) All RELAX NG atomic objects can be composed with any available method Improves ease of learning, use, power; decreases complexity

Copyright John Cowan under the GNU GPL31 Closure RELAX NG is closed under union –If two schemas exist describing two document types, then a schema describing the union of the two document types is trivial to create –Consequently, the content model of an element can be context-dependent

Copyright John Cowan under the GNU GPL32 Two syntaxes, one language Provides two interconvertible syntaxes: –an XML one for processing –a compact non-XML one for human authoring We will learn the compact syntax One example of the XML syntax is provided to assist in learning it

Copyright John Cowan under the GNU GPL33 Attributes “Elements or attributes?” –Reasonable people can differ –Attributes are treated as much like elements as possible Content models include elements as well as attributes Attribute defaulting is not done

Copyright John Cowan under the GNU GPL34 Non-goal: attribute defaulting Attribute defaulting can only be done by DTDs or W3C XML Schema when the value does not depend on context Sensible attribute defaults often depend on context (inheritance of xml:lang, e.g.) Attribute defaulting is trivial for transformation languages such as XSLT

Copyright John Cowan under the GNU GPL35 Non-goal: PSVI RELAX NG has no post-validation infoset enhancement Infoset enhancement can be done as a separate layer Sun’s Multi-Schema Validator provides datatype information Separation of concerns promotes efficiency, flexibility

Copyright John Cowan under the GNU GPL36 Annotations Annotations in the form of elements and attributes can be interspersed in RELAX NG schemas for various purposes: –DTD-style attribute defaults –documentation –embedded Java code Conforming RELAX NG validators ignore annotations

Copyright John Cowan under the GNU GPL37 Mixed content SGML had problems with complex mixed-content models XML DTDs tightly restrict mixed-content models RELAX NG allows character content mixed with any content model

Copyright John Cowan under the GNU GPL38 Unordered content SGML’s & operator allows unordered content models – A & B means ((A, B) | (B, A)) XML DTDs removed & to reduce implementation complexity RELAX NG restores & with improvements

Copyright John Cowan under the GNU GPL39 Customization Definitions included from another schema can be overridden Multiple definitions from the same or different schemas can be intelligently combined –as if with | –as if with &

Copyright John Cowan under the GNU GPL40 A real standard Standardized in OASIS by the RELAX NG Technical Committee A major component of ISO DSDL, the Document Schema Definition Languages umbrella-standard

Copyright John Cowan under the GNU GPL41 Non-goal: inheritance Inheritance-based schemas only model single inheritance Modeling often requires multiple inheritance (at least for interfaces) Schema languages are really about syntactic details, not about models

Copyright John Cowan under the GNU GPL42 Non-goal: identity constraints Identity constraints are not supported Identity constraints are still a developing research area Different applications have different requirements from simple to complex Some RELAX NG tools support DTD- style semantics for ID, IDREF(S)

Copyright John Cowan under the GNU GPL43 Non-goal: schema binding There is no standard way for a document to specify “its schema” Receivers often want to verify against agreed- on schemas, not sender-specified ones Documents may be validated against different schemas for different purposes The validation model takes two inputs: a document and a schema Just part of the XML processing issue

Copyright John Cowan under the GNU GPL44 Interoperability You can convert a DTD to RELAX NG, preserving modularity You can author in RELAX NG and deliver as a DTD or a W3C XML Schema or both RELAX NG allows embedded Schematron rules

Copyright John Cowan under the GNU GPL45 Pronunciation "Relaxing" is the standard way Some people say "relax en gee"

Copyright John Cowan under the GNU GPL46 AND NOW TO RELAX!

Copyright John Cowan under the GNU GPL47

Copyright John Cowan under the GNU GPL48 THE INVOICE EXAMPLE

Copyright John Cowan under the GNU GPL49 An invoice in XML Reuters Health Information 45 West 36th St. New York NY Reuters Health Information 45 West 36th St. New York NY Net 10 days Binder, D-ring, 1.5" Fork, Plastic, Heavy, Medium

Copyright John Cowan under the GNU GPL50 The invoice schema (1) element invoice { attribute number { text }, attribute date { text }, element soldTo { element name { text }, element address { text } }, element shipTo { element name { text }, element address { text } },

Copyright John Cowan under the GNU GPL51 The invoice schema (2) element terms { text }, element item { attribute unitPrice { text }, attribute ordered { text }, attribute shipped { text }, attribute backOrdered { text }?, text }* }

Copyright John Cowan under the GNU GPL52 Things to note The structure of the schema parallels the structure of the document Element content models include attributes as well as child elements The optional attribute is marked with ? text is the equivalent of #PCDATA or CDATA

Copyright John Cowan under the GNU GPL53 Things to note Commas separate multiple components of a content model when the components appear in the given order Of course, the order of attributes does not matter! Consequently, attributes can appear in the schema before, after, or mixed in with child elements

Copyright John Cowan under the GNU GPL54 The XML format

Copyright John Cowan under the GNU GPL55 Definition form start = invoice element invoice { attribute number { text }, attribute date { text }, soldTo, shipTo, terms, item } soldTo = element soldTo { name, address } shipTo = element shipTo { name, address } terms = element terms { text }

Copyright John Cowan under the GNU GPL56 Definition form item = element item { attribute unitPrice { text }, attribute ordered { text }, attribute shipped { text }, attribute backOrdered { text }?, text } name = element name { text } address = element address { text }

Copyright John Cowan under the GNU GPL57 Definition form notes In definition form, there must always be a definition of start You refer to a rule using just its name The order of the rules does not matter; use whatever order makes sense to you (top-down, bottom-up, alphabetical) Rule names are only relevant to the schema, and never appear in the document instance

Copyright John Cowan under the GNU GPL58 Three schema designs "Russian Doll": a single pattern for the whole document "Salami Slice": one definition for each differently-named element (DTD style) "Venetian Blind": one definition for each pattern specifying the content of an element, plus one for the document element

Copyright John Cowan under the GNU GPL59 Venetian Blind schema start = element invoice {invoice} invoice = attribute number { text }, attribute date { text }, element soldTo { name-addr }, element shipTo { name-addr }, element terms { text }, element item { item }*

Copyright John Cowan under the GNU GPL60 Venetian Blind schema item = attribute unitPrice { text }, attribute ordered { text }, attribute shipped { text }, attribute backOrdered { text }?, text name-addr = element name { text }, element address { text }

Copyright John Cowan under the GNU GPL61 BASIC PATTERNS

Copyright John Cowan under the GNU GPL62 Patterns Patterns are the basic building blocks of RELAX NG schemas and rules Some kinds of patterns can contain sub- patterns enclosed in braces ( { … } )

Copyright John Cowan under the GNU GPL63 Element patterns Syntax: element name { … } The content model (child elements and attributes) is contained within the braces Content models consist of one or more patterns

Copyright John Cowan under the GNU GPL64 Attribute patterns Syntax: attribute name { … } The content model is contained within the braces Content models consist of one or more patterns You can't have child elements or attributes within attributes, of course!

Copyright John Cowan under the GNU GPL65 Attribute patterns So what patterns can be inside attributes? –The text pattern - equivalent to CDATA –Datatypes (details later) –Literal strings in quotes: attribute country { "US" } means the country attribute must have the value US.

Copyright John Cowan under the GNU GPL66 Element patterns So what patterns can be inside elements? –The text pattern - equivalent to #PCDATA –Datatypes (details later) –Literal strings in quotes: element country { "US" } means the country element must have the content US.

Copyright John Cowan under the GNU GPL67 The text pattern Matches any amount of arbitrary text, possibly broken up by child elements Equivalent to #PCDATA in elements or CDATA in attributes text*, text?, text+ all mean the same as text

Copyright John Cowan under the GNU GPL68 Namespaces To declare elements and attributes in namespaces, use QNames in element and attribute patterns Namespace prefixes are declared like this: namespace foo = " (some URI) " Namespace declarations must come first in the schema

Copyright John Cowan under the GNU GPL69 Default namespaces You can declare a namespace for unprefixed elements (not attributes) like this: default namespace = " (some URI) " If you want the default namespace to have a prefix too, use: default namespace foo = "( some URI) "

Copyright John Cowan under the GNU GPL70 Namespaces Here's an example: namespace one = " namespace two = " default namespace = " element para { attribute one:class { text }, attribute two:class { text }, element line { text }* }

Copyright John Cowan under the GNU GPL71 Choice Two patterns separated by | represent a choice between them; the document can match one pattern or the other, not both Arbitrary patterns are allowed in a choice: you can have a choice between attributes, between elements, or even between an element and an attribute!

Copyright John Cowan under the GNU GPL72 Choice A useful case: element data { (element id { text } | attribute id { text }), text } You cannot mix, and | in one list; use parentheses to disambiguate

Copyright John Cowan under the GNU GPL73 Choice Enumerated values use choice like this: element font { attribute size { "10" | "12" | "14" | "16" } }

Copyright John Cowan under the GNU GPL74 Quantifiers You can place an *, ?, or + after any pattern to allow it to be repeated: – * means zero or more times – ? means zero or one times – + means one or more times These mean the same as in DTD content models, but can be used after any pattern, not just rule names

Copyright John Cowan under the GNU GPL75 MORE PATTERNS

Copyright John Cowan under the GNU GPL76 Interleave Interleave is a cross between choice and sequence When patterns are combined with &, they all must appear but it can be in any order (as in SGML) … … or even mixed together!

Copyright John Cowan under the GNU GPL77 Interleave So this schema... element head { element meta { empty }* & element title { text } } … matches a head element that has any number of meta child elements (including zero) and a required title child element mixed in anywhere.

Copyright John Cowan under the GNU GPL78 Interleave Note: In the case of attributes, sequence and interleave are the same thing, because attributes don't have ordering So you can use either, or & according to what is the most convenient The pattern mixed {... } is a synonym for (text & (... ))

Copyright John Cowan under the GNU GPL79 Multiple element and attribute patterns An element or attribute pattern with multiple names separated by | matches elements bearing any of those names: element h1|h2|h3|h4|h5|h6 { heading.model }

Copyright John Cowan under the GNU GPL80 Element wildcards An element pattern with * instead of a name matches an element with any name To match any name in a particular namespace, use foo:* where foo is the prefix declared for the namespace To match any name except foo and bar, use * - (foo|bar)

Copyright John Cowan under the GNU GPL81 Attribute wildcards Attribute wildcards are declared just like element wildcards An attribute wildcard pattern must be followed by * or + It makes no sense to specify an element with "just one wildcard attribute"

Copyright John Cowan under the GNU GPL82 ANY? There is no built-in ANY content model, corresponding to empty for empty content models or notAllowed for forbidden models Here’s how it can be done: ANY = element * { attribute * {text}* & text & ANY* }

Copyright John Cowan under the GNU GPL83 Multiple schemas external incorporates one pattern document into another include incorporates one set of rules into another, and allows for overriding any of the included rules by name –In particular, overriding the start rule is usually necessary Rules with identical names can also be combined by choice or by interleave

Copyright John Cowan under the GNU GPL84 Context sensitivity The first paragraph cannot have footnotes; the remainder can: start = element doc {first, other*} first = element para { text } other = element para { mixed { element footnote {text}*} }

Copyright John Cowan under the GNU GPL85 Restrictions on schemas The obvious XML ones: no elements or attributes within attributes, only one top- level element, etc. etc. Attributes can't have conflicting definitions in a single element

Copyright John Cowan under the GNU GPL86 Restrictions on schemas Interleave doesn't allow an element with a given name, or text either, to be used in more than one interleaved subpattern The following are illegal: –(a, b, c) & (a, d, e) –(

Copyright John Cowan under the GNU GPL87 A few lexical details Strings (used as values and parameter values) can be wrapped in double quotes or single quotes Multi-line strings are wrapped in """ or ''' (as in Python) Rule names that are the same as syntactic keywords must be preceded by \

Copyright John Cowan under the GNU GPL88 Comments Ordinary comments begin with # Documentation comments begin with ## and are copied (in groups) into the XML syntax as a:documentation elements The a: prefix represents the namespace of the DTD Compatibility extension to RELAX NG

Copyright John Cowan under the GNU GPL89 DATATYPES

Copyright John Cowan under the GNU GPL90 Datatypes A type is a named set of values An datatype provides a standardized, machine-checkable representation of a type

Copyright John Cowan under the GNU GPL91 Schema datatypes DTDs have only a few datatypes for attributes and only one datatype for elements XML Schema provides a long, but fixed, list of datatypes RELAX NG can work with any datatype library, including the XSD (XML Schema Datatypes) library

Copyright John Cowan under the GNU GPL92 RELAX NG datatypes Datatype patterns are written using QNames This use of QNames can't be confused with QNames for elements or attributes, because those are only recognized after the words element and attribute The built-in datatypes string and token don’t have prefixes and are recognized by all implementations

Copyright John Cowan under the GNU GPL93 Declaring datatype libraries A prefix is declared like this: datatypes lib = " (some URI) " Datatype library declarations must come first RELAX NG processors recognize a system-dependent list of datatype library URIs

Copyright John Cowan under the GNU GPL94 Useful datatypes The xsd prefix is predeclared for XML Schema Datatypes xsd:integer represents an integer of arbitrary length There are xsd: equivalents for all the DTD attribute types ( ID, IDREF, etc.) We'll discuss all the xsd types later.

Copyright John Cowan under the GNU GPL95 Typed values "0" and token "0" match a “0” character with possible surrounding whitespace string "0" matches a “0” character exactly xsd:integer "0" matches “0” or “00” or “000” or “-0” or...

Copyright John Cowan under the GNU GPL96 Simple and complex content Element patterns containing a datatype or a value (or a list) specify simple content Element patterns containing child elements or text or both specify complex content No element pattern can contain both –A choice between simple and complex content is legal

Copyright John Cowan under the GNU GPL97 Datatype exceptions xsd:nonNegativeInteger and xsd:nonPositiveInteger are existing types How do we say “non-zero integer”? xsd:integer - xsd:integer "0" We can likewise express a string that is not a name: xsd:string - xsd:Name

Copyright John Cowan under the GNU GPL98 Parameters Parameters restrict the values of datatypes Each datatype has specific parameters are legal with it An integer between 0 and 999 inclusive: xsd:integer { minInclusive = "0" maxInclusive = "999" }

Copyright John Cowan under the GNU GPL99 Lists List patterns specify that simple content is to be divided by whitespace into tokens The pattern list {xsd:integer*} matches a list of zero or more white- space-separated integers This example needs * because list itself does not imply repetition

Copyright John Cowan under the GNU GPL100 Lists The pattern list {(xsd:integer, token)+} matches the string " 32 foo 45 bar 76 baz" Lists within lists are not allowed

Copyright John Cowan under the GNU GPL101 TOOLS

Copyright John Cowan under the GNU GPL102 The Jing validator Written by James Clark, principal author of RELAX NG Java based command-line tool –Validates schemas –Validates documents against schemas Accepts either compact or XML syntax Optionally enforces DTD ID/IDREF

Copyright John Cowan under the GNU GPL103 The Jing validator Also usable as a validation library within a Java program Provides JAXP (Sun-standard) interface Provides native interface Validates against other schema languages: –W3C XML Schema –Schematron –Namespace Routing Language

Copyright John Cowan under the GNU GPL104 The Trang translator Another James Clark product Translates schemas: –Input: XML syntax, compact syntax, DTDs –Output: XML syntax, compact syntax, DTDs, W3C XML Schema Output schemas may be looser than the input schema (accept a superset of what the input accepts) DTD and W3C XML Schema output is imperfect

Copyright John Cowan under the GNU GPL105 Sun RELAX NG translator Translates other schema languages into RELAX NG in XML syntax: –DTDs –RELAX Core/Namespace –TREX (predecessor of RELAX NG) –Subset of XML Schema Does not preserve schema structure

Copyright John Cowan under the GNU GPL106 Instance to schema InstanceToSchema generates a RELAX NG schema from one or more XML instances Examplotron ( is a schema language that resembles an instance with optional annotations and is translated into RELAX NG

Copyright John Cowan under the GNU GPL107 Sun Multi-Schema Validator By Kohsuke KAWAGUCHI, a major RELAX NG contributor Validates documents (command-line or library) using any schema language supported by the Sun RELAX NG Translator Also handles stand-alone or embedded Schematron rules

Copyright John Cowan under the GNU GPL108 Validation in.NET Tenuto is a C# implementation of validation for the Common Language Runtime environment Supports XML syntax, XSD library Does not support ID/IDREF semantics RelaxngValidatingReader is an unrelated implementation of XMLReader that validates input against a RELAX NG schema

Copyright John Cowan under the GNU GPL109 The Bali validator generator Accepts a RELAX NG schema and generates special-purpose code to validate documents against that particular schema The generated validator is faster and smaller than a general-purpose validator Generates Java, C++, or C#

Copyright John Cowan under the GNU GPL110 The VBRELAXNG validator A validator for the XML syntax written in Visual Basic 6.0 Provides validation as an ActiveX control Requires MSXML 4.0 parser Topologi Schematron Validator uses this control

Copyright John Cowan under the GNU GPL111 The RelaxNGCC compiler Accepts a subset of RELAX NG (no ambiguous grammars) with annotations; RelaxMeter tool checks for ambiguities Compiles specified RELAX NG rules into Java classes Embedded Java code can refer to the values matched by datatype, text, and list patterns Analogous to JavaCC

Copyright John Cowan under the GNU GPL112 The RelaxNGCC compiler The generated code requires a source of SAX events (typically a parser) Objects of the generated classes are data bindings for RELAX NG rules considered as types The killer app for RELAX NG?

Copyright John Cowan under the GNU GPL113 The RELAXER compiler Generates Java objects from RELAX NG schemas without embedded code Imposes slightly different restrictions on schemas (weaker if DOM available) Provides serialization from Java as well as parsing to Java Analogous to JAXB The killer app for RELAX NG?

Copyright John Cowan under the GNU GPL114 Editors oXygen and Topologi editors validate against RELAX NG schemas xmloperator and an Emacs mode allow editing guided by a RELAX NG schema Vim syntax coloring for writing schemas in the compact syntax

Copyright John Cowan under the GNU GPL115 The libxml2 library Provides parsing and (RELAX NG) validation services for C programs Packaged with xmllint, an XML parser and RELAX NG validator Part of the GNOME desktop, but available separately

Copyright John Cowan under the GNU GPL116 NAMESPACE-BASED VALIDATION DISPATCHING LANGUAGE

Copyright John Cowan under the GNU GPL117 What it is An NVDL schema specifies how to divide a document into parts and apply multiple subschemas to them Basically it's a map between namespace names and schema URIs Implementations aren't stable yet, but the concepts are important

Copyright John Cowan under the GNU GPL118 How it works The document is carved up into subtrees whose elements have a common namespace name The subschema associated with that name is used to validate the subtree Subschemas can be in any schema language including XML Schema and even DTD

Copyright John Cowan under the GNU GPL119 Concurrent validation Multiple schemas can be applied to a subtree to cause concurrent validation Example: XHTML validation with RELAX NG requires a check that no a element has another a element as its ancestor A small schema that must fail validation can be used to enforce this limtation

Copyright John Cowan under the GNU GPL120 Other features A subordinate namespace can be attached to a superior namespace so that they will be validated together by the superior namespace's schema Attribute-only schemas can be supported using a dummy element name Validation can begin at a specific element specified by a small subset of XPath

Copyright John Cowan under the GNU GPL121 Tools Currently there are only a few validators Jing actually validates Namespace Routing Language, an earlier version of NVDL This situation should improve

Copyright John Cowan under the GNU GPL122 THE SCHEMATRON

Copyright John Cowan under the GNU GPL123 Rule-based validation Grammars can't do it all The Schematron allows you to write rules that test almost anything about a document Schematron implementations display natural-language diagnostics when: –An assertion is violated –The schema specifies a report

Copyright John Cowan under the GNU GPL124 An invoice in XML Reuters Health Information 45 West 36th St. New York NY Reuters Health Information 45 West 36th St. New York NY Net 10 days Binder, D-ring, 1.5" Fork, Plastic, Heavy, Medium

Copyright John Cowan under the GNU GPL125 Schematron example (1) Sold-to and ship-to names must match....

Copyright John Cowan under the GNU GPL126 Patterns and rules A Schematron pattern is a list of rules to apply against each element in the document Rules are applied in the order they appear Only the first successful rule in a pattern is applied Each rule specifies an XSLT pattern in its context attribute Schemas can contain multiple patterns

Copyright John Cowan under the GNU GPL127 Schematron example (2) Partial shipment Complete shipment. items to follow....

Copyright John Cowan under the GNU GPL128 Reports vs. assertions Both reports and assertions specify an Xpath expression to test in a test attribute Reports display a message if the test passes Assertions display a message if the test fails The emph element marks up emphatic parts of the message

Copyright John Cowan under the GNU GPL129 Diagnostics Assertions and reports can use the diagnostic attribute to point to a diagnostic element by ID Diagnostics are placed in the top-level diagnostics element Diagnostics can contain value-of child elements to output computed values

Copyright John Cowan under the GNU GPL130 Internationalization diagnostic elements can use an xml:lang attribute to specify the language of the diagnostic assert and report elements can point to multiple diagnostics in different languages The diagnostic which matches the locale will be used and the rest ignored

Copyright John Cowan under the GNU GPL131 Namespaces The top-level ns element has prefix and uri attributes with the obvious meanings This specifies the mapping used inside the Xpaths; unprefixed names are in no namespace, as usual Direct namespace declarations in the schema may also work but are implementation-dependent

Copyright John Cowan under the GNU GPL132 Phases A single schema can contain multiple phases At run-time a phase is chosen and only the patterns that are active in that phase are considered A pattern may be active in multiple phases A phase element specifies a phase using active child elements that point to pattern elements by ID.

Copyright John Cowan under the GNU GPL133 Other features The include element brings in externally stored schema portions The title and p elements allow embedded documentation Elements and attributes from foreign namespaces are allowed The let element binds a variable for use in XPath (ISO extension)

Copyright John Cowan under the GNU GPL134 Abstract rules and patterns ISO extensions not yet implemented by most Schematron systems Abstract rules are ignored, but ordinary rules can inherit assertions and report elements from them Abstract patterns are ignored, but concrete patterns can inherit rules from

Copyright John Cowan under the GNU GPL135 MORE INFORMATION