XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy
Outline DTDs and their drawbacks XML Schema Requirements XSDL RELAX Other Schema specifications
Sample XML document Intro to XML Albert Einstein
Equivalent DTD (!element book (title,price,author*)) (!element title #PCDATA) (!element price #PCDATA) (!element author (name, ,phone)) (!element name #PCDATA) (!element #PCDATA) (!element phone #PCDATA)
Drawbacks of DTD Intro to XML Dr. Albert Einstein
Outline DTDs and their drawbacks XML Schema Requirements XSDL RELAX Other Schema specifications
What is a schema ? Model for describing a class of documents Common vocabulary for applications exchanging documents Formally express syntactic, structural and value constraints applicable to instance documents
XML Schema requirements Mechanisms for constraining document structure inheritance embedded documentation application specific constraints primitive data typing allow creation of user-defined datatypes addressing the evolution of schema
Application Scenarios Electronic Commerce transaction processing Traditional document authoring/editing Query formulation and optimization Open and uniform transfer of data between applications, including databases Metadata interchange
Outline DTDs and their drawbacks XML Schema Requirements XSDL RELAX Other Schema specifications
XML Schema Definition Language Enhanced datatypes written in XML separates element tags from types –local namespaces Inheritance : derive new type definitions Identity constraints support for namespaces
Sample XML schema
Sample schema (contd.)
Schema in graphical form book titleprice author* name phoneaddress?
Schema Components Building blocks that comprise the abstract data model of the schema Primary Components –simple type definitions –complex type definitions –attribute declarations –element declarations
Schema Components Secondary components –attribute group definitions –identity constraint definitions –model group definitions –notation declarations Helper components –annotations –model groups –particles –wildcards
Type Definitions Separates tag name from type of elements types can be –simpletypes represent leaf nodes in the graph replace PCDATA in DTDs –complextypes can have elements and attributes in its content
Sample complexType declaration
Simpletype : Pattern Other facets: Enumerate, Range Other simpletypes: Lists, Union
Elements Global elements –can occur as the root of the document –can be included/imported/referenced Local elements –can occur only in the specific context –sibling elements need to have same content model (!element book (author*, title, author*))
Sample schema
Element Content Complextypes from simple types 23 Mixed content amount in US-dollars is 23 only Empty content
Building content models (!element author ((name | (title,firstname,lastname)), ,phone)) Einstein Dr. Albert Albert Einstein
Building content models...
Content models Can represent any content model expressible with XML 1.0 DTD and more !! Does not allow non-determinism –( ( ,name) | ( ,expandedname)) is illegal –should be ( , (name | expandedname)) Does not allow ambiguity –( author*, contactauthor*, author* ) not allowed author* can be derived in multiple ways
Deriving new types Two ways of deriving new types from existing types By extension –similar to inheritance in programming languages By restriction –declarations more limited than base type
Deriving by Extension
Declare Base Type
Derive By Extension
Using Derived Types 1210, W.Dayton Street Madison WI , W.Dayton Street Madison
Deriving By Restriction
Identity Constraints Can specify integrity constraints –uniqueness, key, keyref constraints can be locally scoped can be applied on attributes, elements or their contents –XML ID is an attribute can create keys/keyrefs from a combination of element and attribute content
Sample constraint
Other features Importing schema components –Type libraries Redefining Types & Groups Namespaces –Targetnamespaces allow undeclared value : support for namespace unaware documents
Other features Any element –allows well-formed XML to appear –can be restricted to a set of namespaces Any attribute anyType –base type for all complexTypes –does not constrain content in any way –default type when none is specified
Main drawback of XSDL An element declaration (call it D) together with a blocking constraint (a subset of {substitution, extension,restriction}, the value of a {disallowed substitutions}) is validly substitutable for another element declaration (call it C) if 1.1 the blocking constraint does not contain substitution; 1.2 There is a chain of {substitution group affiliation}s from D to C, that is, either D's {substitution group affiliation} is C, or D's {substitution group affiliation}'s {substitution group affiliation} is C, or...; 1.3 The set of all {derivation method}s involved in the derivation of D's {type definition} from C's {type definition} does not intersect with the union of the blocking constraint, C's {prohibited substitutions} and the {prohibited substitutions} of any intermediate {type definition}s in the derivation of D's {type definition} from C's {type definition}.
Main drawback of XSDL for a sequence, maximum is unbounded if the {max occurs} of any wildcard or element declaration particle in the group's {particles} or the maximum part of the effective total range of any of the group particles in the group's {particles} is unbounded, or if any of those is non-zero and the {max occurs} of the particle itself is unbounded, otherwise the product of the particle's {max occurs} and the sum of the {max occurs} of every wildcard or element declaration particle in the group's {particles} and the maximum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles})
Outline DTDs and their drawbacks XML Schema Requirements XSDL RELAX Other Schema specifications
RELAX Developed by Makoto Murata & others in Japan based on the hedge automaton theory borrows rich datatypes from XML Schema Part2 Submitted to ISO fast-track ease of translation from/to DTDs
Main features of RELAX Separates element tagname and type –context sensitive content models allows content models similar to XML schema allows definition of element and attribute groups annotations include mechanism for large schemas
Features absent in RELAX Support for namespaces –coming shortly?? Identity constraints Inheritance New datatypes
XSDL vs. RELAX Allows sibling elements to have different types –allow the content model (author, title, author) where the two author elements can have different content models –introduces ambiguity For content model (title, author*, author*) ”XYZ” is ambiguous
XSDL vs. RELAX A single type can have multiple definitions –actual definition which matches instance element found by exhaustive search –atleast one match needs to be found nametype can be defined as name or expandedname –it is a choice of the two definitions
Extending existing types XSDL uses inheritance –can change (title, author*) to (title, author*, contactauthor) In RELAX, add the new type definition completely –can change (title, author*) to (title, contactauthor, author*) also
Using attribute values 10 ten content model of price element switched based on attribute value of type attribute
XSDL vs. RELAX RELAX –membership checking in linear time in SAX model XSDL –type assignment in linear time in SAX/DOM models ignoring integrity constraints
Other Schema proposals XDR (XML-Data Reduced) –Microsoft’s Biztalk framework SOX (Schema for Object-oriented XML) –Commerce One DSD –AT&T and BRICS Schematron
References Comparative Analysis of SIX XML Schema Languages, Sigmod Record, Sept Reasoning about XML Schema Languages using Formal Language Theory, WWW submission