XML Validation II Schemas Robin Burke ECT 360
Outline Namespaces Documents Data types XML Schemas Elements Attributes Derived data types RELAX NG
XML so far Languages defined by DTDs names assigned by designers OK for standalone systems Doesn't have The ability to handle naming conflicts The ability to partition work among different developers
Namespaces A way to identify a set of labels element / attribute names attribute values As belonging to a particular application
Example recordings title artist group | artist-name+ date label artworks title artist date exhibit books title author date publisher
Problem Want to create a list of items related to 50s Beat-era culture includes music, art, literature Could create a new DTD better to reuse existing ones
Namespace idea Associate a short prefix with an application Schema or DTD Use the prefix with a colon to "qualify" names music:artist art:artist book:author
Namespace idea, cont'd A namespace is an association between a set of names a unique identifier (URI) a prefix used to identify them
Namespace declaration Standalone Part of element in this case, no prefix
Namespace URI Not a URL there is no resource at the given location just a unique identifier URL-like identifiers are good associated with an organization must be unique on the Internet
Example DTDs Document Problem how to import the namespaces?
Solution Fully-qualified names everywhere yuk! DTDs & namespaces don't work well together
XML so far Languages defined by DTDs contain text elements string attributes OK for text documents Not enough for Databases Business process integration
Other DTD problems Not XML different syntax different processor No support for namespaces
Solution Write language definition in XML Allow more control over document contents XML document becomes a complex data type XML language definition becomes complex data type specification
XML Schema Always a separate document no internal option Written in XML very verbose Can be complex
Schemas and namespaces A schema uses elements from one application the XML Schema language to define another Namespaces are necessary Namespaces apply to elements not values Namespace of element assumed to apply to attributes can have attributes from different namespaces
Example 1, XML Jane Doe A John Doe B
Example 1, DTD <!ATTLIST grades assignment CDATA #IMPLIED> <!ATTLIST student id CDATA #REQUIRED>
Data types grades a collection of items of type grade can never have more than 40 students grade a structure containing a student and an assigned grade student a structure containing an id and some text probably should constrain the student id assigned-grade is text probably should constrain to A-D,F,I
Built-in types Part of the schema language Base types 19 fundamental types Examples: string, decimal Derived types 25 more types that use the base types Examples: ID, positiveInteger
Built-in types, cont'd
To declare an element <xs:element name="assigned-grade" type="string"> Equivalent to
Simple data type A renaming of an existing data type <xs:element name="assigned-grade" type="xs:string"> Or a restriction of a existing type strings beginning with "D" more on this later
Complex datatype compositor element declarations attribute declarations
Compositor sequence choice all
Sequence compositor like "," in DTD DTD Schema
Elements in sequences Can specify optional / # of occurrences ? * + What about...
Choice compositor like "|" in DTD DTD Schema
All compositor no simple DTD equivalent DTD Schema
Nesting Compositors can be combined DTD Schema
Example
Local naming Suppose we want to reuse an element name different place in the structure Example not a legal DTD schema?
Using namespaces Schema must say to use schema namespace what namespace it is defining targetNamespace Document must say that it is using the Schema Instance namespace what namespace(s) it is using what prefix(es) are used where to find the relevant schemas
Multi-schema documents Possible to validate multi-schema documents Must use any element to import namespace can't restrict to certain elements
Attributes DTD attribute types CDATA, enumeration, token Schema can be any of the basic or derived types can also be user-defined types Declaration <xs:attribute name="x" type="xs:string" use="required" default="abc" />
Attribute declaration Part of complex type follows compositor (one exception) Declaration What if the attribute is a more complex type itself? we'll get to that
Example grades element? add homework attribute
Exception: simple content If an element has "simple content" no compositor used instead simpleContent element and extension to declare type of the content
Example <!ATTLIST student id CDATA #REQUIRED > <xs:attribute name="id" type="xs:string" use="required"/>
How to read this student is a complex type it is not simply a renaming of an existing type its content is simple being of only one type string but with an attribute id of type string which is required
Standalone types A type can stand outside of an element definition must have a name Used in element definition
Deriving types DTDs do not allow types restrictions beyond enumeration, CDATA, token for attributes PCDATA for content Schemas have built-in types also capability to create your own
Derivation operations list sequence of values union combine two types allowing either restriction placing limits on the legal values
List PN PN PQ Must be separated by spaces probably more useful to do this with document structure partList -> partNo*
Union Allows data of either type to be used Example Bogus!
Restriction Most useful Allow design to state exactly what values are legal prices must be non-negative SSN must follow a certain pattern in-stock must yes or no etc.
Restriction, cont'd Restrict a base type according to "facets" Different facets available for different data types
Facets
Example: enumeration
Example: numeric
Example: pattern Regular expressions again derived from perl
Extended example Complete schema for grades
RELAX NG XML Schemas are big a lot of the page consists of / repeated element names RELAX NG created as an alternate validation language compact, non-XML syntax also XML syntax
Example element grades { element grade { element student { text }, element assigned-grade { text } }* } Equivalent to
Attributes element grades { element grade { element student { text, attribute id { text } }, element assigned-grade (text) }* attribute assignment { text } }
Types instead of { text } use appropriate built-in data type attribute age { xsd:positiveInteger } facets qualify with name / value pair attribute drinkingAge { xsd:positiveInteger { minInclusive="21" } }
What does this one say? element grade { element student...., { element assigned-grade { text { pattern="([A-D](\+|\-)?|F)" } | ( element assigned-grade { text "I" }, element reason { text } ) }
The point A schema language has two purposes lets the language designer state a design lets the system validate documents against that design Any language that serves this purposes can be used
Validation languages DTD SGML holdover ugly a dead-end fairly simple to express Schema complete extensible baroque unreadable RELAX NG readable esp. compact syntax more expressive than Schema fewer tools
Homework #3 Create a schema for "Grills.xml" Generate a schema for your "books.xml" file using XML Spy's "generate" feature edit generated schema
Next week CSS SVG an XML application for generating graphics online reading