XML Validation III Schemas + RELAX NG Robin Burke ECT 360
Outline Types Built-in Named Anonymous Type Derivation Schema Organization Break RELAX NG
Built-in types Part of the schema language Base types 19 fundamental types Examples: string, decimal Derived types 25 more types that use the base types Examples: ID, positiveInteger
Built-in types, cont'd
User-defined types Any use of complexType can be turned into a user-defined type usually called "standalone" Simple types can be derived from the built-in types
Standalone types A type can stand outside of an element definition must have a name Used in element definition
Mixed content Can specify that an element has mixed content
Mixed content, cont'd Schema cannot control where the text appears If this is legal text here thud grunt So is this thud more text grunt still more
Deriving types DTDs do not allow types restrictions beyond enumeration, CDATA, token for attributes PCDATA for content Schemas have built-in types also capability to create your own
Derivation operations list sequence of values union combine two types allowing either restriction placing limits on the legal values
List PN PN PQ Must be separated by spaces probably more useful to do this with document structure partList -> partNo*
Union Allows data of either type to be used Example Database situation null is a possible value
Restriction Most useful Allow design to state exactly what values are legal prices must be non-negative SSN must follow a certain pattern in-stock must yes or no etc.
Restriction, cont'd Restrict a base type according to "facets" Different facets available for different data types
Facets
Example: enumeration
Example: numeric
Example: pattern Regular expressions again derived from perl
Inheritance facet restrictions are inherited new type derivations must honor them but can restrict them further but new derivations can alter other facets For example monetary type fractionDigits facet = 2 loan amount type monetary type + maxValue = car loan amount loan amount type + maxValue = 30000
Fixed Facets Possible to prevent users from changing certain facet in any way fixed="true" in facet declaration similar to "final" keyword in Java Example minInclusive cannot be changed when inherited lower would be illegal anyway the "fixed" attribute means it cannot be altered upward
Complex Types (not discussed in book) Possible to derive from complex types i.e. elements Use complexContent Possibilities extension restriction elements attributes
Complex Type Extension can add elements to existing complex type only at the end
Complex Type Restriction Adding additional attributes Odd syntax entire element definition must be repeated Not much benefit to inheritance validation checks for consistency with supertype
Example grades schema
Schema design Questions to ask what kind of document? narrative data-centric what kind of processing? web page output complex queries
Document modeling Get examples Get style guides / rules For each data element ask how many ask what legal values ask about sub-parts ask about exceptions
Design decisions Attribute vs element Level of granularity Naming Schema structure
Attribute vs element Some specific rules ID must be attribute General principle data vs metadata Element for document content Attribute for information about content Not always easy to tell!
Element Consists of document content Will be shown to a human user Contains substructure Sequence may be important Could be very long Presence depends on other values
Attribute (Opposite of above) Must be from an enumeration of values Also consistency
Level of granularity How detailed to model the data? Very detailed more work to markup more detail in expressing the schema exceptions must be handled Less detailed easier to mark up easier to schematize document contents less accessible
Element content granularity Fine grained model salutation, first name, middle name, last name, appellation Coarse grained model name Tradeoff search / sort / organized document creation
Levels vs recursion Named levels Recursion Tradeoff ability to rearrange transparency of markup
Naming Case convention uppercase is bad lowercase better Multiple words CapCase camelCase Underline_Convention
Structure Nested "russian doll" schema looks like the document small schema only Flat elements defined at global level references used in complex type definitions Type-based "venetian blind" all schema complex in type defintions one global element
Break
RELAX NG XML Schemas are big a lot of the page consists of / repeated element names RELAX NG created as an alternate validation language compact, non-XML syntax also XML syntax
Example element grades { element grade { element student { text }, element assigned-grade { text } }* } Equivalent to
Attributes element grades { element grade { element student { text, attribute id { text } }, element assigned-grade (text) }* attribute assignment { text } }
Types instead of { text } use appropriate built-in data type attribute age { xsd:positiveInteger } facets qualify with name / value pair attribute drinkingAge { xsd:positiveInteger { minInclusive="21" } }
What does this one say? element grade { element student...., { element assigned-grade { text { pattern="([A-D](\+|\-)?|F)" } | ( element assigned-grade { text "I" }, element reason { text } ) }
The point A schema language has two purposes lets the language designer state a design lets the system validate documents against that design Any language that serves this purposes can be used
Validation languages DTD SGML holdover ugly fairly simple to express Schema complete extensible baroque unreadable RELAX NG readable esp. compact syntax more expressive than Schema fewer tools
Next week Presentations