A Tool Kit for Implementing XML Schema Naming and Design Rules OASIS Symposium: The Meaning of Interoperability May 9, 2006 Josh Lubell, National Institute of Standards and Technology Manufacturing Systems Integration Division
XML Exchange Schemas are Bridges
But Bridges Must Be Designed Properly
A Solution: Naming and Design Rules n Encode XML schema best practices n Enforce a particular modeling methodology n Ensure common naming conventions l Use of camel case l Allowable acronyms l … n But NDRs can be difficult to apply
Barriers to NDR Usefulness n Proliferation l How do I decide which NDR set to adopt? l Should I develop my own NDR? n Lack of structure l NDR documents usually in proprietary word processor formats l Inhibits rule reuse l Limited versioning and traceability n Ambiguity l Rules written in English rather than computer- interpretable language l NDR enforcement not automatic
Schematron as an NDR Implementation Method n Advantages l XML-native (based on XPath) l Rule-based l Can test for co-occurrence constraints l User-configurable diagnostic messages l ISO standard n Disadvantage l Less versatile than a general purpose programming language
Example from Universal Business Language NDR [ELD1] Each UBL:DocumentSchema MUST identify one and only one global element declaration that defines the document ccts:AggregateBusinessInformationEntity being conveyed in the Schema expression. That global element MUST include an xsd:annotation child element which MUST further contain an xsd:documentation child element that declares “This element MUST be conveyed as the root element in any instance document based on this Schema expression.”
Implementation Observations [ELD1] Each UBL:DocumentSchema MUST identify one and only one global element declaration that defines the document ccts:AggregateBusinessInformationEntity being conveyed in the Schema expression. That global element MUST include an xsd:annotation child element which MUST further contain an xsd:documentation child element that declares “This element MUST be conveyed as the root element in any instance document based on this Schema expression.” Rule label Namespace dependence Subrule 1 Context 1 Subrule 2 Context 2
UBL Lessons Learned n Implementation non-trivial even for a seemingly simple rule n Some rules require a general purpose programming language for implementation l [GNR1] UBL XML element, attribute and type names MUST be in the English language, using the primary English spellings provided in the Oxford English Dictionary. l [GNR7] UBL XML element, attribute and type names MUST be in singular form unless the concept itself is plural. n Some rules cannot be implemented at all l [NMS6] UBL published namespaces MUST never be changed. l [VER10] UBL Schema and schema module minor version changes MUST not break semantic compatibility with prior versions. n MUST versus SHOULD versus MAY l More on MAY later…
Dept. of Navy (DON) NDR Case Study n 128 rules n Based on UBL NDR n Why choose the DON NDR? l Help developers write better schemas for Federal government applications l Gain insight into best practices for NDR development (particularly reuse of existing NDRs) l Publicly available l A Navy standard
DON NDR Testability (using Schematron)
Issue: Use of MAY n A rule saying that something MAY occur, strictly speaking, will always pass l But this may not be the rule creator’s intent n Example: [CTD8] Code and ID ccts:BBIE Property complex types MAY use the xsd:choice element to reference global elements defined in standardized ID Scheme or Code List Schema modules. n Approaches l Consider rule as guidance only (don’t implement) l Interpret MAY as discouragement, e.g. “warning: referencing global element using xsd:choice”
Issue: Requirement for External Resources [GNR1] UBL XML element, attribute and type names MUST be in the English language, using the primary English spellings provided in the Oxford English Dictionary. n Implementation requires access to electronic OED n And the DON adaptation of this rule has additional requirements: [GNR1] XML element, attribute, and type names MUST be in the English language, using the Oxford English Dictionary for Writers and Editors (Latest Ed.). Where both American and English spellings of the same word are provided, the American spelling MUST be used. l Electronic OED must be fully up to date
n Illustrated by UBL rule GNR1 versus DON rule GNR1 n DON rule same as UBL rule, but with added contraints l American spelling favored l Latest OED edition required n But no explicit relationship specified in DON NDR! n Both rules have same ID, even though they are different rules n Improved traceability and reusability would reduce the confusion Issue: Rule Proliferation
Issue: Ambiguous Terminology n More rigor needed in NDR definitions n Example: “xsd:SchemaExpression” l Not defined in W3C XML Schema recommendation l Used but not defined in DON NDR l Defined in UBL NDR to mean “a concept”
Issue: Mixed Content n Essential for representing semi-structured data n But allowing it makes the NDR more complicated n UBL NDR forbids mixed content n DON NDR allows it, but only if defined by a namespace from a Navy-approved standard (e.g. XHTML) l But XHTML element and attribute names violate rule GNR1!
Quality of Design (QoD) Tool Contains rules based on naming and design guidelines (NDRs) from a number of sources Stores executable test cases written in Schematron and Java Expert System Shell (Jess) Executes tests against user-provided schemas and reports results Rules grouped into test profiles
Why QoD? n Addresses proliferation of NDRs l Overlapping NDR standards l Supports reusability of rules n Highlights ambiguous rules n Provides an explicit structure for rules in NDRs n Automates rule enforcement n Enables versioning and traceability of rules
Candidate NDRs n OASIS Universal Business Language (UBL) n US Department of the Navy (DON) n Korean Institute for Electronic Commerce n Open Applications Group (OAGIS) n US Air Force n US Federal CIO Council XML Working Group n ASC X12 (CICA) n FIATECH (capital facilities industry)
Architecture of QoD Web Application
Characteristics of Rules n Coverage: full, partial, none n Applicability: indicates type of schema (document, low, or aggregate) the rule applies to n Rationale: reason for rule from a list of justifications n Requirement: text from the NDR document n Implementation File: URI of the file containing the implementation of the rule
Example XML Description of a Rule using QoD Exchange Schema OASIS Universal Business Language (UBL) Naming and Design Rules Element Declaration Rules full D structural clarity Each UBL:DocumentSchema MUST identify one and......
QoD Test Profile Exchange
Application to Developing XML Schemas n Currently a limited set of rules are implemented n Recently implemented subset of DON NDR in Schematron n Tested with a small but varied set of sample schemas l Navy – IETM Schema Q70:IETM (Interactive Electronic Technical Manual) l Grants.gov l AEX (building and construction industry) l US Dept. of Defense n Provided meaningful results to schema developers
Examples of types of warnings found in developing XML Schemas n Global elements declared in non-desirable places n Anonymous/local types defined in non- desirable places n “Global” schemas that do not declare a default namespace n Document/Transaction level schemas that define multiple global elements n Re-declaration of elements and types (e.g. programType) in different namespaces
Lesson Learned in coding NDRs n NDR documents need to be regarded as rigorous technical documentation l More review needed l Better authoring tools needed n Rules that cannot be implemented are non- enforceable n Definition of NDRs is non-trivial l Many rules cannot be tested l Many rules are more difficult to implement than thought l Difficult to reuse rules due to namespace definitions l Often rules are ambiguous or unclear n Implementation of rules is non-trivial l Testing of rules is complex l All boundary conditions need to be thought of and covered n Legacy data and 3 rd party schemas need to be addressed in NDRs
What’s Next n Continue to expand our NDR rule-base n Continue to enhance software based on user requirements n Produce a tool kit for NDR developers l Enhance QoD schema to represent entire NDR document l Provide authoring templates n Identify collaborators for future work l If interested, contact me!
Summary n A process for XML schema development is necessary n Tools can automate the process, thereby reducing labor and deployment time n Definition and implementation of NDRs is non-trivial but necessary to support reuse of schemas n Enforcing NDRs will ultimately make XML schemas more interoperable
For More Information n Lubell, et al., Implementing XML Schema Naming and Design Rules, submitted to Extreme Markup Languages 2006 n QoD information page: n QoD SourceForge project: