RELAX NG 19-Feb-19.

Slides:



Advertisements
Similar presentations
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
Advertisements

ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
4 XML Schema.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
XML 6.5 XML Schema (XSD) 6. What is XML Schema? The origin of schema  XML Schema documents are used to define and validate the content and structure.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
1 XML DTD & XML Schema Monica Farrow G30
Document Type Definitions
13-Jun-15 RELAX NG. 2 Caveat I did not have a RELAX NG validator when I wrote these slides. Therefore, if an example appears to be wrong, it probably.
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
15-Jun-15 RELAX NG. 2 What is RELAX NG? RELAX NG is a schema language for XML It is an alternative to DTDs and XML Schemas It is based on earlier schema.
XML Schema Definition Language
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
RELAX NG. Caveat I did not have a RELAX NG validator when I wrote these slides. Therefore, if an example appears to be wrong, it probably is.
Document Content Description for XML, Version 1.0 By Tim Bray, Charles Frankston and Ashok Malhotra EECS 684 Presentation by Calvin Ang.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
17 Apr 2002 XML Schema Andy Clark. What is it? A grammar definition language – Like DTDs but better Uses XML syntax – Defined by W3C Primary features.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
XML Schema Vinod Kumar Kayartaya. What is XML Schema?  XML Schema is an XML based alternative to DTD  An XML schema describes the structure of an XML.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Li Tak Sing COMPS311F. XML Schemas XML Schema is a more powerful alternative to DTD to describe XML document structures. The XML Schema language is also.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
An OO schema language for XML SOX W3C Note 30 July 1999.
Schemas 1www.tech.findforinfo.com. What is a Schema a schematic or preliminary plan Description of a structure, details... 2www.tech.findforinfo.com.
An Introduction to XML Sandeep Bhattaram
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
CSE3201 Information Retrieval Systems DTD Document Type Definition.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Copyright 2004 John Cowan 1 Infinite Diversity in Infinite Combinations why one schema language is not enough John Cowan.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
CITA 330 Section 4 XML Schema. XML Schema (XSD) An alternative industry standard for defining XML dialects More expressive than DTD Using XML syntax Promoting.
Session III Chapter 10 – Defining Simple Types
eXtensible Markup Language
XML Schema.
Infinite Diversity in Infinite Combinations
XML QUESTIONS AND ANSWERS
RELAX NG 2-Aug-18.
RELAX NG 18-Sep-18.
Design and Implementation of Software for the Web
THE DATATYPES OF XML SCHEMA A Practical Introduction
CMP 051 XML Introduction Session IV Chapter 10 – Defining Simple Types
DTD and XML Schema.
ece 720 intelligent web: ontology and beyond
New Perspectives on XML
Introduction to DTDs.
Extensible Markup Language
New Perspectives on XML
Presentation transcript:

RELAX NG 19-Feb-19

What is RELAX NG? RELAX NG is a schema language for XML It is an alternative to DTDs and XML Schemas It is based on earlier schema languages, RELAX and TREX It is not a W3C standard, but is an OASIS standard OASIS is the Organization for the Advancement of Structured Information Standards ebXML (Enterprise Business XML) is a joint effort of OASIS and UN/CEFACT (United Nations Centre for Trade Facilitation and Electronic Business) OASIS developed the highly popular DocBook DTD for describing books, articles, and technical documents RELAX NG has recently been adopted as an ISO/IEC standard

Design goals Simple and easy to learn Uses XML syntax But there is also a “concise” (non-XML) syntax Does not change the information set of an XML document (I’m not sure what this means) Supports XML namespaces Treats attributes uniformly with elements so far as possible Has unrestricted support for unordered content Has unrestricted support for mixed content Has a solid theoretical basis Can make use of a separate datatyping language (such W3C XML Schema Datatypes)

RELAX NG tools Assorted validators and translators are now available for RELAX NG See http://relaxng.org

Basic structure A RELAX NG specification is written in XML, so it obeys all XML rules The RELAX NG specification has one root element The document it describes also has one root element The root element of the specification is element If the root element of your document is book, then the RELAX NG specifications begins: <element name="book" xmlns="http://relaxng.org/ns/structure/1.0"> and ends: </element>

Data elements RELAX NG makes a clear separation between: the structure of a document (which it describes) the datatypes used in the document (which it gets from somewhere else, such as from XML Schemas) For starters, we will use the two (XML-defined) elements: <text> ... </text> (usually written <text/>) Plain character data, not containing other elements <empty></empty> (usually written <empty/>) Does not contain anything Other datatypes, such as <double>...</double> are not defined in RELAX NG To inherit datatypes from XML Schemas, use: datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" as an attribute of the root element

Data types from XML Schemas Here are some of the predefined numeric types: xs:decimal xs:positiveInteger xs:byte xs:negativeInteger xs:short xs:nonPositiveInteger xs:int xs:nonNegativeInteger xs:long

Predefined date and time types xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05 xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds) xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss The T is part of the syntax Allowable restrictions on dates and times: enumeration, minInclusive, minExclusive, maxInclusive, maxExclusive, pattern, whiteSpace

Defining tags To define a tag (and specify its content), use <element name="myElement"> <!-- Content goes here --> </element> Example: The DTD <!ELEMENT name (firstName, lastName)> <!ELEMENT firstName (#PCDATA)> <!ELEMENT lastName (#PCDATA)> Translates to: <element name="name"> <element name="firstName"> <text/> </element> <element name="lastName"> <text/> </element> </element> Note: As in the DTD, the components must occur in order

RELAX NG describes patterns Your RELAX NG document specifies a pattern that matches your valid XML documents For example, the pattern: <element name="name"> <element name="firstName"> <text/> </element> <element name="lastName"> <text/> </element> </element> Will match the XML: <name> <firstName>David</firstName> <lastName>Matuszek</lastName> </name>

Easy tags <zeroOrMore> ... </zeroOrMore> The enclosed content occurs zero or more times <oneOrMore> ... </oneOrMore> The enclosed content occurs one or more times <optional> ... </optional> The enclosed content occurs once or not at all <choice> ... </choice> Any one of the enclosed elements may occur <!-- This is an XML comment; it is not a container, and it may not contain two consecutive hyphens -->

Example <element name="addressList"> <zeroOrMore> <element name="name"> <element name="firstName"> <text/> </element> <element name="lastName"> <text/> </element> </element> <element name="address"> <choice> <element name="email> <text/> </element> <element name="USPost"> <text/> </element> </choice> </element> </zeroOrMore> </element>

Enumerations The <value>...</value> pattern matches a specified value Example: <element name="gender"> <choice> <value>male</value> <value>female</value> </choice> </element> The contents of <value> are subject to whitespace normalization: Leading and trailing whitespace is removed Internal sequences of whitespace characters are collapsed to a single blank

More about data To inherit datatypes from XML Schemas, add this attribute to the root element: datatypeLibrary = "http://www.w3.org/2001/XMLSchema-datatypes" You can access the inherited types with the <data> tag, for instance, <data type="double> The <data> pattern must match the entire content of the enclosing tag, not just part of it <element name="illegalUse"> <!-- Don't do this! --> <data type="double"/> <element name="moreStuff"> <text/> </element> </element> If you don't specify a datatype library, RELAX NG defines the following for you (along with <text/> and <empty/>): <string/> : No whitespace normalization is done <token/> : A sequence of characters containing no whitespace

<group> <group>...</group> is used as “fat parentheses” Example: <choice> <element name="name"> <text/> <element> <group> <element name="firstName"> <text/> </element> <element name="lastName"> <text/> </element> </group> </choice> choice #1 choice #2

Attributes Attributes are defined practically the same way as elements: <attribute name="attributeName">...</attribute> Example: <element name="name"> <attribute name="title"> <text/> </attribute> <element name="firstName"> <text/> </element> <element name="lastName"> <text/> </element> </element> Matches: <name title="Dr."> <firstName>David</firstName> <lastName>Matuszek</lastName> </name>

More about attributes With attributes, as with elements, you can use <optional>, <choice>, and <group> It doesn’t make sense to use <oneOrMore> or <zeroOrMore> with attributes In keeping with the usual XML rules, The order in which you list elements is significant The order in which you list attributes is not significant

Still more about attributes <attribute name="attributeName"> <text/> </attribute> can be (and usually is) abbreviated as <attribute name="attributeName"/> However, <element name="elementName"> <text/> </element> can not be abbreviated as <element name="elementName"/> If an element has no attributes and no content, you must use <empty/> explicitly

<list> <list> pattern </list> matches a whitespace-separated list of tokens, and applies the pattern to those tokens Example: <!-- A floating-point number and some integers --> <element name="vector"> <list> <data type="float"/> <oneOrMore> <data type="int"/> </oneOrMore> </list> </element>

<interleave> <interleave> ... </interleave> allows the contained elements to occur in any order <interleave> is more sophisticated than you might expect If a contained element can occur more than once, the various instances do not need to occur together

Interleave example <element name="contactInformation"> <interleave> <zeroOrMore> <element name="phone"> <text/> </element> </zeroOrMore> <oneOrMore> <element name="email"> <text/> </element> </oneOrMore> </interleave> </element> <contactInformation> <email>dave@acm.org</email> <phone>215-898-8122</phone> <email>davebiz@matuszek.org</email> </contactInformation>

<mixed> <mixed> allows mixed content, that is, both text and patterns If pattern is a RELAX NG pattern, then <mixed> pattern </mixed> is shorthand for <interleave> <text/> pattern </interleave>

Example of <mixed> Without this we get one bold or one italic Pattern: <element name="words"> <mixed> <zeroOrMore> <choice> <element name="bold"> <text/> </element> <element name="italic"> <text/> </element> </choice> </zeroOrMore> </mixed> </element> Matches: <words>This is <italic>not</italic> a <bold>great</bold> example, <italic>but</italic> it should suffice.</words>

The need for named patterns So far, we have defined elements exactly at the point that they can be used There is no equivalent of: <!ELEMENT person (name)> <!ELEMENT name (firstName, lastName)> ...use person several places in the DTD... With the RELAX NG we have discussed so far, each time we want to include a person, we would need to explicitly define both person and name at that point: <element name="person"> <element name="firstName"> <text/> </element> <element name="lastName"> <text/> </element> </element> The <grammar> element solves this problem

Syntax of <grammar> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> ...usual RELAX NG elements, which may include: <ref name="DefinedName"/> </start> <!-- One or more of the following: --> <define name="DefinedName"> ...usual RELAX NG elements, attributes, groups, etc. </define> </grammar>

Use of <grammar> To write a <grammar>, Make <grammar> the root element of your specification Hence it should say xmlns="http://relaxng.org/ns/structure/1.0" Use, as the <start> element, a pattern that matches the entire (valid) XML document In each <define> element, write a pattern that you want to use other places in the specification Wherever you want to use a defined element, put <ref name="NameOfDefinedElement"> Note that defined elements may be used in definitions, not just in the <start> element Definitions may even be recursive, but Recursive references must be in an element, not an attribute

Long example of <grammar> <!ELEMENT name (firstName, lastName)> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <ref name="Name"/> </start> <define name="Name"> <element name="name"> <element name="firstName"> <text/> </element> <element name="lastName"> <ref name="LastName"> </element> </element> </define> <define name="LastName"> <element name="lastName"> <text/> </element> </define> </grammar> XML is case sensitive-- Note that defined terms are capitalized differently

Common usage I A typical way to use RELAX NG is to use a <grammar> with just the root element in <start> and every element described by a <define> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <ref name="NOVEL"> </start> <define name="NOVEL"> <element name="novel"> <ref name="TITLE"/> <ref name="AUTHOR"/> <oneOrMore> <ref name="CHAPTER"/> </oneOrMore> </element> </define> ...more...

Common usage II <define name="TITLE"> <element name="title"> <text/> </element> </define> <define name="AUTHOR"> <element name="author"> <text/> </element> </define> <define name="CHAPTER"> <element name="chapter"> <oneOrMore> <ref name="PARAGRAPH"/> </oneOrMore> </element> </define> <define name="PARAGRAPH"> <element name="paragraph"> <text/> </element> </define> </grammar>

Replacing DTDs With <grammar> and multiple <define>s, we can do essentially the same things as a DTD Advantages: RELAX NG is more expressive than a DTD; we can interleave elements, specify data types, allow specific data values, use namespaces, and control the mixing of data and patterns RELAX NG is written in XML RELAX NG is relatively easy to understand Disadvantages RELAX NG is extremely verbose But there is a “compact syntax” that is much shorter RELAX NG is not (yet) nearly as well known Hence there are fewer tools to work with it This situation seems to be changing

The End So by this maxim be impressed, USE THE TOOLS THAT WORK THE BEST. Do not yield your sovereign judgment, To any sort of political fudgement. The criterion of sound design Should be, must be, your guideline. And if you're designing documents, Try RNG. We charge no rents. -- John Cowan