THE DATATYPES OF XML SCHEMA A Practical Introduction John Cowan Reuters Health Information
Copyright John Cowan 2001, 2002; licensed under GNU GPL Licensed under the GNU General Public License ABSOLUTELY NO WARRANTIES; USE AT YOUR OWN RISK Black and white for readability 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Abstract This is a brief description, useful for RELAX NG and XML Schema users, of the simple datatypes of XML Schema and their associated facets. A brief summary of XML Schema regular expression language is also given. 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Roadmap Types (11 slides) Facets (7 slides) Regular expression language (6 slides) 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL XML Schema Datatypes A type is a named set of values An XML Schema datatype provides a standardized, machine-checkable representation of a type XML Schema types can be grouped: numeric, date, boolean, string, misc. 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Numeric Types Decimal types Floating-point types 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Decimal Types decimal integer nonPositiveInteger negativeInteger nonNegativeInteger positiveInteger unsigned{Long, Int, Short, Byte} long, int, short, byte 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Decimal Types long, short, int, and byte are the same as in Java: 64, 32, 16, 8 bits unsignedLong, unsignedShort, unsignedInt, and unsignedByte are the obvious unsigned analogues All other numeric types are unbounded 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Floating-point Types Only two floating-point types float double IEEE ranges (same as Java, all modern hardware) 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Date Types duration date, time, dateTime gYear, gMonth, gDay, gYearMonth, gMonthDay 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Date Types Duration duration Single Time Interval dateTime, date, gYear, gYearMonth Recurring Time Interval time, gMonth, gDay, gMonthDay 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Date Type Examples 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Boolean Type Only two values are legal: true (which can also be written 1) false (which can also be written 0) 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL String Types string normalizedString token language NMTOKEN(S) Name NCName o ID, IDREF(S), ENTITY(IES) 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Miscellaneous Types Raw octet types hexBinary base64Binary anyURI QName NOTATION 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Facets Allow the creation of new datatypes by restricting the existing ones in one or more ways Called params in RELAX NG Facets can be grouped into families applicable to datatype families: length, value, pattern enumeration, whiteSpace 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Length Facets Applicable to string and miscellaneous types length facet gives exact length minLength and maxLength facets set limits; either or both may be used lengths of hexBinary and base64Binary types are measured in octets, not characters 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Value Facets Applicable to numeric and date types minExclusive and minInclusive specify a lower bound; either but not both may be used maxExclusive and maxInclusive specify an upper bound; either but not both may be used 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Value Facets totalDigits specifies the total number of significant digits in a decimal, integer, (non)PositiveInteger, or (non)NegativeInteger value fractionDigits specifies the number of fractional digits in a decimal value 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Pattern Facet Applicable to any type Specifies a regular expression that the data must match XML Schema: If multiple pattern facets are present, the data must match at least one of them RELAX NG: If multiple pattern facets are present, the data must match all of them 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Enumeration Facet XML Schema only Applicable to any type The instances of the enumeration facet specify individual values The data must be equal (according to the rules for the type) to one of the specified values 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL whiteSpace Facet XML Schema only Applicable to string types Legal values are: preserve: leave white space alone replace: tabs and newlines become spaces collapse: replace, then remove leading, trailing, and multiple spaces 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
XSD Regular Expressions A subset of Perl regular expressions Supported constructs: choice quantifiers character classes parentheses for grouping All matches are anchored to both ends of the data (so no ^ or $ needed) 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Choice abc|def matches either abc or def Use parentheses to specify the scope of a choice example: abc(d|e) matches either abcd or abce 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Quantifiers (abc){2,4} matches abcabc or or abcabcabc or abcabcabcabc (abc){2,} matches 2 or more consecutive abc sequences (abc)* matches 0 or more sequences (abc)+ matches 1 or more sequences (abc)? matches 0 or 1 sequences 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Character Classes Character classes always match exactly one character, no matter how complex they look [abc] matches a or b or c [^abc] matches anything but a or b or c [a-z] matches any character between a and z inclusive 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Single-Letter Classes \n, \r, \t - newline, return, tab . - anything but newline or return \s, \S - whitespace, non-whitespace \i, \I - name initial, non-name initial \c, \C - name char, non-name char \d, \D - decimal digit, non-decimal digit \w, \W - word char, non-word char 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL
Copyright John Cowan 2001, 2002; licensed under GNU GPL Unicode Classes \p{Xx}, \P{Xx} - matches anything in (not in) a Unicode General Category example: \p{Ll} matches lower case \p{IsXxxxxx}, \P{IsXxxxx} - matches anything in (not in) a Unicode block example: \P{IsCyrillic} matches any non-Cyrillic character 11/29/2018 Copyright John Cowan 2001, 2002; licensed under GNU GPL