Download presentation
Presentation is loading. Please wait.
Published byAgnes Hampton Modified over 9 years ago
1
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach, 2006, pp.113-159) David Meredith d.meredith@gold.ac.uk www.titanmusic.com/teaching/cis336-2006-7.html
2
2 Problems with DTDs DTDs cannot constrain character data –e.g., cannot specify that (#PCDATA) must only be a valid integer representation –need more powerful datatype mechanism Attribute types are too limited –e.g., cannot specify that an attribute value must be an integer, a URI etc. Element and attribute definitions cannot depend on context –e.g., cannot specify that unit attribute only allowed if amount attribute is present Character data cannot be combined with regular expression content model –i.e., mixed content always has form (#PCDATA | e1 | e2)* cannot specify order in which character data may be interspersed with elements Element content model lacks "interleaving" operator that allows us to specify that an element may occur anywhere inside an element –e.g., cannot (easily) specify that comment element may occur anywhere in contents of recipe element
3
3 More problems with DTDs DTD provides very limited support for modularity, reuse and evolution of schemas –hard to write, maintain and read large DTD schemas ID/IDREF mechanism is too limited –sometimes want to specify a more restricted scope for an ID attribute than the whole instance document –also might want to use multiple attribute values or character data as keys rather than just single attribute value DTDs do not support namespaces
4
4 XML Schema DTDs defined as part of the XML 1.0 specification (February 1998) –inherited from SGML Shortly afterwards, W3C initiated XML Schema project to deal with problems in DTDs XML Schema Requirements (1999) specifies that XML Schema should be: –more expressive than XML DTD –a well-formed XML language –self-describing i.e., it should be possible to describe the syntax of XML Schema using an XML Schema (since XML Schema is an XML language) –simple enough to implement with modest design and runtime resources (which limits expressiveness) XML Schema specification should be: –defined quickly to prevent competing schema languages gaining a foothold –precise, concise, human-readable and illustrated with examples
5
5 XML Schema technical requirements XML Schema should –contain mechanism for constraining use of namespaces –allow creation of user-defined datatypes for describing character data and attribute values –enable inheritance for element, attribute and datatype definitions –support evolution of schemas –permit embedded structured documentation within schemas
6
6 XML Schema recommendation Official XML Schema specification published as W3C recommendation in 2001 –in 2 parts: XML Schema Part 1: Structures –Describes core XML Schema including, for example, element and attribute declarations –Most recent version: Second Edition, 28 October 2004 –Available online at http://www.w3.org/TR/xmlschema-1/ XML Schema Part 2: Datatypes –Defines facilities for defining datatypes in XML Schema –Most recent version: Second Edition, 28 October 2004 –Available online at http://www.w3.org/TR/xmlschema-2/ Does not satisfy all original requirements: –not simple Partly remedied by XML Schema Part 0: Primer –Provides easily readable description of the XML Schema facilities –Most recent version: 28 October 2004 –Available online at »http://www.w3.org/TR/xmlschema-0/http://www.w3.org/TR/xmlschema-0/ –not fully self-describing –not sufficiently expressive e.g., cannot express full syntax of RecipeML
7
7 XML Schema overview Contains a sophisticated type system like those in common programming languages –Facilitates re-use and improves schema structure Four central constructs in XML Schema all based on types and are as follows: –Simple type definition Defines a family of Unicode text strings Describes text without markup –Complex type definition Defines validity requirements for attributes, sub-elements and character data in an element of that type Describes text which may contain markup –Element declaration Associates element name with either a simple or complex type –Attribute declaration Associates attribute name with simple type –Attribute values are always unstructured text
8
8 An example schema written in XML Schema Schema at left shows – one element declaration student – two attribute declarations: id, score – one complex type definition: StudentType – one simple type definition: Score XML Schema elements identified by namespace http://www.w3.org/2001/XMLSchema http://www.w3.org/2001/XMLSchema ● Namespace prefix ("xsd") is arbitrary but conventional Root element in XML Schema document is named schema ● usually contains targetNamespace attribute ● defines namespace being defined by the schema ● also declare this namespace with a prefix so that can refer to definitions within the schema Definitions create new types; declarations describe constituents of the instance document Definitions and declarations populate the target namespace
9
9 Syntax for element and attribute declarations Element declaration has form –associates simple or complex type, type, with the element named name Attribute declaration has form –associates simple type, type, with an attribute named name
10
10 Simple student instance document Can avoid use of Can avoid use of prefixes in attribute names
11
11 Business card example Instance doc at top left in language defined at bottom left Assume we own the domain businesscard.org –so no-one else uses this namespace Can fix it so that no need for prefix in uri attribute Compare DTD
12
12 Connecting instance documents and schemas Instance document can refer to a schema using schemaLocation attribute from the namespace, http://www.w3.org/2001/XMLSchema-instance http://www.w3.org/2001/XMLSchema-instance Value of schemaLocation attribute has two parts, separated by whitespace: –target namespace of schema –URI of schema document schemaLocation indicates that document is supposed to be valid with respect to the schema schemaLocation attributes may appear in any element –usually appear in root element –can also appear in another element to indicate that the schema applies to the subtree under that element means XML languages can be combined at will schemaLocation attribute value is actually sequence of "namespace URI" pairs –if more than one pair, all schemas apply independently
13
13 More on schemaLocation All attributes defined in http://www.w3.org/2001/XMLSchema- instance implicitly declared for all elements in instance document http://www.w3.org/2001/XMLSchema- instance schemaLocation attributes are optional –make instance documents self-describing Applications require documents to be valid relative to schemas decided by application developers, not schemas decided by document authors XMLSchema does not directly enforce a particular root element –e.g., an XMLSchema definition of XHTML cannot express that the root element must be html –means that application must check root element as well as carrying out XML validation
14
14 Simple types Simple type or datatype is set of Unicode strings with a particular semantic interpretation –e.g., decimal datatype is built-in XML Schema datatype which consists of all strings that represent decimal numbers (e.g., 3.1415 ) 3.1415 is equal to 3.141500 42 is less than 117 XML Schema contains some primitive simple types with pre-defined meanings XML Schema also provides various mechanisms for deriving new types from existing ones
15
15 Simple Types (Datatypes) – Primitive string any Unicode string boolean true, false, 1, 0 decimal 3.1415 float 6.02214199E23 double 42E970 dateTime 2004-09-26T16:29:00-05:00 time 16:29:00-05:00 date 2004-09-26 hexBinary 48656c6c6f0a base64Binary SGVsbG8K anyURI http://www.brics.dk/ixwt/ QName rcp:recipe, recipe...
16
16 Some built-in derived simple types normalizedString –as string but whitespace facet is replace token –as string but whitespace facet is collapse language –"en", "da", "en-US", etc. NMTOKEN –e.g., "42", "my.form", "r103" NMTOKENS –e.g., "42 my.form r103" nonPositiveInteger –e.g., "-87", "0"
17
17 A simple type element declaration –assigns built-in primitive simple type, nonNegativeInteger, to elements named serialnumber –contents of a serialnumber element must match nonNegativeInteger (possibly with surrounding whitespace) –serialnumber element cannot contain child elements or attributes
18
18 Deriving new simple types by restriction Restriction of a simple type defines a new type by restricting possible values of a base type –restriction performed on facets of base type (see table above left) –restriction may contain multiple constraining facets Facet restrictions operate at semantic not syntactic level –e.g., allows 123, 0123 and 0123.0 but not 1234 and 123.05
19
19 Deriving new simple types by restriction enumeration facet restricts values to a finite set of possibilities (see above left) pattern facet allows values to be constrained to satisfy regular expressions (see above right) –symbols that have a special meaning within regular expressions can be escaped by prefixing with a backslash (e.g., \*) For most facets, restrictions may be changed in further derivations unless fixed="true" attribute is added to constraining facet
20
20 Deriving simple types using list and union Use the list element inside a simpleType definition to define a whitespace separated string of values of a particular type (see above left) –e.g., "23 4 56 -7" is of type integerlist Use union element inside a simpleType definition to specify that a value must be one of two or more types –e.g., "true" and "1.3" are both of type boolean_or_decimal
21
21 Complex types An element declaration may assign a complex type to an element name: –means that elements with the name card must satisfy all the requirements specified in the definition of the type card_type –complex type definition may specify attributes, child element types and ordering and character data Complex type defined using XML Schema element, complexType –content of complexType element can be either complex or simple
22
22 Element reference Element reference takes the form –name is the name of an element that has already been declared Note difference between element element with name attribute and one with a ref attribute!
23
23 sequence element Concatenation within the content of an element with a complex content model is expressed using the sequence element
24
24 choice element Union (i.e., the '|' operator in a regular expression) corresponds to the choice element At left, each card element contains either an email element or zero or 1 phone elements but not both
25
25 all element A content sequence matches an all expression if each constituent of the expression is matched somewhere in the content model and every element in the content model is matched by a constituent in the expression Essentially variant of sequence in which order does not matter
26
26 any element any empty element is a wildcard that matches any element Attribute namespace limits matching elements in various ways –whitespace separated list of URIs –##targetNamespace –##local empty namespace –##any –##other any namespace except targetNamespace
27
27 any element Can be used to specify that a different language is used inside an element –e.g., XHTML used inside the info element in WidgetML (see above) –content must consist of one or more elements from the XHTML namespace
28
28 Some restrictions all element may only contain element references sequence and choice elements cannot contain all elements complexType contents cannot consist of single element or any declaration –need to wrap it in a sequence or choice element
29
29 Attribute references A complex type may optionally contain a number of attribute references of the form –name is the name of the attribute that has been declared elsewhere –attribute reference must appear after the content model description of a complex type –attribute reference can contain an attribute named use which can take the values optional (default) or required
30
30 minOccurs and maxOccurs minOccurs and maxOccurs attributes can be used with – element, sequence, choice, all and any elements –define possible cardinalities of the element –values must be non-negative integers or, for maxOccurs, unbounded –by default, minOccurs and maxOccurs are 1
31
31 mixed attribute complexType may optionally have an attribute, mixed="true" –means arbitrary character data is permitted anywhere in the content in addition to the elements declared in the content model –Without mixed="true" attribute, only whitespace allowed between elements in content model –Character data cannot be constrained if we also want to allow elements in the content
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.