G52IWS: Extensible Markup Language (XML) Chris Greenhalgh
Contents What is XML XML standards XML Syntax DTDs XML Schema See “Developing Java Web Services” chapter 8, first part and G51WPS notes on XML; see W3C standards
What is XML Text-based language for structured data encoding Tree-structured Common abstract syntax any XML document can be read by a common parser DTDs or XML-Schema define particular application-specific constraints E.g. new tags, allowed structures & datatypes
XML standards Created in 1996 Derived from SGML markup language Managed by the W3C XML (www.w3c.org) group(s) since 1998 http://www.w3.org/XML/Core/#Publications inc: Extensible Markup Language (XML) 1.0 (Fourth Edition) http://www.w3.org/XML/Schema#dev inc: XML Schema Part 0: Primer http://www.w3.org/XML/Query/#specs inc: XML Path Language (XPath) 2.0 …
XML Example (no DTD) <?xml version="1.0" ?> <Friends> <Person> <Name>Jane Doe</Name> <Age>21</Age> <Body> <Weight Unit="lbs">126</Weight> <Height Unit="inches">62</Height> </Body> <Trust trusted="yes"/> </Person> <Name>John Doe</Name> <Age>26</Age> <Trust trusted="no"/> </Friends>
XML document structure Prolog Document type declaration Optional Includes element declarations Root element With nested elements With optional attributes With optional text content (incl. CDATA sections) Interleaved with optional comments and processing instructions
XML Syntax Contents Prolog Root Processing instructions Comments Names Tags Elements Content and CDATA sections Attributes Entities Namespaces
Prolog Every XML document starts with prolog, e.g. <?xml version="1.0" ?> <?xml version="1.0" encoding="ISO-8859-1" ?> Known start allows multi-byte and byte-order encodings to be identified Allows specific encoding to be specified Defaults to Unicode (UTF-8 if single byte)
Root Every XML document has exactly one “top”-level or root element, e.g. <?xml version="1.0" ?> <Friends> … </Friends> But not e.g. <?xml version="1.0" ?> <Friends> … </Friends> <Friends> … </Friends>
Processing instructions Provide information for XML processing application(s) Are of the form: <?target instructions?> Includes the document prolog: <?xml version="1.0" ?>
Comments Used for documentation Are of the form: <!-- some comment --> E.g.: <?xml version="1.0" ?> <!-- my friends --> <Friends> <!-- my first friend --> <Person> … </Person> </Friends>
Names No blanks spaces Must start with alphabetical letter (e.g. A-Z or a-z) or underscore (_) Can be followed by letters, digits (0-9), underscores (_), hyphens (-), periods (.) and colons (:) Colons are normally reserved for use with namespaces Case-sensitive E.g. “product” is different from “Product”
Tags Main building block of XML Start tag: <tagname optional-attributes> End tag: </tagname> Empty-element tag: <tagname optional-attributes/>
XML Example <?xml version="1.0" ?> Start tag without attributes <Friends> <Person> <Name>Jane Doe</Name> <Age>21</Age> <Body> <Weight Unit="lbs">126</Weight> <Height Unit="inches">62</Height> </Body> <Trust trusted="yes"/> </Person> <Name>John Doe</Name> <Age>26</Age> <Trust trusted="no"/> </Friends> Start tag without attributes Start tag with attributes Empty-element tag End tag
Elements Basic building block of XML Have form: Never overlap Start tag … matching end tag or Empty-element tag Never overlap Unlike SGML E.g. can’t have “<a>…<b>…</a>…</b>” But can be nested I.e. a tree, starting from the root element E.g. can have “<a>…<b>…</b>…</a>” Can contain textual content
Content and CDATA sections Within elements between start and end tags Plain text Whitespace optionally significant No ‘<‘ or ‘&’ Use entity references instead (“<&”) CDATA “escape” section can include any text unescaped except “]]>” e.g. <![CDATA[<hello>&asoa,osd>as<]]>
Attributes Set of key-value pairs associated with each element Defined in the start tag or empty-element tag never in the end tag Optional Each key must be unique within that element E.g. attribute key is “Unit” and value is “lbs”: <Weight Unit="lbs">126</Weight>
Entities Short-cuts/references to text Of the form: &entityname; E.g. < < > > & & " " ' ' More can be defined in the (optional) DTD
Namespaces Are contexts within which names are defined Prevent confusion between coincidental uses of the same names (for elements or attributes) Namespace is a URI Never actually resolved to a document Default namespace introduced by attribute xmlns="namespaceuri" Applies to that and all subsequent unqualified element names (NOT attribute names) Namespace prefix introduced by attribute xmlns:prefix="namespaceuri" Used explicitly as “prefix:name” No namespace is the same as the empty URI “” This is the top-level default namespace and default namespace for all attributes at any level
Namespace example Expanded names <?xml version="1.0" ?> <Friends xmlns="http://woo.foo/"> <Person xmlns:n2="http://wee.fee/"> <n2:Name>Jane Doe</n2:Name> <Age xmlns="http://wee.fee/">21</Age> <Weight Unit="lbs">126</Weight> <Height n2:Unit="inches">62</Height> </Person> </Friends> “http://woo.foo/”,”Friends” Default NS “http://woo.foo/” “http://woo.foo/”,”Person” “http://wee.fee/”,”Name” Default NS “http://wee.fee/” “http://wee.fee/”,”Age” “http://woo.foo/”,”Weight” (att.) “”,”Unit” (att.) “http://wee.fee/”,”Unit”
Document Type Definitions Use regular expressions to specify valid document structure Element nesting, required and optional attributes, default values May be included after prolog in document Or may be referenced from an external name or URL Relatively limited expressiveness, especially for attribute and text values See G51WPS notes
XML Schema More modern alternative to DTDs for specifying valid XML document structure and content See http://www.w3.org/XML/Schema#dev XML Schema Part 0: Primer XML Schema Part 1: Structures XML Schema Part 2: Datatypes
XML Schema An XML Schema definition is an XML document conforming to the XML Schema schema Allows definition of simple types Without nested elements Including built-in types such as xsd:decimal, xsd:string complex types with nested elements and optional attributes Elements (which may be simple or complex) Attributes (which all have simple types)
XML Schema example 1 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:element name="comment" type="xsd:string"/> </schema> Defines one element “http://woo.foo”,”comment” of simple type xsd:string, e.g. <?xml version="1.0"?> <comment xmlns="http://woo.foo/">this is a comment</comment>
XML Schema example 2 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:simpleType name="Chocolate"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="dark"/> <xsd:enumeration value="milk"/> <xsd:enumeration value="white"/> </xsd:restriction> </xsd:simpleType> <xsd:element name="chocolate" type="Chocolate"/> </schema> Defines one element “http://woo.foo”,”chocolate” of new simple type “http://woo.foo”,”Chocolate”, which must be “dark”, “milk” or “white” <?xml version="1.0"?> <chocolate xmlns="http://woo.foo/">dark</chocolate>
XML Schema example 3 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:complexType name=“ThreePiece"> <xsd:sequence> <xsd:element name="lead" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="bass" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="drums" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> <xsd:element name=“band" type="ThreePiece"/> </schema> Defines one element “http://woo.foo”,”band” of new complex type “http://woo.foo”,”ThreePiece”, with three mandatory child elements <?xml version="1.0"?> <band xmlns="http://woo.foo/"> <lead>Bill</lead> <bass>Bob</bass> <drums>Ben</drums> </band>
G52IWS XML 2007-10-10 XML Schema example 4 <?xml version="1.0"?> <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://woo.foo/"> <xsd:complexType name=“WeightType"> <xsd:simpleContent> <xsd:extension base="xsd:double"> <xsd:attribute name="Units" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:element name="weight" type=“WeightType"/> </schema> Defines one element “http://woo.foo”,”weight” of new simple type “http://woo.foo”,”Chocolate”, which must be “dark”, “milk” or “white” <?xml version="1.0"?> <weight xmlns="http://woo.foo/" Units="kg">dark</chocolate>
XML Schema built-in data types string base64binary – Base64 encoded binary boolean – true or false decimal – integers double – 64 bit floating point float – 32 bit floating point anyUri – URI duration – duration dateTime- date & time … And various restrictions, e.g. minimum & maximum values, lengths
Complex type building blocks Element combinations: Sequence – in order given, specifiable count All – in any order, 0 or 1 of each Choice – one of Additional constructions Reusable groups of elements Reusable groups of attributes Substitution groups Alternative elements which may appear in a particular place
Summary XML XML Schema Common abstract syntax Hierarchical element tree, plus content and attributes XML Schema Specifies XML elements and allowed structure and content for XML document(s) Checked by “validating” parsers Used to formally specify WSDL, SOAP, etc. Can be used to generate schema-specific APIs E.g. Java API for XML Binding (JAXB) Typically more readable code than DOM or SAX