XML -07-
XML What is XML?
XML What is XML? XML stands for eXtensible Markup Language
XML What is XML? XML stands for eXtensible Markup Language XML is a markup language
XML What is XML? XML stands for eXtensible Markup Language XML is a markup language XML was designed to carry data
XML What is XML? XML stands for eXtensible Markup Language XML is a markup language XML was designed to carry data <book> <title> A nice book about XML </title> <author> Hanan Shpungin </author> <edition> <volume> 1 </volume> <year> 2010 </year> </edition> </book>
XML What is XML? XML stands for eXtensible Markup Language XML is a markup language XML was designed to carry data <book> <title> A nice book about XML </title> <author> Hanan Shpungin </author> <edition> <volume> 1 </volume> <year> 2010 </year> </edition> </book> It wasn’t designed to display data
XML What is XML? XML stands for eXtensible Markup Language XML is a markup language XML was designed to carry data <book> <title> A nice book about XML </title> <author> Hanan Shpungin </author> <edition> <volume> 1 </volume> <year> 2010 </year> </edition> </book> It wasn’t designed to display data – how would you display the book information?
XML What is XML? Do not confuse with HTML
XML What is XML? Do not confuse with HTML HTML is about displaying information, while XML is about carrying information - XML focuses on what data is - HTML focuses on how data looks
XML What is XML? Do not confuse with HTML XML does not do anything HTML is about displaying information, while XML is about carrying information - XML focuses on what data is - HTML focuses on how data looks XML does not do anything
XML What is XML? Do not confuse with HTML XML does not do anything HTML is about displaying information, while XML is about carrying information - XML focuses on what data is - HTML focuses on how data looks XML does not do anything XML was created to structure information - it is just pure information wrapped in tags Someone must write a piece of software to send, receive or display it
XML What is XML? XML is just plain text
XML What is XML? XML is just plain text Software that can handle plain text can also handle XML However, XML-aware applications can handle the XML tags specially; the actual handling depends on the tags and the application
XML What is XML? XML is just plain text Software that can handle plain text can also handle XML However, XML-aware applications can handle the XML tags specially; the actual handling depends on the tags and the application You have to invent your own tags
XML What is XML? XML is just plain text Software that can handle plain text can also handle XML However, XML-aware applications can handle the XML tags specially; the actual handling depends on the tags and the application You have to invent your own tags XML language has no predefined tags (unlike HTML) XML allows the author to define his own tags and his own document structure For instance, the book example used “made-up” tags
XML What is XML? XML is just plain text Software that can handle plain text can also handle XML However, XML-aware applications can handle the XML tags specially; the actual handling depends on the tags and the application You have to invent your own tags <book> <title> A nice book about XML </title> <author> Hanan Shpungin </author> <edition> <volume> 1 </volume> <year> 2010 </year> </edition> </book>
XML What is XML? XML is just plain text Software that can handle plain text can also handle XML However, XML-aware applications can handle the XML tags specially; the actual handling depends on the tags and the application You have to invent your own tags An XML document is very self-descriptive
XML What is XML? XML is everywhere XML is the most common tool for data transmissions between all sorts of applications
XML What is XML? XML is everywhere XML is the most common tool for data transmissions between all sorts of applications Various uses: xHTML CAP (Common Alerting Protocol) – public warnings and emergencies DocBook – technical documentation CML (Chemical Markup Language) – managing molecular information
XML What is XML good for?
XML What is XML good for? Separating data from layout You really don’t want to have to update your HTML every time the data changes
XML What is XML good for? Separating data from layout You really don’t want to have to update your HTML every time the data changes Let HTML worry about how data is presented
XML What is XML good for? Separating data from layout You really don’t want to have to update your HTML every time the data changes Let HTML worry about how data is presented Put the data into XML files; this way you can modify the data without having to worry about presentation
XML What is XML good for? Separating data from layout You really don’t want to have to update your HTML every time the data changes Let HTML worry about how data is presented Put the data into XML files; this way you can modify the data without having to worry about presentation The data can then be retrieved by simple JavaScript code
XML What is XML good for? Simple sharing of data XML data is stored in plain text format; this provides a software- and hardware-independent way of storing data
XML What is XML good for? Simple sharing of data XML data is stored in plain text format; this provides a software- and hardware-independent way of storing data Data can be easily shared between different application on the same machine or across the internet
XML What is XML good for? Simple sharing of data XML data is stored in plain text format; this provides a software- and hardware-independent way of storing data Data can be easily shared between different application on the same machine or across the internet System upgrades do not require changing the data format; there is no data incompatibility
XML What is XML good for? Simple sharing of data Data accessibility XML data is stored in plain text format; this provides a software- and hardware-independent way of storing data Data can be easily shared between different application on the same machine or across the internet System upgrades do not require changing the data format; there is no data incompatibility Data accessibility Data can be available to all kinds of "reading machines“ (e.g. smart phones, news feeds readers)
XML What is XML good for? Simple sharing of data Data accessibility XML data is stored in plain text format; this provides a software- and hardware-independent way of storing data Data can be easily shared between different application on the same machine or across the internet System upgrades do not require changing the data format; there is no data incompatibility Data accessibility Data can be available to all kinds of "reading machines“ (e.g. smart phones, news feeds readers) It can also be made available for people with disabilities
XML The XML tree An XML document is a tree of elements
XML The XML tree An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements
XML The XML tree An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements The elements in an XML document form a document tree
XML The XML tree An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements The elements in an XML document form a document tree The tree starts at the root and branches to the lowest level of the tree
XML The XML tree An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements The elements in an XML document form a document tree The tree starts at the root and branches to the lowest level of the tree All elements can have sub elements (child elements)
XML The XML tree An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements The elements in an XML document form a document tree The tree starts at the root and branches to the lowest level of the tree All elements can have sub elements (child elements) <root> <child> <subchild>.....</subchild> </child> </root>
XML The XML tree An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements The elements in an XML document form a document tree The tree starts at the root and branches to the lowest level of the tree All elements can have sub elements (child elements) The terms “parent”, “child”, and “sibling” are used to describe the relationships between elements
<bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
XML The XML tree An XML document is a tree of elements XML declaration is not part of the tree
XML The XML tree An XML document is a tree of elements XML declaration is not part of the tree Most XML documents start with a line which looks like this <?xml version=“1.0” encoding=“ISO-8859-1”?>
XML The XML tree An XML document is a tree of elements XML declaration is not part of the tree Most XML documents start with a line which looks like this <?xml version=“1.0” encoding=“ISO-8859-1”?> XML documents can actually contain non-ASCII characters
XML The XML tree An XML document is a tree of elements XML declaration is not part of the tree Most XML documents start with a line which looks like this <?xml version=“1.0” encoding=“ISO-8859-1”?> XML documents can actually contain non-ASCII characters By omitting the declaration line you can encounter parsing problems due to incompetability with the default settings
XML The XML tree An XML document is a tree of elements XML declaration is not part of the tree Most XML documents start with a line which looks like this <?xml version=“1.0” encoding=“ISO-8859-1”?> XML documents can actually contain non-ASCII characters By omitting the declaration line you can encounter parsing problems due to incompetability with the default settings Always place a declaration line to avoid problems
XML The XML tree An XML document is a tree of elements XML declaration is not part of the tree Most XML documents start with a line which looks like this <?xml version=“1.0” encoding=“ISO-8859-1”?> XML documents can actually contain non-ASCII characters By omitting the declaration line you can encounter parsing problems due to incompetability with the default settings Always place a declaration line to avoid problems Make sure you know in what encoding your XML file is saved
<. xml version=“1. 0” encoding=“ISO-8859-1” <?xml version=“1.0” encoding=“ISO-8859-1”?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
XML XML syntax rules
XML XML syntax rules The XML rules are simple and strict
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag A starting tag starts with “<” and ends with “>” <book> A closing tag starts with “</” and ends with “>” </book> An empty element tag starts with “<” and ends with “/>” <new />
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag A starting tag starts with “<” and ends with “>” A closing tag starts with “</” and ends with “>” An empty element tag starts with “<” and ends with “/>” The following example is illegal <note> <sender> Hanan </sender> <text> Meet you at 15:30 </note>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag A starting tag starts with “<” and ends with “>” A closing tag starts with “</” and ends with “>” An empty element tag starts with “<” and ends with “/>” The following example is illegal <note> <sender> Hanan </sender> <text> Meet you at 15:30 </note>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag A starting tag starts with “<” and ends with “>” A closing tag starts with “</” and ends with “>” An empty element tag starts with “<” and ends with “/>” The fixed example <note> <sender> Hanan </sender> <text> Meet you at 15:30 </text> </note>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag A starting tag starts with “<” and ends with “>” A closing tag starts with “</” and ends with “>” An empty element tag starts with “<” and ends with “/>” The declaration is not an element, so there is no problem <?xml version=“1.0” encoding=“ISO-8859-1”?>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements are defined using tags, which are case-sensitive
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements are defined using tags, which are case-sensitive <Note> is different from <note>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements are defined using tags, which are case-sensitive <Note> is different from <note> The starting and the closing tags of an elements must match
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements are defined using tags, which are case-sensitive <Note> is different from <note> The starting and the closing tags of an elements must match <Note> this is wrong </note> <noTe> this is correct </noTe>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested You cannot open one element and close another one
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested You cannot open one element and close another one Elements must be closed in reverse order
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested You cannot open one element and close another one Elements must be closed in reverse order <b> <i> wrong </b> </i> <b> <i> correct </i> </b>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element <root> <child> <subchild>.....</subchild> </child> </root>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element XML Attribute values must be quoted
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element XML Attribute values must be quoted It is possible for XML elements to have attributes in the form name/value just like in HTML The attribute value must always be quoted
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element XML Attribute values must be quoted It is possible for XML elements to have attributes in the form name/value just like in HTML The attribute value must always be quoted <msg date=11/02/10> wrong </msg>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element XML Attribute values must be quoted It is possible for XML elements to have attributes in the form name/value just like in HTML The attribute value must always be quoted <msg date=“11/02/10”> correct </msg>
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element XML Attribute values must be quoted White spaces are not truncated in XML
XML XML syntax rules The XML rules are simple and strict All XML elements must have a closing tag The XML tags are case-sensitive XML elements must be properly nested An XML document must have a root element XML Attribute values must be quoted White spaces are not truncated in XML Comments in XML Similar to HTML: <!-- this is a comment -->
XML XML syntax rules Entity references Just like in HTML, it is advised to replace several special symbols with entity references
XML XML syntax rules Entity references Just like in HTML, it is advised to replace several special symbols with entity references The characters “<” and “&” are strictly forbidden in XML
XML XML syntax rules Entity references Just like in HTML, it is advised to replace several special symbols with entity references The characters “<” and “&” are strictly forbidden in XML The following generates an error <message> if salary < 1000 then </message>
XML XML syntax rules Entity references Just like in HTML, it is advised to replace several special symbols with entity references The characters “<” and “&” are strictly forbidden in XML The following generates an error <message> if salary < 1000 then </message> Can be fixed
XML XML syntax rules Entity references Just like in HTML, it is advised to replace several special symbols with entity references The characters “<” and “&” are strictly forbidden in XML There are 5 predefined entities for symbols < > “ ‘ & < > " ' &
XML XML syntax rules Entity references Just like in HTML, it is advised to replace several special symbols with entity references The characters “<” and “&” are strictly forbidden in XML There are 5 predefined entities for symbols < > “ ‘ & < > " ' & It is possible to address any symbol through the numeric code, e.g. e
XML XML elements
XML XML elements What is an element?
XML XML elements What is an element? An XML element is everything from (including) the element's start tag to (including) the element's end tag
XML XML elements What is an element? An XML element is everything from (including) the element's start tag to (including) the element's end tag Elements can contain other elements, simple text or a mixture of both
XML XML elements What is an element? An XML element is everything from (including) the element's start tag to (including) the element's end tag Elements can contain other elements, simple text or a mixture of both Elements can also have attributes
XML XML elements What is an element? An XML element is everything from (including) the element's start tag to (including) the element's end tag Elements can contain other elements, simple text or a mixture of both Elements can also have attributes <bookstore> <book category=“WEB”> <author> Erik T. Ray </author> <title> Learning XML </title> </book> <bookstore>
XML XML elements Naming elements Recall that there are no predefined elements – you make your own
XML XML elements Naming elements Recall that there are no predefined elements – you make your own There are no reserved words; you can use any name
XML XML elements Naming elements Recall that there are no predefined elements – you make your own There are no reserved words; you can use any name Some restrictions apply Names cannot start with a number or a punctuation sign Names cannot start with the letters “xml” in any case (XML, xml, Xml, etc.) Names cannot contain spaces
XML XML elements Naming elements Some tips for naming elements:
XML XML elements Naming elements Some tips for naming elements: Use short and informative names, separating words with underscores “-”, e.g. first_name, book_title and not “the_title_of_the_book”
XML XML elements Naming elements Some tips for naming elements: Use short and informative names, separating words with underscores “-”, e.g. first_name, book_title and not “the_title_of_the_book” Avoid using “-”, “.”, “:” in the names, as they might be misinterpreted by some softwares
XML XML elements Naming elements Some tips for naming elements: Use short and informative names, separating words with underscores “-”, e.g. first_name, book_title and not “the_title_of_the_book” Avoid using “-”, “.”, “:” in the names, as they might be misinterpreted by some softwares Use naming conventions of the other parts of your project
XML XML elements Naming elements Some tips for naming elements: Use short and informative names, separating words with underscores “-”, e.g. first_name, book_title and not “the_title_of_the_book” Avoid using “-”, “.”, “:” in the names, as they might be misinterpreted by some softwares Use naming conventions of the other parts of your project You can use non-english letters, but it is better to avoid as the reader might not support it
XML XML attributes
XML XML attributes Attributes may appear in the start tag (like in HTML) Attributes provide additional information about the element <file type=“png”>image.png</file>
XML XML attributes Attributes may appear in the start tag (like in HTML) Attributes provide additional information about the element <file type=“png”>image.png</file> Attributes must be quoted You can use either single or double quotes; both uses are valid <file type=“png”>image.png</file> <file type=‘png’>image.png</file>
XML XML attributes Attributes may appear in the start tag (like in HTML) Attributes provide additional information about the element <file type=“png”>image.png</file> Attributes must be quoted You can use either single or double quotes; both uses are valid <file type=“png”>image.png</file> <file type=‘png’>image.png</file> If the attribute contains quotes, it is possible to use single quotes or entity references <musician name=‘Elvis “The King” Presley’>
XML XML attributes Elements or attributes? There are several ways to present the same data You can use either attributes <message date=“11/02/10”> … </message> or elements <message> <date>11/02/10</date> … </message> to present the same data - what is better?
<message>. <date>. <day>11</day> <message> <date> <day>11</day> <month>02</month> <year>2010</year> </date> <from>Hanan</from> <to>class</to> <text>XML is fun!</text> </message> <message> <date>11/02/2010</date> <from>Hanan</from> <to>class</to> <text>XML is fun!</text> </message> <message date=“11/02/2010”> <from>Hanan</from> <to>class</to> <text>XML is fun!</text> </message>
The worst! <message date=“11/02/2010” from=“Hanan” to=“class” text=“XML is fun!”> </message>
XML XML attributes Elements or attributes? There are several ways to present the same data You can use either attributes or elements to present the same data - what is better? Attributes are generally harder to read and maintain
XML XML attributes Elements or attributes? There are several ways to present the same data You can use either attributes or elements to present the same data - what is better? Attributes are generally harder to read and maintain Some attributes drawbacks: cannot contain multiple values or nest
XML XML attributes Elements or attributes? There are several ways to present the same data You can use either attributes or elements to present the same data - what is better? Attributes are generally harder to read and maintain Some attributes drawbacks: cannot contain multiple values or nest A nice rule of thumb: use elements for data and attributes for meta data (e.g. assigning an id to an element)
XML Well-formedness and validity
XML Well-formedness and validity A well-formed XML document A document is well-formed if it satisfies the syntax rules
XML Well-formedness and validity A well-formed XML document A document is well-formed if it satisfies the syntax rules If an XML document violates the syntax, it is not considered to be an XML document
XML Well-formedness and validity A well-formed XML document A document is well-formed if it satisfies the syntax rules If an XML document violates the syntax, it is not considered to be an XML document Yes, it’s draconic – unlike HTML, where the browser is expected to produce a reasonable result even in the presence of severe errors
XML Well-formedness and validity A well-formed XML document A document is well-formed if it satisfies the syntax rules If an XML document violates the syntax, it is not considered to be an XML document Yes, it’s draconic – unlike HTML, where the browser is expected to produce a reasonable result even in the presence of severe errors If a document is not well-formed, the processor is required to stop and report an error
XML Well-formedness and validity A valid XML document In addition to being well-formed, an XML document may also be valid
XML Well-formedness and validity A valid XML document In addition to being well-formed, an XML document may also be valid A valid document holds a reference to a Document Type Definition (DTD), and the document follows the rules of that DTD
XML Well-formedness and validity A valid XML document In addition to being well-formed, an XML document may also be valid A valid document holds a reference to a Document Type Definition (DTD), and the document follows the rules of that DTD XML processors are classified as validating or non-validating, depending on whether or not they check XML documents for validity
XML Well-formedness and validity A valid XML document In addition to being well-formed, an XML document may also be valid A valid document holds a reference to a Document Type Definition (DTD), and the document follows the rules of that DTD XML processors are classified as validating or non-validating, depending on whether or not they check XML documents for validity DTD is just one of the many ways to write grammar rules (schema) which define the validity of a document
XML Schemas and validation What is a schema?
XML Schemas and validation What is a schema? A schema addresses the following aspects: the set of elements that may be used in a document what attributes may be applied to every element the order of elements/attributes the allowable parent/child relationships
XML Schemas and validation What is a schema? DTD Defines the grammar rules of a document DTD The oldest schema language for XML Quite simple to write and read Only the string type available for data, that is you cannot define a numeric type of data No complex types Very widely used
XML Schemas and validation What is a schema? DTD Defines the grammar rules of a document DTD <!DOCTYPE bookstore [ <!ELEMENT bookstore (books*) <!ELEMENT book (title,author,year)> <!ELEMENT title (#CDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#CDATA)> <!ATTLIST book price (#CDATA) #REQUIRED> ]>
XML Schemas and validation What is a schema? Defines the grammar rules of a document XML schema definition (XSD) Much more powerful than DTD XSD uses an XML-based format, which makes it easier to read using XML tools Complex and rich data typing (almost like a programming language) Detailed constraints on the logical structure of an XML document
XML Processing XML documents The design goal: “it shall be easy to write programs which process XML documents” However, the XML specification does not say how
XML Processing XML documents The design goal: “it shall be easy to write programs which process XML documents” However, the XML specification does not say how A variety of APIs to access XML were developed SAX (Simple API for XML) A stream parser with event driven API The user defines callback methods to be invoked on specific events The parser simply scans the document and notifies of events, such as “element start”, “text node”, etc.
XML Processing XML documents The design goal: “it shall be easy to write programs which process XML documents” However, the XML specification does not say how A variety of APIs to access XML were developed DOM (Document Object Model) Supports navigation in the whole document tree Allows manipulation of the document as well Usually very heavy on the memory
XML Processing XML documents The design goal: “it shall be easy to write programs which process XML documents” However, the XML specification does not say how A variety of APIs to access XML were developed Pull parsing Resembles SAX, but instead of reacting to events, forces the next step The parser provides the user with an iterator, with which he can traverse the document sequentially
XML Related specifications Usually, the term XML implies additional technologies We focus on few of them
XML Related specifications Usually, the term XML implies additional technologies We focus on few of them XML Namespaces – using elements with the same name but from different vocabularies (namespaces)
XML Related specifications Usually, the term XML implies additional technologies We focus on few of them XML Namespaces – using elements with the same name but from different vocabularies (namespaces) XSLT – An XML-based language for transformation of XML documents into other XML documents, plain text, HTML, etc.
XML Related specifications Usually, the term XML implies additional technologies We focus on few of them XML Namespaces – using elements with the same name but from different vocabularies (namespaces) XSLT – An XML-based language for transformation of XML documents into other XML documents, plain text, HTML, etc. XPath – Used to navigate through elements and attributes in an XML document
XML Related specifications Usually, the term XML implies additional technologies We focus on few of them XML Namespaces – using elements with the same name but from different vocabularies (namespaces) XSLT – An XML-based language for transformation of XML documents into other XML documents, plain text, HTML, etc. XPath – Used to navigate through elements and attributes in an XML document XQuery – Designed to query XML documents (uses XPATH)
XML XML so far (quick summary) XML stands for eXtensible Markup Language XML is a markup language XML was designed to carry data <book> <title> A nice book about XML </title> <author> Hanan Shpungin </author> <edition> <volume> 1 </volume> <year> 2010 </year> </edition> </book> It wasn’t designed to display data – how would you display the book information?
XML XML so far (quick summary) XML is just plain text Software that can handle plain text can also handle XML However, XML-aware applications can handle the XML tags specially; the actual handling depends on the tags and the application You have to invent your own tags XML language has no predefined tags (unlike xHTML) XML allows the author to define his own tags and his own document structure For instance, the book example used “made-up” tags
XML XML so far (quick summary) An XML document is a tree of elements XML documents must contain a root element; this element is “the parent” of all the other elements The elements in an XML document form a document tree The tree starts at the root and branches to the lowest level of the tree All elements can have sub elements (child elements) The terms “parent”, “child”, and “sibling” are used to describe the relationships between elements
XML Working with XML data
XML Working with XML data XML data is usually read by parsers Although XML is plain text which can be easily read, using a parser allows taking advantage of the semantic structure of the document (e.g. SAX, DOM, Pull parsing, etc.)
XML Working with XML data XML data is usually read by parsers Although XML is plain text which can be easily read, using a parser allows taking advantage of the semantic structure of the document (e.g. SAX, DOM, Pull parsing, etc.) It is possible to transform XML documents into into other XML documents
XML Working with XML data XML data is usually read by parsers Although XML is plain text which can be easily read, using a parser allows taking advantage of the semantic structure of the document (e.g. SAX, DOM, Pull parsing, etc.) It is possible to transform XML documents into into other XML documents For example, converting an XML document (e.g. a web feed) into an xHTML document (to be presented in a browser)
XML Working with XML data XML data is usually read by parsers Although XML is plain text which can be easily read, using a parser allows taking advantage of the semantic structure of the document (e.g. SAX, DOM, Pull parsing, etc.) It is possible to transform XML documents into into other XML documents For example, converting an XML document (e.g. a web feed) into an xHTML document (to be presented in a browser) XML documents can be transformed (and rendered) by using a family of languages called XSL
XML XSL (eXtensible Stylesheet Language) XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display
XML XSL (eXtensible Stylesheet Language) XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display XSL is actually a family of languages
XML XSL (eXtensible Stylesheet Language) XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display XSL is actually a family of languages These languages define the transformation and formatting rules for XML documents
XML XSL (eXtensible Stylesheet Language) XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display XSL is actually a family of languages These languages define the transformation and formatting rules for XML documents XSLT is a language for the transformation of XML documents
XML XSL (eXtensible Stylesheet Language) XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display XSL is actually a family of languages These languages define the transformation and formatting rules for XML documents XSLT is a language for the transformation of XML documents XPath a language for navigating in XML
XML XSL (eXtensible Stylesheet Language) XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display XSL is actually a family of languages These languages define the transformation and formatting rules for XML documents XSLT is a language for the transformation of XML documents XPath a language for navigating in XML documents XSL-FO is a language for formatting XML documents
XML XSL Transformations (XSLT) The general model of XSLT The XSLT processor takes two input elements 1. the XML document 2. the XSL stylesheet
XML XSL Transformations (XSLT) The general model of XSLT The XSLT processor takes two input elements 1. the XML document 2. the XSL stylesheet The XSLT processor then produces the output document by following the instructions of the XSL stylesheet
XML
XML XSL Transformations (XSLT) The general model of XSLT The XSLT processor takes two input elements 1. the XML document 2. the XSL stylesheet The XSLT processor then produces the output document by following the instructions of the XSL stylesheet With XSLT you have full control of the resulting XML document You can create new elements, add existing ones, rearrange elements, sort elements, modify elements, etc.
XML XSL Transformations (XSLT) The processing is template-based
XML XSL Transformations (XSLT) The processing is template-based The XSL stylesheet defines templates, which the nodes in the origin document are matched against
XML XSL Transformations (XSLT) The processing is template-based The XSL stylesheet defines templates, which the nodes in the origin document are matched against In case of a match, the XSLT processor will transform the matching part into the result document according to the provided rules
XML XSL Transformations (XSLT) The processing is template-based The XSL stylesheet defines templates, which the nodes in the origin document are matched against In case of a match, the XSLT processor will transform the matching part into the result document according to the provided rules It can be viewed as functional expressions which evaluate into the final result
XML XSL Transformations (XSLT) The processing is template-based The XSL stylesheet defines templates, which the nodes in the origin document are matched against In case of a match, the XSLT processor will transform the matching part into the result document according to the provided rules It can be viewed as functional expressions which evaluate into the final result For example ...
XML document <?xml version="1.0" ?> <course_staff> <instructor> <name>Hanan Shpungin</name> <email>hanan.shpungin@ucalgary.ca</email> </instructor> <teaching_assistant> <name>Marian Doerk</name> <email>mdoerk@ucalgary.ca</email> </teaching_assistant> </course_staff>
XSL Stylesheet <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:template match="/"> <head> <title>SENG 513 teaching staff</title> </head> <body> <h1> SENG 513 teaching staff </h1> <h2> Instructor </h2> <b> Name: </b> <i><xsl:value-of select="//instructor/name" /></i> <br /> <b> E-mail: </b> <tt><xsl:value-of select="//instructor/email" /></tt>
Click me! XSL Stylesheet <h2> Teaching assistant </h2> <b> Name: </b> <i> <xsl:value-of select="//teaching_assistant/name" /> </i> <br /> <b> E-mail: </b> <tt> <xsl:value-of select="//teaching_assistant/email" /> </tt> </body> </xsl:template> </xsl:stylesheet> Click me!
XML XSL Transformations (XSLT) The processing is template-based The XSL stylesheet defines templates, which the nodes in the origin document are matched against In case of a match, the XSLT processor will transform the matching part into the result document according to the provided rules It can be viewed as functional expressions which evaluate into the final result XSLT relies on several XML related specifications XML Namespace and XPath
XML XML Namespace
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document An XML document may contain elements or attributes from more than one XML vocabulary The ambiguity can be resolved by giving each vocabulary a namespace
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document An XML document may contain elements or attributes from more than one XML vocabulary The ambiguity can be resolved by giving each vocabulary a namespace For example <tree> <tree> <family> … </family> <node> … </nodes> <age> … </age> <edges> … </edges> </tree> </tree>
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document An XML document may contain elements or attributes from more than one XML vocabulary The ambiguity can be resolved by giving each vocabulary a namespace For example <tree> <tree> <family> … </family> <node> … </nodes> <age> … </age> <edges> … </edges> </tree> </tree>
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document An XML document may contain elements or attributes from more than one XML vocabulary The ambiguity can be resolved by giving each vocabulary a namespace For example <f:tree> <g:tree> <f:family> … </f:family> <g:node> … <g:/nodes> <f:age> … </f:age> <g:edges> … <g:/edges> </f:tree> <g:/tree>
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document The namespace is defined by the xmlns attribute at the start tag of an element <tag xmlns:prefix=URI>
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document The namespace is defined by the xmlns attribute at the start tag of an element <tag xmlns:prefix=URI> For example <tree xmlns:f=“http://www.forests.com/forest”> When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace
XML XML Namespace Namespaces are used to provide unique names to elements and attributes of an XML document The namespace is defined by the xmlns attribute at the start tag of an element <tag xmlns:prefix=URI> For example <tree xmlns:f=“http://www.forests.com/forest”> The prefix might be omitted Note that the URI contains no data, it is just a name
XSL Stylesheet <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:template match="/"> <head> <title>SENG 513 teaching staff</title> </head> <body> <h1> SENG 513 teaching staff </h1> <h2> Instructor </h2> <b> Name: </b> <i><xsl:value-of select="//instructor/name" /></i> <br /> <b> E-mail: </b> <tt><xsl:value-of select="//instructor/email" /></tt>
XML XML Path
XML XML Path XPath is used to navigate through the tree of elements and attributes in an XML document
XML XML Path XPath is used to navigate through the tree of elements and attributes in an XML document The navigation slightly resembles a file system
XML XML Path XPath is used to navigate through the tree of elements and attributes in an XML document The navigation slightly resembles a file system For example /bookstore/book /bookstore/* //title[@lang='eng']
XML XML Path XPath is used to navigate through the tree of elements and attributes in an XML document The navigation slightly resembles a file system For example /bookstore/book /bookstore/* //title[@lang='eng'] Either a single node or a set of nodes are selected by following expressions along the given path
XML XML Path Syntax The nodes are selected by following a chain of steps
XML XML Path Syntax The nodes are selected by following a chain of steps The chain can be either absolute or relative /step1/step2/step3/… step1/step2/step3/… The steps are separated by a “/”
XML XML Path Syntax The nodes are selected by following a chain of steps The chain can be either absolute or relative /step1/step2/step3/… step1/step2/step3/… The steps are separated by a “/” The general form of a step is: axisname::node-test[predicate]
XML XML Path Syntax The nodes are selected by following a chain of steps The chain can be either absolute or relative /step1/step2/step3/… step1/step2/step3/… The steps are separated by a “/” The general form of a step is: axisname::node-test[predicate] For example child::book[price>35.00]
XML XML Path Syntax The evaluation of a single step is relative to the current node (the context node)
XML XML Path Syntax The evaluation of a single step is relative to the current node (the context node) Axis name defines the relationship within the tree For example child, descendant, attribute, parent, etc.
XML XML Path Syntax The evaluation of a single step is relative to the current node (the context node) Axis name defines the relationship within the tree For example child, descendant, attribute, parent, etc. Node test specifies a node (or a group of nodes) within the axis
XML XML Path Syntax The evaluation of a single step is relative to the current node (the context node) Axis name defines the relationship within the tree For example child, descendant, attribute, parent, etc. Node test specifies a node (or a group of nodes) within the axis Predicates allow to refine the matching by providing additional conditions for the nodes to hold
XPath expression: /bookstore <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <title lang="eng">Learning XML</title> <price>39.95</price> </bookstore> XPath expression: /bookstore
/bookstore/book[price>35] <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <title lang="eng">Learning XML</title> <price>39.95</price> </bookstore> XPath expression: /bookstore/book[price>35]
//price/parent::book/title[@=‘eng’] <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <title>Learning XML</title> <price>39.95</price> </bookstore> XPath expression: //price/parent::book/title[@=‘eng’]
XPath expression: //book/* <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <title lang="eng">Learning XML</title> <price>39.95</price> </bookstore> XPath expression: //book/*
XML XML Path Syntax The evaluation of a single step is relative to the current node (the context node) Axis name defines the relationship within the tree Node test specifies a node (or a group of nodes) within the axis Predicates allow to refine the matching by providing additional conditions for the nodes to hold It is possible to specify several paths simultaneously
//title | //book[price>35]/price <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <title lang="eng">Learning XML</title> <price>39.95</price> </bookstore> XPath expression: //title | //book[price>35]/price
XML XML Path Syntax XPath supports a set of operators which can be used in XPath expressions For example: *, /, div, or, and, etc.
XML XML Path Syntax XPath supports a set of operators which can be used in XPath expressions For example: *, /, div, or, and, etc. There are also a core function library with useful methods
XML XML Path Syntax XPath supports a set of operators which can be used in XPath expressions For example: *, /, div, or, and, etc. There are also a core function library with useful methods Some of the functions provide general utility methods substring, floor, string-length
XML XML Path Syntax XPath supports a set of operators which can be used in XPath expressions For example: *, /, div, or, and, etc. There are also a core function library with useful methods Some of the functions provide general utility methods substring, floor, string-length Some of the functions operate on the node set the current context position, count, id
XML XSL Transformations (XSLT) (recap.) The processing is template-based The XSL stylesheet defines templates, which the nodes in the origin document are matched against In case of a match, the XSLT processor will transform the matching part into the result document according to the provided rules It can be viewed as functional expressions which evaluate into the final result XSLT relies on several XML related specifications XML Namespace and XPath
XML XSL Transformations (XSLT) The XSL stylesheet is an XML document! Therefore it must start with the XML declaration <?xml version="1.0" encoding="ISO-8859-1"?>
XML XSL Transformations (XSLT) The XSL stylesheet is an XML document! Therefore it must start with the XML declaration <?xml version="1.0" encoding="ISO-8859-1"?> The root node can be either xsl:stylesheet or xsl:transform
XML XSL Transformations (XSLT) The XSL stylesheet is an XML document! Therefore it must start with the XML declaration <?xml version="1.0" encoding="ISO-8859-1"?> The root node can be either xsl:stylesheet or xsl:transform The XSLT namespace must be declared <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version=1.0 xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”> ... </xsl:stylesheet>
XML XSL Transformations (XSLT) The XML document is linked to an XSL stylesheet through a simple declaration <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href=“myxsl.xsl"?>
XML XSL Transformations (XSLT) The XML document is linked to an XSL stylesheet through a simple declaration <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href=“myxsl.xsl"?> The browser will then gladly render your XML file The latest versions of the major browsers support XSL Transformations
XML XSL Stylesheet structure The XSL Stylesheet consists of one or more templates Templates hold the rules which are applied when XML nodes are matched
XML XSL Stylesheet structure The XSL Stylesheet consists of one or more templates Templates hold the rules which are applied when XML nodes are matched The <xsl:template> element defines a template The <xsl:template> element has a match attribute which is an XPath expression <xsl:template match=“XPathExpression”> ... </xsl:template>
XML XSL template rules The XSLT processor recursively searches for a match
XML XSL template rules The XSLT processor recursively searches for a match The processor holds a list of nodes to match (initialized with the root node)
XML XSL template rules The XSLT processor recursively searches for a match The processor holds a list of nodes to match (initialized with the root node) For each node in the list, the best possible template is chosen (if exists) and its logic is applied
XML XSL template rules The XSLT processor recursively searches for a match The processor holds a list of nodes to match (initialized with the root node) For each node in the list, the best possible template is chosen (if exists) and its logic is applied If the node is unmatched, the processor adds its children to the list and continues to the next node
XML XSL template rules The XSLT processor recursively searches for a match The processor holds a list of nodes to match (initialized with the root node) For each node in the list, the best possible template is chosen (if exists) and its logic is applied If the node is unmatched, the processor adds its children to the list and continues to the next node The process ends when the list is empty
XML XSL template rules The XSLT processor recursively searches for a match The processor holds a list of nodes to match (initialized with the root node) For each node in the list, the best possible template is chosen (if exists) and its logic is applied If the node is unmatched, the processor adds its children to the list and continues to the next node The process ends when the list is empty Templates will usually trigger template matching of descendant nodes
XML Producing output The result tree is created through static XML output and several XSL elements
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:value-of> element Extracts the value of the selected node Has the format <xsl:value-of select=“XPathExpression"/> If not absolute, the XPathExpression is relative to the current node (the node which matched the template)
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:for-each> element Used to select every XML element of a specified node set Has the format <xsl:for-each select=“XPathExpression"> . . . </xsl:for-each> If not absolute, the XPathExpression is relative to the current node (the node which matched the template)
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:for-each> element For example <xsl:template match=“/bookstore”> <xsl:for-each select=“book[price<10]”> <i> <xsl:value-of select=“title” /> : </i> <b> <xsl:value-of select=“price” /> </xsl:for-each> </xsl:template>
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:sort> element The elements can be sorted by simply placing the <xsl:sort> element inside the <xsl:for-each> element
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:sort> element For example <xsl:template match=“/bookstore”> <xsl:for-each select=“book[price<10]”> <xsl:sort order=“descending” /> <i> <xsl:value-of select=“title” /> : </i> <b> <xsl:value-of select=“price” /> </xsl:for-each> </xsl:template>
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:sort> element The elements can be sorted by simply placing the <xsl:sort> element inside the <xsl:for-each> element It is possible to have primary, secondary (and so on) keys
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:if> and <xml:choose> elements Conditional processing in a template is supported by these two elements The <xsl:if> element is a simple condition <xsl:if test=“TestExpression"> . . . </xsl:if>
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:if> and <xml:choose> elements For example <xsl:template match=“/bookstore”> <xsl:for-each select=“book[price<10]”> <i> <xsl:value-of select=“title” /> : </i> <b> <xsl:value-of select=“price” /> <xsl:if test=“price<5”> <blink> GREAT PRICE!!! </blink> </xsl:if> </xsl:for-each> </xsl:template>
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:if> and <xml:choose> elements Conditional processing in a template is supported by these two elements The <xsl:choose> element is a more complex condition which resembles the Java switch statement
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:apply-templates> element When the XSLT processor finds a match to a node, it does not process its children
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:apply-templates> element When the XSLT processor finds a match to a node, it does not process its children It is possible to invoke the matching on some of the descendants of the current node <xsl:apply-templates select=“XPathExpression” /> If not absolute, the XPathExpression is relative to the current node (the node which matched the template)
XML Producing output The result tree is created through static XML output and several XSL elements The <xsl:apply-templates> element When the XSLT processor finds a match to a node, it does not process its children It is possible to invoke the matching on some of the descendants of the current node <xsl:apply-templates /> If the select attribute is omitted, the matching will be applied to all the child nodes
XML XSL summary XML data is usually read by parsers Although XML is plain text which can be easily read, using a parser allows taking advantage of the semantic structure of the document (e.g. SAX, DOM, Pull parsing, etc.) It is possible to transform XML documents into into other XML documents For example, converting an XML document (e.g. a web feed) into an xHTML document (to be presented in a browser) XML documents can be transformed (and rendered) by using a family of languages called XSL
XML XSL summary XSL for XML is like CSS is for HTML XML tags hold no information about what and how to display XSL is actually a family of languages These languages define the transformation and formatting rules for XML documents XSLT is a language for the transformation of XML documents XPath a language for navigating in XML documents XSL-FO is a language for formatting XML documents
XML Well-formedness and validity (recap.)
XML Well-formedness and validity (recap.) A well-formed XML document A document is well-formed if it satisfies the syntax rules If an XML document violates the syntax, it is not considered to be an XML document Yes, it’s draconic – unlike HTML, where the browser is expected to produce a reasonable result even in the presence of severe errors If a document is not well-formed, the processor is required to stop and report an error
XML Well-formedness and validity (recap.) A valid XML document In addition to being well-formed, an XML document may also be valid A valid document holds a reference to an XML schema, and the document follows the rules of that schema XML processors are classified as validating or non-validating, depending on whether or not they check XML documents for validity Document Type Definition (DTD) is just one of the many ways to write grammar rules (schema) which define the validity of a document
XML Schemas and validation (recap.) What is an XML schema? An XML schema addresses the following aspects: the set of elements that may be used in a document what attributes may be applied to every element the order of elements/attributes the allowable parent/child relationships elements/attributes data types etc.
XML Schemas and validation (recap.) What is an XML schema? DTD Defines the grammar rules of a document DTD The oldest schema language for XML Quite simple to write and read Only the string type available for data, that is you cannot define a numeric type of data No complex types Very widely used
XML Schemas and validation (recap.) What is an XML schema? Defines the grammar rules of a document XML Schema definition (XSD) Much more powerful than DTD XSD uses an XML-based format, which makes it easier to read using XML tools Complex and rich data typing (almost like a programming language) Detailed constraints on the logical structure of an XML document
XML Schemas and validation (recap.) What is an XML schema? Defines the grammar rules of a document Why to use any kind of schema?
XML Schemas and validation (recap.) What is an XML schema? Defines the grammar rules of a document Why to use any kind of schema? Your XML documents carry their own definitions
XML Schemas and validation (recap.) What is an XML schema? Defines the grammar rules of a document Why to use any kind of schema? Your XML documents carry their own definitions May serve as a specification for data format in a shared project
XML Schemas and validation (recap.) What is an XML schema? Defines the grammar rules of a document Why to use any kind of schema? Your XML documents carry their own definitions May serve as a specification for data format in a shared project You can validate the received data to avoid parsing errors You can validate your own data to avoid errors
XML Document Type Definition (DTD)
XML Document Type Definition (DTD) DTD is a set of declarations <!DOCTYPE bookstore [ <!ELEMENT bookstore (books*) <!ELEMENT book (title,author,year)> <!ELEMENT title (#CDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#CDATA)> <!ATTLIST book price (#CDATA) #REQUIRED> ]>
XML Document Type Definition (DTD) DTD is a set of declarations <!DOCTYPE bookstore [ <!ELEMENT bookstore (books*) <!ELEMENT book (title,author,year)> <!ELEMENT title (#CDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#CDATA)> <!ATTLIST book price (#CDATA) #REQUIRED> ]> Declarations define the legal structure of a valid XML document
XML Document Type Definition (DTD) DTD is a set of declarations There are several types of declarations Element type declarations Name the allowable set of elements within the document and specify the allowed nesting character content of the elements
XML Document Type Definition (DTD) DTD is a set of declarations There are several types of declarations Element type declarations Attribute type declarations Name the allowable set of attributes for each declared element specify the value type or a strict set of possible values for each attribute
XML Document Type Definition (DTD) DTD is a set of declarations There are several types of declarations Element type declarations Attribute type declarations Entity type declarations Used to define abbreviations (resembles #define in the C programming language) and also specify special characters
XML Document Type Definition (DTD) DTD is a set of declarations There are several types of declarations The declarations are composed of building blocks with different types In addition to the basic types: Elements, Attributes, Entities there are types for character data: CDATA and PCDATA
XML Document Type Definition (DTD) DTD is a set of declarations There are several types of declarations The declarations are composed of building blocks with different types In addition to the basic types: Elements, Attributes, Entities there are types for character data: CDATA and PCDATA CDATA is “Character Data”; it is text that WILL NOT be parsed by an XML parser Simply put, entities and markup will not be parsed
XML Document Type Definition (DTD) DTD is a set of declarations There are several types of declarations The declarations are composed of building blocks with different types In addition to the basic types: Elements, Attributes, Entities there are types for character data: CDATA and PCDATA PCDATA is “Parsed Character Data”; it is text that WILL be parsed by an XML parser Simply put, entities and markup will be parsed
XML Document Type Definition (DTD) DTD is a set of declarations Element type declarations Elements are declared with an ELEMENT declaration as follows <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)>
XML Document Type Definition (DTD) DTD is a set of declarations Element type declarations Elements are declared with an ELEMENT declaration as follows <!ELEMENT element-name category> category specifies that element must have no content (EMPTY) or can have any content (ANY) For example, <!ELEMENT br EMPTY>
XML Document Type Definition (DTD) DTD is a set of declarations Element type declarations Elements are declared with an ELEMENT declaration as follows <!ELEMENT element-name (element-content)> (element-content) specifies the possible content of the element in more detail For example, a required sequence of child elements <!ELEMENT html (head, body)> the above child elements must appear only once
XML Document Type Definition (DTD) DTD is a set of declarations Element type declarations Elements are declared with an ELEMENT declaration as follows <!ELEMENT element-name (element-content)> (element-content) specifies the possible content of the element in more detail For example, providing a required number of appearances <!ELEMENT book (title, dedication*, authors+, toc?, chapter+)>
XML Document Type Definition (DTD) DTD is a set of declarations Element type declarations Elements are declared with an ELEMENT declaration as follows <!ELEMENT element-name (element-content)> (element-content) specifies the possible content of the element in more detail For example, a mixed and optional content <!ELEMENT book (title, authors+, toc?, (chapter|story)+)> <!ELEMENT message (from,to, #PCDATA)>
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> attribute-type defines the type of the attribute value, e.g. CDATA, (val1|val2|…), ID, ENTITY
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> attribute-type defines the type of the attribute value, e.g. CDATA, (val1|val2|…), ID, ENTITY <!ATTLIST cd genre (jazz|rock|pop) … > <!ATTLIST div id ID … >
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> default-value can have the following values #REQUIRED - the attribute is required #IMPLIED - the attribute is not required #FIXED value - the attribute value is fixed value - the attribute’s default value is value
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> default-value can have the following values: #REQUIRED, #IMPLIED, #FIXED value, value <!ATTLIST cd genre (jazz|rock|pop) #REQUIRED>
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> default-value can have the following values: #REQUIRED, #IMPLIED, #FIXED value, value <!ATTLIST div id ID #IMPLIED>
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> default-value can have the following values: #REQUIRED, #IMPLIED, #FIXED value, value <!ATTLIST payment type (cash|credit) “cash”>
XML Document Type Definition (DTD) DTD is a set of declarations Attribute type declarations Attributes are declared with an ATTLIST declaration as follows <!ATTLIST element-name attribute-name attribute-type default-value> default-value can have the following values: #REQUIRED, #IMPLIED, #FIXED value, value <!ATTLIST record version CDATA FIXED “1.0”>
XML Document Type Definition (DTD) DTD is a set of declarations Entity type declarations Entities are declared with an ENTITY declaration, which can be internal: <!ENTITY entity-name “entity-value”> or external <!ENTITY entity-name SYSTEM “URI/URL”> The entity is then used with the following syntax &entity-name;
XML Document Type Definition (DTD) DTD is a set of declarations Entity type declarations Entities are declared with an ENTITY declaration, which can be internal: <!ENTITY entity-name “entity-value”> For example: <!ENTITY coursenum “SENG513”> Usage: <course>&coursenum;</course>
XML Document Type Definition (DTD) DTD is a set of declarations Entity type declarations Entities are declared with an ENTITY declaration, which can be internal or external: <!ENTITY entity-name SYSTEM “URI/URL”> For example: <!ENTITY coursenum SYSTEM “http://www.uofc.ca/courses/entities.dtd”>
XML Document Type Definition (DTD) DTD can be defined internally and externally Internal definition is within the XML document The DTD needs to be wrapped in a DOCTYPE definition as follows <!DOCTYPE root-element [element-declarations]> Where root-element is the root element of the XML document and element-declarations are the DTD declarations
XML Document Type Definition (DTD) DTD can be defined internally and externally Internal definition is within the XML document For example <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> ... <note> ... </note>
XML Document Type Definition (DTD) DTD can be defined internally and externally Internal definition is within the XML document External definition is referenced from within the XML document The DTD declarations appear in an external file which needs to be referenced as follows <!DOCTYPE root-element SYSTEM “filename”>
XML Document Type Definition (DTD) DTD can be defined internally and externally Internal definition is within the XML document External definition is referenced from within the XML document For example, <?xml version="1.0"?> <!DOCTYPE note SYSTEM “note.dtd”> <note> ... </note>
XML Document Type Definition (DTD) Example from http://www.vervet.com/
<. ELEMENT CATALOG (PRODUCT+)> < <!ELEMENT CATALOG (PRODUCT+)> <!ELEMENT PRODUCT (SPECIFICATIONS+, OPTIONS?, PRICE+, NOTES?)> <!ELEMENT SPECIFICATIONS (#PCDATA)> <!ELEMENT OPTIONS (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT NOTES (#PCDATA)> <!ATTLIST PRODUCT NAME CDATA #IMPLIED> <!ATTLIST CATEGORY (HandTool | Table | Shop-Professional) "HandTool"> <!ATTLIST PARTNUM CDATA #IMPLIED> <!ATTLIST PLANT (Pittsburgh | Milwaukee | Chicago) "Chicago"> <!ATTLIST INVENTORY (InStock | Backordered | Discontinued) "InStock"> <!ATTLIST SPECIFICATIONS WEIGHT CDATA #IMPLIED> <!ATTLIST POWER CDATA #IMPLIED> <!ATTLIST OPTIONS FINISH (Metal | Polished | Matte) "Matte"> <!ATTLIST OPTIONS ADAPTER (Included | Optional | NotApplicable) "Included"> <!ATTLIST OPTIONS CASE (HardShell | Soft | NotApplicable) "HardShell"> <!ATTLIST PRICE MSRP CDATA #IMPLIED> <!ATTLIST PRICE WHOLESALE CDATA #IMPLIED> <!ATTLIST PRICE STREET CDATA #IMPLIED> <!ATTLIST PRICE SHIPPING CDATA #IMPLIED> <!ENTITY AUTHOR "John Doe"> <!ENTITY COMPANY "JD Power Tools, Inc."> <!ENTITY EMAIL "jd@jd-tools.com">
XML XML Schema
XML XML Schema XML Schema resembles DTD in what it provides: defines elements/attributes that can appear in a document defines the parent/child relationship of elements defines the number of children and their order defines the possible content of an element defines data types for elements and attributes defines default and fixed values for elements and attributes
XML XML Schema XML Schema resembles DTD in what it provides: However, it holds some advantages over DTD written in XML supports more data types and data restrictions possible to reference multiple XML Schemas supports namespaces reuse and extend old XML Schemas DTD is still more used due to its simplicity and clarity
XML XML Schema The XML Schema is an XML document with a root element <schema> The typical form of an XML Schema is <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> . . . </xs:schema> Note the xs namespace defined in the root element
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements These elements contain only text of any type and cannot contain any other elements or attributes Some restrictions may be applied Simple elements cannot be empty
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements These elements contain only text of any type and cannot contain any other elements or attributes Some restrictions may be applied Simple elements cannot be empty Complex elements Any element which is not simple, is complex
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type The syntax of a simple element is <xs:element name=“name" type=“type"/> Where name is the name of the element and type is the text data type
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type The syntax of a simple element is <xs:element name=“name" type=“type"/> Most common data types are xs:string xs:boolean xs:decimal xs:date xs:integer xs:time
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type For example: <xs:element name=“dogname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/> <dogname>Rocky</dogname> <age>5</age> <dateborn>2005-02-11</dateborn>
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Simple elements can have default values or fixed values through the appropriate attributes <xs:element name="color" type="xs:string" default="red"/> <xs:element name="color" type="xs:string" fixed="red"/>
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Attributes resemble simple elements <xs:element name=“name" type=“type"/> where name and type are the same as for simple elements
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type It is possible to use default and fixed attributes <xs:attribute name="lang" type="xs:string" default="EN"/> <xs:attribute name="lang" type="xs:string" fixed="EN"/>
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type It is possible to use default and fixed attributes <xs:attribute name="lang" type="xs:string" default="EN"/> <xs:attribute name="lang" type="xs:string" fixed="EN"/> By default attributes are optional; to make them required it is possible to use the use=“required” attribute in the declaration
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Data types may be restricted
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Data types may be restricted The general form of a restriction on a textual value is <xs:element name="name"> (or <xs:attribute name=“name”> <xs:restriction base="type"> . . . </xs:restriction> </xs:element>
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Data types may be restricted The restriction is imposed by using special elements, e.g. totalDigits, pattern, length, enumeration For example…
XML XML Schema Restriction example <xs:element name="car"> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value=“Toyota"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:element>
XML XML Schema Restriction example <xs:element name="age"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value=“120"/> </xs:restriction> </xs:element>
XML XML Schema Restriction example <xs:element name="password"> <xs:restriction base="xs:string"> <xs:pattern value="[a-zA-Z0-9]{8}"/> </xs:restriction> </xs:element>
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Data types may be restricted Complex elements can contain text, other elements, have attributes, and be empty - there are four kinds
XML XML Schema The XML Schema is an XML document with a root element <schema> There are two types of elements: simple and complex Simple elements and attributes are of simple type Data types may be restricted Complex elements can contain text, other elements, have attributes, and be empty - there are four kinds empty elements containing only other elements containing only text containing both text and other elements
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration <xs:element name=“name"> <xs:complexType> <order-indicator> <xs:element name=“name" type=“type" occurs-indicator /> <xs:element name=“name" type=“type" occurs-indicator /> . . . </order-indicator> </xs:complexType> </xs:element>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration <xs:element name=“book"> <xs:complexType> <xs:sequence> <xs:element name=“title" type="xs:string"/> <xs:element name=“author" type="xs:string" maxOccurs=“5"/> </xs:sequence> </xs:complexType> </xs:element>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements <xs:all> specifies that the child elements can appear in any order, and that each child element must occur only once
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements <xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:all> </xs:complexType> </xs:element>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements <xs:all> <xs:choice> specifies that either one child element or another can occur
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements <xs:element name=“employee"> <xs:complexType> <xs:choice> <xs:element name=“teamleader" type=“teamleader"/> <xs:element name=“developer" type=“developer"/> </xs:choice> </xs:complexType> </xs:element>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements <xs:all> <xs:choice> <xs:sequence> specifies that the child elements must appear in a specific order
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements: <xs:all>, <xs:choice>, <xs:sequence> Occurrence indicators define how often an element can occur by using the maxOccurs and minOccurs attributes in the elements (to specify that an element can occur unbounded number of times, use maxOccurs=“unbounded”)
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration <xs:element name=“book"> <xs:complexType> <xs:sequence> <xs:element name=“title" type="xs:string"/> <xs:element name=“author" type="xs:string" minOccurs=“1"/> <xs:element name=“chapter" type=“chapter" maxOccurs=“unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration There are several types of indicators Order indicators define the order of the elements: <xs:all>, <xs:choice>, <xs:sequence> Occurrence indicators define how often an element can occur by using the maxOccurs and minOccurs attributes in the elements (to specify that an element can occur unbounded number of times, use maxOccurs=“unbounded”) Group indicators: elements and attributes can be grouped and referenced in later declarations
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration It is also possible to define an element and to reference it in later declarations
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration It is also possible to define an element and to reference it in later declarations The general form would be <xs:element name=“name” type=“typeName”/> <xs:complexType name=“typeName”> . . . </xs:complexType>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration It is also possible to define an element and to reference it in later declarations <xs:element name="employee" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration It is also possible to define an element and to reference it in later declarations As a result several elements can refer to the same complex type <xs:element name=“developer" type="personinfo"/> <xs:element name=“qa" type="personinfo"/> <xs:element name=“manager" type="personinfo"/>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration It is also possible to define an element and to reference it in later declarations As a result several elements can refer to the same complex type Complex types can also be extended
<xs:element name="employee" type="fullpersoninfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="fullpersoninfo"> <xs:complexContent> <xs:extension base="personinfo"> <xs:sequence> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
XML XML Schema Complex elements can be define directly, which resembles a DTD declaration It is also possible to define an element and to reference it in later declarations As a result several elements can refer to the same complex type Complex types can also be extended We only showed one type of elements, which contain other elements only
XML XML Schema Empty complex elements are defined the same but without providing any child elements
XML XML Schema Empty complex elements are defined the same but without providing any child elements <xs:element name=“book” type=“booktype”/> <xs:complexType name=“booktype”> <xs:attribute name=“isbn” type=“xs:string”/> </xs:complexType>
XML XML Schema Empty complex elements are defined the same but without providing any child elements Element which contains only text needs to be restricted/extended to/from the base simple type
XML XML Schema Empty complex elements are defined the same but without providing any child elements Element which contains only text needs to be restricted/extended to/from the base simple type <xs:element name=“cdtime” type=“timelength”/> <xs:complexType name=“timelength”> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name=“units” type="xs:string" /> </xs:extension> </xs:simpleContent> </xs:complexType>
XML XML Schema Empty complex elements are defined the same but without providing any child elements Element which contains only text needs to be restricted/extended to/from the base simple type Element which contains both text and other elements needs to be declared as such by setting the mixed attribute in the <xs:complexType> element <xs:complexType name=“message” mixed=true>
XML XML Schema A reference to an XML Schema comes is specified in the root node of the XML document <?xml version=“1.0”?> <root xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceSchemaLocation=“url.xsd" > . . . </root>
XML Schemas and validation summary What is an XML schema? An XML schema addresses the following aspects: the set of elements that may be used in a document what attributes may be applied to every element the order of elements/attributes the allowable parent/child relationships elements/attributes data types etc.
XML Schemas and validation summary What is an XML schema? DTD Defines the grammar rules of a document DTD The oldest schema language for XML Quite simple to write and read Only the string type available for data, that is you cannot define a numeric type of data No complex types Very widely used
XML Schemas and validation summary What is an XML schema? Defines the grammar rules of a document XML Schema definition (XSD) Much more powerful than DTD XSD uses an XML-based format, which makes it easier to read using XML tools Complex and rich data typing (almost like a programming language) Detailed constraints on the logical structure of an XML document
XML Schemas and validation summary What is an XML schema? Defines the grammar rules of a document Why to use any kind of schema? Your XML documents carry their own definitions May server as a specification for data format in a shared project You can validate the received data to avoid parsing errors You can validate your own data to avoid errors
XML XML Schema vs. DTD XML Schema resembles DTD in what it provides However, it holds some advantages over DTD written in XML supports more data types and data restrictions possible to reference multiple XML Schemas supports namespaces reuse and extend old XML Schemas DTD is still more used due to its simplicity and clarity