eXtensible Markup Language XML eXtensible Markup Language
Extensible Markup Language XML is another format (similar to JSON) that is meant to convey semi-structured data. HTML (used in webpages) is a subset of XML. Heavily used in web services and inter-database communications. Often used for text document formatting Microsoft Office OpenOffice.org / LibreOffice Apple's iWork Also often used in configuration files.
XML Tags and Elements Tags come in 3 types: start-tags (e.g. <section>) end-tags (e.g. </section>) empty-element tags (e.g. <section />) An element consist of a start-tag, optional content (which may be text or other elements), and a matching end-tag. Or, an element is just a empty-element tag. Example elements: <head></head> <h> Hi <p /></h> <h />
XML Attributes Attributes are name/value pairs within a start-tag or empty-element tag. There can be only one value in each pair, so multiple values are combined (often as a space-delimited string). Example attributes: <person name="josh"></person> <work loc="BPS Engineering" />
Well-Formed XML The document contains only properly encoded legal Unicode characters None of the special syntax characters such as < and & appear except when performing their markup-delineation roles The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping The element tags are case- sensitive; the beginning and end tags must match exactly. Tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{| }~, nor a space character, and cannot start with -,., or a numeric digit. A single "root" element contains all the other elements.
XML Internal Representation What data structures would you use to represent an XML document: Graph Tree Hash table List
DTD - Document Type Definition A DTD defines what is "valid" XML data for a particular purpose. It defines what are allowable tags and the grammar for how they can be used. Details can be found here: http://www.w3schools.com/xml/ xml_dtd_intro.asp Special DTD components: (#PCDATA) - Parsed character data - Text content that has special characters(<&) escaped. CDATA - Unparsed character data EMPTY - empty-element tag, no content allowed.
Document Type Definition <!ELEMENT bookstore (name,topic+)> Valid XML <bookstore> <name>Josh's Store</name> <topic> <name>XML</name> <book isbn="123-456-789"> <title>Josh's Guide To DTD's and XML Schemas</title> <author>Josh</author> </book> </topic> </bookstore> Document Type Definition <!ELEMENT bookstore (name,topic+)> <!ELEMENT topic (name,book*)> <!ELEMENT name (#PCDATA)> <!ELEMENT book (title,author)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ATTLIST book isbn CDATA #REQUIRED>
XML Schemas Schemas are a much more powerful way to describe an XML documents structure and limitations. They are written in XML and can even have DTDs and XML schemas of their own. They provide a basic set of types that are more descriptive than PCDATA / CDATA. Detail can be found here: http://www.w3schools.com/xml/xml_sch ema.asp Basic types (integer, byte, string, floating-point number) can be combined into complex types that are used to define an element. Let's build an XML schema to match our previous bookstore example.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> Valid XML <bookstore> <name>Josh's Store</name> <topic> <name>XML</name> <book isbn="123-456-789"> <title>Josh's Guide To DTD's and XML Schemas</title> <author>Josh</author> </book> </topic> </bookstore> XML Schema <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation xlm:lang="en"> XML Schema for a Bookstore as an example. </xsd:documentation> </xsd:annotation>
<xsd:element name="bookstore" type="bookstoreType"/> Valid XML <bookstore> <name>Josh's Store</name> <topic> <name>XML</name> <book isbn="123-456-789"> <title>Josh's Guide To DTD's and XML Schemas</title> <author>Josh</author> </book> </topic> </bookstore> XML Schema ... <xsd:element name="bookstore" type="bookstoreType"/> <xsd:complexType name="bookstoreType"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="topic" type="topicType" minOccurs="1"/> </xsd:sequence> </xsd:complexType>
<xsd:complexType name="topicType"> <xsd:element name="name" Valid XML <bookstore> <name>Josh's Store</name> <topic> <name>XML</name> <book isbn="123-456-789"> <title>Josh's Guide To DTD's and XML Schemas</title> <author>Josh</author> </book> </topic> </bookstore> XML Schema ... <xsd:complexType name="topicType"> <xsd:element name="name" type="xsd:string"/> <xsd:element name="book" type="bookType" minOccurs="0"/> </xsd:complexType>
<xsd:complexType name="bookType"> <xsd:element name="title" Valid XML <bookstore> <name>Josh's Store</name> <topic> <name>XML</name> <book isbn="123-456-789"> <title>Josh's Guide To DTD's and XML Schemas</title> <author>Josh</author> </book> </topic> </bookstore> XML Schema ... <xsd:complexType name="bookType"> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" <xsd:attribute name="isbn" type="isbnType"/> </xsd:complexType> <xsd:simpleType name="isbnType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{3}[-][0-9]{3}[-][0-9]{3}"/> </xsd:restriction> </xsd:simpleType>
Schema versus DTD When should you use an XML Schema instead of an XML DTD? Data Typing Validation Speed Occurrence Contraints Always