Introduction to XML Extensible Markup Language
What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added to the document to provide the extra information. HTML tags tell a browser how to display the document. XML tags give a reader some idea what some of the data means.
Advantages of XML XML documents are used to transfer data from one place to another often over the Internet. XML is text (Unicode) based. Takes up less space. Can be transmitted efficiently. XML documents can be modularized. Parts can be reused.
Example of an HTML Document <head><title>Example</title></head. <body> <h1>This is an example of a page.</h1> <h2>Some information goes here.</h2> </body> </html>
Example of an XML Document <?xml version=“1.0”/> <address> <name>Alice Lee</name> <email>alee@aol.com</email> <phone>212-346-1234</phone> <birthday>1985-03-22</birthday> </address>
Difference Between HTML and XML HTML tags have a fixed meaning and browsers know what it is. XML tags are different for different applications, and users know what they mean. HTML tags are used for display. XML tags are used to describe documents and data.
XML Rules Tags are enclosed in angle brackets. Tags come in pairs with start-tags and end-tags. Tags must be properly nested. <name><email>…</name></email> is not allowed. <name><email>…</email><name> is. Tags that do not have end-tags must be terminated by a ‘/’. <br /> is an html example.
More XML Rules Tags are case sensitive. <address> is not the same as <Address> XML in any combination of cases is not allowed as part of a tag. Tags may not contain ‘<‘ or ‘&’. Tags follow Java naming conventions, except that a single colon and other characters are allowed. They must begin with a letter and may not contain white space. Documents must have a single root tag that begins the document.
Well-Formed Documents An XML document is said to be well-formed if it follows all the rules. An XML parser is used to check that all the rules have been obeyed. Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers. Parsers are also available for free download over the Internet. Java 1.4 also supports an open-source parser.
XML Example Revisited <?xml version=“1.0”/> <address> <name>Alice Lee</name> <email>alee@aol.com</email> <phone>212-346-1234</phone> <birthday>1985-03-22</birthday> </address> Markup for the data aids understanding of its purpose. A flat text file is not nearly so clear. Alice Lee alee@aol.com 212-346-1234 1985-03-22 The last line looks like a date, but what is it for?
Expanded Example <?xml version = “1.0” ?> <address> <name> <first>Alice</first> <last>Lee</last> </name> <email>alee@aol.com</email> <phone>123-45-6789</phone> <birthday> <year>1983</year> <month>07</month> <day>15</day> </birthday> </address>
Document Type Definitions A DTD describes the tree structure of a document and something about its data. There are two data types, PCDATA and CDATA. PCDATA is parsed character data. CDATA is character data, not usually parsed. A DTD determines how many times a node may appear, and how child nodes are ordered.
1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the DTD defines the XML document’s grammar, we can use an XML parser to check that if an XML document conforms to the grammar defined by the DTD.
The purpose of a DTD is to define the legal building blocks of an XML document. Terminology for XML: -- well-formed: if tags are correctly closed. -- valid: if it has a DTD and conforms to it. Validation is useful in data exchange.
A DTD can be declared inside the XML document, or as an external reference. 1) Internal DTD This is a example of a simple XML document with an internal DTD:
<!DOCTYPE company [ < ! ELEMENT company ( (person | product)*) > < ! ELEMENT person ( ssn, name, office, phone?) > < ! ELEMENT ssn ( # PCDATA) > < ! ELEMENT name ( # PCDATA) > < ! ELEMENT office ( # PCDATA) > < ! ELEMENT phone ( # PCDATA) > < ! ELEMENT product ( pid, name, description? ) > < ! ELEMENT pid ( # PCDATA) > < ! ELEMENT description ( # PCDATA) > ]> <company> <person> <ssn> 12345678 < /ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <product> … </product> … </company>
2) External DTD This is the same XML document with an external DTD: <!DOCTYPE company SYSTEM “company.dtd”> <company> <person> <ssn> 12345678 < /ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <product> … </product> … </company> The DTD can reside at a different URL, and then we refer to it as: <!DOCTYPE company SYSTEM “http://www.mydtd.com/company.dtd”>
This is the file “company.dtd” containing the DTD: < ! ELEMENT company ( (person | product)*) > < ! ELEMENT person ( ssn, name, office, phone?) < ! ELEMENT ssn ( # PCDATA) > < ! ELEMENT name ( # PCDATA) > < ! ELEMENT office ( # PCDATA) > < ! ELEMENT phone ( # PCDATA) > < ! ELEMENT product ( pid, name, description? ) > < ! ELEMENT pid ( # PCDATA) > < ! ELEMENT description ( # PCDATA) >
DTD for address Example-1 <!ELEMENT address (name, email, phone, birthday)> <!ELEMENT name (first, last)> <!ELEMENT first (#PCDATA)> <!ELEMENT last (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT birthday (year, month, day)> <!ELEMENT year (#PCDATA)> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)>
DTD Example-2 <BOOKLIST> <BOOK GENRE = “Science” FORMAT = “Hardcover”> <AUTHOR> <FIRSTNAME> RICHRD </FIRSTNAME> <LASTNAME> KARTER </LASTNAME> </AUTHOR> </BOOK> </BOOKS> <!DOCTYPE BOOKLIST[ <!ELEMENT BOOKLIST(BOOK)*> <!ELEMENT BOOK(AUTHOR)> <!ELEMENT AUTHOR(FIRSTNAME,LASTNAME)> <!ELEMENT FIRSTNAME(#PCDATA)> <!ELEMENT>LASTNAME(#PCDATA)> <!ATTLIST BOOK GENRE (Science|Fiction)#REQUIRED> <!ATTLIST BOOK FORMAT (Paperback|Hardcover) “PaperBack”>]> Xml Document And Corresponding DTD
2. DTD - XML building blocks The building blocks of XML documents: 1) Elements: main building blocks. Example: “company”, “ person ” … 2) Tags: are used to markup elements. 3) Attributes: Attributes provide extra information about elements. Attributes are placed inside the starting tag of an element. Example: <img src="computer.gif" />
PCDATA is text that will be parsed by a parser. 4)PCDATA: parsed character data. Think of character data as the text found between the starting tag and the ending tag of an XML element. <name> John</name> PCDATA is text that will be parsed by a parser. 5)CDATA: character data. CDATA is text that will NOT be parsed by a parser. Think of the attribute value. 6) Entities: Entities as variables used to define common text
3.DTD - Elements <!ELEMENT element-name (element-content)> In the DTD, XML elements are declared with an element declaration. An element declaration has the following syntax: <!ELEMENT element-name (element-content)> Element-content model: 1) Empty Elements: keyword “EMPTY”. <!ELEMENT element-name (EMPTY)> example: <!ELEMENT img (EMPTY)>
2) Text-only: Elements with text are declared: example: <!ELEMENT element-name (#PCDATA)> example: < ! ELEMENT name ( # PCDATA) > 3) Any: keyword “ANY” declares an element with any content: <!ELEMENT element-name (ANY)>
4) Complex: a regular expression over other elements. Example: < ! ELEMENT company ( (person | product)*) > Note: The elements in the regular expression must be defined in the DTD.
4) Mixed content: Example: <!ELEMENT note (to+,from,header,message*,#PCDATA)> This example declares that the element note must contain at least one to child element, exactly one from child element, exactly one header, zero or more message, and some other parsed character data as well.
4.DTD - Attributes In DTD, XML element attributes are declared with an ATTLIST declaration. Attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attribute-type default-value> Example: <! ELEMENT person (ssn, name, office, phone?) > <! ATTLIS person age CDATA #REQUIRED) >
Attribute declaration examples Example 1: DTD example: <!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> XML example: <square width="100"></square> or <square width="100”/>
5.DTD - Entities Entities Entities as variables used to define shortcuts to common text. Entity references are references to entities. Entities can be declared internal. Entities can be declared external.
Internal Entity Declaration Syntax: <!ENTITY entity-name "entity-value"> DTD Example: <!ENTITY writer "Jan Egil Refsnes."> <!ENTITY copyright "Copyright XML101."> XML example: <author>&writer;©right;</author>
XML Schema An XML Schema: defines elements that can appear in a document defines attributes that can appear within elements defines which elements are child elements defines the sequence in which the child elements can appear defines the number of child elements defines whether an element is empty or can include text defines default values for attributes The purpose of a Schema is to define the legal building blocks of an XML document, just like a DTD.
Schema for First address Example <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="email" type="xs:string"/> <xs:element name="phone" type="xs:string"/> <xs:element name="birthday" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Explanation of Example Schema <?xml version="1.0" encoding="ISO-8859-1" ?> ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> www.w3.org/2001/XMLSchema contains the schema standards. <xs:element name="address"> <xs:complexType> This states that address is a complex type element. <xs:sequence> This states that the following elements form a sequence and must come in the order shown. <xs:element name="name" type="xs:string"/> This says that the element, name, must be a string. <xs:element name="birthday" type="xs:date"/> This states that the element, birthday, is a date. Dates are always of the form yyyy-mm-dd.
DTD: <!ELEMENT paper (title,author*,year, (journal|conference))> XML Schemas <xsd:element name=“paper” type=“papertype”/> <xsd:complexType name=“papertype”> <xsd:sequence> <xsd:element name=“title” type=“xsd:string”/> <xsd:element name=“author” minOccurs=“0”/> <xsd:element name=“year”/> <xsd: choice> < xsd:element name=“journal”/> <xsd:element name=“conference”/> </xsd:choice> </xsd:sequence> </xsd:element> DTD: <!ELEMENT paper (title,author*,year, (journal|conference))>
XML Schema – Better than DTDs XML Schemas are easier to learn than DTD are extensible to future additions are richer and more useful than DTDs are written in XML support data types
Example: Shipping Order <items> <item> <title>Wheel</title> <quantity>1</quantity> <price>10.90</price> </item> <title>Cam</title> <price>9.90</price> </item> </items> </shipOrder> <?xml version="1.0"?> <shipOrder> <shipTo> <name>Svendson</name> <street>Oslo St</street> <address>400 Main</address> <country>Norway</country> </shipTo>
XML Schema for Shipping Order <xsd:schema xmlns:xsd=http://www.w3.org/1999/XMLSchema> <xsd:element name="shipOrder" type="order"/> <xsd:complexType name="order"> <xsd:element name="shipTo" type="shipAddress"/> <xsd:element name="items" type="cdItems"/> </xsd:complexType> <xsd:complexType name="shipAddress"> <xsd:element name="name“ type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="address" type="xsd:string"/> <xsd:element name="country" type="xsd:string"/>
XML Schema - Shipping Order (continued) <xsd:complexType name="cdItems"> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded" type="cdItem"/> </xsd:complexType> <xsd:complexType name="cdItem"> <xsd:element name="title" type="xsd:string"/> <xsd:element name="quantity“ type="xsd:positiveInteger"/> <xsd:element name="price" type="xsd:decimal"/> </xsd:schema>