Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee
Overview What is XML? Why XML is popular? How to write a XML document? How to write XML DTD’s/Schemas?
What is XML? eXtensible Markup Language XML is a standard developed by the W3C XML is a syntax for expressing structured data in a text format XML is not a language on its own. Instead, XML is used to build markup languages. XML is like html per-se, but unlike html tags, XML tags convey meaning of the data inside their tags
Structured Data Structured data refers to data that is tagged for its content, meaning, or use Includes: spreadsheets, address books, databases, PDF documents, … Stored in binary or text format
XML Technology Model Data is modeled in XML The structure and constraints are modeled using DTD’s or Schemas The document format can be modeled using XSL (XML Style Sheets) REMEMBER: XML allows us to separate data from presentation!
Why use XML? Interoperability – XML is operating system, platform, language independent Separates content from presentation Well supported by most browsers Simple XML documents are human-readable and can be easily parsed by machines, as well. Easily converted to other formats. XML->PDF || Microsoft CHM etc., Can represent almost any kind of data Many, many applications: Math/Science/etc., (continued: next slide)
MyMathML 4 + (5 * 3) <expression> <operator> add </operator> <expression> <number> 4 </number></expression> <expression> <operator> mult </operator> <expression> <number> 5 </number> </expression> <expression> <number> 3 </number> </expression> </expression> </expression>
MyChemML ChemML (tracking experiments) <experiment date = "03-15-2003"> <introduction> The compound under investigation is common water: <molecule> <atom symbol="H" number ="2"/> <atom symbol="O" number ="1"/> </molecule> It boils at 100 degrees and freezes at 0 degrees! For more information about this amazing compound see the March 2003 issue of: <reference type = "simple" href = "http://www.ww.com"> Water World </reference> </introduction> <!-- etc --> </experiment> (Now the technical stuff)
XML document syntax Root element Elements and attributes are case sensitive Elements must be correctly nested Attributes values must be in quotes Tags must be closed Spaces are not allowed in element and attribute names
XML Example <?xml version="1.0"?> <Bookstore> <Book ID=“101”> <Author>John Doe</Author> <Title>Introduction to XML</Title> <Date>12 June 2001</Date> <ISBN>121232323</ISBN> <Publisher>XYZ</Publisher> </Book> <Book ID=“102”> <Author>Foo Bar</Author> <Title>Introduction to XSL</Title> <ISBN>12323573</ISBN> <Publisher>ABC</Publisher> </Bookstore>
Well-formed vs Valid Syntax & Semantic checking Well-formed (syntax): Properties: (1) every start tag has a matching end tag, and (2) elements are properly nested an XML document might be “well-formed” without being “valid“, but a “valid” document is “well-formed” Valid (semantic): A valid XML document conforms to the vocabulary constraints defined in a DTD or Schema
Well-Formed (cont’d) Well formed? <?xml version=“1.0”?> <memo> <from> Bill <to> Sue </from> </to> Dinner tonight? </nemo>
Definition and Validation Two ways to define the structure of an XML document DTDs Schemas Each set of rules specifies an XML vocabulary
What is a DTD? Document Type Definitions (DTD) Emphasis on the structure of the XML, what elements and attributes can appear and their relationships Difficult to work with No support for data types Not extensible
Bookstore Example <Bookstore> <Book ID=“101”> <Author>John Doe</Author> <Title>Introduction to XML</Title> <Date>12 June 2001</Date> <ISBN>121232323</ISBN> <Publisher>XYZ</Publisher> </Book> <Book ID=“102”> <Author>Foo Bar</Author> <Title>Introduction to XSL</Title> <ISBN>12323573</ISBN> <Publisher>ABC</Publisher> </Bookstore> <!ELEMENT Bookstore (Book)*> <!ELEMENT Book (Title, Author+, Date, ISBN, Publisher)> <!ATTLIST Book ID #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>
Problems with DTD’s It's not XML syntax You write your XML document using one syntax and the DTD using another syntax -> inconsistent, more work for the parsers. Limited set of primitive datatypes Desire a set of datatypes compatible with those found in databases One of the main weaknesses of DTD is its lack of support for data types beyond character strings (PCDATA). Limited support for applying constraints. Can support only constraints like “+” (1 or more occurences), “?” (0 or 1 occurences), “*” (0 or more occurences), etc. No facility for providing constraints like those found in databases (enumerations, ranges, string length, etc.)
What are Schemas? Schemas More complex than DTD’s Specify structure Support for precise data type constraints Allows for user-defined data types (complex/simple types) Enhanced datatypes (unlike PCDATA in DTD’s): Wider range of primitive data types, supporting those found in databases (string, boolean, decimal, integer, date, etc.) Can create your own datatypes (complexType) Support namespaces for extensibility
Schema Example <Bookstore> <Book ID=“101”> (next SLIDE) <Bookstore> <Book ID=“101”> <Author>John Doe</Author> <Title>Introduction to XML</Title> <Date>12 June 2001</Date> <ISBN>121232323</ISBN> <Publisher>XYZ</Publisher> </Book> <Book ID=“102”> <Author>Foo Bar</Author> <Title>Introduction to XSL</Title> <ISBN>12323573</ISBN> <Publisher>ABC</Publisher> </Bookstore>
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema” targetNamespace="http://www.books.org" xmlns=“http://www.books.org”> <xsd:element name="Bookstore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs=“unbounded”/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:Date"/> <xsd:element name="ISBN" type="xsd:integer"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema>
XML Namespaces: Code-reuse Identifies an XML vocabulary defined by a URI (Uniform Resource Identifier) Allows reuse of XML markup Resolves problems with recognition and collision of tags with similar names. Can happen if your combining elements from multiple documents. (see previous slide)
Cool XML Application: RSS <rss version="0.91"> <channel> <title>XML.com</title> <link>http://www.xml.com/</link> <description>XML.com features a rich mix of information and services for the XML community.</description> <language>en-us</language> <item> <title>Normalizing XML, Part 2</title> <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link> <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description> </item> <item> <title>The .NET Schema Object Model</title> <link>http://www.xml.com/pub/a/2002/12/04/som.html</link> <description>Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.</description> </item> </channel> </rss>
?? Almost done …
TOools/Software XML Spy By far, the most comprehensive editor. Handles XML files, DTD’s, XSL files, as well as XSD (XML Schema). Unfortunately only a 30 day trial version. http://www.xmlspy.com/download.html XML Notepad Microsoft XML Notepad is a simple application for building and editing small sets of XML-based data. Freeware. http://msdn.microsoft.com/xml/notepad/download.asp XML Pro XML Pro is a top-notch XML editor but it doesn’t include as many features as XML Spy. Shareware. http://www.vervet.com/demo.html $$ You can also validate your XML files by just opening them with IE5.0 or above. It checks if the XML file is well-formed or not, and also validates against a DTD (if specified on the DOCTYPE declaration Great links: www.w3schools.com http://www.cs.sjsu.edu/faculty/pearce/web/front.htm
Conclusion You thought HTML was easy? XML just got easier! Get XML certified before you graduate! Visit: http://www.whizlabs.com/articles/xml-article.html Questions skumarjang@hotmail.com