1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000
2 What Is XML? eXtensible Markup Language for data Standard for publishing and interchange “Cleaner” SGML for the Internet Applications: Data exchange over intranets, between companies E-business Native file formats (Word, SVG) Publishing of data Storage format for irregular data …
3 What’s Special about XML? Supported by almost everyone Easy to parse (even with no info about the doc) Can encode data with little or much structure Supports data references inside & outside document Presentation layer for publishing (XSL) Document Object Model (DOM) for manipulating Many, many tools
4 Basic XML Structures Elements: Open & close tags or “empty tag” Ordered, nestable Attributes: Single-valued, unordered Special types: ID, IDREF, IDREFS PCDATA/CDATA Publishing Object Data IBM Since its… Intro XML, … … …
5 Other XML Structures Processing instructions: instructions for applications CDATA sections: treat content as char data Whatever!!! ]]> Comments: just like HTML Entities: external resources and macros &my-entity; (non-parameter entity) %param-entity; (parameter entity for DTD declarations)
6 Document Type Descriptor Inherited from SGML DTD standard BNF grammar establishing constraints on element structure and content Specification of attributes and their types Definitions of entities
7 Example DTD
8 Shortcomings of DTDs Useful for documents, but not so good for data: No support for structural re-use Object-oriented-like structures aren’t supported No support for data types Can’t do data validation Can have a single key item (ID), but: No support for multi-attribute keys No support for foreign keys (references to other keys) No constraints on IDREFs (reference only a Section)
9 XML Schema In XML format Includes primitive data types (integers, strings, dates, etc.) Supports value-based constraints (integers > 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints
10 Sample XML Schema …
11 Subtyping in
12 Important XML Standards XSL/XSLT *: presentation and transformation standards RDF : resource description framework (meta-info such as ratings, categorizations, etc.) Xpath/Xpointer/Xlink *: standard for linking to documents and elements within Namespaces : for resolving name clashes DOM : Document Object Model for manipulating XML documents SAX : Simple API for XML parsing
13 Some Key XML Resources W3C XML standards SGML, XML standards XML portal xml.apache.org: Apache XML tools (Cocoon, Xerces, Xalan, etc.) xml.apache.org java.sun.com/xml: Sun Java tools java.sun.com/xml alphaworks.ibm.com: IBM tools alphaworks.ibm.com tools, xCentral search XML directory
14 Conclusions XML is emerging as the standard for data publishing and exchange Based on nested elements, references DTDs and XML Schema provide constraints on structure Later in this quarter: Querying, presenting XML Storing XML Integrating XML