XML: Extensible Markup Language FST-UMAC Gong Zhiguo
Gong Z.G.2 How the Web is Today HTML documents all intended for human consumption many generated automatically by applications Easy to fetch any Web page, from any server, any platform
Gong Z.G.3 Limits of the Web Today Application cannot consume HTML HTML wrapper technology is brittle –screen scraping OO technology (Corba) requires controlled environment Companies merge, form partnerships; need interoperability fast
Gong Z.G.4 Paradigm Shift on the Web new Web standard XML: –XML generated by applications –XML consumed by applications data exchange –across platforms: enterprise interoperability –across enterprises Web: from collection of documents to data and documents
Gong Z.G.5 XML a W3C standard to complement HTML origins: structured text SGML motivation: –HTML describes presentation –XML describes content (2/98)
Gong Z.G.6 From HTML to XML HTML describes the presentation
Gong Z.G.7 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999
Gong Z.G.8 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 …
Gong Z.G.9 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags
Gong Z.G.10 More XML: Attributes Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data
Gong Z.G.11 Query Languages: Motivation granularity of the HTML Web: one file granularity of Web data varies: –single data item: “get John’s salary” –entire database: “get all salaries” –aggregates: “get average salary” need query language to define granularity
Gong Z.G.12 XML-QL: A Query Language for XML (8/98) features: –regular path expressions –patterns, templates –Skolem Functions based on OEM data model
Gong Z.G.13 Pattern Matching in XML-QL where Morgan Kaufmann $a in “ construct $a where Morgan Kaufmann $a in “ construct $a
Gong Z.G.14 Simple Constructors in XML-QL Note: abbreviates or or... where $a in “ construct $a $l where $a in “ construct $a $l Smith English Smith Mandarin Doe English
Gong Z.G.15 Schemas in XML Document Type Definition (DTD) XML Schema RDF Schema
Gong Z.G.16 Document Type Definition: DTD part of the original XML specification an XML document may have a DTD terminology for XML: –well-formed: if tags are correctly closed –valid: if it has a DTD and conforms to it validation is useful in data exchange
Gong Z.G.17 DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> …
Gong Z.G.18 DTDs as Schemas Not so well suited: impose unwanted constraints on order references cannot be constrained can be too vague:
Gong Z.G.19 XML Storage text file (XML) store in ternary relation use DTD to derive schema mine data to derive schema build special purpose repository (Lore)
Gong Z.G.20 XML Storage: Text File advantages –simple –less space than one thinks –reasonable clustering disadvantage –no updates –require special purpose query processor
Gong Z.G.21 &o1 &o3 &o2 &o4&o5 paper title author year &o6 “…” “1986” Store XML in Ternary Relation [Florescu, Kossman 1999] Ref Val
Gong Z.G.22 Use DTD to derive Schema DTD: ODMG classes: [Christophides et al. 1994, Shanmugasundaram et al. 1999] class Employee public type tuple (name:string, address:Address, project:List(Project)) class Address public type tuple (street:string, …)
Gong Z.G.23 Mine Data to Derive Schema paper author title year fn ln Paper1 Paper2 [Deutsch et al. 1999]
Gong Z.G.24 XML and Databases (1) “Is XML a database?” In a strict sense, no. In a more liberal sense, yes, but … –XML has: Storage (the XML document) A schema (DTD) Query languages (XQL, XML-QL, …) Programming interfaces (SAX, DOM) –XML lacks: Efficient storage, indexes, security, transactions, multi- user access, triggers, queries across multiple documents
Gong Z.G.25 XML and Databases (2) Data versus Documents –There are two ways to use XML in a database environment: Use XML as a data transport, i.e., to get data in and out of the database –Data is stored in a relational or object-oriented database –Middleware converts between the database and XML Use a “native XML” database, i.e., store data in document form –Use a content management system
Gong Z.G.26 XML and Databases (3) Data-centric documents –Fairly regular structure –Fine-grained data –Little or no mixed content –Order of sibling elements often not significant Document-centric documents –Irregular structure –Larger-grained data –Lots of mixed content –Order of sibling elements is significant
Gong Z.G.27 XML and Databases (4) Data-centric storage and retrieval systems –Use a database Add middleware to convert to/from XML –Use an XML server (specialized product for e- commerce) –Use an XML-enabled web server with a database backend Document-centric storage and retrieval systems –Content management system –Persistent DOM implementation
Gong Z.G.28 XML and Databases (5) Mapping document structure to database structure –Template-driven No predefined mapping Embedded commands process (retrieve) data Currently only available from RDBMS to XML The following flights have available seats: SELECT Airline, FltNumber, Depart, Arrive FROM Flights We hope one of these meets your needs
Gong Z.G.29 XML and Databases (6) –Template-driven - Example result: The following flights have available seats: ACME 123 Dec 12, 2000, 13:43 Dec 13, 2000, 01:21 We hope one of these meets your needs
Gong Z.G.30 XML and Databases (7) Mapping document structure to database structure –Model-driven A data model is imposed on the structure of the XML document This model is mapped to the structures in the database There are two common models: –Model the XML document as a single table or a set of tables –Model the XML document as a tree of data-specific objects (good for OODBMS mapping)
Gong Z.G.31 XML and Databases (8) –Single table or set of tables: –Tree organization: Orders | SalesOrder / | \ Customer Item Item | | Part Part
Gong Z.G.32 XML and Databases (9) Generating DTDs from a database schema and vice versa –Many times the DTD does not change often for an application and does not need to be automatically generated. –Some simple conversions are possible Example: DTD from relational schema: ¬ For each table, create an ELEMENT. For each column in a table, create an attribute or a PCDATA-only child ELEMENT. ® For each primary key/foreign key relationship in which a column of the table contributes the primary key, create a child ELEMENT.
Gong Z.G.33 XML and Databases (10) Document-centric storage and retrieval systems –Content management system Allows the storage of discrete content fragments, such as examples, procedures, chapters, as well as metadata such as author names, revision dates, etc. Many content management systems are built on top of relational or object-oriented database systems. Examples: –BladeRunner (Interleaf), SigmaLink (STEP), Parlance Content Manager (XyEnterprise), Target 2000 (Progressive Information Technology) –Persistent DOM implementation
Gong Z.G.34 Further Readings www. w3.org/XML www-db.stanford.edu/~widom www-rocq.inria.fr/~abiteboul db.cis.upenn.edu Abiteboul, Buneman, Suciu Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999 (appears in October)