XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center

XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center carey@almaden.ibm.com

Plan for Today’s Talk n Thoughts on DB and web technologies –The web and web “querying” –Semistructured databases –Object-relational databases –XML and databases n XML/DB research at IBM Almaden –The XPERANTO project Motivation and approach Whirlwind tour of the system

The Web is Great at Supporting URL-Based Sharing n Ex: Online conference proceedings n Web browsers have given us –Universal file access (ftp++) –Universal document access (html) –Universal service access (forms) n What more could we navigational couch potatoes possibly want? –Universal platform for e-shopping!

The Web is Lousy at Supporting Parametric Searches n Ex: Find all the used Musicman Sterling bass guitars currently available for under $750 within a 50-mile radius of my San Jose home n This is hard for a number of reasons –Data buried in web pages, news groups, classified ads, store sites, auction sites, … –No schema (no metal fish, please!) –No data types (miles, US$, instruments) –No regularity within/across (good!) sites

Aren’t We Supposed to be the Experts on Data Management? n The DB community brought the world –Data models, schemas, and views –Query languages, optimizers, fast joins –Scalable parallel servers –Federated database systems n What do we have in our bag of tricks? –Semistructured databases –Object-relational database systems

Is Semistructured Database Technology the Answer? n Database characteristics –Collections of [name, value] pairs or maybe [name, type, value] triples –Collections typically set or list n System characteristics –“Typeloose” query languages –Indexes for nested, typeloose structures –Appropriate query processing techniques

Are Semistructured Databases the Answer? (2) n No, because schemas are critical for –Data readers What info is in a given collection? Thus, what queries might make sense? –Data writers What should I call this piece of info? Is it okay to put this kind of data here? –Efficient/effective query processors Indexing, statistics,... (e.g., range queries) Integration mappings (e.g., unit conversions)

Are Semistructured Databases the Answer? (3) n It has some nice features, though –Flexible, dynamic schemas Forgiving w.r.t. variations and exceptions Schema evolution is not a big deal –Richer data modeling (vs. relational) Nested structures, ordered collections –More powerful query languages Blurring of schema and data querying Ordering, nesting, restructuring handled

Is Object-Relational Database Technology the Answer? n Database characteristics –Base types, user-defined structured types, inheritance, reference types, collections –Collections are well-typed n System characteristics –Extended SQL-based query languages –Support for methods (fenced/unfenced) –Also triggers, LOBs, extensible indexes

Are Object-Relational Databases the Answer? (2) n No, because most O-R DBMSs have –Overly rigid schemas Every instance is of one (known) type Evolving a type can be a major burden Distributed type management is hard –Crufty old storage managers Ragged or sparse records poorly supported –Insufficient power in extended SQL Prehistoric assumptions get in the way Weak on restructuring, schema-querying

Yes!! Is XML the Answer? (Yes!!...What Was the Question Again?) n Structured documents (for the web) Tables Are The Answer Chris Date Saratoga CA

Is XML the Answer? (2) n W3C’s XML Schema working group –Typed elements, attributes, documents –Simple types and complex types –Derived types (extension, restriction) –Facets, anonymous types, groups, … –Uniqueness, keys and key references n W3C’s XML Query working group –XML-QL, Xpath, XQL, XSL/T, XSQL, … –Recommendation due in late 2000 (?)

Is XML the Answer? (3) n XML Schema might help because –XML has achieved a huge mindshare for data interchange on the web –DTD standardization is happening for documents within vertical industries, and XML Schemas should take over –When finished, XML Schema should be a widely used schema description tool Similar to O-R schemas, but with more flexibility (and web-based sex appeal)

Some Useful XML+DB Topics n Publish documents with XML Schemas from O-R databases –B2B e-commerce messages –B2C comparison shopping (if permitted!) –Robust O-R DB-resident web sites with XML for page content generation n Use XML Schema as the central data model for data integration middleware –I.e., web information integration

Useful XML+DB Topics (2) n Build a “native” XML Repository on top of an O-R DBMS –Map from XML Schema model to O-R DBMS modeling constructs –Map from XML queries to O-R queries (including tag variables and loose typing) –Thereby provide XML document storage management with industrial-strength robustness, scalability, and performance

Useful XML+DB Topics (3) n Evolve XML-QL into a complete web data manipulation language –Typing a la XML Schema –Ordered/unordered collections –XPath-inspired expressions –Easier grouping and aggregation –Updates (insert/delete, modify) –Etc.

The XPERANTO Project n Middleware for publishing O-R (or plain relational) DB content on the web –Provides a virtual XML document view –Based on a “pure XML” approach –Using XML-QL (as W3C placeholder) n Born at Almaden in summer of 1999 –Mike Carey, Dana Florescu, Zack Ives, Ying Lu, Jai Shanmugasundaram, Beau Shekita, Subbu Subramanian

The XPERANTO Belief System n Databases contain, and will continue to contain, the world’s “data jewels” –Transactional data (RDBMS) –Important multimedia assets (ORDBMS) n XML application developers of the future may not love SQL like we do –View databases as default XML documents –Let them define appropriate (query-able) views of these XML documents

XPERANTO Architecture Views XML Schema O-R Database SQL Query Processor Stored Tables System Catalog Metadata Services View Services Type & Table Services Query Translation XQGM XML-QL Parser XQGM Query Rewrite SQL Translation XML Schema Generator Catalog Info XML Tagger Data Tuples Table & Type Info SQL Queries

XPERANTO Components n XML-QL Parser –Neutral query representation (XQGM) n Query Rewrite –View composition and other rewrites n SQL Translation –Produce SQL query(s) to get the required data from the underlying DBMS n XML Tagger –Tag and structure the tabular results

XPERANTO Components n View Services –Repository for XML view definitions n Type & Table Services –Interface (and cache) for DB catalog info n XML Schema Generator –Give DB catalog info in XML Schema form for default views –Infer XML Schema info for queries and non-default view definitions

Consider a Simple O-R Schema Create Table book AS (bookID CHAR(30), name VARCHAR(255), publisher VARCHAR(30)) Create Table publisher AS (name VARCHAR(30), address VARCHAR(255)) Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30)) Create Table author OF author_type (REF IS ssn USER GENERATED)

Part of the Default XML View.

XPERANTO’s Default Views n XPERANTO generates default O-R to XML Schema mappings –Each DB shown as an XML file –Subtyping handled via XML Schema’s refinement facilities –OIDs and references become ids/idrefs n “Don’t use this at home!” –Application developers are expected to define the real view(s) using XML-QL

Creating a Better XML View WHERE $bid $name $bpub IN “db2:xml:books/library”, $bpub = “Kluwer” CONSTRUCT $bname {WHERE $bpub $addr IN “db2:xml:books/library” CONSTRUCT $addr } {WHERE $bid $fname $lname IN “db2:xml:books/library” CONSTRUCT }.

XPERANTO Query Rewrite n XML-QL queries first translated into XQGM representation –Neutral, well-poised for more features –Easier to go from XML-QL to SQL –Borrow rewrites from DB2 UDB engine n XQGM is an extension of DB2’s QGM –XML data type for “columns” –Set of XML-specific functions

SQL Generation and XML Document Tagging/Structuring n Sorted Outer Union queries are used to obtain the data –Fetch the data in one query that brings it back in the appropriate order –Tag and nest it to create XML document n Advantages of this approach –Shown to be stable as well as fast –Simple (linear-space) tagging possible Just watch for nesting-related changes

Outer Union Query Example WITH OuterUnion (type, bookID, bookName, pubName, pubAddr, authFirst, authLast) AS ( SELECT ‘0’, b.bookID, b.name, NULL, NULL, NULL, NULL FROM book b WHERE b.publisher = “Kluwer” UNION ALL SELECT ‘1’, b.bookID, NULL, p.name, p.address, NULL, NULL FROM book b, publisher p WHERE b.publisher = “Kluwer” and b.publisher = p.name UNION ALL SELECT ‘2’, b.bookID, NULL, NULL, NULL, a.first, a.last FROM book b, author a WHERE b.publisher = “Kluwer” and b.bookID = a.bookID ) SELECT * FROM OuterUnion ORDER BY bookID

XPERANTO Project Summary n Goal is to publish O-R data in XML form –Default XML views –XML-QL for defining useful views –“Look Ma, no SQL!” n Currently (re)building our prototype –View composition is our first stop –Updates in addition to queries –Queries over both data and metadata –Other needs for XML web sites...?

A Few Closing Remarks n DB community must ensure that the web will support real queries…! –XML Schema and XML Query standards need ongoing input from DB researchers –Large-scale technologies needed for XML indexing, caching, querying, etc. n DB community should also work on important underlying technologies –Publishing XML both from and to RDBMSs and ORDBMSs, for example!

XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center

Similar presentations

Presentation on theme: "XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center

Similar presentations

Presentation on theme: "XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center"— Presentation transcript:

Similar presentations

About project

Feedback