An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002
Presentation Outline The Problems Research Objectives The Schema Extension: MMX MMX Query System Discussion Conclusion
The Problems Mapping XML data into relational tables Not natural to XML structure Efficient, but may not be a effective method Legacy application-specific structured data Similar modeling but proprietary implementation Not interoperable, and difficult to maintain Lack of modular design and thus difficult to combine to form more complex data structure Meta-data can facilitate wide range of needs, while XML Schema is solely used for physical data validation nowadays
Research Objectives To facilitate more effective searching and storing of XML contents by making use of meta-data (XML Schema) Propose a data-oriented model to allow different storage mechanism, processing model, and query model on XML contents
Our Approach – MMX Use meta-data to map XML data into structured data objects Define the structured data models “conceptually” and link the models to XML document structure “syntactically” Meta-data is defined as an extension of XML Schema The extension is called MMX (Multi Model XML)
Program Driven vs. Data Driven Raw Data Data with Program Codes Data with Modeling Information Structured Data (XML) Program Driven Data Driven Information for processing is hard-coded in program Processing instruction is hard-coded in data?! MMX!
A Glance of XML Data
A Glance of The Linked Schema
Schema Extension The extended schema is associated with a namespace The extended schema goes within a schema element, like in the example specify a single structure object instance Name association for elements and attributes Class hierarchies: -> -> finally to the structure specified in Additional properties in, and Schema writer has to know the structure model specification, while the XML writer only needs to know the given schema
Modeling For an instance of “MMX data object” As an encapsulated information object only accessible from the root, thus as a “single tree node” As a mapping from root node, query method and query parameters to the value at leaf nodes Leaf nodes may contain any valid XML content, as long as defined in the Schema I.e. may contain another “MMX data object” A query is modeled as a 3-dimension tuple: [accessing-node, query-method, query-parameters] Accessing-node is specified by XPath Query-method is specified in String Value Query-parameters is multi-dimension depends on the current model
Modeling (2) Tree (1) Tree(2) XML Elements.. A B Tree(1) is accessible from point A, occasionally, a query (e.g. [A, “spatial-search”,(3, 5)], assuming Tree(1) will accept spatial-search with two coordinates) may return point B as answer, either by XPath of B or the XML subtree of B. From this point B, user may drill down the tree by issue another query on Tree(2).
Query with and without MMX From the original XML data, we could not assume the semantics of the data: We can ONLY do XML-based query such as XPath We can do the spatial query ONLY IF we can map the data into a R-Tree After mapping the data into R-Tree Spatial Queries Give me the point at (2,7) Give me the point nearest to (4,4) Nearest Neighbor Search Give me the point nearest to “Franklin”
Processing Users might not know the “type” of the node (and not necessary to know). They are interested in what they can do Users retrieved the list of possible operation by issuing a LIST-OPERATION method to the root element of a MMX object Possible operations may include queries, updates, and other model-specific operations
MMX Query System To show that the schema, modeling, and processing of MMX extension is workable To illustrate how it assists in querying XML data To facilitate as the platform for testing the implementation of arbitrary structured models Implement with JDK1.4
VP-Tree X-Tree System Design XML Schema R-Tree … DOM MMX Document Parse Schema Fetch Classes MMX Element Node Data Abstract MMX Element Extends class The Abstract Class defines common interface that have to be implement in each MMX Element such as LIST-OPERATION, QUERY, BUILD, etc. R-Tree Schema Maps (Partly) Defines Clients
Discussions - Pros Compatible with the relational approach, and supersedes that. Modular design promotes reusability and maintainability XML “flatten” the legacy structured data to make them text-editable, easy to transport and process by different systems
Discussion - Cons There is no generic syntax to precisely describe all kinds of structures models The size of XML file is often larger than legacy data file Each structure model needs additional implementation effort Schema specification become longer and longer quickly as number of supported model increases
Conclusion Propose a representation to encapsulate data structures Describe XML data with the Schema conceptually as well as syntactically Map legacy structure models into Schema, and map XML data to the structure models by the Schema Structured data repository with increased interoperability, reusability, and transportability
Q&A