The Structured-Element Object Model for XML

Slides:

Advertisements

Similar presentations

XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.

Advertisements

Advanced XSLT. Branching in XSLT XSLT is functional programming –The program evaluates a function –The function transforms one structure into another.

XML: Extensible Markup Language

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.

Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.

The Structured-Element Object Model for XML Committee Members Prof. Y.S. Moon(Chairman) Prof. Irwin King Prof. Michael Lyu(Supervisor) Oral Defense for.

UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

C++ fundamentals.

1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.

Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.

XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.

DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.

Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.

An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.

REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.

XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.

XML and its applications: 4. Processing XML using PHP.

WORKING WITH XSLT AND XPATH

1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,

1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.

Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.

RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah

WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.

Copyright 2003 Scott/Jones Publishing Standard Version of Starting Out with C++, 4th Edition Chapter 13 Introduction to Classes.

Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.

XML and Database.

Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –

Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.

Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.

 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:

XML Schema – XSLT Week 8 Web site:

Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.

Unit 4 Representing Web Data: XML

Visit for more Learning Resources

Querying and Transforming XML Data

Object-Oriented Analysis and Design

SOFTWARE DESIGN AND ARCHITECTURE

System Design Ashima Wadhwa.

The Object-Oriented Thought Process Chapter 1

XML QUESTIONS AND ANSWERS

Software Design and Architecture

XML in Web Technologies

Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004

PHP / MySQL Introduction

Database Processing with XML

Relational Algebra Chapter 4, Part A

Web Data Extraction Based on Partial Tree Alignment

Top Reasons to Choose Angular. Angular is well known for developing robust and adaptable Single Page Applications (SPA). The Application structure is.

Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.

Computer Programming.

Chapter 7 Representing Web Data: XML

Introducing HTML & XHTML:

Chapter 27 WWW and HTTP.

File Systems and Databases

Lecture 1: Multi-tier Architecture Overview

ISC321 Database Systems I Chapter 10: Object and Object-Relational Databases: Concepts, Models, Languages, and Standards Spring 2015 Dr. Abdullah Almutairi.

Relational Algebra Chapter 4, Sections 4.1 – 4.2

MANAGING DATA RESOURCES

Presented by: Jacky Ma Date: 11 Dec 2001

Introduction to Data Structure

Object-Oriented PHP (1)

XML and its applications: 4. Processing XML using PHP

A Semantic Peer-to-Peer Overlay for Web Services Discovery

Presentation transcript:

The Structured-Element Object Model for XML Oral Defense for the degree of Master of Philosophy Presented by: Ma Chak Kei, Jacky welcome guests Introduce myself and thesis title Committee Members Prof. Y.S. Moon(Chairman) Prof. Irwin King Prof. Michael Lyu(Supervisor)

The Problem XML flexibly represents semi-structured data, however, it lacks the concepts of data encapsulation and object methods. Programmers have to identify each pieces of data to do the corresponding processing. There is a modeling gap between XML representation and OO programming representation. As the thesis title suggested, we are working with modeling problem on XML Now, at the very beginning of my presentation, let us firstly have a glance on what are the problems that we are talking and we are trying to solve As stated on the slide, XML can represent semi-structured data in a flexibly way, however it lacks the concepts of data encapsulation and object methods. This increases the burden of application programmers that they have to identify each pieces of data from a large XML document and retrieve the data from a generic tree structure, before they can take any actions on the data. In short, there is a modeling gap between XML data representation and the OO programming representation

Motivation XML data processing is important due to the increasing uses of XML in Internet applications and databases applications OO programming languages are widely used in developing for Internet applications Inadequate research on using XML data in OO programs XML Schema is popular in defining XML meta-data, but the current practice is limited to physical data validation Our objective is to construct a data model that better facilitate XML usage in OO programming, which supports data encapsulation and object methods. The model makes use of schema to define objects, and includes mechanisms for parsing and querying these objects After seeing what’s the problem we are working with, I would also like to state the motivation of our research, and our major contributions on next slide Here are several motivations: First is that XML data processing becomes more and more important in Internet applications and database applications development This is closely related to the second point, that is OO programming favors software reusability and modular design, and thus widely adopted in Internet applications development However, it seems that there is inadequate researches on using XML data in OO programming What’s more, is that XML schema is often under-utilized. While it may provide rich information on the XML data, it is currently used only in data validation. To put it into one, our objective is to construct a data model that better facilitate XML usage in OO programming, which supports data encapsulation and object methods. The model will make use of schema to define objects. It also includes mechanisms for parsing and querying these data objects.

Contribution We propose the Structured-Element Object Model (SEOM) for handling XML data. The Model is extensible and flexible for using XML data in OO programming We propose the SEOM Schema for mapping XML Element data into SEOM class objects. The schema is generic to allow flexible representation of Element objects in XML data We propose a query wrapper technique for querying Structured-Element objects, which wraps the information of query details in an XML message We extend the XPath function to perform queries on Structured-Element objects We implement a Web-based SEOM Document Query System to demonstrate the feasibility of the model Before going into the details of our research, I would like to put our contributions in the front, such that you may easier to understand our research There are five major contributions: We propose the Structured-Element Object Model (SEOM) for handling XML data. The Model is extensible and flexible for using XML data in OO programming We propose the SEOM Schema for mapping XML Element data into SEOM class objects. The schema is generic to allow flexible representation of Element objects in XML data We propose a query wrapper technique for querying Structured-Element objects, which wraps the information of query details in an XML message We extend the XPath function to perform queries on Structured-Element objects, and finally, We implement a Web-based SEOM Document Query System to demonstrate the feasibility of the model

Presentation Outline Overview of XML and XML Data Modeling Examples on XML Data Modeling The Structured-Element Object Model Data Modeling Schema Modeling Classes Architecture Parsing and Querying Web-Based SEOM Document Query System Evaluation & Conclusion After the brief introduction on our research, here I would like to give a presentation outline for the later parts that covers the details of the research In the following presentation, we will first have an overview on XML and XML data modeling. We will spend some times on different XML data modeling to show the importance of that Then we will go through the SEOM model, includes its data modeling, schema modeling, classes architecture, and parsing and querying processes After that I will introduce a web-base SEOM document query system, which is a prototype system to show the feasibility of the our model Finally, we will have some evaluation and conclusion for the research, and that ends our presentation

Overview of XML and XML Data Modeling Now, let us begins an overview of XML and XML data modeling

Overview of XML XML is a markup language for describing data. Its specification describes the XML data format and grammar. It can flexibly represent semi-structured data The family of XML technologies offers rich supporting functionalities, e.g.: XPath, Schema, DOM, namespaces, etc. XML provides a data exchange and representation standard XML favors cross-platform development of Internet applications For XML, it is a markup language for describing data. Its specification describes the XML data format and grammar. It can flexibly represent semi-structured data There is a family of XML technologies that offers rich supporting features. The family includes XPath, XML Schema, DOM, namespaces and etc. With its open standard and good supporting technologies. XML becomes a data exchange and representation standard And is widely adopted in cross-platform development of Internet applications

An XML Example Here you may see an XML example. We won’t spend too long on this, but having a glance would make it more clear. You may see that an XML document is indeed a text document. The document embeds some tags, which makes the document into segments The tags may include text contents, attributes, Or other tags in a nested manner The overall effect is to create an hierarchically structured document.

Overview of XML Modeling A data model describes the structure, function, and constraints of the data; it affects how the data are manipulated in programs and how they are queried Two basic XML modeling: Relational Model Document Object Model Legacy application-specific data structures E.g. various indexing trees Similar modeling but proprietary implementation Not interoperable, and difficult to maintain Non-modular design and thus difficult to combine into more complex data structure For XML modeling, first we have to know what is data model A data model describes the structure, function, and constraints of the data; it affects how the data are manipulated in programs and how they are queried Based solely on the XML tags and contents, there are two common XML modeling, that is the relational model and the document object model On the other hand, in application-specific domain, XML may be used to represent any proprietary data structures, such as various indexing trees They may have similar modeling, but for a proprietary implementation, they are not interoperable and difficult to maintain, And they do not possess a modular design and therefore difficult to combine into a more complex data structure

XML Modeling – Example <key N="7" S="3" E="16" W="1"/> <node id="1"> <key N="7" S="3" E="6" W="1"/> <key N="7" S="5" E="2" W="1"/> <data x="1" y="5">itemA</data> <data x="2" y="7">itemB</data> </node> <node id="2"> <key N="4" S="3" E="6" W="4"/> <data x="4" y="3">itemA</data> <data x="6" y="4">itemC</data> <key N="6" S="4" E="16" W="8"/> <key N="5" S="4" E="8" W="9"/> <data x="8" y="4">itemG</data> <data x="9" y="5">itemD</data> <key N="6" S="5" E="11" W="16"/> <data x="11" y="5">itemF</data> <data x="16" y="6">itemE</data> To see how the modeling affects data queries, let’s first see an example Here is an XML document, it consists of mainly three types of elements, namely “key”, “node”, and “data” On the right hand side is a visualization of the document, and the three elements are colored into red, green, and blue balls respectively For each element, there are some attributes And for the data element, it also has a string value as its content

Example – Relational Model Under the relational model, data are put into a table, regardless the difference in its role. Information are then retrieved using the “relation” of fields with SQL statements. Structural information in XML data is lost id1 N1 S1 E1 W1 id0 N0 S0 E0 W0 x y data 1 7 3 6 5 2 itemA itemB 4 itemC 16 8 9 itemD itemG 11 itemE In data management, the relational model is well established However, it is not natural to XML structure and Some features in XML is not addressed properly in a relational database Under the relational model, data are put into a table, regardless the difference in its role Information are then retrieved using the relation of fields In many cases, the structural information in the original document is lost. For those which can store the structural information, however, the number of tables usually grows into a large number and the complexity increase dramatically

Example – Document Object Model DOM maintains the structure of XML data: Retrieve the node “data” containing attribute “x=8” //data[@x=‘8’] Can retrieve parent-node, sibling-node, etc. /node[1]/node[1]/key/following-sibling::* It is based on a generic tree structure, which does not require any assumption on the data Since it does not assume any knowledge on the data, all data are treated equally and little can be done on optimizing the manipulations On the other hand, the Document Object Model is designed for XML and it can maintains the structure of XML document In DOM, it addresses the XML data types, for example, you may retrieve the element node “data” which has an attribute x equals to 8; or you may navigate the document tree up, down, or horizontally It is advantageous that it is based on the generic XML tree structure and does not require any assumption on the data, however, this means that it can only treat all the data equally, and therefore little can be done on tailoring solution for different data

Example – Specific data structure, R-Tree For the same piece of XML data, if we know that represents an R-Tree structure, we can build a corresponding indexing structure in memory, and define meaningful methods on it: Spatial Queries Give me the point at (2,7) Give me the point nearest to (4,4) Give me the points bounded by (2,2) to (4,4) Nearest Neighbor Search Give me the point nearest to “itemB” For the last modeling of the same example, assume that we know the data represents an R-Tree. We know how are the attributes representing various parameters of the R-Tree, then we can build a corresponding indexing structure in memory and define meaningful methods on it For an R-Tree, there are spatial queries such as Querying a data on a exact position Or querying the data point in the proximity of a position Or search an given area for data points Or even search for the nearest neighbor of any given data point Some of these operations are not possible to be issued in the previous models directly, and some of the operations may take a much longer time to complete Only by knowing the semantics of data or its intrinsic orders, we can use a appropriate method to search data effectively

Comparison From the original XML data, we could not assume the semantics of the data: We can do XML-based queries as in XPath Or we can do queries based on the relationships in the tags as in the relational model From a model-based approach, By using meta-data, we can define the model of a piece of XML data We can define non-generic methods on data for a known model, such as the spatial queries in R-Tree model Beside legacy data structure (like R-tree), there are also business data objects that may have its own data representation and manipulating/querying methods As a little comparison for the three modeling in the example We may see that, as we could not assume the semantics of data from the original XML data, thus we can only performs queries based on relationship in the tags, such as in XPath or in relational model On the other hand, by using meta-data, we can define the model of a piece of XML data. Thus we can define non-generic methods on the data, such as the spatial queries in the R-Tree model. Moreover, besides these legacy data structures, although we won’t give an example here, but there are also business data objects that may need proprietary data representation and accessing methods, and our model will apply to them too.

The Structured-Element Object Model After an overview on XML and XML modeling, now we are going to see how we define a model that can adapt to various models easily, and thus improve the efficiency in querying the XML data

SEOM – General Concepts Data Representation Physical Data Representation – how the data are stored as files Human-friendly: tables, hierarchical relationship Machine-friendly: indexing trees Logical Data Representation – how the data are represented as data objects E.g. a “tree object”, a business logic object, etc. Data Binding – the process of translating physical data representation to logical data representation Data Access – retrieve a particular record from a data object, e.g., search for a data point from a “search tree” object To begin with, I will briefly talk about some general concepts used. First is data representation, when we say physical data representation, it means how the data are stored as files, here we mean how are the XML document structured. It may have a human-friendly structure like tables, or hierarchical relationship that is easy to understand by a human reader. Or it may store the relational tables as indexing trees, which is not to be understandable by human readers, but can support bulk loading on the machine-side For logical data representation, it means how the data are represented as data objects and presented to the programmer, examples are tree objects, or business logic objects, etc. Second is data binding, this is the process of translating physical data representation to a logical data representation And lastly, data access is to retrieve a data record from a data object, in SEOM, it usually means searching for a data object within an Element object.

SEOM – Modeling Simple XML Data Model Document Object Model Document, Element, Attribute, Character Data Document Object Model Including Node, NodeList, AttributeSet for better management SEOM Data Model Including an additional SElement type Now, I will introduce the SEOM modeling. From the XML specification, it defines a simple XML data model, that is the data types available in XML, namely, document, element, attribute, and character data. The Document object model is built upon the simple XML data model, it adds new data types such as Node, nodelist, and attributeSet for better data management. The SEOM data model is based on the Document Object Model, we add an additional SElement type to the model to achieve our objectives.

SEOM – SElement Structured-Element (SElement) is an extension of DOM Element An data object encapsulates private information XML representation is defined by schema, including the internal branching and the child nodes Act as a mapping from data object root to leaf, with query method and query parameters as the selection criteria A query is modeled as a 3-dimension tuple: [node, method, parameters] node is specified by XPath method is specified by a string value parameters are specified in a multi-dimensional tuple, which varies for different methods Therefore, the most important concept in SEOM is the SElement. SElement is the short form of Structured-Element. It is an extension of the DOM Element type. It differs from DOM Element mainly in that it encapsulates an internal data structure that carries private information It is not mapped to a single XML Element, instead, its XML representation is defined by a schema; the schema defines the XML branching and the XML nodes to be converted into a single SElement In essence, it acts as a mapping from the data object’s root to the leaf, with querying methods and parameters as the selection criteria. A more formal representation for query is to model it as a 3-dimension tuple, Includes the node, the method, and the parameters. Here, the node is specified in XPath representation The method is an string identifier And the parameters are specified as a multi-dimensional tuple that varies for different methods.

SEOM – SElement XML - DOM Structure SEOM Structure The figure illustrates the relation between an XML document and an SElement. The left hand-side shows a structure of XML document tree or a DOM structure, which are identical. Here, we define the small blue node as the root of an SElement And the small red nodes as the child nodes When we construct an SElement, the branches and nodes between the root and leaf will be converted into internal data structure of the SElement and is invisible from outside. The tree may considered to be shortened, and all the leaf nodes are now at the same level and directly under a single SElement. SElement Root of data object in XML leaf of data object

SEOM – SElement Major methods (in addition to DOM Element’s method): getTypeName() – get the type name getSchema() – get the schema document queryMethods() – query for available query methods query() – submit query to the SElement path() – submit an XPath query to the SElement In addition to methods defined under DOM Element, the SElement has some additional interfaces. getTypeName will get the name of the model getSchema will return the schema element of the current SElement queryMethods will retrieve the list of available query methods For query, it submits a query to the SElement And path is responsible for submitting an XPath query

SEOM – Schema Provides meta data for describing the grammar of XML document to match a target model Defines: the range of XML segments to be transformed from original DOM tree to SEOM SElement objects the internal branching structure, e.g. number of branches, ordering, etc. the data types of leaf nodes the mapping from XML element values and attribute values to required parameters of the target model The extended schema is associated with a namespace with prefix “seom” As we had mentioned, the SElement is defined by a schema. We use schema provide meta data for describing the grammar of XML document such that the XML document can match a target model It defines the range of XML segments to be transformed to an SElement, that is the tag names of SElement root, internal nodes, and leaf nodes It also defines how the internal nodes branched, such as the number of branches and the ordering, etc. It defines what kind of data is allowed at the leaf nodes, and also defines the mapping from XML element values and attribute values to the parameter required by the target model In the XML schema file, we add the namespace prefix “seom” to tags that belongs to SEOM such that it won’t mixed up with the W3C XML schema.

SEOM - Schema Major schema elements for SElement: seom:selement – encapsulate an SElement definition seom:rootNode – defines the root of the SElement seom:internalNode – defines internal nodes of the SElement seom:leafNode – defines leaf nodes of the SElement seom:attribute – defines the attributes in root node, internal nodes and leaf nodes; may specifies model parameters by values or by referencing XML attribute values seom:value – defines the structure under a root node, internal nodes and leaf nodes; may use XML schema elements to refine the constraints Here are the major schema elements that defines an SElement selement – encapsulate an SElement definition rootNode, internalNode, leafNode– defines the XML elements to be mapped into SElement attribute – defines the attributes in the nodes; and may specifies parameters for the model by giving values directly or by referencing an attribute in the XML data value – defines the structure and content under a node; it may use XML schema elements to refine the constraints

A Glance of XML Data Here I show a piece of simple XML data to demonstrate the idea There are “map” element, “node” and “data” elements, “country” and “city” elements. Indeed, each “map” element corresponds to a r-tree data object, node and data forms the internal data structure of the data object, and country and city are the user data Here, the outer one is a world map, while one of the leaf node of the world map is US, and US contains another local map for itself.

A Glance of The Linked Schema For a schema that links to the previous XML data, we may see that it defines an R-tree model class. The rootNode of the SElement is “Map”, the internal node is “node”, and the leafNode is “data” For the rootNode, we specify the parameters branch factor and leaf number by giving a value. And for internal nodes and data nodes, the attribute element makes a linkage from the model’s parameter to an attribute in the XML document In the “value” element, you may see that a root node or internal node may contains either an internal node or a leaf node And a leafNode may contains any valid XML constructs.

SEOM – Implementation Issues A family of Java classes materialize the models. Instances of data objects are built from the classes with data from XML To construct an SEOM Document instance, it involves five types of classes: Classes inherited from DOM SEOM Document class Abstract SElement class Generic SElement class Implement SElement classes Document processing Parsing Query The Structured-Element Object Model is implemented by a family of Java classes. Data objects are instances of these Java classes with data from the XML document. There are five types of classes, includes Classes inherited from DOM SEOM Document class Abstract SElement class Generic SElement class Implement SElement classes We will address these classes one by one in the following slides After that, we may also look into the parsing and querying processes

SEOM – Classes Classes inherited from DOM SEOM Document class Nodes, Elements, Attributes, etc. Form the basic backbone of a DOM tree SEOM Document class Corresponds to an XML document with additional interface for the SEOM-extended features Constructor – take a DOM document and an XML Schema as parameters. Matching DOM elements will be send to a SElement constructor Query – three query operations are implemented at this level: DOM() – retrieve all direct children of a target node Data() – retrieve the sub-tree of a target node in XML form query() – generic interface for accepting user queries Since SEOM is extended from DOM, it is obvious that the classes from DOM are used. For example, the Nodes, elements, attributes are widely used and they form the basic backbone of a DOM tree The SEOM Document class overrides the DOM Document class to include additional interfaces for the SEOM-extended features It has a constructor that takes a DOM document and an XML schema as parameters. DOM elements that has matching definition in the schema will be converted to SElements. The SEOM Document class also implements several query functions. Include a DOM function that retrieve all direct children of a target node, a Data function to retrieve the sub-tree of a target node in XML form, and a query function that act as a generic interface to accept user queries.

SEOM – Classes Abstract SElement Class Generic SElement Class is the abstract superclass for all SElement data types extends the DOM Element class to inherit its methods defines abstract methods query() and queryMethod() Generic SElement Class the only SElement class accessible to programmers can instantiate an SElement object wraps an implementation SElement class fetch the needed implementation class make the actual class transparent to the programmers handle exceptions in creating SElement Abstract SElement Class is the abstract superclass for all SElement data types It extends the DOM Element class to inherit its methods and define the abstract methods query() and queryMethod() For Generic SElement Class, it is the only SElement class accessible to programmers It can be used to instantiated an SElement object by wrapping an implementation SElement class It fetches the needed implementation class, handle exceptions in creating SElement Then it can make the actual implementation class transparent to the programmers

SEOM – Classes Implementation SElement Classes It is indeed a group of classes, each class corresponds to one specific model It has an internal data structure (instead of a DOM tree) to hold the data It implements the constructor method to load data from XML to its internal data structure It implements the query methods to fetch data from the internal data structure Implemented classes: An R-tree Class with exact search, range search, and k-nearest neighbor search A Table Class with limited “select-from” clauses For Implementation SElement Classes, It is indeed a group of classes, that each class corresponds to one specific model These classes use private data structure instead of a DOM tree to hold the data They implement the constructor method to load data from XML to its internal data structure And the query methods to fetch data from the internal data structure In our system, we had two implemented classes First is an R-tree Class with exact search, range search, and k-nearest neighbor search And second is a Table Class that can perform some “select-from” clauses

SEOM – SElement Classes This figure gives an overview on the SElement classes First is that the generic SElement class and Implementation SElement classes are subclasses of Abstract SElement Next is that when a new SElement is needed, the generic SElement constructor is called. That will instantiate a new generic SElement object, the object will fetch the required implementation class instantiate an SElement of that class as a member of itself. Finally the generic SElement object will be returned to the programmer.

Parsing After seeing the classes architecture, we will look into the parsing process. First, we need an XML Source as well as a SEOM Schema They are processed by a DOM parser to become Data Element and Schema Element respectively Both of them are passed into the constructor of SEOM Document The SEOM Document will analyze the data element and the schema element, Find elements that are declared to be SElement and send them to the generic SElement constructor The generic SElement constructor in turns invokes the implementation SElement and wraps the resulting SElement Finally, the returned generic SElement will replace the original Elements in the SEOM document

Query Two approaches: query wrapper and XPath Query Wrapper XPath Based on exchanging XML messages Suitable for interactive querying between client and server XPath Extended the W3C XPath with additional function Suitable for pointing and referencing nodes for direct use After parsing the SEOM Document, it will be ready for being queried. In Structured-Element Object Model, we have two query approaches, one is named query wrapper, and another is an extended version of XPath The query wrapper is based on exchanging XML messages and more suitable for interactive querying between client and server The XPath approach extends the W3C XPath with an additional function, and is more suitable for pointing and referencing nodes for direct uses

Query Wrapper Skeleton of query wrapper: Querying processes: <query path=“” queryMethod=“”></query> path specifies the target node using unique XPath expression queryMethod specifies the name of method to be called Querying processes: An query wrapper with empty queryMethod will retrieve the list of available query methods The query wrappers for each query types will be returned as a NodeList The user fill the parameters of a selected query wrapper and submit the query Individual results are wrapped in <result> Elements; all <result> Elements are grouped under a single <results> Element; the results may in form of: simple values (string, number) composite values (XML Data) child nodes of current SElement (wrapped in <node> element and specified in XPath) Query wrapper is an ordinary XML message, it’s skeleton is shown here; And “query” element, with two attributes, “path”, and “queryMethod”. Where path specifies the target node by using the unique XPath expression The queryMethod is used to specify the name of method to be called. The query process may begin by sending an query wrapper with empty “queryMethod” to the server. This will retrieve a list of available query methods in form of NodeList of query wrappers. The client then select one of the query wrappers, fill the required parameters, and submit the query. Results of query are wrapped in <result> Elements, and multiple <result> elements are grouped under a single <results> element Results of query may in form of simple values, composite values, or child nodes of the SElement

Query Wrapper Exact Range KNN An R-Tree SElement A Query Client <queries> <query path=”/rtree” queryMethod=”exact”> <x/> <y/> </query> <query path=”/rtree” queryMethod=”range”> <x1/> <x2/> <y1/> <y2/> <query path=”/rtree” queryMethod=”knn”> <point/> <k/> </queries> <results path="/rtree" queryMethod="exact"> <result> <node path="/rtree/data[1]"/> <node path="/rtree/data[3]"/> </results> <query path=”/rtree” queryMethod=”exact”> <x>3</x> <y>4</y> </query> <query target=“/rtree” queryMethod=“”> </query> Here we have an animation for the query process First, the client submit an query wrapper with empty queryMethod The R-Tree SElement receive the query wrapper, There are three types of queries implemented in this SElement, that is “exact”, “range”, and “knn”. Therefore it returns a list of available queries, each with its required parameters. The client select the “exact” query, fills the parameter “x=3” and “y=4”, and submit this query Finally the query is processed and the client is happy now An R-Tree SElement A Query Client

Extended XPath In XPath, there are functions for manipulating strings, numbers, and Booleans. We introduce a function to allow queries to be made to SElements. The basic query form is to specify a target SElement node, a method name, and a set of parameters in name-value pairs, e.g. query(“/document/selement”, “exact”, “x=3”, “y=4”) A more common use of XPath function is to select nodes with predicate The predicate is added as a filter to the context node, i.e., the leaf nodes of SElement /document/selement/data[query(“exact”, “x=3”, “y=4”)] The function itself results in a boolean value. It takes the context position implicitly and evaluates the query according to that In another approach, we extends the XPath to do querying. Function is a feature for XPath to manipulate strings, numbers, and booleans. We introduce a function to allow queries to be made to SElement The basic query form is to specify a target SElement node, a method name, and a set of parameters in name-value pairs A more common use of XPath function is to select nodes with predicate, that is to do a filtering to the context node. A difference rise here since we have to work on the leaf node of SElement instead of the SElement itself. The function will take the context position, call the parent SElement to evaluate the query and returns a boolean value. For a true value, the leaf node will be included in the result set, While for a false value, the leaf node will be excluded.

A Web-Based SEOM Document Query System We have go through the modeling, schema, classes and architecture of the Structured-Element Object Model. Now we will see a web-based SEOM document query system

Web-Based SEOM Document Query System Objective To demonstrate the feasibility of our model, including the schema, the parsing process, as well as the query process To illustrate how it assists in querying XML data To facilitate as the platform for testing the implementation of arbitrary structured models Implemented with JDK1.4 Available models: R-Tree, Table The web-based SEOM document query system is a prototype system to demonstrate the feasibility of our model, and as the test bed of schema, parsing process, and query process. Not only it can illustrate how the model helps in querying XML data, It also facilitate as the platform for testing the implementation of arbitrary models The system is implemented in JDK1.4, and currently there are two available models R-Tree and Table The system is extensible an more models can be added easily

System Design (Server) Table R-Tree … Intermediate DOM Structure SEOM Document Parse Schema Defining Schema Generic SElement Abstract SElement Extends class Clients XML Declare Elements SElement Classes of specific models Parse Data Fetch Classes Network Interface Convert to Querying XML Data through HTTP Create Now, let’s move on to the server’s system design. Again, the Abstract SElement is the super class for SEOM Document, Generic SElement, and various Implementation Classes. The Implementation classes are closely associated with its schema, that the schema is defined under a certain model. In turns the schema will define SElements for an XML document. As stated before, both XML document and schema will be parsed into DOM structures, and then feed into a SEOM Document constructor. Generic SElements will be created, with the needed implementation classes fetched, instantiated, and wrapped inside. When the SEOM Document is ready, Clients may connected to the server through HTTP, and performs queries.

System Design (Client) Internet Explorer DOM Request Object DOM Data Objects XSL Objects Parse Data HTML for display XSLT Transform XML Data through HTTP Server Query Input Form Navigation Panel Information Panel Request Encoding Serialize into XML On the other hand, the client side is much more simple. The client is running on the Internet Explorer. Its code is written in Javascript with ActiveX technology and embedded in an HTML file. When it firstly connected to a server, it will send a query wrapper with empty queryMethod in order to retrieve the list of available methods as well as the information on the current node. The returned XML message will be transformed by an XSL object, rendered as HTML and displayed on the navigation panel and the information panel. When the user select a queryMethod from the navigation panel, a dialog box will pop up to prompt the user to input the required parameters. The query will then encoded as a DOM object, and serialize into XML message before sending back to the server. When the server receives the query and replies. The returned message is again transformed by an XSL object. Displaying the resulting node set on the navigation Panel. User can then select a node from the result and continue the navigation.

Interface Here is some screen dump of the client program. The left hand side is the outlook of the client program. Right hand side displays the connected server, current document and current node.

Interface Then comes to the navigation panel and the information panel. When connected to a node, its information will be displayed on the information panel. On the right hand side, you may see there are some methods and some result nodes to be chosen in the navigation panel. And a dialog is pop up to prompt the user to input some required parameters.

Evaluation This finished the parts on this prototype query system. Finally, we may have some evaluation on our research.

Discussions - Pros Separates logical data representation from physical data representation, thus hides the unnecessary details from the programmers Data can be validated semantically during object instantiation Flexible internal data structure implementation for SElement allows better optimization on data processing Our Structured-Element Object Model for XML data described in this thesis has certain strengths and weaknesses. Generally speaking, our model enjoys the following advantages: It separates logical data representation from the physical data representation, hides unnecessary details that programmers would not be interested in. By using a model based approach, data can be validate semantically during object instantiation. SElement has an internal data structure that allows flexible implementation and better optimization on data processing.

Discussion – Pros(2) Coincides with object-oriented programming paradigm: object encapsulation, modular development, and reusable software components Flattens legacy data structures into XML, which is text-editable, easy to transport and process by different systems Facilitates interoperability through the use of schema Inherits the DOM interface, and can use other technologies built for DOM It coincides with the object-oriented programming paradigm with object encapsulation, modular development and reuse of software components For some legacy applications, it flatten their data structure into XML, which is text-editable, easy to transport and process by different systems. By using the schema approach, the interoperability between multiple system is maintained. And finally, it inherits the DOM interface and therefore most technologies that applies to DOM will also applies to SEOM.

Discussion - Cons The size of XML file is often larger than legacy data file Overhead in parsing is longer Each structure model needs additional implementation effort Proprietary schema specification may not attract people to use Though our model has many advantages for representing complex data objects in XML, it also has some drawbacks, however. Here listed below are the disadvantages of our model. First is that the size of XML file often becomes larger then they were, and the overhead in parsing the files becomes longer. Next is that there does not exist a generic model for all data. Therefore we have to implement each model when they are needed for the first time. Lastly, our model relies on using our own schema, which may brings sort of inconvenience and could not attract people to use. Actually, these disadvantages are the tradeoffs of some good features. Though building of index structures add extra workload to the system, they provide great improvement in efficiency in use. Also, the use of schema can easily to be employed in other XML applications, hence increasing the extensibility and flexibility of the model.

Means of Enhancement Include other manipulation methods such as inserting data, removing data, serialize to XML etc. Include referencing mechanisms to define graph relation, which is more general and expressive than the hierarchical relationship Despite that our model enhances the DOM to make it more usable with complex data object, there are rooms for improvement. First is that currently the model is addressing data access only. It is possible to include other manipulation methods. Operations for inserting child nodes, removing child nodes, output to XML, etc. These operations may involve dynamic changes in the internal data structure, and leads to more complicated considerations. Another improvement may be to include referencing mechanisms to define a graph relation within the schema. This is more general and expressive than the currently employed hierarchical relationship.

Conclusion An object model combining the features of DOM and data binding technology A schema for mapping physical XML data into Java classes that implement logical entities A framework to facilitate marshaling and unmarshaling between XML data and the data objects A mechanism to support querying SElements by exchanging XML wrapper messages An extension of the XPath to support filtering nodes using the query function in SElement A web-based XML query system using SEOM has been implemented to demonstrate our work To conclude, there are a few points worth mentioned again. First is that we have proposed an object model that combins the features of DOM and data binding technology Second is that we define a schema that maps physical XML data into Java classes, where the classes are the implementation for logical entities. As a whole, there is a framework to facilitate marshaling and unmarshaling between XML data and Java data objects. For data query, we develop an query wrapper mechanism which based on exchanging XML message and we extend the XPath to support filtering the nodes using the query function in SElement. And last, we have implemented a web-based XML query system with SEOM in order to demonstrate our work.

Q&A That’s all for my presentation, and welcome for any questions.

Program Driven vs. Data Driven Information for processing is hard-coded in program Program Driven Data Driven Raw Data Data with Program Codes Data with Modeling Information Structured Data (XML) SEOM Processing instruction is hard-coded in data?!