Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids

Similar presentations

Presentation on theme: "Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids"— Presentation transcript:

1 Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids
.opennet Technologies XML Document Object Model and XML-Java Interfaces Fall Semester MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids Computer Science, Informatics, Physics Indiana University Bloomington IN 47404 11/13/2018 xmldomfall01

2 The two XML World Views There are the Data Object
And the Document Object view – both defined in XML Schema Database (Virtual) XML Layer Enterprise Javabeans JAVA Servlet Persistent Managed Store Object Layer Virtual Machine Control Form Input/Output Processing System User Java Web Server XML Web page 11/13/2018 xmldomfall01

3 Java XML Interfaces -- SAX
The appropriate way to interface Java to XML is still being debated and there are several different approaches There is the SAX (Simple API for XML) where a SAX parser reads an XML data stream and hands nuggets of information to a user program. These nuggets are called events and typical events are Start Tag, End Tag, Content within a Tag etc. has a recent SAX tutorial SAX resource is XML SAX Parser XML Data Schema/Instance(s) User Code 11/13/2018 xmldomfall01

4 Java XML Interfaces – Document Object Model DOM
The DOM is an unfortunate name as it is useful whether or not XML defines a document or any other object i.e. it can be used whether one is supporting XML Web page or XML Data Further the OM part of name is confusing as XML defines an object and the DOM describes a different object DOM is really TOSXO – Tree Object Structure of an XML Object – or perhaps TOM – Tree Object Model In fact if you look at almost any structured information, you will find that it has a tree structure and of course we saw that any Schema or DTD defined XML produces a tree 11/13/2018 xmldomfall01

5 Example of HTML DOM Here is an example of a fragment of HTML and how it can be thought of as a tree This is called a “document fragment” in DOM (lightweight tree) 11/13/2018 xmldomfall01

6 IMS and DOM As an example, consider recent definition of a object structure for course material from the so called ADL or Advanced Distributed Learning effort from DoD – see Here we have a hierarchy with an element called block to define nodes in a tree This block tree node has various other elements which are specific to this application Actually in this specification the leaves of tree are <au> tag (assignable unit) which is in fact typically a Web page So ADL has superimposed a tree for document organization on top of tree for document given by DOM Of course DOM applies to either tree and describes the way of navigates through it 11/13/2018 xmldomfall01

7 Example Tree based Course Structure
11/13/2018 xmldomfall01

8 XML DTD Structure for Block Element
11/13/2018 xmldomfall01

9 Tree or Structured Data
Yahoo and Google offer Structured (tree) or unstructured data access Tree Nodes 11/13/2018 xmldomfall01

10 Unstructured Data The Gallimaufrey of Web Search Engines 11/13/2018

11 Java XML Interfaces – DOM
Apache has two so called DOM parsers which read the full tree into memory and allow you to browse it Xerces and Crimson Note these are built on top of SAX parsers and provide an additional layer of capability. In all these architectures, one can choose to validate or not to validate XML XML DOM Parser XML Data Schema/Instance(s) User Code Tree Representation Of XML Instance 11/13/2018 xmldomfall01

12 Java XML Interfaces -- XPP
A “Pull” Parser written by Aleksander Slominski who is a graduate student of Dennis Gannon at Indiana University This has a similar interface to SAX but you can “backtrack” For instance you could decide that you did not want to read all the events in a particular element <xmlnode> Other Nodes </xmlnode> And later go back if it turns out you need them In DOM view of Java Interface, XPP Supports choosing whether or not to expand nodes of the XML Tree XPP was fastest parser in a recent survey (which excluded SAX as it doesn’t preserve tree structure) 11/13/2018 xmldomfall01

13 Performance of XML DOM like Parsers
This took a variety of documents and summed time Current XPP does not support one of documents with entities and other not so useful XML constructs Smaller Numbers Better Article has links to all systems 11/13/2018 xmldomfall01

14 Java XML Interfaces – JDOM I
DOM has perhaps two difficulties A lot of DOM features are aimed at Web Page not XML data application (Tree structure common to both) It is not especially well designed to exploit Java JDOM is designed to produce a natural Java—XML interface It exploits Java Collections to organize nodes and other features of an XML Instance For more information on JDOM, visit For information on the Java Community Process (JCP) standards effort for JDOM, see JDOM appears immature and description in performance review is not so positive! Surprisingly it is no faster than Java DOM 11/13/2018 xmldomfall01

15 Party Line on JDOM, DOM4J The standard DOM is a very simple data structure that intermixes text nodes, element nodes, processing instruction nodes, CDATA nodes, entity references, and several other kinds of nodes. That makes it difficult to work with in practice, because you are always sifting through collections of nodes, discarding the ones you don't need into order to process the ones you are interested in. JDOM, on the other hand, creates a tree of objects from an XML structure. The resulting tree is much easier to use, and it can be created from an XML structure without a compilation step. Although it is not on the JCP standards track, DOM4J is an open-source, object-oriented alternative to DOM that is in many ways ahead of JDOM in terms of implemented features. As such, it represents an excellent alternative for Java developers who need to manipulate XML-based data. For more information on DOM4J, see 11/13/2018 xmldomfall01

16 Java XML Interfaces – Castor I
is open source project that supports a different model where you map one to one XML Schema objects to Java Classes Map Class <--> Schema Map Java Instance <--> XML Instance This uses Java object references to traverse tree – not explicit tree structure Looks best if Schema reflects an integrated object and names of properties mean something If Schema (as in ADL) just a “tree” then maybe not so natural Next Page is Castor advertisment! There is some partial standards effort for this type called JAXB (Java Architecture for XML Binding See for Sun’s attempt to deconfuse these approaches 11/13/2018 xmldomfall01

17 Java XML Interfaces – Castor II
Castor XML: Java object model to and from XML Generate source code from an XML Schema Castor JDO: Java object persistence to RDBMS Castor DAX: Java object persistence to LDAP Castor DSML: LDAP directory exchange through XML XML-based mapping file specify the mapping between one model and another Support for schema-less Java to XML binding In memory caching and write-at-commit reduces JDBC operations Two phase commit transactions, object rollback and deadlock detection OQL query mapping to SQL queries EJB container managed persistence provider for OpenEJB 11/13/2018 xmldomfall01

18 Java XML Interfaces – Castor III
Note Comparison of DOM versus Castor/JAXB Maybe we have a tree corresponding to a parent class docroot and child properties called say fred. Let fred have children of same name The Castor way of accessing information would be reference Docroot.fred.fred.fred.finalproperty Actually use methods (setter/getter) as properties are private DOM model would reference tree 4 levels down with node whose name was finalproperty Castor has a document handler which will return the XML associated with any Java object generated from XML in text format as well as SAX DocumentHandlers and DOM trees. Best is to combine Castor and DOM models? 11/13/2018 xmldomfall01

19 Java XML Interfaces – Castor IV
This diagram illustrates the Castor versus DOM model Node Instance Docroot Instance fred Property finalpropery child parent See online book chapter (Professional XML 2nd Ed.Wrox Pubs.) 11/13/2018 xmldomfall01

20 Java XML Interfaces – JAXP
JavaTM APIs for XML Processing (JAXP) is a collection of technologies allowing you to interface with many different types of XML Java interfaces This link has several good online tutorials This tutorial discusses JAXP and relation to SAX DOM XSLT JDOM JAXP is an approved Java standard which is meant to allow you to keep the same interface and change implementation Not clear this is efficient and will catch on 11/13/2018 xmldomfall01

21 The Origins of the W3C DOM
The idea of DOM came from need to be able to build interactive web pages and to identify parts of a document uniquely so that one can for example Associate a mouse event with a particular page element. Associate input of text into a form with a particular text are Dynamical HTML was introduced in Netscape 4 and IE5 and allows one to both associate events with HTML elements and to change the HTML structure e.g. move a “layer” around within browser Change text and color in a “document fragment” Netscape’s implementation of Dynamical HTML had many bugs and was inferior to Microsoft’s although it had the essential needed functionality 11/13/2018 xmldomfall01

22 The 4 levels of DOM Level 0: Functionality equivalent to that evident in Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0. Levels 1 and 2 include what is called Dynamical HTML but make this much more complete Level 1: This concentrates on the general API to an XML document. It contains functionality for document (tree) navigation and manipulation. It defines the special case of DOM applied to HTML with specific API’s for the different HTML elements Level 2: includes a style sheet object model, and defines functionality for manipulating the style information attached to a document. It also enables traversals on the document (i.e. for manipulating collections of nodes) , defines an event model (very important!) and provides support for XML namespaces. Level 3: Still being developed – see next page 11/13/2018 xmldomfall01

23 Level 3 DOM Level 3, which is at Working Draft stage, includes the following items: Extending the DOM Level 2 Object Model: Allowing users and applications to access keyboard events. Adding the ability of defining groups of events. Content Models (DTD, Schema) and Validation: an object model for accessing and modifying a Content Model for a document. Load and Save interfaces: for loading XML source documents into a DOM representation and for saving a DOM representation as an XML document. Embedded Document Object Model: Currently, the Web is moving towards documents with mixed markup vocabularies, e.g. SVG fragments can be embedded in an XHTML document. This creates new challenges for the DOM, since it also means that DOM APIs and implementations of the different vocabularies need to work together. Adaption to changes to core XML functionality: the DOM is an API to an XML document. As auxiliary functionality to XML 1.0 is developed (namespaces, XML Base), the DOM API should model this. XPath DOM: A simple solution to query a DOM tree using XPath will be also included. 11/13/2018 xmldomfall01

24 What the DOM is not ….. I Although the Document Object Model was strongly influenced by "Dynamic HTML", in Level 1, it does not implement all of "Dynamic HTML". In particular, events have not yet been defined. Level 1 is designed to lay a firm foundation for this kind of functionality by providing a robust, flexible model of the document itself. The Document Object Model is not a binary specification. DOM programs written in the same language will be source code compatible across platforms, but the DOM does not define any form of binary interoperability. The Document Object Model is not a way of persisting objects to XML or HTML. Instead of specifying how objects may be represented in XML, the DOM specifies how XML and HTML documents are represented as objects, so that they may be used in object oriented programs. The Document Object Model is not a set of data structures, it is an object model that specifies interfaces. Although this document contains diagrams showing parent/child relationships, these are logical relationships defined by the programming interfaces, not representations of any particular internal data structures. 11/13/2018 xmldomfall01

25 What the DOM is not ….. II The Document Object Model does not define "the true inner semantics" of XML or HTML. The semantics of those languages are defined by W3C Recommendations for these languages. The DOM is a programming model designed to respect these semantics. The DOM does not have any ramifications for the way you write XML and HTML documents; any document that can be written in these languages can be represented in the DOM. The Document Object Model, despite its name, is not a competitor to the Component Object Model (COM). COM, like CORBA, is a language independent way to specify interfaces and objects; the DOM is a set of interfaces and objects designed for managing HTML and XML documents. The DOM may be implemented using language-independent systems like COM or CORBA; it may also be implemented using language-specific bindings like the Java or ECMAScript bindings specified in this document. 11/13/2018 xmldomfall01

26 Language Bindings The DOM specifies a set of methods and properties which are the interface that for user to access the static or dynamic (events) of an XML structure. It also allows one to create or modify such structures In specification it gives this interface for IDL (CORBA), Java and ECMAScript For Web Pages, Java (in Java Server Pages) or ECMAScript are most important ECMAScript is a general object based scripting language ECMAScript plus the DOM bindings is essentially JavaScript Of course Netscape 4 and IE5 do not follow (exactly) the W3C DOM Mozilla (Netscape 6) does support the W3C DOM Interface – fully at level 1 and partially at level 2 11/13/2018 xmldomfall01

27 Netscape 6 and Level 1 DOM Note that Netscape 6 supports XML
This comes from In Netscape 6 and Mozilla “everything” (Web page and Browser adornments) are controlled by DOM interface Netscape 6 and Level 1 DOM 11/13/2018 xmldomfall01

28 DOM Level 1Core In the DOM, one builds a tree out of a set of Node objects Each Node object has a set of generic capabilities (properties and methods) and also implements specific interfaces. In the CORE one defines a set of Node types to reflect the structure of XML. Each Node type has its own interface to reflects its special features. Node …….. etc. 11/13/2018 xmldomfall01

29 Node Types in Level 1 Core I
For each Node Type, we give the allowed children Document Element (maximum of one), ProcessingInstruction, Comment, DocumentType DocumentFragment -- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference DocumentType no children EntityReference Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference Element Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference 11/13/2018 xmldomfall01

30 Node Types in Level 1 Core II
Attr Text, EntityReference ProcessingInstruction -- no children Comment no children Text no children CDATASection no children Entity Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference Notation no children 11/13/2018 xmldomfall01

31 The Node Interface in CORBA IDL
Constants The Node Interface in CORBA IDL Properties Methods 11/13/2018 xmldomfall01

32 nodeName nodeValue attributes
Each Node type has particular rules for values of some of the properties – most importantly nodeName and nodeValue attributes is property only allowed for an element document type Node Type 11/13/2018 xmldomfall01

33 Document Fragment This is a lightweight “document” used to denote a part of a Tree. As it does not carry all the overhead of an XML object instance, it is a convenient way of denoting a sub tree including all leaf nodes below a certain internal node. This is an important building block for documents Node Document Fragment Node Node Node Node Node Node Node Node Node 11/13/2018 xmldomfall01

34 This page is full of Documents Fragments such as
or 11/13/2018 xmldomfall01

35 Properties of a Node I nodeName
The name of this node, depending on its type; see the table above. nodeValue The value of this node, depending on its type; see the table above. Exceptions on setting: DOMException NO_MODIFICATION_ALLOWED_ERR: Raised when the node is readonly. Exceptions on retrieval: DOMException DOMSTRING_SIZE_ERR: Raised when it would return more characters than fit in a DOMString variable on the implementation platform. nodeType A code representing the type of the underlying object, as defined above. 11/13/2018 xmldomfall01

36 Properties of a Node II parentNode childNodes firstChild
The parent of this node. All nodes, except Document, DocumentFragment, and Attr may have a parent. However, if a node has just been created and not yet added to the tree, or if it has been removed from the tree, this is null. childNodes A NodeList that contains all children of this node. If there are no children, this is a NodeList containing no nodes. The content of the returned NodeList is "live" in the sense that, for instance, changes to the children of the node object that it was created from are immediately reflected in the nodes returned by the NodeList accessors; it is not a static snapshot of the content of the node. This is true for every NodeList, including the ones returned by the getElementsByTagName method. firstChild The first child of this node. If there is no such node, this returns null. lastChild The last child of this node. If there is no such node, this returns null. 11/13/2018 xmldomfall01

37 Properties of a Node III
previousSibling The node immediately preceding this node. If there is no such node, this returns null. nextSibling The node immediately following this node. If there is no such node, this returns null. attributes A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise. ownerDocument The Document object associated with this node. This is also the Document object used to create new nodes. When this node is a Document this is null. 11/13/2018 xmldomfall01

38 Methods of a Node I insertBefore (newChild, refChild)
Inserts the node newChild before the existing child node refChild. If refChild is null, insert newChild at the end of the list of children. If newChild is a DocumentFragment object, all of its children are inserted, in the same order, before refChild. If the newChild is already in the tree, it is first removed. replaceChild (newChild, oldChild) Replaces the child node oldChild with newChild in the list of children, and returns the oldChild node. If the newChild is already in the tree, it is first removed. 11/13/2018 xmldomfall01

39 Methods of a Node II removeChild (oldChild)
Removes the child node indicated by oldChild from the list of children, and returns it. appendChild (newChild) Adds the node newChild to the end of the list of children of this node. If the newChild is already in the tree, it is first removed. hasChildNodes This is a convenience method to allow easy determination of whether a node has any children. It returns true if there are any Child Nodes 11/13/2018 xmldomfall01

40 Methods of a Node III cloneNode (deep)
Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns null.). Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node. Parameter deep: If true, recursively clone the subtree under the specified node; if false, clone only the node itself (and its attributes, if it is an Element). 11/13/2018 xmldomfall01

41 Two Specific Interfaces
DocumentFragment: And Document 11/13/2018 xmldomfall01

42 HTML Level 1 DOM This has several extensions basically inheriting the XML Interfaces of Core to specialize to each HTML tag An HTMLDocument interface, derived from the core Document interface. HTMLDocument specifies the operations and queries that can be made on a HTML document. An HTMLElement interface, derived from the core Element interface. HTMLElement specifies the operations and queries that can be made on any HTML element. Methods on HTMLElement include those that allow for the retrieval and modification of attributes that apply to all HTML elements. Specializations for all HTML elements that have attributes that extend beyond those specified in the HTMLElement interface. For all such attributes, the derived interface for the element contains explicit methods for setting and getting the values. 11/13/2018 xmldomfall01

43 HTMLDocument Interface
This uses another special interface data structure HTMLCollection to hold lists of sub-components 11/13/2018 xmldomfall01

44 HTMLElement and Specializations
Any HTML Element adds to Node The <body> tag adds 11/13/2018 xmldomfall01

45 Two HTML DOM API’s And <a> </a> Link tag adds while the select element in a form has a bunch of new properties and methods 11/13/2018 xmldomfall01

46 Highlights of Event Model in Level 2 DOM
Every Node can have Event Listeners added for types of Event For example taking mouse events, types are click, mousedown, mouseup, mouseover, mousemove, mouseout 11/13/2018 xmldomfall01

47 Sample Event in DOM Level 2
Here is a MouseEvent Note you can in DOM both receive events and create them programmatically. This capability was not implemented properly in Netscape 4 – sometimes you could and sometimes you couldn’t 11/13/2018 xmldomfall01

Download ppt "Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids"

Similar presentations

Ads by Google