Xml processing as a tree

Xml processing as a tree
XPath Xml processing as a tree

Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for locating specific structured data within a document. To find information in an XML document, parsing would be needed and then the elements returned would need to be examined. This is an inefficient approach for large documents. XPath provides a way of locating specific parts of an XML document. XPath is not structured, like XML, but is string-based and uses expressions common to other XML technologies, like XSLT (which can be used to convert XML to HTML, for example) and Xpointer (XML Pointer language) which can point to information inside an XML document. XSLT and Xpointer are covered later in the text.

XPath views the xml document as a tree
The elements are nodes. Of course, a node may have child nodes. Xpath regards an XML document as a tree structure made up of nodes. (Of course, the structure is recursive.) Seven node types: root, element, attribute, text, comment, processing instruction, namespace. Only element, comment, text and PI may be child nodes. Attributes and namespaces are not considered child nodes of their parent: They are not “contained in” the parent, they describe the parent. The root has no parent. The XPath tree is similar to the DOM tree (see chapter 8).

XPath nodes have string representation
Each node has a string-value used by XPath to compare nodes. The string-value of a text node is the character data contained in it. The meta-language specification <![CDATA[ and ]]> are not included in the text. The string value for a non-text node is the document-order concatenation of its child text nodes. An attribute node has string-value consisting of the normalized attribute. The string-value of a comment consists just of the comment text, excluding xml specification characters <!– and --> Text that does not fall in a CDATA section is normalized (whitespace is truncated). “document order” is dfs. In XPath there is also reverse-document order, which traces back up the hierarchy tree.

An example <?xml version = "1.0"?>
  <book title = "C++ How to Program" edition = "3"> <sample> <![CDATA[ // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); ]]> </sample> C++ How to Program by Deitel & Deitel </book>

has structure

String values String value of book in this XML is the concatenation of its two descendant text nodes in document order = //C++ comment if….. cerr<<… C++ How to … The text node (C++… ) is not in a CDATA section and so it is normalized. The string value of the root is the same as for book (its child node). String value of edition (attribute) is 3 String value of a comment is its text without delimiters. In this example: Simple XML document

Another example <?xml version = "1.0"?>
 <html xmlns = " <head> <title>Processing Instruction and Namespace Nodes</title> </head> <?deitelprocessor example = "fig11_03.xml"?> <body> <deitel:book deitel:edition = "1" xmlns:deitel = " <deitel:title>XML How to Program</deitel:title> </deitel:book> </body> </html>

XPath for this example The root node contains 3 nodes, 2 comments and the html element. The namespace’s ( parent is html. Html has 3 child nodes: head, PI and body. Head contains only title which in turn contains only a text node. Book contains a namespace bound to prefix deitel. The namespace’s parent is book. The title element is the only child of book. Namespace node string values consist of the URI. PI node string values consist of the text after the target, omitting meta-characters, but including whitespace. In this case, the text: example=“…” A summary of node types appears in the XML text on page 304

Xpath nodetypes NODETYPE String-value Expanded name Description none
root Concatenate string values of all text-node descendants in document order none The root node – may contain any other type of node element ditto attribute Normalized attribute value Name including namespace prefix, if any Attribute of an element text Char data in node Char data content of an element comment content Xml comment Processing instruction The part of the PI following the target+any whitespace The target of the PI Xml PI namespace URI of the namespace prefix Xml namespace

Location paths Location paths are expressions that describe how to navigate the XPath tree from one node to another. A location path consists of location steps. Each location step consists of an axis, a node-test, and an optional predicate. The context node specifies the start node for our search. Axis defines which nodes relative to the context node should be included in the test. There are forward and reverse axes which follow document and reverse-document order, respectively.

XPath “axes” (searches) have forward or reverse document order
self : the context node itself parent: (reverse ordering) context node’s parent, if any. child :children of context node, if any (forward order) ancestor: context node’s ancestors (reverse) ancestor-or-self: reverse ordering. Include self in ancestor search. decendant: all decendants decendant-or-self :similar to above, forward order following: nodes following the context node, not including decendants. (forward order) following-sibling: siblings that follow the context node. preceding: reverse order. Preceding nodes not including ancestors. preceding-sibling: reverse order. Sibling nodes preceding context node. attribute: attribute nodes of context node. namespace: namespace nodes of the context node (forward order)

Node tests The operator * select all nodes of the same type as the principal node type. node() select all nodes regardless of type The following tests select nodes based on the type specified: text() comment() processing-instruction() node-name

Some examples child()::* selects all element-node children of the context node. child()::text() would select all text node children of the context node. we can combine tests using /. For example, child()::*/child()::text() selects text node grandchildren of the context node since the second selection applies to the results of the first selection.

Abbreviations child:: This is the default axis so it may always be omitted. The search attribute::/decendant-or-self::node()/ is abbreviated as // self::node() abbreviated with a period (.) parent::node() abbreviated with two periods (..)

Another example- a reading list
<?xml version = "1.0"?>  <!-- reading list > <?xml:stylesheet type = "text/xsl" href = "usage.xsl"?> <books> <book> <title>The Color Purple</title> <translation edition = "1">Spanish</translation> <translation edition = "1">Czech</translation> <translation edition = "1">Mandarin</translation> <translation edition = "2">French</translation> </book> <title>The Hamlet</title> <translation edition = "1">Chinese</translation> <translation edition = "1">Latin</translation> <translation edition = "2">English</translation> <book> <title>The Old Man and the Sea</title> <translation edition = "1">Spanish</translation> <translation edition = "1">Chinese</translation> <translation edition = "1">Japanese</translation> <translation edition = "2">French</translation> <translation edition = "2">Russian</translation> </book> <title>Moby Dick</title> <translation edition = "1">Tagalog</translation> <translation edition = "2">Portugese</translation> <translation edition = "2">Dutch</translation> <translation edition = "3">Italian</translation> <translation edition = "3">Japanese</translation> <title>Grapes of Wrath</title> <translation edition = "1">Korean</translation> <translation edition = "2">German</translation> </books>

XML Structure

Xml structure for a book node

Example continued-an xsl to list books in Japanese
<?xml version = "1.0"?>  <xsl:stylesheet version = "1.0" xmlns:xsl = " <xsl:template match = "/books"> <html> <h1>books in Japanese</h1> <body> <ul><xsl:for-each select="book"> <xsl:if test= "translation='Japanese'"> <li> <xsl:value-of select = "title"/> </li> </xsl:if> </xsl:for-each> </ul> </body> </html> </xsl:template>

Node set operators (|) pipe operator...union of two nodesets.
(/) slash...separator (//) double slash...abbreviates path /decendant-or-self::node()/

node-set functions last() last value in node-set
position() position number of current node in node-set count(node-set) the number of nodes in the node-set id(string) returns the element node whose id matches the string. local-name(node-set) returns the local part of the expanded name for first node in node-set. namespace-uri(node-set) returns the namesapce URI of the expanded name for first node in node-set. name(node-set) returns the qualified name for first node in node-set.

examples from the reading list
head/title[last()] returns the title of the last element node in the head node. book[position()=3] would select the 3rd book element of the context node. //book selects all books in the document count(*) returns the number of element node children of the context node.

another example:stocks.xml
<?xml version = "1.0"?>  <!-- Stock list > <?xml:stylesheet type = "text/xsl" href = "stocks.xsl"?> <stocks> <stock symbol = "INTC"> <name>Intel Corporation</name> </stock> <stock symbol = "CSCO"> <name>Cisco Systems, Inc.</name> <stock symbol = "DELL"> <name>Dell Computer Corporation</name> <stock symbol = "MSFT"> <name>Microsoft Corporation</name> <stock symbol = "SUNW"> <name>Sun Microsystems, Inc.</name> <stock symbol = "CMGI"> <name>CMGI, Inc.</name> </stocks>

the stylesheet <?xml version = "1.0"?>
  <xsl:stylesheet version = "1.0" xmlns:xsl = " <xsl:template match = "/stocks"> <html> <body> <ul> <xsl:for-each select = "stock"> <xsl:if test = 'C')"> <li> <xsl:value-of select = - ', name)"/> </li> </xsl:if> </xsl:for-each> </ul> </body> </html> </xsl:template> </xsl:stylesheet>

stocks.xml in IE

The Xalan parser Xalan can be used to render transformations on XML, like generating HTML for a given XML.

remove the xsl reference in stocks
remove the xsl reference in stocks.xml and run Xalan from dos command line Microsoft(R) Windows DOS (C)Copyright Microsoft Corp C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>java org.apache.xalan.xslt.Process -INDENT 3 -I N stocks.xml -XSL stocks.xsl -OUT stocks.html ========= Parsing file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xsl ========== Parse of file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xsl took 381 milliseconds ========= Parsing file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xml ========== Parse of file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xml took 50 milliseconds ============================= Transforming... transform took 40 milliseconds XSLProcessor: done C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>

generates the html <html> <body> <ul>
<li>CSCO - Cisco Systems, Inc.</li> <li>CMGI - CMGI, Inc.</li> </ul> </body> </html>

Xml processing as a tree

Similar presentations

Presentation on theme: "Xml processing as a tree"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Xml processing as a tree

Similar presentations

Presentation on theme: "Xml processing as a tree"— Presentation transcript:

Similar presentations

About project

Feedback