Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing XML Processing XML using XSLT Processing XML documents with Java (DOM) Next week -- Processing XML documents with Java (SAX)

Similar presentations


Presentation on theme: "Processing XML Processing XML using XSLT Processing XML documents with Java (DOM) Next week -- Processing XML documents with Java (SAX)"— Presentation transcript:

1 Processing XML Processing XML using XSLT Processing XML documents with Java (DOM) Next week -- Processing XML documents with Java (SAX)

2 Processing XML using XSLT To use James Clark’s xt program visit his site at http://www.jclark.com/http://www.jclark.com/ and click on XML. The following programs were tested with the command line C:>xt somefile.xml somefile.xsl resultfile.html The xt classes (and xslt processing) may also be accessed via a servlet.

3 The Catcher in the Rye J. D. Salinger Little, Brown and Company Input

4 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> Processing

5 The Catcher in the Rye J. D. Salinger Little, Brown and Company Output

6 The Catcher in the Rye J. D. Salinger Little, Brown and Company Input

7 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> The default rules matches the root, library and block elements.

8 The Catcher in the Rye J. D. Salinger Little, Brown and Company The output is the same.

9 The Catcher in the Rye J. D. Salinger Little, Brown and Company Cliff Notes on The Catcher in the Rye Two books in the input

10 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> What’s the output?

11 The Catcher in the Rye J. D. Salinger Little, Brown and Company Cliff Notes on The Catcher in the Rye Illegal HTML

12 The Catcher in the Rye J. D. Salinger Little, Brown and Company Input

13 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- --> We are not matching on publisher.

14 The Catcher in the Rye J. D. Salinger Little, Brown and Company We get the default rule matching the publisher and then printing its child.

15 The Catcher in the Rye J. D. Salinger Little, Brown and Company Input

16 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> We can skip the publisher by matching and stopping the recursion.

17 The Catcher in the Rye J. D. Salinger

18 The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company A shelf has many books.

19 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> Will this do the job?

20 The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company This is not what we want.

21 The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company Same input.

22 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> Found a shelf Checks for a shelf and quits.

23 Found a shelf Output

24 The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company The Catcher in the Rye J. D. Salinger Little, Brown and Company Same input.

25 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> These are a few of my favorite books Produce a table of books.

26 These are a few of my favorite books 1 The Catcher in the Rye J. D. Salinger Little, Brown and Company 2 The XSLT Programmer's Reference Michael Kay Wrox Press 3 Computer Organization and Design Patterson and Henessey Morgan Kaufmann

27

28 Processing XML Documents with Java (DOM) The following examples were tested using Sun’s JAXP (Java API for XMP Parsing. This is available at http://www.javasoft.com/ and click on XMLhttp://www.javasoft.com/

29 XML DOM The World Wide Web Consortium’s Document Object Model Provides a common vocabulary to use in manipulating XML documents. May be used from C, Java, Perl, Python, or VB Things may be quite different “under the hood”. The interface to the document will be the same.

30 I am The Cat in The Hat I am Little Cat A I am Little Cat B I am Little Cat C The XML File “cats.xml”

31 Little cat A Little cat B I am little cat B topcat I am the cat in the hat Little cat D Little Cat C I am little cat C I am little cat A document XML doc doctypeelement textelement text element DOM Called the Document Element

32 Agreement.xml 100 5 3 6

33 document XML doc doctype FixedFloatSwap Notional FixedRate NumYearsNumPayments All of these nodes implement the Node interface 10053 6

34 Operation of a Tree-based Parser Tree-Based Parser Application Logic Document Tree Valid XML DTD XML Document

35 Some DOM Documentation from JavaSoft

36 The Node Interface The Node interface is the primary datatype for the entire Document Object Model. It represents a single node in the document tree. While all objects implementing the Node interface expose methods for dealing with children, not all objects implementing the Node interface may have children. For example, Text nodes may not have children.

37 Properties All Nodes have properties. Not all properties are needed by all types of nodes. The attribute property is an important part of the Element node but is null for the Text nodes. We access the properties through methods…

38 Some Methods of Node Example Methods are: String getNodeName() – depends on the Node type if Element node return tag name if Text node return #text

39 Some Methods of Node Example Methods are: short getNodeType() Might return a constant like ELEMENT_NODE or TEXT_NODE or …

40 Some Methods of Node Example Methods are: String getNodeValue() if the Node is an Element Node then return ‘null’ if the Node is a Text Node then return a String representing that text.

41 Some Methods of Node Example Methods are: Node getParentNode() returns a reference to the parent

42 Some Methods of Node Example Methods are: public Node getFirstChild() Returns the value of the firstChild property.

43 Some Methods of Node Example Methods are: public NodeList getChildNodes() returns a NodeList object NodeList is an interface and not a Node.

44 The NodeList Interface The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or constraining how this collection is implemented. The items in the NodeList are accessible via an integral index, starting from 0.

45 There are only two methods of the NodeList Interface public Node item(int index) Returns the item at index in the collection. If index is greater than or equal to the number of nodes in the list, this returns null.

46 There are only two methods of the NodeList Interface public int getLength() Returns the value of the length property.

47 The Element Interface public interface Element extends Node By far the vast majority of objects (apart from text) that authors encounter when traversing a document are Element nodes. Inheritance Nothing prevents us from extending one interface in order to create another. Those who implement Element just have more promises to keep.

48 The Element Interface public interface Element extends Node Some methods in the Element interface String getAttribute(String name) Retrieves an attribute value by name.

49 The Element Interface public interface Element extends Node Some methods in the Element interface public String getTagName() Returns the value of the tagName property.

50 The Element Interface public interface Element extends Node Some methods in the Element interface public NodeList getElementsByTagName(String name) Returns a NodeList of all descendant elements with a given tag name, in the order in which they would be encountered in a preorder traversal of the Element tree..

51 The CharacterData Interface public interface CharacterData extends Node The CharacterData interface extends Node with a set of attributes and methods for accessing character data in the DOM. For clarity this set is defined here rather than on each object that uses these attributes and methods. No DOM objects correspond directly to CharacterData, though Text and others do inherit the interface from it. All offsets in this interface start from 0.

52 The CharacterData Interface public interface CharacterData extends Node An example method: public String getData() Returns the value of the the character data of the node that implements this interface. The Text interface extends CharacterData. public void setData(String data) is also available.

53 The Document Interface public interface Document extends Node The Document interface represents the entire HTML or XML document. Conceptually, it is the root of the document tree, and provides the primary access to the document's data.

54 The Document Interface public interface Document extends Node Some methods: public Element getDocumentElement() Returns the value of the documentElement property. This is a convenience attribute that allows direct access to the child node that is the root element of the document. For HTML documents, this is the element with the tagName "HTML".

55 The Document Interface Some methods: public NodeList getElementsByTagName(String tagname) Returns a NodeList of all the Elements with a given tag name in the order in which the would be encountered in a preorder traversal of the Document tree. Parameters: tagname - The name of the tag to match on. The special value "*" matches all tags. Returns: A new NodeList object containing all the matched Elements.

56 FixedFloatSwap.xml 100 5 3 6

57 document XML doc doctype FixedFloatSwap Notional FixedRate NumYearsNumPayments 10053 6 FixedFloatSwap.xml

58 An Example import java.io.File; import org.w3c.dom.*; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; Process a local file

59 public class Simulator3 { public static void main(String argv[]) { Document doc; if(argv.length != 1 ) { System.err.println("usage: java Simulator3 documentname"); System.exit(1); } try { DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

60 doc = docBuilder.parse(new File(argv[0])); Element top = doc.getDocumentElement(); top.normalize(); // concatenate adjacent text nodes NodeList elementList = top.getElementsByTagName("*"); int listLength = elementList.getLength(); for(int i = 0; i < listLength; i++) { Element e = (Element)elementList.item(i); System.out.print(e.getNodeName()); Text t = (Text)e.getFirstChild(); System.out.println(t.getNodeValue()); }

61 catch(SAXParseException err) { System.out.println("Parsing error" + ", line " + err.getLineNumber() + ", URI " + err.getSystemId()); System.out.println(" " + err.getMessage()); } catch(SAXException e) { Exception x = e.getException(); ((x == null) ? e : x).printStackTrace(); } catch (Throwable t) { t.printStackTrace(); } System.exit(0); }

62 FixedFloatSwap.xml 100 5 3 6

63 Output Notional100 Fixed_Rate5 NumYears3 NumPayments6

64 Another DOM Example The program then displays the DOM tree. A Java Program that reads FixedFloatSwap.xml from Jigsaw and performs validation against the server based DTD. More on DTD’s next week.

65 import java.net.*; import java.io.*; import org.w3c.dom.*; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import org.xml.sax.*; public class Simulator6 { public static void main(String argv[]) { try { DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); docBuilderFactory.setValidating(true); docBuilderFactory.setNamespaceAware(true); Process a file on the internet.

66 DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); docBuilder.setErrorHandler( new org.xml.sax.ErrorHandler() { public void fatalError(SAXParseException e) throws SAXException { System.out.println("Fatal error"); // an exception will be thrown by SAX } public void error(SAXParseException e) throws SAXParseException { System.out.println("Validity error"); throw e; } Register our own event handler

67 public void warning(SAXParseException err) throws SAXParseException { System.out.println("** Warning" + ", line " + err.getLineNumber() + ", uri " + err.getSystemId()); System.out.println(" " + err.getMessage()); throw err; } ); public interface ErrorHandler Basic interface for SAX error handlers. If a SAX application needs to implement customized error handling, it must implement this interface and then register an instance with the SAX parser using the parser's setErrorHandler method. The parser will then report all errors and warnings through this interface. The parser shall use this interface instead of throwing an exception: it is up to the application whether to throw an exception for different types of errors and warnings. Note, however, that there is no requirement that the parser continue to provide useful information after a call to fatalError (in other words, a SAX driver class could catch an exception and report a fatalError).

68 InputSource is = new InputSource("http://mccarthy.heinz.cmu.edu:8001/fpml/Agreement.xml"); Document doc = docBuilder.parse(is); System.out.println("No Problems found"); // Let’s print the tree TreePrinter tp = new TreePrinter(doc); tp.print(); } Jigsaw’s port. Under WWW/fpml A single input source for an XML entity.

69 catch(SAXParseException err) { System.out.println("Catching raised exception"); System.out.println("Parsing error" + ", line " + err.getLineNumber() + ", URI " + err.getSystemId()); System.out.println(" " + err.getMessage()); } catch(SAXException e) { System.out.println("Catch clause 2"); Exception x = e.getException(); ((x == null) ? e : x).printStackTrace(); } catch (Throwable t) { System.out.println("Catch clause 3"); t.printStackTrace(); } System.exit(0); }

70 TreePrint Class import org.w3c.dom.*; public class TreePrinter { private Document doc; private int currentIndent; public TreePrinter(Document d) { currentIndent = 2; doc = d; } public void print() { privatePrint(doc,currentIndent); }

71 document XML doc doctype FixedFloatSwap Notional FixedRate NumYearsNumPayments 10053 6 FixedFloatSwap.xml

72 public void privatePrint(Node n, int indent) { for(int i = 0; i < indent; i++) System.out.print(" "); switch( n.getNodeType()) { // Print information as each node type is encountered case n.DOCUMENT_NODE : System.out.println(n.getNodeName() + "...Document Node"); break; case n.ELEMENT_NODE : System.out.println(n.getNodeName() + "...Element Node"); break; case n.TEXT_NODE : System.out.println(n.getNodeName() + "...Text Node"); break; case n.CDATA_SECTION_NODE: System.out.println(n.getNodeName() + "...CDATA Node"); break; case n.PROCESSING_INSTRUCTION_NODE: System.out.println(" "+ "...PI Node"); break;

73 case n.COMMENT_NODE: System.out.println(" " + "...Comment node"); break; case n.ENTITY_NODE: System.out.println("ENTITY "+ n.getNodeName()+ "...Entity Node"); break; case n.ENTITY_REFERENCE_NODE: System.out.println("&"+n.getNodeName()+";" + "...Entity Reference Node"); break; case n.DOCUMENT_TYPE_NODE: System.out.println("DOCTYPE"+n.getNodeName()+ "...Document Type Node"); break; default: System.out.println("?" + n.getNodeName()); } Node child = n.getFirstChild(); while(child != null) { privatePrint(child, indent+currentIndent); child = child.getNextSibling(); }

74 Output No Problems found #document...Document Node DOCTYPEFixedFloatSwap...Document Type Node FixedFloatSwap...Element Node #text...Text Node Notional...Element Node #text...Text Node Fixed_Rate...Element Node #text...Text Node NumYears...Element Node #text...Text Node #text...Text Node NumPayments...Element Node #text...Text Node

75 Example of using XML: Web Applications Our example is an application called PowerWarning. Using PowerWarning we access a weather information site on the Web to obtain the current temperature for a particular location. Based on the temperature and if certain conditions have been met, the application sends a notice to clients of the service alerting them of the condition.

76 [1] [2] [3] Weather Report [4] [5] [6] Weather Report -- White Plains, NY [7] [8] Date/Time 11 AM EDT Sat Jul 25 1998 [9] Current Tem. 70° [10] Today’s High 82° [11] Today’s Low 62° [12] [13] [14] We know that the weather information is available from the Web at http://www.xweather.com/White_Plains_NY_US.html.

77 Strategy 1: For the current temperature of White Plains, go to line 9, column 46 of the page and continue until reaching the next ampersand. Strategy 2: For the current temperature of the White Plains, go to the first tag, then go to the second tag within the table, and then go to the second tag within the row.

78 <!DOCTYPE WeatherReport SYSTEM “http>//www.xweather.com/WeatherReport.dtd”> White Plains NY Sat Jul 25 1998 11 AM EDT 70 82 62

79 Strategy 3: For the current temperature of White Plains, N.Y., go to the tag.

80 XML Mobile users PC users Http://www.xweather.com WeatherReport application WML HTML PowerWarning application Application programs Common logical data (DOM/XML) Next week -- SAX

81 Classpath Info When I work with JAXP (DOM and SAX) my classpath contains the following: c:\Program Files\JavaSoft\Jaxp1.0\jaxp.jar Available at www.javasoft.com/xml c:\Program Files\JavaSoft\Jaxp1.0\parser.jar When I work with xt (XSLT) my classpath contains the following: c:\Xt\xt.jar Available at www.jclark.com/xml c:\XP\xp.jar For Jigsaw 2.0.5 my classpath looks like c:\Jigsaw\Jigsaw\classes\jigsaw.zip Available at www.w3.org/jigsaw c:\Jigsaw\Jigsaw\classes\servlet.jar c:\Jigsaw\Jigsaw\classes\jigadmin.jar For Jigsaw to run JSP pages, my classpath looks like this: c:\gnujsp\gnujsp-1.0.1\lib\gnujsp10.jar Available at www.klomp.org.gnujsp


Download ppt "Processing XML Processing XML using XSLT Processing XML documents with Java (DOM) Next week -- Processing XML documents with Java (SAX)"

Similar presentations


Ads by Google