Processing XML Part II Parser Operations with DOM and SAX overview XML Validation with examples Processing XML with SAX (locally and on the internet)
FixedFloatSwap.xml
FixedFloatSwap.dtd <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
Operation of a Tree-based Parser Tree-Based Parser Application Logic Document Tree Valid XML DTD XML Document
Tree Benefits Some data preparation tasks require early access to data that is further along in the document (e.g. we wish to extract titles to build a table of contents) New tree construction is easier (e.g. xslt works from a tree to convert FpML to WML)
Operation of an Event Based Parser Event-Based Parser Application Logic Valid XML DTD XML Document
Operation of an Event Based Parser Event-Based Parser Application Logic Valid XML DTD XML Document public void startDocument () public void endDocument () public void startElement (String name, AttributeList attrs) public void endElement (String name) public void characters (char buf [], int offset, int len) public void error(SAXParseException e) throws SAXException { System.out.println("\n\n--Invalid document ---" + e); }
Event-Driven Benefits We do not need the memory required for trees Parsing can be done faster with no tree construction going on
XML Validation A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings. Software developers should consider batch validation to be analogous to program compilation, with similar errors detected. Interactive validation involves constant comparison of the DTD against a document as it is being created.
XML Validation The benefits of validating documents against a DTD include: Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input. Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents.
XML Validation Examples XML elements may contain further, embedded elements, and the entire document must be enclosed by a single document element. The degree to which an element’s content is organized into child elements is often termed its granularity. Some hierarchical structures may be recursive. The Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents.
// Validate.java import java.io.*; import org.xml.sax.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class Validate extends HandlerBase { public static boolean valid = true; public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); We’ll run this program against several xml files with DTD’s.
try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv [0]), new Validate()); } catch (Throwable t) { t.printStackTrace (); } System.out.println("Valid document is " + valid); System.exit (0); } public void error(SAXParseException e) throws SAXException { System.out.println(e.toString()); valid = false; }
XML Document DTD Valid document is true
XML Document DTD Valid document is false
XML Document
DTD C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times Valid document is true
The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data) XML Document
C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" -- (#PCDATA) org.xml.sax.SAXParseException: Element type "StartYear" is not declared. org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (# PCDATA) org.xml.sax.SAXParseException: Element type "EndYear" is not declared. Valid document is false Output of program after being modified to display the error. DTD
There are strict rules which must be applied when an element is allowed to contain both text and child elements. The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”). The group must be optional and repeatable. This is known as a mixed content model.
DTD H 2 O is water. XML Document Valid document is true
Attributes An attribute is associated with a particular element by the DTD and is assigned an attribute type. The attribute type can restrict the range of values it can hold. Example attribute types include : CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right)
DTD XML Document C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED. Valid document is false
DTD XML Document Valid document is true
DTD XML Document Valid document is true #IMPLIED means optional
DTD XML Document Valid document is true
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; <!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) > DTD Document using a General Entity Validate is true
<xsl:stylesheet xmlns:xsl=" version="1.0"> XSLT Program
C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark. xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwa p.xsl FixedFloatSwap.wml C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml Mellon National Bank and Trust XSLT OUTPUT
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; An external text entity
Mellon Bank And Trust Corporation When you need a friend! XSLT Output Mellon Bank And Trust Corporation When you need a friend! JustAFile.dat
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > XML Document DTD Internal Parameter Entities
&bankname; <!ELEMENT FixedFloatSwap (Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > XML Document DTD General Entity defined in the DTD
will not be parsed for markup]]> <!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) > XML Document DTD CDATA Section
<xsl:stylesheet xmlns:xsl=" version="1.0"> h XSLT Program
This is text that <b>will not be parsed for markup XSLT Output
DTD Components Kevin Dick 123 Anywhere Lane Apt 1b Palo Alto CA USA Order.xml
Kevin Dick 123 Not The Same Lane Work Place Palo Alto CA USA An order may have more than one address.
440BX Motherboard MB PC-100 DIMM x CD-ROM 1 50 Several products may be purchased.
Kevin S. Dick /01 The payment is with a Visa card. Valid document is true
order.dtd <!ATTLIST ORDER SOURCE (web | phone | retail) #REQUIRED CUSTOMERTYPE (consumer | business) "consumer" CURRENCY CDATA "USD" > Define an order based on other elements.
%anAddress; %aLineItem; %aPayment; The other elements are in their own dtd files. External parameter entities
address.dtd <!ELEMENT address (firstname, middlename?, lastname, street+, city, state,postal,country)> <!ATTLIST address ADDTYPE (bill | ship | billship) "billship"> <!ATTLIST street ORDER CDATA #IMPLIED>
lineitem.dtd <!ATTLIST lineitem ID ID #REQUIRED> <!ATTLIST product CAT (CDROM|MBoard|RAM) #REQUIRED>
<!ATTLIST card CARDTYPE (VISA|MasterCard|Amex) #REQUIRED> payment.dtd
Processing XML with SAX Important interfaces and classes are found in org.xml.sax package We will look at the following interfaces and then study an example interface DocumentHandler -- reports on document events interface ErrorHandler – reports on validity errors class HandlerBase – implements both of the above plus two others
public interface DocumentHandler Receive notification of general document events. This is the main interface that most SAX applications implement: if the application needs to be informed of basic parsing events, it implements this interface and registers an instance with the SAX parser. The parser uses the instance to report basic document-related events like the start and end of elements and character data.
void characters(char[] ch, int start, int length) Receive notification of character data. void endDocument() Receive notification of the end of a document. void endElement(java.lang.String name) Receive notification of the end of an element. void startDocument() Receive notification of the beginning of a document. void startElement(java.lang.String name, AttributeList atts) Receive notification of the beginning of an element. Some methods from the DocumentHandler Interface
public interface ErrorHandler Basic interface for SAX error handlers. If a SAX application needs to implement customized error handling, it must implement this interface and then register an instance with the SAX parser. The parser will then report all errors and warnings through this interface. Some methods are: void error(SAXParseException exception) Receive notification of a recoverable error. void fatalError(SAXParseException exception) Receive notification of a non-recoverable error. void warning(SAXParseException exception) Receive notification of a warning.
public class HandlerBase extends java.lang.Object implements EntityResolver, DTDHandler, DocumentHandler, ErrorHandler Default base class for handlers. This class implements the default behaviour for four SAX interfaces: EntityResolver, DTDHandler, DocumentHandler, and ErrorHandler.
<!ELEMENT FixedFloatSwap ( Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > FixedFloatSwap.dtd Input
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; FixedFloatSwap.xml Input
// NotifyStr.java // Adapted from XML and Java by Maruyama, Tamura and Uramoto // IBM Tokyo Research, Addison-Wesley import java.io.*; import org.xml.sax.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; Processing Java event-driven processing
public class NotifyStr extends HandlerBase { public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); NotifyStr myHandler = new NotifyStr(); try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv [0]), myHandler); } catch (Throwable t) { t.printStackTrace (); } System.exit (0); }
public NotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); }
public void startElement(String Name, AttributeList aMap) throws SAXException { System.out.println("startElement called: element name =" + Name); // examine the attributes for(int i = 0; i < aMap.getLength(); i++) { String attName = aMap.getName(i); String type = aMap.getType(i); String value = aMap.getValue(i); System.out.println(" attribute name = " + attName + " type = " + type + " value = " + value); } }
public void endElement(String name) throws SAXException { System.out.println("endElement is called:" + name); } public void characters(char[] ch, int start, int length) throws SAXException { // build String from char array String dataFound = new String(ch,start,length); System.out.println("characters called:" + dataFound); }
public void error(SAXParseException e) throws SAXException { System.out.println("Parsing error"); System.out.println(e.toString()); }
C:\McCarthy\www\46-928\examples\sax>java NotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap startElement called: element name =Bank characters called:Pittsburgh National Corporation endElement is called:Bank startElement called: element name =Notional attribute name = currency type = ENUMERATION value = pounds characters called:100 endElement is called:Notional startElement called: element name =Fixed_Rate characters called:5 endElement is called:Fixed_Rate startElement called: element name =NumYears characters called:3 endElement is called:NumYears startElement called: element name =NumPayments characters called:6 endElement is called:NumPayments endElement is called:FixedFloatSwap endDocument called: Output
Accessing the swap from Jigsaw <!DOCTYPE FixedFloatSwap [ ] > &bankname; Saved under Www/fpml/ServerSwap.xml
// This servlet file is stored in WWW/Jigsaw/servlet/GetXML.java // This servlet returns a user selected xml file from // the Www/fpml directory and returns it to the client. import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; public class GetXML extends HttpServlet { public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { String theData = ""; String extraPath = req.getPathInfo(); extraPath = extraPath.substring(1); Servlet Code
// read the file and write it to the client try { // open file and create a DataInputStream FileInputStream theFile = new FileInputStream("c:\\Jigsaw\\Jigsaw\\Jigsaw\\Www\\fpml\\“ +extraPath); //DataInputStream dis = new DataInputStream(theFile); InputStreamReader is = new InputStreamReader(theFile); BufferedReader br = new BufferedReader(is); // read the file into the string theData String thisLine; while((thisLine = br.readLine()) != null) { theData += thisLine + "\n"; } catch(Exception e) { System.err.println("Error " + e); }
PrintWriter out = res.getWriter(); out.write(theData); System.out.println("Wrote document to client"); // write data to console System.out.println(theData); out.close(); }
// Sax Client import java.io.*; import org.xml.sax.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class JigsawNotifyStr extends HandlerBase { public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } String serverString = " String fileName = argv[0];
InputSource is = new InputSource(serverString + fileName); System.out.println("Got the input source"); SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); JigsawNotifyStr myHandler = new JigsawNotifyStr(); try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( is, myHandler); } catch (Throwable t) { System.out.println("Big error"); t.printStackTrace (); } System.exit (0); }
public JigsawNotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); } // Same as before // public void error(SAXParseException e) throws SAXException { // describe each arror and show each error method System.out.println("Parsing error"); System.out.println(e.toString()); }
Being served by the servlet <!DOCTYPE FixedFloatSwap [ ] > &bankname;
Got the input source startDocument called: Parsing error org.xml.sax.SAXParseException: Element type "FixedFloatSwap" is not declared. startElement called: element name =FixedFloatSwap characters called: Parsing error org.xml.sax.SAXParseException: Element type "Bank" is not declared. startElement called: element name =Bank characters called:Pittsburgh National Corporation endElement is called:Bank characters called: Parsing error org.xml.sax.SAXParseException: Element type "Notional" is not declared. Parsing error org.xml.sax.SAXParseException: Attribute "currency" is not declared for element "Notional". startElement called: element name =Notional attribute name = currency type = CDATA value = pounds characters called:100 endElement is called:Notional characters called: We have some parsing errors. Do you see why?
Parsing error org.xml.sax.SAXParseException: Element type "Fixed_Rate" is not declared. startElement called: element name =Fixed_Rate characters called:5 endElement is called:Fixed_Rate characters called: Parsing error org.xml.sax.SAXParseException: Element type "NumYears" is not declared. startElement called: element name =NumYears characters called:3 endElement is called:NumYears characters called: Parsing error org.xml.sax.SAXParseException: Element type "NumPayments" is not declared. startElement called: element name =NumPayments characters called:6 endElement is called:NumPayments characters called: endElement is called:FixedFloatSwap endDocument called:
The InputSource Class The SAX and DOM parsers need XML input. The “output” produced by these parsers amounts to a series of method calls (SAX) or an application programmer interface to the tree (DOM). An InputSource object can be used to provided input to the parser. InputSurce SAX or DOM Tree Events application So, how do we build an InputSource object?
Some InputSource constructors: InputSource(String pathToFile); InputSource(InputStream byteStream); InputStream(Reader characterStream); For example: String text = “ some xml ”; StringReader sr = new StringReader(text); InputSource is = new InputSource(sr); : myParser.parse(is);
But what about the DTD? public interface EntityResolver Basic interface for resolving entities. If a SAX application needs to implement customized handling for external entities, it must implement this interface and register an instance with the SAX parser using the parser's setEntityResolver method. The parser will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities, if any) before including them.
EntityResolver public InputSource resolveEntity(String publicId, String systemId) { // Add this method to the client above. The systemId String // holds the path to the dtd as specified in the xml document. // We may now access the dtd from a servlet and return an // InputStream or return null and let the parser resolve the // external entity. System.out.println("Attempting to resolve" + "Public id :" + publicId + "System id :" + systemId); return null; }