Processing XML Part II Parser Operations with DOM and SAX overview XML Validation with examples Processing XML with SAX (locally and on the internet)

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Internet Technologies1 XML Messaging A PowerWarning application using servlets and SAX The PowerWarning Application is from “XML and Java” by Maruyama,
Summer A-2000, Project Course-- Carnegie Mellon University 1 Financial Engineering Project Course.
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Internet Technologies1 XML DTD Validation In making up the slides for this lecture, I borrowed material from several sources: “Data on the Web” Abiteboul,
Internet Technologies1 XML Grammars Internet Technologies.
Internet Technologies XML Validation In making up the slides for this lecture, I borrowed material from several very nice sources: “Data on the Web” Abiteboul,
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
Summer A-2000, Project Course-- Carnegie Mellon University 1 Financial Engineering Project Course.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable.
Internet Technologies1 More XML Schema The main source for these slides is “The XML Companion” by Bradley Other resources:
OCT1 Java and XML (DOM and SAX) Some of the material for these slides came from the following sources: “XML a Manager’s Guide” by Kevin Dick “The XML Companion”
Internet Technologies Java and XML (DOM and SAX) Some of the material for these slides came from the following sources: “XML a Manager’s Guide” by Kevin.
Project Course XML Validation In making up the slides for this lecture, I borrowed material from several very nice sources: “Data on the Web” Abiteboul,
Internet Technologies1 Java and XML (DOM and SAX) Some of the material for these slides came from the following sources: “XML a Manager’s Guide” by Kevin.
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Internet Technologies Review Week 1 How does Jigsaw differ from EchoServer.java? What abstractions are made available to the servlet writer (under.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
SNU OOPSLA Lab. DOM/SAX Applications The ubiquitous XML(9) © copyright 2001 SNU OOPSLA Lab.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
1 Validation SAX-DOM. Objectives 2  Schema Validation Framework  XML Validation After Transformation  Workshops.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
Parsing with SAX using Java Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java XML IS
CHAPTER 9 JAVA AND XML.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
New Perspectives on XML
A parser for XML Documents
SAX2 29-Jul-19.
Presentation transcript:

Processing XML Part II Parser Operations with DOM and SAX overview XML Validation with examples Processing XML with SAX (locally and on the internet)

FixedFloatSwap.xml

FixedFloatSwap.dtd <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >

Operation of a Tree-based Parser Tree-Based Parser Application Logic Document Tree Valid XML DTD XML Document

Tree Benefits Some data preparation tasks require early access to data that is further along in the document (e.g. we wish to extract titles to build a table of contents) New tree construction is easier (e.g. xslt works from a tree to convert FpML to WML)

Operation of an Event Based Parser Event-Based Parser Application Logic Valid XML DTD XML Document

Operation of an Event Based Parser Event-Based Parser Application Logic Valid XML DTD XML Document public void startDocument () public void endDocument () public void startElement (String name, AttributeList attrs) public void endElement (String name) public void characters (char buf [], int offset, int len) public void error(SAXParseException e) throws SAXException { System.out.println("\n\n--Invalid document ---" + e); }

Event-Driven Benefits We do not need the memory required for trees Parsing can be done faster with no tree construction going on

XML Validation A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings. Software developers should consider batch validation to be analogous to program compilation, with similar errors detected. Interactive validation involves constant comparison of the DTD against a document as it is being created.

XML Validation The benefits of validating documents against a DTD include: Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input. Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents.

XML Validation Examples XML elements may contain further, embedded elements, and the entire document must be enclosed by a single document element. The degree to which an element’s content is organized into child elements is often termed its granularity. Some hierarchical structures may be recursive. The Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents.

// Validate.java import java.io.*; import org.xml.sax.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class Validate extends HandlerBase { public static boolean valid = true; public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); We’ll run this program against several xml files with DTD’s.

try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv [0]), new Validate()); } catch (Throwable t) { t.printStackTrace (); } System.out.println("Valid document is " + valid); System.exit (0); } public void error(SAXParseException e) throws SAXException { System.out.println(e.toString()); valid = false; }

XML Document DTD Valid document is true

XML Document DTD Valid document is false

XML Document

DTD C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times Valid document is true

The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data) XML Document

C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" -- (#PCDATA) org.xml.sax.SAXParseException: Element type "StartYear" is not declared. org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (# PCDATA) org.xml.sax.SAXParseException: Element type "EndYear" is not declared. Valid document is false Output of program after being modified to display the error. DTD

There are strict rules which must be applied when an element is allowed to contain both text and child elements. The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”). The group must be optional and repeatable. This is known as a mixed content model.

DTD H 2 O is water. XML Document Valid document is true

Attributes An attribute is associated with a particular element by the DTD and is assigned an attribute type. The attribute type can restrict the range of values it can hold. Example attribute types include : CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right)

DTD XML Document C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED. Valid document is false

DTD XML Document Valid document is true

DTD XML Document Valid document is true #IMPLIED means optional

DTD XML Document Valid document is true

<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; <!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) > DTD Document using a General Entity Validate is true

<xsl:stylesheet xmlns:xsl=" version="1.0"> XSLT Program

C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark. xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwa p.xsl FixedFloatSwap.wml C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml Mellon National Bank and Trust XSLT OUTPUT

<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; An external text entity

Mellon Bank And Trust Corporation When you need a friend! XSLT Output Mellon Bank And Trust Corporation When you need a friend! JustAFile.dat

<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > XML Document DTD Internal Parameter Entities

&bankname; <!ELEMENT FixedFloatSwap (Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > XML Document DTD General Entity defined in the DTD

will not be parsed for markup]]> <!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) > XML Document DTD CDATA Section

<xsl:stylesheet xmlns:xsl=" version="1.0"> h XSLT Program

This is text that <b>will not be parsed for markup XSLT Output

DTD Components Kevin Dick 123 Anywhere Lane Apt 1b Palo Alto CA USA Order.xml

Kevin Dick 123 Not The Same Lane Work Place Palo Alto CA USA An order may have more than one address.

440BX Motherboard MB PC-100 DIMM x CD-ROM 1 50 Several products may be purchased.

Kevin S. Dick /01 The payment is with a Visa card. Valid document is true

order.dtd <!ATTLIST ORDER SOURCE (web | phone | retail) #REQUIRED CUSTOMERTYPE (consumer | business) "consumer" CURRENCY CDATA "USD" > Define an order based on other elements.

%anAddress; %aLineItem; %aPayment; The other elements are in their own dtd files. External parameter entities

address.dtd <!ELEMENT address (firstname, middlename?, lastname, street+, city, state,postal,country)> <!ATTLIST address ADDTYPE (bill | ship | billship) "billship"> <!ATTLIST street ORDER CDATA #IMPLIED>

lineitem.dtd <!ATTLIST lineitem ID ID #REQUIRED> <!ATTLIST product CAT (CDROM|MBoard|RAM) #REQUIRED>

<!ATTLIST card CARDTYPE (VISA|MasterCard|Amex) #REQUIRED> payment.dtd

Processing XML with SAX Important interfaces and classes are found in org.xml.sax package We will look at the following interfaces and then study an example interface DocumentHandler -- reports on document events interface ErrorHandler – reports on validity errors class HandlerBase – implements both of the above plus two others

public interface DocumentHandler Receive notification of general document events. This is the main interface that most SAX applications implement: if the application needs to be informed of basic parsing events, it implements this interface and registers an instance with the SAX parser. The parser uses the instance to report basic document-related events like the start and end of elements and character data.

void characters(char[] ch, int start, int length) Receive notification of character data. void endDocument() Receive notification of the end of a document. void endElement(java.lang.String name) Receive notification of the end of an element. void startDocument() Receive notification of the beginning of a document. void startElement(java.lang.String name, AttributeList atts) Receive notification of the beginning of an element. Some methods from the DocumentHandler Interface

public interface ErrorHandler Basic interface for SAX error handlers. If a SAX application needs to implement customized error handling, it must implement this interface and then register an instance with the SAX parser. The parser will then report all errors and warnings through this interface. Some methods are: void error(SAXParseException exception) Receive notification of a recoverable error. void fatalError(SAXParseException exception) Receive notification of a non-recoverable error. void warning(SAXParseException exception) Receive notification of a warning.

public class HandlerBase extends java.lang.Object implements EntityResolver, DTDHandler, DocumentHandler, ErrorHandler Default base class for handlers. This class implements the default behaviour for four SAX interfaces: EntityResolver, DTDHandler, DocumentHandler, and ErrorHandler.

<!ELEMENT FixedFloatSwap ( Bank, Notional, Fixed_Rate, NumYears, NumPayments ) > FixedFloatSwap.dtd Input

<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; FixedFloatSwap.xml Input

// NotifyStr.java // Adapted from XML and Java by Maruyama, Tamura and Uramoto // IBM Tokyo Research, Addison-Wesley import java.io.*; import org.xml.sax.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; Processing Java event-driven processing

public class NotifyStr extends HandlerBase { public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); NotifyStr myHandler = new NotifyStr(); try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv [0]), myHandler); } catch (Throwable t) { t.printStackTrace (); } System.exit (0); }

public NotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); }

public void startElement(String Name, AttributeList aMap) throws SAXException { System.out.println("startElement called: element name =" + Name); // examine the attributes for(int i = 0; i < aMap.getLength(); i++) { String attName = aMap.getName(i); String type = aMap.getType(i); String value = aMap.getValue(i); System.out.println(" attribute name = " + attName + " type = " + type + " value = " + value); } }

public void endElement(String name) throws SAXException { System.out.println("endElement is called:" + name); } public void characters(char[] ch, int start, int length) throws SAXException { // build String from char array String dataFound = new String(ch,start,length); System.out.println("characters called:" + dataFound); }

public void error(SAXParseException e) throws SAXException { System.out.println("Parsing error"); System.out.println(e.toString()); }

C:\McCarthy\www\46-928\examples\sax>java NotifyStr FixedFloatSwap.xml startDocument called: startElement called: element name =FixedFloatSwap startElement called: element name =Bank characters called:Pittsburgh National Corporation endElement is called:Bank startElement called: element name =Notional attribute name = currency type = ENUMERATION value = pounds characters called:100 endElement is called:Notional startElement called: element name =Fixed_Rate characters called:5 endElement is called:Fixed_Rate startElement called: element name =NumYears characters called:3 endElement is called:NumYears startElement called: element name =NumPayments characters called:6 endElement is called:NumPayments endElement is called:FixedFloatSwap endDocument called: Output

Accessing the swap from Jigsaw <!DOCTYPE FixedFloatSwap [ ] > &bankname; Saved under Www/fpml/ServerSwap.xml

// This servlet file is stored in WWW/Jigsaw/servlet/GetXML.java // This servlet returns a user selected xml file from // the Www/fpml directory and returns it to the client. import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; public class GetXML extends HttpServlet { public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { String theData = ""; String extraPath = req.getPathInfo(); extraPath = extraPath.substring(1); Servlet Code

// read the file and write it to the client try { // open file and create a DataInputStream FileInputStream theFile = new FileInputStream("c:\\Jigsaw\\Jigsaw\\Jigsaw\\Www\\fpml\\“ +extraPath); //DataInputStream dis = new DataInputStream(theFile); InputStreamReader is = new InputStreamReader(theFile); BufferedReader br = new BufferedReader(is); // read the file into the string theData String thisLine; while((thisLine = br.readLine()) != null) { theData += thisLine + "\n"; } catch(Exception e) { System.err.println("Error " + e); }

PrintWriter out = res.getWriter(); out.write(theData); System.out.println("Wrote document to client"); // write data to console System.out.println(theData); out.close(); }

// Sax Client import java.io.*; import org.xml.sax.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class JigsawNotifyStr extends HandlerBase { public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java NotifyStr filename.xml"); System.exit (1); } String serverString = " String fileName = argv[0];

InputSource is = new InputSource(serverString + fileName); System.out.println("Got the input source"); SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); JigsawNotifyStr myHandler = new JigsawNotifyStr(); try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( is, myHandler); } catch (Throwable t) { System.out.println("Big error"); t.printStackTrace (); } System.exit (0); }

public JigsawNotifyStr() {} public void startDocument() throws SAXException { System.out.println("startDocument called:"); } public void endDocument() throws SAXException { System.out.println("endDocument called:"); } // Same as before // public void error(SAXParseException e) throws SAXException { // describe each arror and show each error method System.out.println("Parsing error"); System.out.println(e.toString()); }

Being served by the servlet <!DOCTYPE FixedFloatSwap [ ] > &bankname;

Got the input source startDocument called: Parsing error org.xml.sax.SAXParseException: Element type "FixedFloatSwap" is not declared. startElement called: element name =FixedFloatSwap characters called: Parsing error org.xml.sax.SAXParseException: Element type "Bank" is not declared. startElement called: element name =Bank characters called:Pittsburgh National Corporation endElement is called:Bank characters called: Parsing error org.xml.sax.SAXParseException: Element type "Notional" is not declared. Parsing error org.xml.sax.SAXParseException: Attribute "currency" is not declared for element "Notional". startElement called: element name =Notional attribute name = currency type = CDATA value = pounds characters called:100 endElement is called:Notional characters called: We have some parsing errors. Do you see why?

Parsing error org.xml.sax.SAXParseException: Element type "Fixed_Rate" is not declared. startElement called: element name =Fixed_Rate characters called:5 endElement is called:Fixed_Rate characters called: Parsing error org.xml.sax.SAXParseException: Element type "NumYears" is not declared. startElement called: element name =NumYears characters called:3 endElement is called:NumYears characters called: Parsing error org.xml.sax.SAXParseException: Element type "NumPayments" is not declared. startElement called: element name =NumPayments characters called:6 endElement is called:NumPayments characters called: endElement is called:FixedFloatSwap endDocument called:

The InputSource Class The SAX and DOM parsers need XML input. The “output” produced by these parsers amounts to a series of method calls (SAX) or an application programmer interface to the tree (DOM). An InputSource object can be used to provided input to the parser. InputSurce SAX or DOM Tree Events application So, how do we build an InputSource object?

Some InputSource constructors: InputSource(String pathToFile); InputSource(InputStream byteStream); InputStream(Reader characterStream); For example: String text = “ some xml ”; StringReader sr = new StringReader(text); InputSource is = new InputSource(sr); : myParser.parse(is);

But what about the DTD? public interface EntityResolver Basic interface for resolving entities. If a SAX application needs to implement customized handling for external entities, it must implement this interface and register an instance with the SAX parser using the parser's setEntityResolver method. The parser will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities, if any) before including them.

EntityResolver public InputSource resolveEntity(String publicId, String systemId) { // Add this method to the client above. The systemId String // holds the path to the dtd as specified in the xml document. // We may now access the dtd from a servlet and return an // InputStream or return null and let the parser resolve the // external entity. System.out.println("Attempting to resolve" + "Public id :" + publicId + "System id :" + systemId); return null; }