© 2001-2002 Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
SDPL 2002Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
SDPL 2003Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-
Tomcat Java and XML. Announcements  Final homework assigned Wednesday  Two week deadline  Will cover servlets + JAXP.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Generic Connection Framework Connection FileConnectionSocketConnectionHTTPConnection InputConnection OutputConnection StreamConnection.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
JAX- Java APIs for XML by J. Pearce. Some XML Standards Basic –SAX (sequential access parser) –DOM (random access parser) –XSL (XSLT, XPATH) –DTD Schema.
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
The Joy of SAX (and DOM, and JDOM…) Bill MacCartney 11 October 2004.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
3/29/2001 O'Reilly Java Java API for XML Processing 1.1 What’s New Edwin Goei Engineer, Sun Microsystems.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
EXtensible Markup Language (XML) James Atlas July 15, 2008.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
By Nicholas Policelli An Introduction to Java. Basic Program Structure public class ClassName { public static void main(String[] args) { program statements.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
1 Recitation 8. 2 Outline Goals of this recitation: 1.Learn about loading files 2.Learn about command line arguments 3.Review of Exceptions.
JSP Tag Libraries Lec Last Lecture Example We incorporated JavaBeans in “Course Outline” Example But still have to write java code inside java.jsp.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
Document Object Model DOM. Agenda l Introduction to DOM l Java API for XML Parsing (JAXP) l Installation and setup l Steps for DOM parsing l Example –Representing.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
1 Validation SAX-DOM. Objectives 2  Schema Validation Framework  XML Validation After Transformation  Workshops.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
Java API for XML Processing
May 8, 2006 MAGE v1 and MAGE v2 Michael Miller Lead Software Developer Rosetta Biosoftware NCI MAGE Jamboree.
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
Lecture Transforming Data: Using Apache Xalan to apply XSLT transformations Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Parsing with SAX using Java Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
{ XML Technologies } BY: DR. M’HAMED MATAOUI
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
In this session, you will learn to:
Parsing XML into programming languages
XML Parsers By Chongbing Liu.
Jagdish Gangolly State University of New York at Albany
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
A parser for XML Documents
SAX2 29-Jul-19.
Presentation transcript:

© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX

SAX2 Agenda Introduction to SAX Installation and setup Steps for SAX parsing Defining a content handler Examples –Printing the Outline of an XML Document –Counting Book Orders Defining an error handler Validating a document

SAX3 Simple API for XML (SAX) Parse and process XML documents Documents are read sequentially and callbacks are made to handlers Event-driven model for processing XML content SAX Versions –SAX 1.0 (May 1998) –SAX 2.0 (May 2000) Namespace addition –Official Website for SAX

SAX4 SAX Advantages and Disadvantages Advantages –Do not need to process and store the entire document (low memory requirement) Can quickly skip over parts not of interest –Fast processing Disadvantages –Limited API Every element is processed through the same event handler Need to keep track of location in document and, in cases, store temporary data –Only traverse the document once

SAX5 Java API for XML Parsing (JAXP) SAXParser XMLReaderParser SAX 1.0SAX 2.0 JAXP provides a vendor-neutral interface to the underlying SAX 1.0/2.0 parser

SAX6 SAX Installation and Setup 1.Download a SAX 2-compliant parser Java-based XML parsers at Recommend Apache Xerces-J parser at 2.Download the Java API for XML Processing (JAXP) JAXP is a small layer on top of SAX which supports specifying parsers through system properties versus hard coded See Note: Apache Xerces-J already incorporates JAXP

SAX7 SAX Installation and Setup (continued) 3.Set your CLASSPATH to include the SAX (and JAXP) classes set CLASSPATH=xerces_install_dir\xerces.jar; %CLASSPATH% or setenv CLASSPATH xerces_install_dir/xerces.jar: $CLASSPATH For servlets, place xerces.jar in the server’s lib directory Note: Tomcat 4.0 is prebundled with xerces.jar Xerces-J already incorporates JAXP For other parsers you may need to add jaxp.jar to your classpath and servlet lib directory

SAX8 Aside: Using Xerces with Tomcat 3.2.x Problem –Tomcat 3.2.x may load the provided SAX 1.0 parser first ( parser.jar ) before xerces.jar, effectively eliminating namespace support required in SAX 2.0 Solutions 1.Set up a static CLASSPATH and place xerces.jar first in the list 2.As the files are loaded alphabetically, rename parser.jar to z_parser.jar and xml.jar to z_xml.jar

SAX9 SAX Parsing SAX parsing has two high-level tasks: 1.Creating a content handler to process the XML elements when they are encountered 2.Invoking a parser with the designated content handler and document

Callbacks SAX works through callbacks: you call the parser, it calls methods that you supply Your program main(...) startDocument(...) startElement(...) characters(...) endElement( ) endDocument( ) parse(...) The SAX parser

SAX11 Steps for SAX Parsing 1.Tell the system which parser you want to use 2.Create a parser instance 3.Create a content handler to respond to parsing events 4.Invoke the parser with the designated content handler and document

SAX12 Step 1: Specifying a Parser Approaches to specify a parser –Set a system property for javax.xml.parsers.SAXParserFactory –Specify the parser in jre_dir/lib/jaxp.properties –Through the J2EE Services API and the class specified in META-INF/services/ javax.xml.parsers.SAXParserFactory –Use system-dependant default parser (check documentation)

SAX13 Specifying a Parser, Example The following example: –Permits the user to specify the parser through the command line –D option java –Djavax.xml.parser.SAXParserFactory= com.sun.xml.parser.SAXParserFactoryImpl... –Uses the Apache Xerces parser otherwise public static void main(String[] args) { String jaxpPropertyName = "javax.xml.parsers.SAXParserFactory"; if (System.getProperty(jaxpPropertyName) == null) { String apacheXercesPropertyValue = "org.apache.xerces.jaxp.SAXParserFactoryImpl"; System.setProperty(jaxpPropertyName, apacheXercesPropertyValue); }... }

SAX14 Step 2: Creating a Parser Instance First create an instance of a parser factory, then use that to create a SAXParser object SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser parser = factory.newSAXParser(); –To set up namespace awareness and validation, use factory.setNamespaceAware(true) factory.setValidating(true)

SAX15 Step 3: Create a Content Handler Content handler responds to parsing events –Typically a subclass of DefaultHandler public class MyHandler extends DefaultHandler { // Callback methods... } Primary event methods (callbacks) –startDocument, endDocument Respond to the start and end of the document –startElement, endElement Respond to the start and end tags of an element –characters, ignoreableWhitespace Respond to the tag body

SAX16 ContentHandler startElement Method Declaration public void startElement(String nameSpaceURI, String localName, String qualifiedName, Attributes attributes) throws SAXException Arguments –namespaceUri URI uniquely identifying the namespace –localname Element name without prefix –qualifiedName Complete element name, including prefix –attributes Attributes object representing the attributes of the element

SAX17 Anatomy of an Element XML Processing with Java namespaceUri qualifiedNameattribute[1] localname

SAX18 ContentHandler characters Method Declaration public void characters(char[] chars, int startIndex, int length) throws SAXException Arguments –chars Relevant characters form XML document To optimize parsers, the chars array may represent more of the XML document than just the element PCDATA may cause multiple invocations of characters –startIndex Starting position of element –length The number of characters to extract

SAX19 Step 4: Invoke the Parser Call the parse method, supplying: 1.The content handler 2.The XML document File, input stream, or org.xml.sax.InputSource parser.parse(filename, handler)

SAX20 SAX Example 1: Printing the Outline of an XML Document Approach –Define a content handler to respond to three parts of an XML document: start tags, end tag, and tag bodies –Content handler implementation overrides the following three methods: startElement –Prints a message when start tag is found with attributes listed in parentheses –Adjusts (increases by 2 spaces) the indentation endElement –Subtracts 2 from the indentation and prints a message indicating that an end tag was found characters –Prints the first word of the tag body

SAX21 SAX Example 1: PrintHandler import org.xml.sax.*; import org.xml.sax.helpers.*; import java.util.StringTokenizer; public class PrintHandler extends DefaultHandler { private int indentation = 0; /** When you see a start tag, print it out and then * increase indentation by two spaces. If the * element has attributes, place them in parens * after the element name. */ public void startElement(String namespaceUri, String localName, String qualifiedName, Attributes attributes) throws SAXException { indent(indentation); System.out.print("Start tag: " + qualifiedName);

SAX22 SAX Example 1: PrintHandler (continued)... int numAttributes = attributes.getLength(); // For just print out "someTag". But for //, print out // "someTag (att1=Val1, att2=Val2). if (numAttributes > 0) { System.out.print(" ("); for(int i=0; i<numAttributes; i++) { if (i>0) { System.out.print(", "); } System.out.print(attributes.getQName(i) + "=" + attributes.getValue(i)); } System.out.print(")"); } System.out.println(); indentation = indentation + 2; }...

SAX23 SAX Example 1: PrintHandler (continued) /** When you see the end tag, print it out and decrease * indentation level by 2. */ public void endElement(String namespaceUri, String localName, String qualifiedName) throws SAXException { indentation = indentation - 2; indent(indentation); System.out.println("End tag: " + qualifiedName); } private void indent(int indentation) { for(int i=0; i<indentation; i++) { System.out.print(" "); }...

SAX24 SAX Example 1: PrintHandler (continued) /** Print out the first word of each tag body. */ public void characters(char[] chars, int startIndex, int length) { String data = new String(chars, startIndex, length); // Whitespace makes up default StringTokenizer delimeters StringTokenizer tok = new StringTokenizer(data); if (tok.hasMoreTokens()) { indent(indentation); System.out.print(tok.nextToken()); if (tok.hasMoreTokens()) { System.out.println("..."); } else { System.out.println(); }

SAX25 SAX Example 1: SAXPrinter import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; public class SAXPrinter { public static void main(String[] args) { String jaxpPropertyName = "javax.xml.parsers.SAXParserFactory"; // Pass the parser factory in on the command line with // -D to override the use of the Apache parser. if (System.getProperty(jaxpPropertyName) == null) { String apacheXercesPropertyValue = "org.apache.xerces.jaxp.SAXParserFactoryImpl"; System.setProperty(jaxpPropertyName, apacheXercesPropertyValue); }

SAX26 SAX Example 1: SAXPrinter (continued)... String filename; if (args.length > 0) { filename = args[0]; } else { String[] extensions = { "xml", "tld" }; WindowUtilities.setNativeLookAndFeel(); filename = ExtensionFileFilter.getFileName(".", "XML Files", extensions); if (filename == null) { filename = "test.xml"; } printOutline(filename); System.exit(0); }...

SAX27 SAX Example 1: SAXPrinter (continued)... public static void printOutline(String filename) { DefaultHandler handler = new PrintHandler(); SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser parser = factory.newSAXParser(); parser.parse(filename, handler); } catch(Exception e) { String errorMessage = "Error parsing " + filename + ": " + e; System.err.println(errorMessage); e.printStackTrace(); }

SAX28 SAX Example 1: orders.xml Luxury Yachts, Inc. M-1 <standardFeatures oars="plastic" lifeVests="none"> false...

SAX29 SAX Example 1: Result Start tag: orders Start tag: order Start tag: count 1 End tag: count Start tag: price 9.95 End tag: price Start tag: yacht Start tag: manufacturer Luxury... End tag: manufacturer Start tag: model M-1 End tag: model Start tag: standardFeatures (oars=plastic, lifeVests=none) false End tag: standardFeatures End tag: yacht End tag: order... End tag: orders

SAX30 SAX Example 2: Counting Book Orders Objective –To process XML files that look like: and count up how many copies of Core Web Programming (ISBN ) are contained in the order

SAX31 SAX Example 2: Counting Book Orders (continued) Problem –SAX doesn’t store data automatically –The isbn element comes after the count element –Need to record every count temporarily, but only add the temporary value (to the running total) when the ISBN number matches

SAX32 SAX Example 2: Approach Define a content handler to override the following four methods: –startElement Checks whether the name of the element is either count or isbn Set flag to tell characters method be on the lookout –endElement Again, checks whether the name of the element is either count or isbn If so, turns off the flag that the characters method watches

SAX33 SAX Example 2: Approach (continued) –characters Subtracts 2 from the indentation and prints a message indicating that an end tag was found –endDocument Prints out the running count in a Message Dialog

SAX34 SAX Example 2: CountHandler import org.xml.sax.*; import org.xml.sax.helpers.*;... public class CountHandler extends DefaultHandler { private boolean collectCount = false; private boolean collectISBN = false; private int currentCount = 0; private int totalCount = 0; public void startElement(String namespaceUri, String localName, String qualifiedName, Attributes attributes) throws SAXException { if (qualifiedName.equals("count")) { collectCount = true; currentCount = 0; } else if (qualifiedName.equals("isbn")) { collectISBN = true; }...

SAX35 SAX Example 2: CountHandler (continued)... public void endElement(String namespaceUri, String localName, String qualifiedName) throws SAXException { if (qualifiedName.equals("count")) { collectCount = false; } else if (qualifiedName.equals("isbn")) { collectISBN = false; } public void endDocument() throws SAXException { String message = "You ordered " + totalCount + " copies of \n" + "Core Web Programming Second Edition.\n"; if (totalCount < 250) { message = message + "Please order more next time!"; } else { message = message + "Thanks for your order."; } JOptionPane.showMessageDialog(null, message); }

SAX36 SAX Example 2: CountHandler (continued)... public void characters(char[] chars, int startIndex, int length) { if (collectCount || collectISBN) { String dataString = new String(chars, startIndex, length).trim(); if (collectCount) { try { currentCount = Integer.parseInt(dataString); } catch(NumberFormatException nfe) { System.err.println("Ignoring malformed count: " + dataString); } } else if (collectISBN) { if (dataString.equals(" ")) { totalCount = totalCount + currentCount; }

SAX37 SAX Example 2: CountBooks import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; public class CountBooks { public static void main(String[] args) { String jaxpPropertyName = "javax.xml.parsers.SAXParserFactory"; // Use -D to override the use of the Apache parser. if (System.getProperty(jaxpPropertyName) == null) { String apacheXercesPropertyValue = "org.apache.xerces.jaxp.SAXParserFactoryImpl"; System.setProperty(jaxpPropertyName, apacheXercesPropertyValue); } String filename; if (args.length > 0) { filename = args[0]; } else {... } countBooks(filename); System.exit(0); }

SAX38 SAX Example 2: CountBooks (continued) private static void countBooks(String filename) { DefaultHandler handler = new CountHandler(); SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser parser = factory.newSAXParser(); parser.parse(filename, handler); } catch(Exception e) { String errorMessage = "Error parsing " + filename + ": " + e; System.err.println(errorMessage); e.printStackTrace(); }

SAX39 SAX Example 2: orders.xml Core Web Programming Second Edition Marty Hall Larry Brown...

SAX40 SAX Example 2: Result

SAX41 Error Handlers Responds to parsing errors –Typically a subclass of DefaultErrorHandler Useful callback methods –error Nonfatal error Usual a result of document validity problems –fatalError A fatal error resulting from a malformed document –Receive a SAXParseException from which to obtain the location of the problem ( getColumnNumber, getLineNumber )

SAX42 import org.xml.sax.*; import org.apache.xml.utils.*; class MyErrorHandler extends DefaultErrorHandler { public void error(SAXParseException exception) throws SAXException { System.out.println( "**Parsing Error**\n" + " Line: " + exception.getLineNumber() + "\n" + " URI: " + exception.getSystemId() + "\n" + " Message: " + exception.getMessage() + "\n"); throw new SAXException("Error encountered"); } Error Handler Example

SAX43 Namespace Awareness and Validation Approaches 1.Through the SAXParserFactory factory.setNamespaceAware(true) factory.setValidating(true) SAXParser parser = factory.newSAXParser(); 2.By setting XMLReader features XMLReader reader = parser.getXMLReader(); reader.setFeature( " true); reader.setFeature( " false); Note: a SAXParser is a vendor-neutral wrapper around a SAX 2 XMLReader

SAX44 Validation Example public class SAXValidator { public static void main(String[] args) { String jaxpPropertyName = "javax.xml.parsers.SAXParserFactory"; // Use -D to override the use of the Apache parser. if (System.getProperty(jaxpPropertyName) == null) { String apacheXercesPropertyValue = "org.apache.xerces.jaxp.SAXParserFactoryImpl"; System.setProperty(jaxpPropertyName, apacheXercesPropertyValue); } String filename; if (args.length > 0) { filename = args[0]; } else {... } validate(filename); System.exit(0); }...

SAX45 Validation Example (continued)... public static void validate(String filename) { DefaultHandler contentHandler = new DefaultHandler(); ErrorHandler errHandler = new MyErrorHandler(); SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); try { SAXParser parser = factory.newSAXParser(); XMLReader reader = parser.getXMLReader(); reader.setContentHandler(contentHandler); reader.setErrorHandler(errHandler); reader.parse(new InputSource(filename)); } catch(Exception e) { String errorMessage = "Error parsing " + filename; System.out.println(errorMessage); }

SAX46 Instructors.xml <!DOCTYPE jhu [ ]> Larry Brown Hall Marty

SAX47 Validation Results >java SAXValidator Parsing Error: Line: 16 URI: file:///C:/CWP2-Book/chapter23/Instructors.xml Message: The content of element type "instructor“ must match "(firstname,lastname)+". Error parsing C:\CWP2-Book\chapter23\Instructors.xml

SAX48 Summary SAX processing of XML documents is fast and memory efficient JAXP is a simple API to provide vendor neutral SAX parsing –Parser is specified through system properties Processing is achieved through event call backs –Parser communicates with a DocumentHandler –May require tracking the location in document and storing data in temporary variables Parsing properties (validation, namespace awareness) are set through the SAXParser or underlying XMLReader

© Marty Hall, Larry Brown Web core programming 49 Questions?