Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?

Similar presentations


Presentation on theme: "SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?"— Presentation transcript:

1 SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently? n A: Yes, with Streaming API for XML (StAX) –general introduction –an example –comparison with SAX

2 SDPL 20113.4 Streaming API for XML2 StAX: General n Latest of standard Java XML parser interfaces –Origin: the XMLPull API (A. Slominski, ~ 2000) –developed as a Java Community Process lead by BEA Systems (2003) –included in JAXP 1.4, in Java WSDP 1.6, and in Java SE 6 (JDK 1.6) n An event-driven streaming API, like SAX –does not build in-memory representation n A "pull API" –lets the application to ask for individual events –unlike a "push API" like SAX

3 Advantages of Pull Parsing n A pull API provides events, on demand, from the chosen stream –can cancel parsing, say, after processing the header of a long message –can read multiple documents simultaneously –application-controlled access (~ iterator design pattern) usually simpler than SAX-style call- backs (~ observer design pattern) SDPL 20113.4 Streaming API for XML3

4 Cursor and Iterator APIs n StAX consists of two sets of APIs –(1) cursor APIs, and (2) iterator APIs –differ by representation of parse events (1) cursor API XMLStreamReader (1) cursor API XMLStreamReader –lower-level –methods hasNext() and next() to scan events, represented by as int constants START_DOCUMENT, START_ELEMENT,... –access methods, depending on current event type: –getName(), getAttributeValue(.. ), getText(),... SDPL 20113.4 Streaming API for XML4

5 (2) XMLEventReader Iterator API XMLEventReader provides contents of an XML document to the application using an event object iterator XMLEventReader provides contents of an XML document to the application using an event object iterator n Parse events represented as immutable XMLEvent objects –received using methods hasNext() and nextEvent() –event properties accessed through their methods –can be stored (if needed) –require more resources than the cursor API (See later) Event lookahead, without advancing in the stream, with XMLEventReader.peek() and XMLStreamReader.getEventType() Event lookahead, without advancing in the stream, with XMLEventReader.peek() and XMLStreamReader.getEventType() SDPL 20113.4 Streaming API for XML5

6 Writing APIs n StAX is a bidirectional API n allows also to write XML data through an XMLStreamWriter or an XMLEventWriter through an XMLStreamWriter or an XMLEventWriter n Useful for "marshaling" data structures into XML n Writers are not required to force well- formedness (not to mention validity) n provide some support: escaping of reserved chars like & and <, and adding unclosed end-tags SDPL 20113.4 Streaming API for XML6

7 SDPL 20113.4 Streaming API for XML7 Example of Using StAX (1/6) n Use StAX iterator interfaces to –fold element tagnames to uppercase, and to –strip comments n Outline: –Initialize »an XMLEventReader for the input document »an XMLEventWriter (for System.out ) »an XMLEventFactory for creating modified StartElement and EndElement events –Use them to read all input events, and to write some of them, possibly modified

8 SDPL 20113.4 Streaming API for XML8 StAX example (2/6) First import relevant interfaces & classes: First import relevant interfaces & classes: import java.io.*; import javax.xml.stream.*; import javax.xml.stream.events.*; import javax.xml.namespace.QName; public class capitalizeTags { public static void main(String[] args) throws FactoryConfigurationError, XMLStreamException, IOException { public static void main(String[] args) throws FactoryConfigurationError, XMLStreamException, IOException { if (args.length != 1) System.exit(1); if (args.length != 1) System.exit(1); InputStream input = new FileInputStream(args[0]); InputStream input = new FileInputStream(args[0]);

9 SDPL 20113.4 Streaming API for XML9 StAX example (3/6) Initialize XMLEventReader/Writer/Factory : Initialize XMLEventReader/Writer/Factory : XMLInputFactory xif = XMLInputFactory.newInstance(); xif.setProperty( XMLInputFactory.IS_NAMESPACE_AWARE, true); XMLInputFactory xif = XMLInputFactory.newInstance(); xif.setProperty( XMLInputFactory.IS_NAMESPACE_AWARE, true); XMLEventReader xer = xif.createXMLEventReader(input); XMLEventReader xer = xif.createXMLEventReader(input); XMLOutputFactory xof = XMLOutputFactory.newInstance(); XMLEventWriter xew = xof.createXMLEventWriter(System.out); XMLEventWriter xew = xof.createXMLEventWriter(System.out); XMLEventFactory xef = XMLEventFactory.newInstance();

10 SDPL 20113.4 Streaming API for XML10 StAX example (4/6) n Iterate over events of the InputStream: while (xer.hasNext() ) { while (xer.hasNext() ) { XMLEvent inEvent = xer.nextEvent(); XMLEvent inEvent = xer.nextEvent(); if (inEvent.isStartElement()) { if (inEvent.isStartElement()) { StartElement se = (StartElement) inEvent; StartElement se = (StartElement) inEvent; QName inQName = se.getName(); QName inQName = se.getName(); String localName = inQName.getLocalPart(); String localName = inQName.getLocalPart(); xew.add( xef.createStartElement( xew.add( xef.createStartElement( inQName.getPrefix(), inQName.getPrefix(), inQName.getNamespaceURI(), inQName.getNamespaceURI(), localName.toUpperCase(), localName.toUpperCase(), se.getAttributes(), se.getAttributes(), se.getNamespaces() ) ); se.getNamespaces() ) );

11 SDPL 20113.4 Streaming API for XML11 StAX example (5/6) n Event iteration continues, to capitalize end tags: } else if (inEvent.isEndElement()) { } else if (inEvent.isEndElement()) { EndElement ee = (EndElement) inEvent; QName inQName = ee.getName(); EndElement ee = (EndElement) inEvent; QName inQName = ee.getName(); String localName = inQName.getLocalPart(); String localName = inQName.getLocalPart(); xew.add( xef.createEndElement( xew.add( xef.createEndElement( inQName.getPrefix(), inQName.getPrefix(), inQName.getNamespaceURI(), inQName.getNamespaceURI(), localName.toUpperCase(), localName.toUpperCase(), ee.getNamespaces() ) ); ee.getNamespaces() ) );

12 SDPL 20113.4 Streaming API for XML12 StAX example (6/6) Output other events, except for comments; Finish when input ends: Output other events, except for comments; Finish when input ends: } else if (inEvent.getEventType() != XMLStreamConstants.COMMENT) { } else if (inEvent.getEventType() != XMLStreamConstants.COMMENT) { xew.add(inEvent); } xew.add(inEvent); } } // while (xer.hasNext()) } // while (xer.hasNext()) xer.close(); input.close(); xer.close(); input.close(); xew.flush(); xew.close(); xew.flush(); xew.close(); } // main() } // class capitalizeTags

13 Efficiency of Streaming APIs? n An experiment of SAX vs StAX for scanning documents n Task: Count and report the number of elements, attributes, character fragments, and total char length n Inputs: Similar prose-oriented documents, of different size – repeated fragments of W3C XML Schema Rec (Part 1) n Tested on OpenJDK 1.6.0 (different updates), with –Red Hat Linux 6.0.52, 3 GHz Pentium,1 GB RAM (”OLD”) –64 b Centos Linux 5, 2.93 GHz Intel Core 2 Duo, 4GB RAM (”NEW”) SDPL 20113.4 Streaming API for XML13

14 Essentials of the SAX Solution n Obtain and use a JAXP SAX parser: String docFile; // initialized from cmd line String docFile; // initialized from cmd line SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParserFactory spf = SAXParserFactory.newInstance(); spf.setValidating(validate); //from cmd option spf.setValidating(validate); //from cmd option spf.setNamespaceAware(true); spf.setNamespaceAware(true); SAXParser sp = spf.newSAXParser(); SAXParser sp = spf.newSAXParser(); CountHandler ch = new CountHandler(); CountHandler ch = new CountHandler(); sp.parse( new File(docFile), ch ); sp.parse( new File(docFile), ch ); ch.printResult(); // print the statistics ch.printResult(); // print the statistics SDPL 20113.4 Streaming API for XML14

15 SAX Solution: CountHandler public static class CountHandler extends DefaultHandler { public static class CountHandler extends DefaultHandler { // Instance vars for statistics: // Instance vars for statistics: int elemCount = 0, charFragCount = 0, int elemCount = 0, charFragCount = 0, totalCharLen = 0, attrCount = 0; public void startElement(String nsURI, String locName, String qName, Attributes atts) { elemCount++; attrCount += atts.getLength(); } totalCharLen = 0, attrCount = 0; public void startElement(String nsURI, String locName, String qName, Attributes atts) { elemCount++; attrCount += atts.getLength(); } public void characters(char[] buf, int start, int length) { charFragCount++; totalCharLen += length; } public void characters(char[] buf, int start, int length) { charFragCount++; totalCharLen += length; } SDPL 20113.4 Streaming API for XML15

16 Essentials of the StAX Solution n First, initialize : XMLInputFactory xif = XMLInputFactory.newInstance(); XMLInputFactory xif = XMLInputFactory.newInstance(); xif.setProperty( XMLInputFactory.IS_NAMESPACE_AWARE, true); InputStream input = new FileInputStream( docFile ); InputStream input = new FileInputStream( docFile ); int elemCount = 0, charFragCount = 0, int elemCount = 0, charFragCount = 0, totalCharLen = 0, attrCount = 0; totalCharLen = 0, attrCount = 0; n Then parse the InputStream, using (a) the cursor API, or (b) the event iterator API SDPL 20113.4 Streaming API for XML16

17 (a) StAX Cursor API Solution (1) XMLStreamReader xsr = xif.createXMLStreamReader(input); XMLStreamReader xsr = xif.createXMLStreamReader(input); while(xsr.hasNext() ) { while(xsr.hasNext() ) { int eventType = xsr.next(); int eventType = xsr.next(); switch (eventType) { switch (eventType) { case XMLEvent.START_ELEMENT: case XMLEvent.START_ELEMENT: elemCount++; elemCount++; attrCount += xsr.getAttributeCount(); attrCount += xsr.getAttributeCount(); break; break; SDPL 20113.4 Streaming API for XML17

18 (a) StAX Cursor API Solution (2) case XMLEvent.CHARACTERS: case XMLEvent.CHARACTERS: charFragCount++; charFragCount++; totalCharLen += xsr.getTextLength(); totalCharLen += xsr.getTextLength(); break; break; default: break; default: break; } // switch } // switch } // while (xsr.hasNext() ) } // while (xsr.hasNext() ) xsr.close(); xsr.close(); input.close(); input.close(); SDPL 20113.4 Streaming API for XML18

19 (b) StAX Iterator API Solution (1) XMLEventReader xer = xif.createXMLEventReader ( input ); while (xer.hasNext() ) { XMLEvent event = xer.nextEvent(); XMLEventReader xer = xif.createXMLEventReader ( input ); while (xer.hasNext() ) { XMLEvent event = xer.nextEvent(); if (event.isStartElement()) { if (event.isStartElement()) { elemCount++; elemCount++; Iterator attrs = event.asStartElement().getAttributes(); Iterator attrs = event.asStartElement().getAttributes(); while (attrs.hasNext()) { while (attrs.hasNext()) { attrs.next(); attrCount++; } attrs.next(); attrCount++; } } // if (event.isStartElement()) } // if (event.isStartElement()) SDPL 20113.4 Streaming API for XML19

20 (b) StAX Iterator API Solution (2) if (event.isCharacters()) { if (event.isCharacters()) { charFragCount++; charFragCount++; totalCharLen += ((Characters) event).getData().length(); totalCharLen += ((Characters) event).getData().length(); } } // while (xer.hasNext() ) } // while (xer.hasNext() ) xer.close(); xer.close(); input.close(); input.close(); SDPL 20113.4 Streaming API for XML20

21 Efficiency of SAX vs StAX SDPL 20113.4 Streaming API for XML21

22 Efficiency of SAX vs StAX (NEW) SDPL 20113.4 Streaming API for XML22

23 Observations n StAX cursor API is the most efficient Overhead of XMLEvent objects makes StAX iterator some 50 – 80% slower Overhead of XMLEvent objects makes StAX iterator some 50 – 80% slower n SAX is on small documents ~ 40 - 100% slower than the StAX cursor API n Overhead of DTD validation adds ~5 – 10 % to SAX parsing time n StAX loses its advantage with bigger documents: SDPL 20113.4 Streaming API for XML23

24 Times on Larger Documents SDPL 20113.4 Streaming API for XML24 Why? Let's take a look at memory usage Why? Let's take a look at memory usage

25 Memory Usage of SAX vs StAX SDPL 20113.4 Streaming API for XML25 StAX implementation has a memory leak! (Should get fixed in future releases) < 6 MB

26 Memory Usage of SAX vs StAX (NEW) SDPL 20113.4 Streaming API for XML26 Memory-leak also in the SAX implementation!

27 Circumventing the Memory Leak n The bug appears to be related to a DOCTYPE declaration with an external DTD n Without a DOCTYPE declaration –In first experiment, each API uses less than 6 MB –In second experiment, the StAX Event objects still require increasing amounts of memory; See next SDPL 20113.4 Streaming API for XML27

28 SAX vs StAX memory need (w.o. DTD) SDPL 20113.4 Streaming API for XML28

29 Speed on documents without DTD SDPL 20113.4 Streaming API for XML29

30 Speed on documents without DTD (NEW) SDPL 20113.4 Streaming API for XML30

31 SDPL 20113.4 Streaming API for XML31 StAX: Summary n Event-based streaming pull-API for XML documents n More convenient than SAX –and often more efficient, esp. the cursor API with small docs n Supports also writing of XML data n A potential substitute for SAX –NB: Sun Java Streaming XML Parser (in JDK 1.6) is non- validating (but the API allows validation, too) –once some implementation bugs (in JDK 1.6) get eliminated


Download ppt "SDPL 20113.4 Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?"

Similar presentations


Ads by Google