21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

SDPL 2002Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
1 Frameworks. 2 Framework Set of cooperating classes/interfaces –Structure essential mechanisms of a problem domain –Programmer can extend framework classes,
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
Fonts and colors Times New Roman “quotes” Trebuchet "quotes" yellow blue pink green violet.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
29-Jun-15 JAXB Java Architecture for XML Binding.
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
Networking Nasrullah. Input stream Most clients will use input streams that read data from the file system (FileInputStream), the network (getInputStream()/getInputStream()),
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
3/29/2001 O'Reilly Java Java API for XML Processing 1.1 What’s New Edwin Goei Engineer, Sun Microsystems.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Scripting with the DOM Ellen Pearlman Eileen Mullin Programming the Web.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
COSC617 Project XML Tools Mark Liu Sanjay Srivastava Junping Zhang.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
OOP Basics Classes & Methods (c) IDMS/SQL News
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
13-Mar-16 DOM. 2 Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
21-Jun-16 Document Object Model DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java XML IS
CHAPTER 9 JAVA AND XML.
XML Parsers By Chongbing Liu.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
DOM 8-Dec-18.
A parser for XML Documents
DOM 24-Feb-19.
SAX2 29-Jul-19.
XML and Web Services (II/2546)
Presentation transcript:

21-Jun-15 SAX (Abbreviated)

2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc (but very popular) standard SAX was developed by David Megginson and is open source XOM is a parser by Elliott Rusty Harold StAX is a new parser from Sun and BEA Systems Some others are XNI and JDOM Unlike many XML technologies, XML parsers are relatively easy

3 Types of XML parsers XML parsers can be classified as tree-based or event-based Tree-based parsers read the entire XML document into memory and stores it as a tree data structure Tree-based parsers allow random access to the XML, hence are usually more convenient to work with It’s usually possible to manipulate the tree and write out the modified XML Parse trees can take up a lot of memory DOM and XOM are tree-based Event-based (or streaming) parsers read sequentially through the XML file Event-based parsers are faster and take very little memory—this is important for large documents and for web sites SAX and StAX are event-based

4 Types of streaming XML parsers Streaming XML parsers can be classified as push or pull parsers Push parsers take control and call your methods with the XML constituents as they are encountered This type of programming is unnatural for Java programmers SAX and XNI are push parsers Pull parsers let you ask for the next XML constituent Pull parsers are similar to iterators StAX is a pull parser

5 SAX vs. StAX At this point, there seems to be no reason to use SAX rather than StAX--if you have a choice SAX and StAX are both streaming parsers StAX is faster and simpler With StAX your program has control, rather than being controlled by the parser This means: You can choose what tags to look at, rather than having to deal with them all You can stop whenever you like However, StAX is new, so most existing projects use SAX Many ideas, such as the use of factories, are the same for each

6 SAX uses callbacks SAX works through callbacks: you call the parser, it calls methods that you supply Your program main(...) startDocument(...) startElement(...) characters(...) endElement( ) endDocument( ) parse(...) The SAX parser

7 Simple SAX program The following program is adapted from CodeNotes® for XML by Gregory Brill, pages The program consists of two classes: Sample -- This class contains the main method; it Gets a factory to make parsers Gets a parser from the factory Creates a Handler object to handle callbacks from the parser Tells the parser which handler to send its callbacks to Reads and parses the input XML file Handler -- This class contains handlers for three kinds of callbacks: startElement callbacks, generated when a start tag is seen endElement callbacks, generated when an end tag is seen characters callbacks, generated for the contents of an element

8 The Sample class, I import javax.xml.parsers.*; // for both SAX and DOM import org.xml.sax.*; import org.xml.sax.helpers.*; // For simplicity, we let the operating system handle exceptions // In "real life" this is poor programming practice public class Sample { public static void main(String args[]) throws Exception { // Create a parser factory SAXParserFactory factory = SAXParserFactory.newInstance(); // Tell factory that the parser must understand namespaces factory.setNamespaceAware(true); // Make the parser SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader();

9 The Sample class, II In the previous slide we made a parser, of type XMLReader // Create a handler (Handler is my class) Handler handler = new Handler(); // Tell the parser to use this handler parser.setContentHandler(handler); // Finally, read and parse the document parser.parse("hello.xml"); } // end of Sample class You will need to put the file hello.xml : In the same directory, if you run the program from the command line Or where it can be found by the particular IDE you are using

10 The Handler class, I public class Handler extends DefaultHandler { DefaultHandler is an adapter class that defines these methods and others as do-nothing methods, to be overridden as desired We will define three very similar methods to handle (1) start tags, (2) contents, and (3) end tags--our methods will just print a line Each of these three methods could throw a SAXException // SAX calls this method when it encounters a start tag public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes attributes) throws SAXException { System.out.println("startElement: " + qualifiedName); }

11 The Handler class, II // SAX calls this method to pass in character data public void characters(char ch[ ], int start, int length) throws SAXException { System.out.println("characters: \"" + new String(ch, start, length) + "\""); } // SAX call this method when it encounters an end tag public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException { System.out.println("Element: /" + qualifiedName); } } // End of Handler class

12 Results If the file hello.xml contains: Hello World! Then the output from running java Sample will be: startElement: display characters: "Hello World!" Element: /display

13 More results Now suppose the file hello.xml contains : Hello World! Notice that the root element,, now contains a nested element and some whitespace (including newlines) The result will be as shown at the right:  startElement: display characters: "" characters: " " characters: " " startElement: i characters: "Hello" endElement: /i characters: "World!" characters: " " endElement: /display // empty string // newline // spaces // another newline

14 Factories SAX uses a parser factory A factory is an alternative to constructors Factories allow the programmer to: Decide whether or not to create a new object Decide what kind (subclass, implementation) of object to create Trivial example: class TrustMe { private TrustMe() { } // private constructor public static TrustMe makeTrust() { // factory method if ( /* test of some sort */) return new TrustMe(); } } }

15 Parser factories To create a SAX parser factory, call this method: SAXParserFactory.newInstance() This returns an object of type SAXParserFactory It may throw a FactoryConfigurationError You can then customize your parser: public void setNamespaceAware(boolean awareness) Call this with true if you are using namespaces The default (if you don’t call this method) is false public void setValidating(boolean validating) Call this with true if you want to validate against a DTD The default (if you don’t call this method) is false Validation will give an error if you don’t have a DTD

16 Getting a parser Once you have a SAXParserFactory set up (say it’s named factory ), you can create a parser with: SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader(); Note: older texts may use Parser in place of XMLReader Parser is SAX1, not SAX2, and is now deprecated SAX2 supports namespaces and some new parser properties Note: SAXParser is not thread-safe; to use it in multiple threads, create a separate SAXParser for each thread This is unlikely to be a problem in class projects

17 Declaring which handler to use Since the SAX parser will be calling our methods, we need to supply these methods In the example these are in a separate class, Handler We need to tell the parser where to find the methods: Handler handler = new Handler(); parser.setContentHandler(handler); These statements could be combined: parser.setContentHandler(new Handler()); Finally, we call the parser and tell it what file to parse: parser.parse("hello.xml"); Everything else will be done in the handler methods

18 SAX handlers A callback handler for SAX must implement these four interfaces: interface ContentHandler This is the most important interface Handles elements (tags), attributes, and text content Again, done by callbacks--you have to supply a method for each type interface DTDHandler Handles only notation and unparsed entity declarations interface EntityResolver Does customized handling for external entities interface ErrorHandler Must be implemented or parsing errors will be ignored! Adapter classes are provided for these interfaces

19 Whitespace Whitespace is usually a problem when parsing XML A nonvalidating parser cannot ignore whitespace, because it cannot distinguish it from real data You have to write code to recognize and discard whitespace A validating parser ignores whitespace where character data is not allowed For processing XML, this is usually what you want However, if you are manipulating and writing out XML, discarding whitespace ruins your indentation To capture ignorable whitespace, SAX provides a method ignorableWhitespace that you can override

20 Summing up These slides provide only the very basics of using SAX I’ve omitted descriptions of the major methods Important concepts: “Push” technology, using callbacks (common in C++) Factory methods If you would like to know more, see my slides from previous years For a good quick start, I recommend the book referenced earlier: CodeNotes® for XML by Gregory Brill

21 The End