XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
JAX- Java APIs for XML by J. Pearce. Some XML Standards Basic –SAX (sequential access parser) –DOM (random access parser) –XSL (XSLT, XPATH) –DTD Schema.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
1 Processing XML with Java Dr. Praveen Madiraju Modified from Dr.Sagiv ’ s slides.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
5 Processing XML Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
1 Processing XML with Java Modified from Dr.Sagiv ’ s slides.
SDPL Streaming API for XML1 3.4 Streaming API for XML (StAX) n Could we process XML documents more conveniently than with SAX, and yet more efficiently?
1 4/13/01 CSE 121/131 Programming Spring 2001 Lecture Notes 7  A. Sahuguet & V.Tannen.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
Document Object Model DOM. Agenda l Introduction to DOM l Java API for XML Parsing (JAXP) l Installation and setup l Steps for DOM parsing l Example –Representing.
SNU OOPSLA Lab. DOM/SAX Applications The ubiquitous XML(9) © copyright 2001 SNU OOPSLA Lab.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
© Marty Hall, Larry Brown Web core programming 1 Simple API for XML SAX.
SAX2 and DOM2 Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
Processing of structured documents Part 4. XML processing model zXML processor is used to read XML documents and provide access to their content and structure.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
13-Mar-16 DOM. 2 Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
USING ANDROID WITH THE DOM. Slide 2 Lecture Summary DOM concepts SAX vs DOM parsers Parsing HTTP results The Android DOM implementation.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
21-Jun-16 Document Object Model DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
Parsing with SAX using Java Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML. Contents  Parsing an XML Document  Validating XML Documents.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
XML Parsers.
XML Parsers By Chongbing Liu.
Jagdish Gangolly State University of New York at Albany
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
WaysInJavaToParseXML
A parser for XML Documents
DOM 24-Feb-19.
XML Parsers.
SAX2 29-Jul-19.
Presentation transcript:

XML Parsers By Chongbing Liu

XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)

What is a XML Parser? It is a software library (or a package) that provides methods (or interfaces) for client applications to work with XML documents It checks the well-formattedness It may validate the documents It does a lot of other detailed things so that a client is shielded from that complexities

What is a XML Parser? (continued)

DOM and SAX Parsers in general DOM: Document Object Model SAX: Simple API for XML A DOM parser implements DOM API A SAX parser implement SAX API Most major parsers implement both DOM and SAX API’s

DOM and SAX Parsers DOM parsers DOM Document object Main features of DOM parsers

DOM and SAX Parsers DOM Document Object A DOM document is an object containing all the information of an XML document It is composed of a tree (DOM tree) of nodes, and various nodes that are somehow associated with other nodes in the tree but are not themselves part of the DOM tree

DOM and SAX Parsers DOM Document Object There are 12 types of nodes in a DOM Document object Document node Element node Text node Attribute node Processing instruction node …….

DOM and SAX Parsers DOM parsers – continued (Appendix) Sample XML document …… 20 ……

DOM and SAX Parsers DOM parsers – continued (Appendix)

DOM and SAX Parsers main features of DOM parsers A DOM parser creates an internal structure in memory which is a DOM document object Client applications get the information of the original XML document by invoking methods on this Document object or on other objects it contains DOM parser is tree-based (or DOM obj-based) Client application seems to be pulling the data actively, from the data flow point of view

DOM and SAX Parsers main features of DOM parsers (cont.) Advantage: (1) It is good when random access to widely separated parts of a document is required (2) It supports both read and write operations Disadvantage: (1) It is memory inefficient (2) It seems complicated, although not really

DOM and SAX Parsers SAX parsers It does not first create any internal structure Client does not specify what methods to call Client just overrides the methods of the API and place his own code inside there When the parser encounters start-tag, end- tag,etc., it thinks of them as events

DOM and SAX Parsers SAX parsers (cont.) When such an event occurs, the handler automatically calls back to a particular method overridden by the client, and feeds as arguments the method what it sees SAX parser is event-based,it works like an event handler in Java (e.g. MouseAdapter) Client application seems to be just receiving the data inactively, from the data flow point of view

DOM and SAX Parsers SAX parsers (cont.) Advantage: (1) It is simple (2) It is memory efficient (3) It works well in stream application Disadvantage: The data is broken into pieces and clients never have all the information as a whole unless they create their own data structure

Appendix: Call back in Java class MyMouseListener extends java.awt.event.MouseAdapter { /** Overriding the method mousePressed(). */ public void mousePressed(java.awt.event.MouseEvent event) {..…do something here after the mouse is pressed ……. } /** Overriding the method mousePressed(). */ public void mouseReleased(java.awt.event.MouseEvent event) {..…do something here after the mouse is released ……. } MyMouseListener Listener = new MyMouseListener(); java.awt.Button MyButton=new java.awt.Button("ok"); MyButton.addMouseListener(Listener);

DOM and SAX Parsers

Xerces-J Parser Overview It is a Java package Provides two parsers, one is a DOM parser and another is a SAX parser It is a validating parser It fully supports DOM2 and SAX2, and partially supports DOM3 (W3C XML Schema) It is very popular

Xerces-J Parser Overview package structure java.lang.Object | +--org.apache.xerces.framework.XMLParser | +-- org.apache.xerces.parsers.DOMParser +-- org.apache.xerces.parsers.SAXParser

Xerces-J Parser Overview DOMParser methods  Void parse (java.lang.String systemId) Parses the input source specified by the given system identifier.  Document getDocument() Returns the document  ……

Xerces-J DOMParser DOM interfaces  Document  Element  Attr  NodeList  ProcessingInstruction  NamedNodeMap .....

Xerces-J Parser Overview SAXParser methods  Void parse (java.lang.String systemId) Parses the input source specified by the given system identifier.  Void setContentHandler(Contenthandler handler ) Allow an application to register a content event handler.  Void setErrorHandler(Errorhandler handler ) Set error handler.  ……

Xerces-J Parser Overview SAXParser interfaces  ContentHandler  DTDHandler  EntityResolver  ErrorHandler

Work with XML Parsers Example Task: Extract all information about circles 20

Example DOMParser:create client class public class shapes_DOM { static int numberOfCircles = 0; static int x[] = new int[1000]; static int y[] = new int[1000]; static int r[] = new int[1000]; static String color[] = new String[1000]; public static void main(String[] args) { …… }

Example (DOMParser: create a DOMParser) import org.w3c.dom.*; import org.apache.xerces.parsers.DOMParser; public class shapes_DOM { …… public static void main(String [ ] args ) { try{ DOMParser parser=new DOMParser(); parser.parse(args[0]); Document doc=parser.getDocument(); …… } catch (Exception e) { e.printStackTrace(System.err); }

Example (DOMParser: get all the circle nodes) NodeList nodelist = doc.getElementsByTagName("circle"); numberOfCircles = nodelist.getLength();

Example (DOMParser: iterate over circle nodes) for(int i=0; i<nodelist.getLength(); i++) { Node node = nodelist.item(i);. }

Example (DOMParser: get color attribute) 25 NamedNodeMap attrs = node.getAttributes(); 26 if(attrs.getLength()!=0) 26’ color[i]= (String)attrs.getNamedItem("color").getNodeValue();

Example (DOMParser: get child nodes) 27 // get the child nodes of a circle 28 NodeList childnodelist = node.getChildNodes(); 29 // get the x and y 30 for(int j=0; j<childnodelist.getLength(); j++) { 31 Node childnode = childnodelist.item(j); 32 Node textnode = childnode.getFirstChild(); 33 String childnodename = childnode.getNodeName(); 34 if(childnodename.equals("x")) 35 x[i]=Integer.parseInt(textnode.getNodeValue().trim()); 36 else if(childnodename.equals("y")) 37 y[i]=Integer.parseInt(textnode.getNodeValue().trim()); 38 else if(childnodename.equals("radius")) 39 r[i]=Integer.parseInt(texxtnode.getNodeValue().trim()) 40 }

Example (SAXarser: create client class) public class shapes_SAX extends DefaultHandler { static int numberOfCircles = 0; static int x[] = new int[1000]; static int y[] = new int[1000]; static int r[] = new int[1000]; static String color[] = new String[1000]; public static void main(String[] args) { …… }

Example (SAXParser: create a SAXParser) import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class shapes_SAX extends DefaultHandler { public static void main(String [ ] args ) { try{ shapes_SAX SAXHandler = new shapes_SAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { … … } }

Example (SAXParser: override methods of interest)  startDocument() endDocument()  startElement() endElement()  startCDATA() endCDATA()  startDTD() endDTD()  characters()  … …

Example (SAXParser: override startElement() ) 21 public void startElement(String uri, String localName, String rawName, Attributes attributes) { 22 if(rawName.equals("circle" ) 23 color[numberOfCircles]=attributes.getValue("color"); 26 else if(rawName.equals("x")) 27 flagX = 1; 28 else if(rawName.equals("y")) 29 flagY = 1; 30 else if(rawName.equals("radius")) 31 flagR = 1; 32 }

Example (SAXParser: override endElement() ) 33public void endElement(String uri, String localName, String rawName) { 34 numberOfCircles += 1; 35 }

Example (SAXParser: override characters() ) 36 public void characters(char characters[], int start, int length) { 38 String characterData = 39 (new String(characters,start,length)).trim(); 42 if(flagX==1) { 43 x[numberOfCircles] = Integer.parseInt(characterData); flagX=0; } 44 if(flagY==1) { 45 y[numberOfCircles] = Integer.parseInt(characterData); flagY=0; } 46 if(flagR==1) { 47 r[numberOfCircles] = Integer.parseInt(characterData); flagR=0; } 49 }

Example (SAXParser: override endDocument() ) 50 public void endDocument() { 51 // print the result 52 System.out.println("circles="+numberOfCircles); 53 for(int i=0;i<numberOfCircles;i++) { 54 String line=""; 55 line=line+"(x="+x[i]+",y="+y[i]+",r="+r[i] +",color="+color[i]+")"; 56 System.out.println(line); 57 } 58 }

DOM and SAX Parsers