1 Processing XML with Java Representation and Management of Data on the Internet.

Slides:



Advertisements
Similar presentations
Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
Advertisements

1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
1 Processing XML with Java Representation and Management of Data on the Internet.
1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
Summer A-2000, Project Course-- Carnegie Mellon University 1 Financial Engineering Project Course.
1 Processing XML with Java Representation and Management of Data on the Internet A comprehensive tutorial about XML processing with JavaXML processing.
1 Processing XML with Java CS , Spring 2008/9.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
1 Processing XML with Java Dr. Praveen Madiraju Modified from Dr.Sagiv ’ s slides.
Processing of structured documents Spring 2003, Part 5 Helena Ahonen-Myka.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
1 Processing XML with Java CS , Spring 2010.
1 XML Data Management 4. Domain Object Model Werner Nutt.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
5 Processing XML Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.
17 Apr 2002 XML Programming - DOM Andy Clark. DOM Design Premise Derived from browser document model Defined in IDL – Lowest common denominator programming.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)
The XML Document Object Model (DOM) Aug’10 – Dec ’10.
Extensible MarkUp Language. AGENDA  OVERVIEW OF XML  DATA TYPE DEFINITION LANGUAGE  XML SCHEMA  XML PARSERS 1) DOM PARSER 2) SAX PARSER 3) JAXB PARSER.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
1 Processing XML with Java Modified from Dr.Sagiv ’ s slides.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
School of Computing and Information Systems CS 371 Web Application Programming XML and JSON Encoding Data.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
SNU OOPSLA Lab. DOM/SAX Applications The ubiquitous XML(9) © copyright 2001 SNU OOPSLA Lab.
Java and XML. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added.
SDPLNotes 3.2: DOM1 3.2 Document Object Model (DOM) n How to provide uniform access to structured documents in diverse applications (parsers, browsers,
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
1 Processing XML with Java Representation and Management of Data on the Internet.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
13-Mar-16 DOM. 2 Difference between SAX and DOM DOM reads the entire XML document into memory and stores it as a tree data structure SAX reads the XML.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
USING ANDROID WITH THE DOM. Slide 2 Lecture Summary DOM concepts SAX vs DOM parsers Parsing HTTP results The Android DOM implementation.
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
21-Jun-16 Document Object Model DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C.
Java API for XML Processing
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Unit 4 Representing Web Data: XML
Java XML IS
Chapter 7 Representing Web Data: XML
Jagdish Gangolly State University of New York at Albany
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
DOM 8-Dec-18.
DOM 24-Feb-19.
SAX2 29-Jul-19.
Presentation transcript:

1 Processing XML with Java Representation and Management of Data on the Internet

2 XML XML is eXtensible Markup Language It is a metalanguage: –A language used to describe other languages using “markup” tags that describe properties of the data Designed to be structured –Strict rules about how data can be formatted Designed to be extensible –Can define own terms and markup

3 XML Family XML is an official recommendation of the W3C Aims to accomplish what HTML cannot and be simpler to use and implement than SGML HTML XML SGML XHTML

4 The Essence of XML Syntax: The permitted arrangement or structure of letters and words in a language as defined by a grammar (XML) Semantics:The meaning of letters or words in a language XML uses Syntax to add Semantics to the documents

5 Using XML In XML there is a separation of the content from the display XML can be used for: –Data representation –Data exchange

6 Databases and XML Database content can be presented in XML –XML processor can access DBMS or file system and convert data to XML –Web server can serve content as either XML or HTML

7 HTML vs. XML HTMLXML improper nesting proper nesting allow start tags, without end tags like empty tags must have a trailing slash, as in unquoted attribute values quoted attribute values HTML is case insensitive XML is case sensitive Whitespace is ignoredWhitespace is important Begins with

8 HTML vs. XML HTMLXML Well defined set of tags Can use any tag you like tags have a known meaning tags have no known meaning

9 Some Things in Common Comments are allowed - Special characters must be escaped (e.g., > for >)

10 Processing XML – The Idea

11 Sample Document WEBM GE

12 DOM Parser DOM = Document Object Model Parser creates a tree object out of the document User accesses data by traversing the tree The API allows for constructing, accessing and manipulating the structure and content of XML documents

13 Document as Tree transaction account buy ticker shares 100 WEBM exch sell ticker shares 30 NYSE GE exch NASDAQ Methods like: getRoot getChildren getAttributes etc.

14 Advantages and Disadvantages Advantages: –Natural and relatively easy to use –Can repeatedly traverse tree Disadvantages: –High memory requirements – the whole document is kept in memory –Must parse the whole document before use

15 SAX Parser SAX = Simple API for XML Parser creates “events” while traversing tree Parser calls methods (that you write) to deal with the events Similar to an IOStream, goes in one direction

16 Document as Events WEBM GE Start tag: transaction Start tag: account Text: End tag: account Start tag: buy Attribute: shares Value: 100

17 Advantages and Disadvantages Advantages: –Requires little memory –Fast Disadvantages: –Cannot reread –Less natural for object oriented programmers (perhaps)

18 Which should we use? DOM vs. SAX If your document is very large and you only need a few elements - use SAX If you need to manipulate (i.e., change) the XML - use DOM If you need to access the XML many times - use DOM

19 XML Parsers

20 XML Parsers There are several different ways to categorise parsers: –Validating versus non-validating parsers –DOM parsers versus SAX parsers –Parsers written in a particular language (Java, C++, Perl, etc.)

21 Validating Parsers A validating parser makes sure that the document conforms to the specified DTD This is time consuming, so a non-validating parser is faster

22 Using an XML Parser Three basic steps –Create a parser object –Pass the XML document to the parser –Process the results Generally, writing out XML is not in the scope of parsers (though some may implement proprietary mechanisms)

23 SAX – Simple API for XML

24 The SAX Parser SAX parser is an event-driven API –An XML document is sent to the SAX parser –The XML file is read sequentially –The parser notifies the class when events happen, including errors –The events are handled by the implemented API methods to handle events that the programmer implemented

25 Used to create a SAX Parser Handles document events: start tag, end tag, etc. Handles Parser Errors Handles DTDs and Entities

26 Problem The SAX interface is an accepted standard There are many implementations Like to be able to change the implementation used without changing any code in the program How is this done?

27 Factory Design Pattern Have a “Factory” class that creates the actual Parsers. The Factory checks the value of a system property that states which implementation should be used In order to change the implementation, simply change the system property

28 Creating a SAX Parser Import the following packages: –org.xml.sax.*; –org.xml.sax.helpers.*; Set the following system property: –System.setProperty("org.xml.sax.driver", "org.apache.xerces.parsers.SAXParser"); Create the instance from the Factory: –XMLReader reader = XMLReaderFactory.createXMLReader();

29 Receiving Parsing Information A SAX Parser calls methods such as “startDocument”, “startElement”, etc., as it runs In order to react to such events we must: –implement the ContentHandler interface –set the parser’s content handler with an instance of our class

30 ContentHandler // Methods (partial list) public void startDocument(); public void endDocument(); public void characters(char[] ch, int start, int length); public void startElement(String namespaceURI, String localName, String qName, Attributes atts); public void endElement(String namespaceURI, String localName, String qName);

31 Namespaces and Element Names <forsale date="12/2/03" xmlns:xhtml = "urn: DBI: The Course I Wish I never Took My favorite book!

32 Namespaces and Element Names <forsale date="12/2/03" xmlns:xhtml = "urn: DBI: The Course I Wish I never Took My favorite book! namespaceURI = urn: localName = em qName = xhtml:em namespaceURI = "" localName = book qName = book

33 Receiving Parsing Information (cont.) An easy way to implement the ContentHandler interface is the extend the DefaultHandler, which implements this interface (and a few others) in an empty fashion To actually parse a document, create an InputSource from the document and supply the input source to the parse method of the XMLReader

34 import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; public class InfoWithSax extends DefaultHandler { public static void main(String[] args) { System.setProperty("org.xml.sax.driver", "org.apache.xerces.parsers.SAXParser"); try { XMLReader reader = XMLReaderFactory.createXMLReader(); reader.setContentHandler(new InfoWithSax()); reader.parse(new InputSource(new FileReader(args[0]))); } catch(Exception e) { e.printStackTrace()} }

35 public static startDocument() throws SAXException { System.out.println(“START DOCUMENT”); } public static endDocument() throws SAXException { System.out.println(“END DOCUMENT”); } int depth; String indent = “ ”; private void println(String header, String value) { for (int i = 0 ; i < depth ; i++) System.out.print(indent); System.out.println(header + ": " + value); }

36 public void characters(char buf[], int offset, int len) throws SAXException { String s = (new String(buf, offset, len)).trim(); if (!"".equals(s)) println("CHARACTERS", s); } public void endElement(String namespaceURI, String localName, String name) throws SAXException { depth--; String elementName = name; if (!"".equals(namespaceURI) && !"".equals(localName)) elementName = namespaceURI + ":" + localName; println("END ELEMENT", elementName); }

37 public static startElement(String namespaceURI, String localName, String name, Attributes attrs) throws SAXException { String elementName = name; if (!"".equals(namespaceURI) && !"".equals(localName)) elementName = namespaceURI + ":" + localName; println("START ELEMENT", elementName); if (attrs != null && attrs.getLength() > 0) { for (int i = 0; i < attrs.getLength(); i++) println("ATTRIBUTE", attrs.getLocalName(i) + “=” + attrs.getValue(i)); } depth++; }

38 Bachelor Tags What do you think happens when the parser parses a bachelor tag?

39 Attributes Interface Elements may have attributes There is no distinction between attributes that are defined explicitly from those that are specified in the DTD (with a default value)

40 Attributes Interface (cont.) int getLength(); String getQName(int i); String getType(int i); String getValue(int i); String getType(String qname); String getValue(String qname); etc.

41 Attributes Types The following are possible types for attributes: –"CDATA", –"ID", –"IDREF", "IDREFS", –"NMTOKEN", "NMTOKENS", –"ENTITY", "ENTITIES", –"NOTATION"

42 Setting Features It is possible to set the features of a parser using the setFeature method. Examples: –reader.setFeature(“ espaces”, true) –reader.setFeature(“ dation", false) For a full list, see:

43 ErrorHandler Interface We implement ErrorHandler to receive error events (similar to implementing ContentHandler) DefaultHandler implements ErrorHandler in an empty fashion, so we can extend it (as before) An ErrorHandler is registered with –reader.setErrorHandler(handler); Three methods: –void error(SAXParseException ex); –void fatalError(SAXParserExcpetion ex); –void warning(SAXParserException ex);

44 public void warning(SAXParseException err) throws SAXException { System.out.println(“Warning in line” + err.getLineNumber() + “ and column ” + err.getColumnNumber()); } public void error(SAXParseException err) throws SAXException { System.out.println(“Oy va’avoi, an error!”); } public void fatalError(SAXParseException err) throws SAXException { System.out.println(“OY VA’AVOI, a fatal error!”); } Extending the InfoWithSax Program Will these methods be called in the case of a problem?

45 Lexical Events Lexical events have to do with the way that a document was written and not with its content Examples: –A comment is a lexical event ( ) –The use of an entity is a lexical event (>) These can be dealt with by implementing the LexicalHandler interface, and set on a parser by –reader.setProperty(" lexical-handler", mylexicalhandler);

46 LexicalHandler // Methods (partial list) public void startEntity(String name); public void endEntity(String name); public void comment(char[] ch, int start, int length); public void startCDATA(); public void endCDATA();

47 DOM – Document Object Model

48 Creating a DOM Tree How can we create a DOM Tree independently of the implementation chosen? Creating a DOM Tree using the Apache Xerces package: –Import: org.apache.xerces.parsers.DOMParser –Import: org.w3c.dom.*; –Use the following lines of code: DOMParser dom = new DOMParser(); dom.parse(fileName); Document doc = dom.getDocument();

49 Using a DOM Tree DOM Parser DOM TreeXML File APIAPI Application

50 Nodes in a DOM Tree DocumentFragment Document CharacterData Text Comment CDATASection Attr Element DocumentType Notation Entity EntityReference ProcessingInstruction Node NodeList NamedNodeMap DocumentType Figure as appears in : “The XML Companion” - Neil Bradley

51 DOM Tree Document Document TypeElement AttributeElement AttributeText ElementTextEntity ReferenceText Comment

52 Normalizing a Tree Normalizing a DOM Tree has two effects: –Combine adjacent textual nodes –Eliminate empty textual nodes To normalize, apply the normalize() method to the document element

53 Node Methods Three categories of methods –Node characteristics: name, type, value –Contextual location and access to relatives: parents, siblings, children, ancestors, descendants –Node modification: Edit, delete, re-arrange child nodes

54 Node Methods (2) short getNodeType(); String getNodeName(); String getNodeValue() throws DOMException; void setNodeValue(String value) throws DOMException; boolean hasChildNodes(); NamedNodeMap getAttributes(); Document getOwnerDocument();

55 Node Types - getNodeType() ELEMENT_NODE = 1 ATTRIBUTE_NODE = 2 TEXT_NODE = 3 CDATA_SECTION_NODE = 4 ENTITY_REFERENCE_NODE = 5 ENTITY_NODE = 6 PROCESSING_INSTRUCTION_NODE = 7 COMMENT_NODE = 8 DOCUMENT_NODE = 9 DOCUMENT_TYPE_NODE = 10 DOCUMENT_FRAGMENT_NODE = 11 NOTATION_NODE = 12 if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node … }

56

57 Node Navigation Every node has a specific location in tree Node interface specifies methods to find surrounding nodes –Node getFirstChild(); –Node getLastChild(); –Node getNextSibling(); –Node getPreviousSibling(); –Node getParentNode(); –NodeList getChildNodes();

58 Node Navigation (2) getFirstChild() getPreviousSibling() getChildNodes() getNextSibling() getLastChild() getParentNode() Figure as from “The XML Companion” - Neil Bradley

59 import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.*; public class InfoWithDom { public static void main(String[] args) { try { DOMParser dom = new DOMParser(); dom.parse(args[0]); Document doc = dom.getDocument(); new InfoWithDom().echo(doc); } catch(Exception e) { e.printStackTrace()} }

60 private int depth = 0; private final String indent = " "; private String[] NODE_TYPES = {"", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA", "ENTITY_REF", "ENTITY", "PROCESSING_INST", "COMMENT", "DOCUMENT", "DOCUMENT_TYPE", "DOCUMENT_FRAG", "NOTATION"}; private void outputIndentation() { for (int i = 0; i < depth; i++) System.out.print(indent); }

61 private void printlnCommon(Node n) { System.out.print(NODE_TYPES[n.getNodeType()] + ":"); System.out.print(" nodeName=" + n.getNodeName()); String val; if ((val = n.getNamespaceURI()) != null) System.out.print(" uri=" + val); if ((val = n.getPrefix()) != null) System.out.print(" pre=" + val); if ((val = n.getLocalName()) != null) System.out.print(" local=" + val); if ((val = n.getNodeValue()) != null && !val.trim().equals("")) System.out.print(" nodeValue=" + val); System.out.println(); }

62 private void echo(Node n) { outputIndentation(); printlnCommon(n); if (n.getNodeType() == Node.ELEMENT_NODE) { NamedNodeMap atts = n.getAttributes(); indent += 2; for (int i = 0; i < atts.getLength(); i++) echo(atts.item(i)); indent -= 2; } indent++; for (Node child = n.getFirstChild(); child != null; child = child.getNextSibling()) echo(child); indent--; } Example InputExample Output

63 Node Manipulation Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc. Node removeChild(Node old) throws DOMException; Node insertBefore(Node new, Node ref) throws DOMException; Node appendChild(Node new) throws DOMException; Node replaceChild(Node new, Node old) throws DOMException; Node cloneNode(boolean deep);

64 Node Manipulation (2) Ref New insertBefore Old New replaceChild cloneNode Shallow 'false' Deep 'true' Figure as appears in “The XML Companion” - Neil Bradley

65 Other Interfaces We have discussed methods of the Node interface Each of the "specific types of nodes" have additional methods See API for details

66 Note about DOM Objects DOM object  compiled XML Can save time and effort if send and receive DOM objects instead of XML source –Saves having to parse XML files into DOM at sender and receiver –But, DOM object may be larger than XML source