Java and XML. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added.

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

SDPL 2002Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
SDPL 2003Notes 3: XML Processor Interfaces1 3.3 JAXP: Java API for XML Processing n How can applications use XML processors? –A Java-based answer: through.
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Tomcat Java and XML. Announcements  Final homework assigned Wednesday  Two week deadline  Will cover servlets + JAXP.
Parsing XML into programming languages JAXP, DOM, SAX, JDOM/DOM4J, Xerces, Xalan, JAXB.
Parsing XML into programming languages JAXP, DOM, SAX, JDOM/DOM4J, Xerces, Xalan, JAXB.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
Cspp51037 Parsing XML into other programming languages – alternatives to XSLT.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
JAX- Java APIs for XML by J. Pearce. Some XML Standards Basic –SAX (sequential access parser) –DOM (random access parser) –XSL (XSLT, XPATH) –DTD Schema.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Tutorial 11 Creating XML Document
Introduction to XML Extensible Markup Language
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
Introduction to XML Extensible Markup Language Carol Wolf Computer Science Department.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
1 XML Data Management 4. Domain Object Model Werner Nutt.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Java API for XML Processing (JAXP) Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
Tutorial 13 Validating Documents with Schemas
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
XML Study-Session: Part III
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
XML DOM  XML Document Object Model provides a robust international standard for XML Documents.  DOM Level 1 is a Dec 11, 1998 W3C recommendation.  XML.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Java API for XML Processing
XML. Contents  Parsing an XML Document  Validating XML Documents.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Unit 4 Representing Web Data: XML
Parsing XML into programming languages
Java/XML.
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Chapter 7 Representing Web Data: XML
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
Introduction to XML Extensible Markup Language
Introduction to XML Extensible Markup Language
Lecture 4 Introduction to XML Extensible Markup Language
Presentation transcript:

Java and XML

What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added to the document to provide the extra information. HTML tags tell a browser how to display the document. XML tags give a reader some idea what some of the data means.

Advantages of XML XML is text (Unicode) based. One XML document can be displayed differently in different media. – Html, video, CD, DVD, – You only have to change the XML document in order to change all the rest. XML documents can be modularized. Parts can be reused.

Example of an HTML Document Example </head. This is an example of a page. Some information goes here.

Example of an XML Document Alice Lee

Difference Between HTML and XML HTML tags have a fixed meaning and browsers know what it is. XML tags are different for different applications, and users know what they mean. HTML tags are used for display. XML tags are used to describe documents and data.

XML Rules Tags are enclosed in angle brackets. Tags come in pairs with start-tags and end- tags. Tags must be properly nested. – … is not allowed. – … is. Tags that do not have end-tags must be terminated by a ‘/’. – is an html example.

More XML Rules Tags are case sensitive. – is not the same as XML in any combination of cases is not allowed as part of a tag. Tags may not contain ‘<‘ or ‘&’. Tags follow Java naming conventions, except that a single colon and other characters are allowed. They must begin with a letter and may not contain white space. Documents must have a single root tag that begins the document.

Well-Formed Documents An XML document is said to be well-formed if it follows all the rules. An XML parser is used to check that all the rules have been obeyed. Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers. Parsers are also available for free download over the Internet. One is Xerces, from the Apache open- source project. Java 1.4 also supports an open-source parser.

Expanded Example Alice Lee

XML Files are Trees address name phonebirthday firstlastyearmonthday

Validity A well-formed document has a tree structure and obeys all the XML rules. A particular application may add more rules in either a DTD (document type definition) or in a schema. Many specialized DTDs and schemas have been created to describe particular areas. These range from disseminating news bulletins (RSS) to chemical formulas. DTDs were developed first, so they are not as comprehensive as schema.

Document Type Definitions A DTD describes the tree structure of a document and something about its data. There are two data types, PCDATA and CDATA. – PCDATA is parsed character data. – CDATA is character data, not usually parsed. A DTD determines how many times a node may appear, and how child nodes are ordered.

DTD for address Example

Schemas Schemas are themselves XML documents. They were standardized after DTDs and provide more information about the document. They have a number of data types including string, decimal, integer, boolean, date, and time. They divide elements into simple and complex types. They also determine the tree structure and how many children a node may have.

Schema for First address Example

Parsers There are two principal models for parsers. SAX – Simple API for XML – Uses a call-back method – Similar to javax listeners DOM – Document Object Model – Creates a parse tree – Requires a tree traversal

DOM Parser

About DOM Stands for Document Object Model A World Wide Web Consortium (w3c) standard Standard constantly adding new features – Level 3 Core just released this month Well cover most of the basics. There’s always more, and it’s always changing.

DOM abstraction layer in Java -- architecture Returns specific parser implementation org.w3d.dom.Document Emphasis is on allowing vendors to supply their own DOM Implementation without requiring change to source code

Sample Code DocumentBuilderFactor factory = DocumentBuilderFactory.newInstance(); /* set some factory options here */ DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(xmlFile); A factory instance is the parser implementation. Can be changed with runtime System property. Jdk has default. Xerces much better. From the factory one obtains an instance of the parser xmlFile can be an java.io.File, an inputstream, etc. javax.xml.parsers.DocumentBuilderFactory javax.xml.parsers.DocumentBuilder org.w3c.dom.Document For reference. Notice that the Document class comes from the w3c-specified bindings.

Validation Note that by default the parser will not validate against a schema or DTD As of JAXP1.2, java provides a default parse than can handle most schema features See next slide for details on how to setup

Document object Once a Document object is obtained, rich API to manipulate. First call is usually Element root = doc.getDocumentElement(); This gets the root element of the Document as an instance of the Element class Note that Element subclasses Node and has methods getType(), getName(), and getValue(), and getChildNodes()

Types of Nodes Note that there are many types of Nodes (ie subclasses of Node: Attr, CDATASection, Comment, Document, DocumentFragment, DocumentType, Element, Entity, EntityReference, Notation, ProcessingInstruction, Text Each of these has a special and non-obvious associated type, value, and name. Standards are language-neutral and are specified on chart on following slide Important: keep this chart nearby when using DOM

Node nodeName() nodeValue()AttributesnodeType() Attr Attr nameValue of attributenull 2 CDATASection #cdata-sectionCDATA cotnentnull 4 Comment #commentComment contentnull 8 Document #document Null null 9 DocumentFragment #document- fragment null 11 DocumentType Doc type name null 10 Element Tag name null NamedNodeMap 1 Entity Entity name null 6 EntityReference Name entitry referenced null 5 Notation Notation name null 1 ProcessingInstruction target Entire string null 7 Text #text Actual text null 3

Transformer Architecture

Writing DOM to XML public class WriteDOM{ public static void main(String[] argv) throws Exception{ File f = new File(argv[0]); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(f); TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer(); DOMSource source = new DOMSource(document); StreamResult result = new StreamResult(System.out); transformer.transform(source, result); }

Creating a DOM from scratch Sometimes you may want to create a DOM tree directly in memory. This is done with: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); document = builder.newDocument();

Manipulating Nodes Once the root node is obtained, typical tree methods exist to manipulate other elements: boolean node.hasChildNodes() NodeList node.getChildNodes() Node node.getNextSibling() Node node.getParentNode() String node.getValue(); String node.getName(); String node.getText(); void setNodeValue(String nodeValue); Node insertBefore(Node new, Node ref);

SAX Simple API for XML Processing

About SAX SAX in Java is hosted on source forge SAX is not a w3c standard Originated purely in Java Other languages have chosen to implement in their own ways based on this prototype

SAX vs DOM Please don’t compared unrelated things: – SAX is an alternative to DOM, but realize that DOM is often built on top of SAX – SAX and DOM do not compete with JAXP – They do both compete with JAXB implementations

How a SAX parser works SAX parser scans an xml stream on the fly and responds to certain parsing events as it encounters them. This is very different than digesting an entire XML document into memory. Much faster, requires less memory. However, need to reparse if you need to revisit data.

Obtaining a SAX parser Important classes javax.xml.parsers.SAXParserFactory; javax.xml.parsers.SAXParser; javax.xml.parsers.ParserConfigurationException; //get the parser SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); //parse the document saxParser.parse( new File(argv[0]), handler);

DefaultHandler Note that an event handler has to be passed to the SAX parser. This must implement the interface org.xml.sax.ContentHandler; Easier to extend the adapter org.xml.sax.helpers.DefaultHandler

Overriding Handler methods Most important methods to override – void startDocument() Called once when document parsing begins – void endDocument() Called once when parsing ends – void startElement(...) Called each time an element begin tag is encountered – void endElement(...) Called each time an element end tag is encountered – void characters(...) Called randomly between startElement and endElement calls to accumulated character data

startElement public void startElement( String namespaceURI, //if namespace assoc String sName, //nonqualified name String qName, //qualified name Attributes attrs) //list of attributes Attribute info is obtained by querying Attributes objects.

Characters public void characters( char buf[], //buffer of chars accumulated int offset, //begin element of chars int len) //number of chars Note, characters may be called more than once between begin tag / end tag Also, mixed-content elements require careful handling

Entity references Recall that entity references are special character sequences for referring to characters that have special meaning in XML syntax – ‘<‘ is &lt – ‘>’ is &gt In SAX these are automatically converted and passed to the characters stream unless they are part of a CDATA section

Choosing a Parser Choosing your Parser Implementation – If no other factory class is specified, the default SAXParserFactory class is used. To use a different manufacturer's parser, you can change the value of the environment variable that points to it. You can do that from the command line, like this: java -Djavax.xml.parsers.SAXParserFactory=yourFactoryHere... The factory name you specify must be a fully qualified class name (all package prefixes included). For more information, see the documentation in the newInstance() method of the SAXParserFactory class.