Chapter 24 XML.

Slides:



Advertisements
Similar presentations
Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
Advertisements

XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
History Leading to XHTML
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
14-Jun-15 DOM. SAX and DOM SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Tomcat Java and XML. Announcements  Final homework assigned Wednesday  Two week deadline  Will cover servlets + JAXP.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
29-Jun-15 JAXB Java Architecture for XML Binding.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
Chapter 24 XML. CHAPTER GOALS Understanding XML elements and attributes Understanding the concept of an XML parser Being able to read and write XML documents.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools Leonidas Fegaras.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
Java WWW Week 10 Version 2.1 Mar 2008 Slide Java (JSP) and XML  Format of lecture: What is XML? A sample XML file… How to use.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
1 Java and XML Modified from presentation by: Barry Burd Drew University Portions © 2002 Hungry Minds, Inc.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
Consuming eXtensible Markup Language (XML) feeds.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
CSE 6331 © Leonidas Fegaras XML Tools1 XML Tools.
Document Object Model DOM. Agenda l Introduction to DOM l Java API for XML Parsing (JAXP) l Installation and setup l Steps for DOM parsing l Example –Representing.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
Java and XML. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information about a document. Tags are added.
XML. DCS – SWC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Web services. DOM parsing and SOAP.. Summary. ● Exercise: SAX-Based checkInvoice(), ● push parsing, ● event-based parsing, ● traversal order is depth-first.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
XML. RHS – SOC 2 Data vs. Information We often use the terms data and information interchangeably More precisely, data is some ”value” of a certain type,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Chapter 26 XML. Chapter Goals Understanding XML elements and attributes Understanding the concept of an XML parser Being able to read and write XML documents.
USING ANDROID WITH THE DOM. Slide 2 Lecture Summary DOM concepts SAX vs DOM parsers Parsing HTTP results The Android DOM implementation.
XML 1.Introduction to XML 2.Document Type Definition (DTD) 3.XML Parser 4.Example: CGI Gateway to XML Middleware.
XML. Contents  Parsing an XML Document  Validating XML Documents.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
Unit 4 Representing Web Data: XML
Java XML IS
Java/XML.
Session III Chapter 6 – Creating DTDs
Chapter 7 Representing Web Data: XML
XML.
WaysInJavaToParseXML
New Perspectives on XML
DOM 24-Feb-19.
Chapter 25 – XML.
XML document processing in Java using XPath and XSLT
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
WaysInJavaToParseXML
Presentation transcript:

Chapter 24 XML

CHAPTER GOALS Understanding XML elements and attributes Understanding the concept of an XML parser Being able to read and write XML documents Being able to design Document Type Definitions for XML documents

XML Stands for Extensible Markup Language Lets you encode complex data in a form that the recipient can parse easily Is independent from any programming language

XML Encoding of Coin Data <value>0.5</value> <name>half dollar</name> </coin>

Advantages of XML XML files are readable by both computers and humans XML formatted data is resilient to change It is easy to add new data elements Old programs can process the old information in the new data format

Differences Between XML and HTML Both are descendants of SGML (Standard Generalized Markup Language) XML is a simplified version of SGML XML is very strict but HTML (as used today) is not XML tells what the data means; HTML tells how to display data

Differences Between XML and HTML XML tags are case-sensitive <LI> is different from <li> Every XML start tag must have a matching end tag If a tag has no end-tag, it must end in /> <img src="hamster.jpeg"/> XML attribute values must be enclosed in quotes <img src="hamster.jpeg" width="400" height="300"/>

Structure of an XML Document An XML data set is called a document The document starts with a header <?xml version 1.0?> The data are contained in a root element <purse> more data </purse> The document contains elements and text

Structure of an XML Document An XML element has one of two forms <elementTag optional attributes> contents </elementTag> or <elementTag optional attributes/> The contents can be elements or text or both An example of an element with both elements and text (mixed content): <p>Use XML for <strong>robust</strong> data formats.</p> Avoid mixed content for data descriptions

Structure of an XML Document An element can have attributes The a element in HTML has an href attribute <a href="http://java.sun.com"> ... </a> An attribute has a name (such as href) and a value The attribute value is enclosed in either single or double quotes Attribute is intended to provide information about the content <value currency="USD">0.5</value> or <value currency="EUR">0.5</value> An element can have multiple attributes

Parsing XML Documents A parser is a program that Reads a document Checks whether it is syntactically cornet Takes some action as it processes the document There are two kinds of XML parsers SAX (Simple Access to XML) DOM ( Document Object Model)

Parsing XML Documents SAX parser Event-driven It calls a method you provide to process each construct it encounters More efficient for handling large XML documents DOM parser Builds a tree that represents the document When the parser is done, you can analyze the tree Easier to use for most applications

JAXP Stands for Java API for XML Processing Provides a standard mechanism for DOM parsers to read and create documents Part of Java1.4 and above Earlier versions need to download additional libraries

Parsing XML Documents Document interface describes the tree structure of an XML document A DocumentBuilder can generate an object of a class that implements Document interface Get a DocumentBuilder by calling the static newInstance method of the DocumentBuilderFactory class Call newDocumentBuilder method of the factory to get a DocumentBuilder DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder();

Parsing XML Documents To read a document from a file String fileName = . . . ; File f = new File(filename); Document doc = builder.parse(f); To read a document from a URL on the Internet String urlName = . . . ; URL u = new URL(urlName); Document doc = builder.parse(u); To read from an input stream InputStream in = . . . ; Document doc = builder.parse(in);

Parsing XML Documents You can inspect or modify the document The document tree consists of nodes Two node type are Element and Text Element and Text are subinterfaces of the Node interface

An XML Document <?xml version="1.0"?> <items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <description>4-port Mini Hub</description> <price>19.95</price> <quantity>4</quantity> </items>

Tree View of XML Document previous | start | next Tree View of XML Document previous | start | next

Parsing XML Documents Start inspection of the tree by getting the root element Element root = doc.getDocumentElement(); To get the child elements of an element Use the GetChildNodes method of the Element interface The nodes are stored in an object of a class that implements the NodeList interface Use a NodeList to visit the child nodes of an element getLength method gives the number of elements item method gets an item in the node list Code to get a child node NodeList nodes = root.getChildNodes(); int i = . . . ; //a value between o and getlength() - 1 Node child = nodes.item(i); The XML parser keeps all white spaces if you don't use a DTD You can include a test to ignore the white space

Parsing XML Documents Get an element name with the getTagName Element priceElement = . . . ; String name = priceElement.getTagName(); To find the value of the currency attribute String attributeValue = priceElement.getAttribute("currency") You can also iterate through all attributes Use a NamedNodeMap Each attribute is stored in a Node

Parsing XML Documents Some elements have children that contain text Document builder creates nodes of type Text If you don't use mixed content elements Any element containing text has a single Text child node Use getFirstChild method to get it Use getData method to read the text To determine the price stored in the price element Element priceNode = . . . ; Text priceData = (Text)priceNode.getFirstChild(); String priceString = priceNode.getData(); double price = Double.parseDouble(priceString);

File ItemListParser.java 001: import java.io.File; 002: import java.io.IOException; 003: import java.util.ArrayList; 004: import javax.xml.parsers.DocumentBuilder; 005: import javax.xml.parsers.DocumentBuilderFactory; 006: import javax.xml.parsers.ParserConfigurationException; 007: import org.w3c.dom.Attr; 008: import org.w3c.dom.Document; 009: import org.w3c.dom.Element; 010: import org.w3c.dom.NamedNodeMap; 011: import org.w3c.dom.Node; 012: import org.w3c.dom.NodeList; 013: import org.w3c.dom.Text; 014: import org.xml.sax.SAXException; 015: 016: /** 017: An XML parser for item lists

018: */ 019: public class ItemListParser 020: { 021: /** 022: Constructs a parser that can parse item lists 023: */ 024: public ItemListParser() 025: throws ParserConfigurationException 026: { 027: DocumentBuilderFactory factory 028: = DocumentBuilderFactory.newInstance(); 029: builder = factory.newDocumentBuilder(); 030: } 031: 032: /** 033: Parses an XML file containing an item list 034: @param fileName the name of the file 035: @return an array list containing all items in the XML file 036: */ 037: public ArrayList parse(String fileName)

038: throws SAXException, IOException 039: { 040: File f = new File(fileName); 041: Document doc = builder.parse(f); 042: 043: // get the <items> root element 044: 045: Element root = doc.getDocumentElement(); 046: return getItems(root); 047: } 048: 049: /** 050: Obtains an array list of items from a DOM element 051: @param e an <items> element 052: @return an array list of all <item> children of e 053: */ 054: private static ArrayList getItems(Element e) 055: { 056: ArrayList items = new ArrayList(); 057:

058: // get the <item> children 059: 060: NodeList children = e.getChildNodes(); 061: for (int i = 0; i < children.getLength(); i++) 062: { 063: Node childNode = children.item(i); 064: if (childNode instanceof Element) 065: { 066: Element childElement = (Element)childNode; 067: if (childElement.getTagName().equals("item")) 068: { 069: Item c = getItem(childElement); 070: items.add(c); 071: } 072: } 073: } 074: return items; 075: } 076: 077: /**

078: Obtains an item from a DOM element 079: @param e an <item> element 080: @return the item described by the given element 081: */ 082: private static Item getItem(Element e) 083: { 084: NodeList children = e.getChildNodes(); 085: Product p = null; 086: int quantity = 0; 087: for (int j = 0; j < children.getLength(); j++) 088: { 089: Node childNode = children.item(j); 090: if (childNode instanceof Element) 091: { 092: Element childElement = (Element)childNode; 093: String tagName = childElement.getTagName(); 094: if (tagName.equals("product")) 095: p = getProduct(childElement); 096: else if (tagName.equals("quantity")) 097: {

098: Text textNode = (Text)childElement.getFirstChild(); 099: String data = textNode.getData(); 100: quantity = Integer.parseInt(data); 101: } 102: } 103: } 104: return new Item(p, quantity); 105: } 106: 107: /** 108: Obtains a product from a DOM element 109: @param e a <product> element 110: @return the product described by the given element 111: */ 112: private static Product getProduct(Element e) 113: { 114: NodeList children = e.getChildNodes(); 115: String name = ""; 116: double price = 0; 117: for (int j = 0; j < children.getLength(); j++)

118: { 119: Node childNode = children.item(j); 120: if (childNode instanceof Element) 121: { 122: Element childElement = (Element)childNode; 123: String tagName = childElement.getTagName(); 124: Text textNode = (Text)childElement.getFirstChild(); 125: 126: String data = textNode.getData(); 127: if (tagName.equals("description")) 128: name = data; 129: else if (tagName.equals("price")) 130: price = Double.parseDouble(data); 131: } 132: } 133: return new Product(name, price); 134: } 135: 136: private DocumentBuilder builder; 137: }

File ItemListParserTest.java 01: import java.util.ArrayList; 02: 03: /** 04: This program parses an XML file containing an item list. 05: It prints out the items that are described in the XML file. 06: */ 07: public class ItemListParserTest 08: { 09: public static void main(String[] args) throws Exception 10: { 11: ItemListParser parser = new ItemListParser(); 12: ArrayList items = parser.parse("items.xml"); 13: for (int i = 0; i < items.size(); i++) 14: { 15: Item anItem = (Item)items.get(i); 16: System.out.println(anItem.format()); 17: } 18: } 19: }

Creating XML Documents We can build a Document object in a Java program and then save it as an XML document We need a DocumentBuilder object to create a new, empty document DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.newDocument(); //empty document The Document class has methods to create elements and text nodes

Creating XML Documents To create an element use createElement method and pass it a tag Element itemElement = doc.createElement("item"); To create a text node, use createTextNode and pass it a string Text quantityText= doc.createTextNode("8"); Use setAttribute method to add an attribute to the tag priceElement.setAttribute("currency", "USD");

Creating XML Documents To construct the tree structure of a document start with the root add children with appendChild To build an XML tree that describes an item // create elements Element itemElement = doc.createElement("item"); Element productElement = doc.createElement("product"); Element descriptionElement = doc.createElement("description"); Element priceElement = doc.createElement("price"); Element quantityElement = doc.createElement("quantity"); Text descriptionText = doc.createTextNode("Ink Jet Refill Kit"); Text priceText = doct.createTextNode("29.95"); Text quantityText = doc.createTextNode("8");

// add elements to the document doc.appendChild(itemElement); itemElement.appendChild(productElement); itemElement.appendChild(quantityElement); productElement.appendChild(descriptionElement); productElement.appendChild(priceElement); descriptionElement.appendChild(descriptionText); priceElement.appendChild(priceText); quantityElement.appendChild(quantityText);

Creating XML Documents Use a Transformer to write an XML document to a stream Create a transformer Transformer t = TransformerFactory.newInstance().newTransformer(); Create a DOMSource from your document Create a StreamResult from your output stream Call the transform method of your transformer t.transform(new DOMSource(doc), new StreamResult(System.out));

File ItemListBuilder.java 001: import java.util.ArrayList; 002: import javax.xml.parsers.DocumentBuilder; 003: import javax.xml.parsers.DocumentBuilderFactory; 004: import javax.xml.parsers.ParserConfigurationException; 005: import org.w3c.dom.Document; 006: import org.w3c.dom.Element; 007: import org.w3c.dom.Text; 008: 009: /** 010: Builds a DOM document for an array list of items. 011: */ 012: public class ItemListBuilder 013: { 014: /** 015: Constructs an item list builder. 016: */ 017: public ItemListBuilder()

018: throws ParserConfigurationException 019: { 020: DocumentBuilderFactory factory 021: = DocumentBuilderFactory.newInstance(); 022: builder = factory.newDocumentBuilder(); 023: } 024: 025: /** 026: Builds a DOM document for an array list of items. 027: @param items the items 028: @return a DOM document describing the items 029: */ 030: public Document build(ArrayList items) 031: { 032: doc = builder.newDocument(); 033: Element root = createItemList(items); 034: doc.appendChild(root); 035: return doc; 036: } 037:

038: /** 039: Builds a DOM element for an array list of items. 040: @param items the items 041: @return a DOM element describing the items 042: */ 043: private Element createItemList(ArrayList items) 044: { 045: Element itemsElement = doc.createElement("items"); 046: for (int i = 0; i < items.size(); i++) 047: { 048: Item anItem = (Item)items.get(i); 049: Element itemElement = createItem(anItem); 050: itemsElement.appendChild(itemElement); 051: } 052: return itemsElement; 053: } 054: 055: /** 056: Builds a DOM element for an item. 057: @param anItem the item

058: @return a DOM element describing the item 059: */ 060: private Element createItem(Item anItem) 061: { 062: Element itemElement = doc.createElement("item"); 063: Element productElement 064: = createProduct(anItem.getProduct()); 065: Text quantityText = doc.createTextNode( 066: "" + anItem.getQuantity()); 067: Element quantityElement = doc.createElement("quantity"); 068: quantityElement.appendChild(quantityText); 069: 070: itemElement.appendChild(productElement); 071: itemElement.appendChild(quantityElement); 072: return itemElement; 073: } 074: 075: /** 076: Builds a DOM element for a product. 077: @param p the product

078: @return a DOM element describing the product 079: */ 080: private Element createProduct(Product p) 081: { 082: Text descriptionText 083: = doc.createTextNode(p.getDescription()); 084: Text priceText = doc.createTextNode("" + p.getPrice()); 085: 086: Element descriptionElement 087: = doc.createElement("description"); 088: Element priceElement = doc.createElement("price"); 089: 090: descriptionElement.appendChild(descriptionText); 091: priceElement.appendChild(priceText); 092: 093: Element productElement = doc.createElement("product"); 094: 095: productElement.appendChild(descriptionElement); 096: productElement.appendChild(priceElement); 097:

098: return productElement; 099: } 100: 101: private DocumentBuilder builder; 102: private Document doc; 103: }

File ItemListBuilderTest.java 01: import java.util.ArrayList; 02: import org.w3c.dom.Document; 03: import javax.xml.transform.Transformer; 04: import javax.xml.transform.TransformerFactory; 05: import javax.xml.transform.dom.DOMSource; 06: import javax.xml.transform.stream.StreamResult; 07: 08: /** 09: This program tests the item list builder. It prints the 10: XML file corresponding to a DOM document containing a list 11: of items. 12: */ 13: public class ItemListBuilderTest 14: { 15: public static void main(String[] args) throws Exception 16: { 17: ArrayList items = new ArrayList();

18: items.add(new Item(new Product("Toaster", 29.95), 3)); 19: items.add(new Item(new Product("Hair dryer", 24.95), 1)); 20: 21: ItemListBuilder builder = new ItemListBuilder(); 22: Document doc = builder.build(items); 23: Transformer t = TransformerFactory 24: .newInstance().newTransformer(); 25: t.transform(new DOMSource(doc), 26: new StreamResult(System.out)); 27: } 28: }

Document Type Definitions A DTD is a set of rules for correctly formed documents of a particular type Describes the legal attributes for each element type Describes the legal child elements for each element type Legal child elements are described with an ELEMENT rule <!ELEMENT items (item*)> The items element (the root in this case) can have 0 or more item elements Definition of an item node <!ELEMENT item (product, quantity)> Children of the item node must be a product node followed by a quantity node

Document Type Definitions Definition of product node <! ELEMENT product (description, price)> The other nodes <!ELEMENT quantity (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT price (#PCDATA)> #PCDATA stands for parsable character data which is just text Can contain any characters Special characters have to be encoded when they occur in character data

Encodings for Special Characters

DTD for Item List <!ELEMENT items (item)*> <!ELEMENT item (product, quantity)> <!ELEMENT product (description, price)> <!ELEMENT quantity (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT price (#PCDATA)>

Regular Expressions for Element Content

Document Type Definitions A DTD gives you control over the allowed attributes of an element <!ATTLIST Element Attribute Type Default> Type can be any sequence of character data specified as CDATA Type can also specify a finite number of choices <!ATTLIST price currency (USD | EUR | JPY ) #REQUIRED >

Common Attribute Types

Attribute Defaults

Document Type Definitions #IMPLIED keyword means you can supply an attribute or not. <!ATTLIST price currency CDATA #IMPLIED > If you omit the attribute, the application processing the XML data implicitly assumes some default value You can specify a default to be used if the attribute is not specified <!ATTLIST price currency CDATA "USD" >

Parsing with Document Type Definitions Specify a DTD with every XML document Instruct the parser to check that the document follows the rules of the DTD Then the parser can be more intelligent about parsing If the parser knows that the children of an element are elements, it can suppress white spaces

Parsing with Document Type Definitions An XML document can reference a DTD in one of two ways The document may contain the DTD The document may refer to a DTD stored elsewhere A DTD is introduced with a DOCTYPE declaration

Parsing with Document Type Definitions If the document contains the DTD, the declaration looks like this: <!DOCTYPE rootElement [ rules ]> Example <?xml version="1.0"?> <!DOCTYPE items [ <!ELEMENT items (item*)> <!ELEMENT item (product, quantity)> <!ELEMENT product (description, price)> <!ELEMENT quantity (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT price (#PCDATA)> ]>

<items> <item> <product> <description>Ink Jet Refill Kit</description> <price>29.95</price> </product> <quantity>8</quantity> </item> <description>4-port Mini Hub</description> <price>19.95</price> <quantity>4</quantity> </items>

Parsing with Document Type Definitions If the DTD is stored outside the document, use the SYSTEM keyword inside the DOCTYPE declaration This indicates that the system must locate the DTD The location of the DTD follows the SYSTEM keyword A DOCTYPE declaration can point to a local file <!DOCTYPE items SYSTEM "items.dtd" > A DOCTYPE declaration can point to a URL <!DOCTYPE items SYSTEM "http://www.mycompany.com/dtds/items.dtd">

Parsing with Document Type Definitions When your XML document has a DTD, use validation when parsing Then the parser will check that all child elements and attributes conform to the ELEMENT and ATTRIBUTE rules in the DTD The parser throws an exception if the document is invalid Use the setValidating method of the DocumentBuilderFactory before calling newDocumentBuilder method DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(. . .);

Parsing with Document Type Definitions If the parser validates the document with a DTD, you can avoid validity checks in your code You can tell the parser to ignore white space in non-text elements factory.setValidating(true); factory.setIgnoringElementContentWhitespace(true); If the parser has access to a DTD, it can fill in defaults for attributes

File ItemListParser.java 001: import java.io.File; 002: import java.io.IOException; 003: import java.util.ArrayList; 004: import javax.xml.parsers.DocumentBuilder; 005: import javax.xml.parsers.DocumentBuilderFactory; 006: import javax.xml.parsers.ParserConfigurationException; 007: import org.w3c.dom.Attr; 008: import org.w3c.dom.Document; 009: import org.w3c.dom.Element; 010: import org.w3c.dom.NamedNodeMap; 011: import org.w3c.dom.Node; 012: import org.w3c.dom.NodeList; 013: import org.w3c.dom.Text; 014: import org.xml.sax.SAXException; 015: 016: /** 017: An XML parser for item lists

018: */ 019: public class ItemListParser 020: { 021: /** 022: Constructs a parser that can parse item lists 023: */ 024: public ItemListParser() 025: throws ParserConfigurationException 026: { 027: DocumentBuilderFactory factory 028: = DocumentBuilderFactory.newInstance(); 029: factory.setValidating(true); 030: factory.setIgnoringElementContentWhitespace(true); 031: builder = factory.newDocumentBuilder(); 032: } 033: 034: /** 035: Parses an XML file containing an item list 036: @param fileName the name of the file 037: @return an array list containing all items in the XML file

038: */ 039: public ArrayList parse(String fileName) 040: throws SAXException, IOException 041: { 042: File f = new File(fileName); 043: Document doc = builder.parse(f); 044: 045: // get the <items> root element 046: 047: Element root = doc.getDocumentElement(); 048: return getItems(root); 049: } 050: 051: /** 052: Obtains an array list of items from a DOM element 053: @param e an <items> element 054: @return an array list of all <item> children of e 055: */ 056: private static ArrayList getItems(Element e) 057: {

058: ArrayList items = new ArrayList(); 059: 060: // get the <item> children 061: 062: NodeList children = e.getChildNodes(); 063: for (int i = 0; i < children.getLength(); i++) 064: { 065: Element childElement = (Element)children.item(i); 066: Item c = getItem(childElement); 067: items.add(c); 068: } 069: return items; 070: } 071: 072: /** 073: Obtains an item from a DOM element 074: @param e an <item> element 075: @return the item described by the given element 076: */ 077: private static Item getItem(Element e)

078: { 079: NodeList children = e.getChildNodes(); 080: 081: Product p = getProduct((Element)children.item(0)); 082: 083: Element quantityElement = (Element)children.item(1); 084: Text quantityText 085: = (Text)quantityElement.getFirstChild(); 086: int quantity = Integer.parseInt(quantityText.getData()); 087: 088: return new Item(p, quantity); 089: } 090: 091: /** 092: Obtains a product from a DOM element 093: @param e a <product> element 094: @return the product described by the given element 095: */ 096: private static Product getProduct(Element e) 097: {

098: NodeList children = e.getChildNodes(); 099: 100: Element descriptionElement = (Element)children.item(1); 101: Text descriptionText 102: = (Text)descriptionElement.getFirstChild(); 103: String description = descriptionText.getData(); 104: 105: Element priceElement = (Element)children.item(1); 106: Text priceText 107: = (Text)priceElement.getFirstChild(); 108: double price = Double.parseDouble(priceText.getData()); 109: 110: return new Product(description, price); 111: } 112: 113: private DocumentBuilder builder; 114: }

File ItemListParserTest.java 01: import java.util.ArrayList; 02: 03: /** 04: This program parses an XML file containing an item list. 05: The XML file should reference the items.dtd 06: */ 07: public class ItemListParserTest 08: { 09: public static void main(String[] args) throws Exception 10: { 11: ItemListParser parser = new ItemListParser(); 12: ArrayList items = parser.parse("items.xml"); 13: for (int i = 0; i < items.size(); i++) 14: { 15: Item anItem = (Item)items.get(i); 16: System.out.println(anItem.format()); 17: } 18: } 19: }