Java WWW Week 10 Version 2.1 Mar 2008 Slide Java (JSP) and XML Format of lecture: What is XML? A sample XML file… How to use XML with JSP Example code for parsing an XML file in JSP
Java WWW Week 10 Version 2.1 Mar 2008 Slide What is XML? XML is principally concerned with the description and structure of data (content & presentation are separate) Traditional methods of data storage and exchange employ a variety of schemes Electronically, these usually are of the form of simple text files (Comma Separated Values etc.) or binary files Both methods have their own advantages and disadvantages CSV files contain data that can be easily read but lack a description of their own format Binary files contain data and a description of its data format (as a Word document does) but are proprietary schemes that require specific applications to read them
Java WWW Week 10 Version 2.1 Mar 2008 Slide What is XML? XML stands for eXtensible Markup Language Developed by World Wide Web Consortium (W3C) Aims to provide the best of both worlds Stores data in an easy to read text format Also contains a description of the data Open standard - looks similar to HTML (large user base) except the extensible nature of XML allows for the creation of user-defined tags It is not a replacement for HTML (although it may eventually supplant it)
Java WWW Week 10 Version 2.1 Mar 2008 Slide What is XML? XML offers a standard method for storing structured data Language independence (English, Chinese etc.) Hierarchical structure allows for simple and efficient querying/parsing of the document Simple data interchange between applications and/or distributed objects
Java WWW Week 10 Version 2.1 Mar 2008 Slide What is XML? Can improve upon and replace existing EDI (Electronic Document Interchange) solutions such as EDIFACT Websites gain by having content and presentation separate A site could be developed purely in XML Cost benefits No need for private EDI networks – the Internet used as exchange medium Reduced time to implement/reduced maintenance
Java WWW Week 10 Version 2.1 Mar 2008 Slide Illustrative Example Paper-based or Simple Text File Jonathan Westlake Staffordshire University Comma Separated Values Jonathan,Westlake,Staffordshire Binary – a string of 0s and 1s etc.
Java WWW Week 10 Version 2.1 Mar 2008 Slide Illustrative Example Simple ‘Freeform’ (i.e. without an XML schema) XML File Jonathan Westlake Staffordshire University
Java WWW Week 10 Version 2.1 Mar 2008 Slide How to use XML with JSP XML is a big topic! We are just going to look at one of the more fundamental aspects of XML An XML parser simply checks that your XML document is syntactically correct (well-formed) and contains correctly formatted data (valid) Once an XML document is parsed, the information it contains is accessible inside our web application There are two ways of parsing an XML document – Simple API for XML (SAX) and Document Object Model (DOM) reference link
Java WWW Week 10 Version 2.1 Mar 2008 Slide SAX Parser Event Driven : An event is triggered each time the parser encounters a beginning, or ending tag
Java WWW Week 10 Version 2.1 Mar 2008 Slide DOM Parser A tree representing the document is built in memory.
Java WWW Week 10 Version 2.1 Mar 2008 Slide DOM Tree Structure contacts---contact | |---name | | | |---firstname | | └ ---Text | | | └ ---lastname | └ ---Text | |---workplace | └ ---Text | └ --- └ ---Text For our contacts example
Java WWW Week 10 Version 2.1 Mar 2008 Slide Standard DOM Parsing Searching for a contact based on their address Create a Document object Load the XML file into the Document Using iterations over the Document nodes, you can access any of the values or attributes that are stored in the XML file When you find the record that contains the address that you are looking for, do something with the information
Standard DOM Parsing // Get a factory object (many ways to build a Document, this lets us choose) DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); // ensures the factory object returned is set to validate the XML data with the schema docBuilderFactory.setAttribute( " " docBuilderFactory.setValidating(true); docBuilderFactory.setNamespaceAware(true); // get the Document builder object that we will use to build the DOM tree in memory DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); // load the XML file into a new Document object Document doc = docBuilder.parse(new File(“myfile.xml”)); // does a bit of extra DOM processing (strips out empty nodes etc.) doc.normalize();
Standard DOM Parsing // get the whole node structure NodeList nodelist = getNodeList("/contacts", doc); for(int index = 0; index < nodelist.getLength(); index++) {Node node = nodelist.item(index); if(node.getNodeType(TEXT_NODE)) {if(node.getNodeName().equals(“firstname”)) firstname = node.getNodeValue(); if(node.getNodeName().equals(“lastname”)) lastname = node.getNodeValue(); if(node.getNodeName().equals(“workplace”)) workplace = node.getNodeValue(); if(node.getNodeName().equals(“ ”)) = node.getNodeValue(); { displayRecord(firstname, lastname, workplace, ); break; }
Java WWW Week 10 Version 2.1 Mar 2008 Slide Standard DOM Parsing Problems The previous procedure is ok if you have a fairly ‘flat’ XML structure that does not contain many different node types If you have many contacts stored then it may be very slow to iterate through the nodes until you find the contact you are looking for For more complex XML documents, you can’t use a simple FOR loop to iterate You end up with code that looks more like…
Java WWW Week 10 Version 2.1 Mar 2008 Slide Not Nice! Element root = doc.getDocumentElement(); Node configNode = root.getFirstChild(); NodeList childNodes = configNode.getChildNodes(); for (int childNum = 0; childNum < childNodes.getLength(); childNum++) { if ( childNodes.item(childNum).getNodeType() == Node.ELEMENT_NODE ) {Element child = (Element) childNodes.item( childNum ); if ( child.getTagName().equals( "header" ) ) {// Do something with the header System.out.print("Got a header!\n"); }
Java WWW Week 10 Version 2.1 Mar 2008 Slide XPath XPath (XML Path Language) is a terse (short, non- XML) syntax for addressing portions of an XML document A typical XPath expression is a Location Path consisting of a string of element or attribute qualifiers separated by forward slashes ("/"), similar in appearance to a file system path E.g. this gets the field of the first contact //contact[1]/ NB would you expect to have seen //contact[0]/...?
Java WWW Week 10 Version 2.1 Mar 2008 Slide Using XPath We can use the XPath syntax to retrieve any information we like from the Document XPath is a new(ish) specification and initially it was only available via Java extensions As of version 1.5 of the JDK, Java now natively supports XPath
Search Using XPath // create the XPath and initialize it XPath xPath = XPathFactory.newInstance().newXPath(); // now execute the XPath select statement to get the contact that matches the address NodeList nodes = nodelist, XPathConstants.NODESET); Node fNameNode = (NodeList) xPath.evaluate(//firstname’], nodes, XPathConstants.NODESET); String firstname = firstNameNode.getNodeValue(); Node lNameNode = (NodeList) xPath.evaluate(//lastname’], nodes, XPathConstants.NODESET); String lastname = lastNameNode.getNodeValue(); Node workNode = (NodeList) xPath.evaluate(//workplace’], nodes, XPathConstants.NODESET); String workplace = workPlaceNode.getNodeValue(); Node Node = (NodeList) xPath.evaluate(// ’], nodes, XPathConstants.NODESET); String = Node.getNodeValue();
Java WWW Week 10 Version 2.1 Mar 2008 Slide XMLHelper.java package xmlhelper; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; So JSP has access to a set of packages which include DOM and XPath
Java WWW Week 10 Version 2.1 Mar 2008 Slide
Java WWW Week 10 Version 2.1 Mar 2008 Slide getNodeListXPath() public static NodeList getNodeListXPath(String expression, Document target) throws XPathExpressionException { // create the XPath and initialize it XPath xPath = XPathFactory.newInstance().newXPath(); // now execute the XPath select statement NodeList nodeList = (NodeList) xPath.evaluate(expression, target, XPathConstants.NODESET); // return the resulting node return nodeList; }
Java WWW Week 10 Version 2.1 Mar 2008 Slide getBooleanXPath() public static boolean getBooleanXPath(String expression, Document target) throws XPathExpressionException { // create the XPath and initialize it XPath xPath = XPathFactory.newInstance().newXPath(); // now execute the XPath select statement Boolean nodeBoolean = (Boolean)xPath.evaluate(expression, target, XPathConstants.BOOLEAN); // return the resulting node return nodeBoolean.booleanValue(); }
Java WWW Week 10 Version 2.1 Mar 2008 Slide getNumberXPath() public static double getNumberXPath(String expression, Document target) throws XPathExpressionException { // create the XPath and initialize it XPath xPath = XPathFactory.newInstance().newXPath(); // now execute the XPath select statement Double nodeNumber = (Double)xPath.evaluate(expression, target, XPathConstants.NUMBER); // return the resulting node return nodeNumber.doubleValue(); }
Java WWW Week 10 Version 2.1 Mar 2008 Slide getStringXPath() public static String getStringXPath(String expression, Document target) throws XPathExpressionException { // create the XPath and initialize it XPath xPath = XPathFactory.newInstance().newXPath(); // now execute the XPath select statement String nodeText = (String)xPath.evaluate(expression, target, XPathConstants.STRING); // return the resulting node return nodeText; }
Java WWW Week 10 Version 2.1 Mar 2008 Slide Summary XML is important as it offers: Neutral data exchange Can be built into a web application Can be searched for content using XPath Java (and therefore JSP) can use XML and Xpath Used widely in enterprise-scale information systems No lecture next week but the first revision session in preparation of the module class test (short exam)