Internet Technologies1 XML Grammars Internet Technologies
Internet Technologies2 XML Grammars: Three Major Uses 1. Validation 2.Code Generation 3.Communication
Internet Technologies3 XML Validation Sources for this lecture: “Data on the Web” Abiteboul, Buneman and Suciu “XML in a Nutshell” Harold and Means “The XML Companion” Bradley The validation examples were originally tested with an older parser and so the specific outputs may differ from those shown.
Internet Technologies4 XML Validation A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings. Consider batch validation to be analogous to program compilation, with similar errors detected. Interactive validation involves constant comparison of the DTD against a document as it is being created.
Internet Technologies5 XML Validation The benefits of validating documents against a DTD include: Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input. Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents. Consider how Netbeans allows you to edit web.xml files.
Internet Technologies6 XML Validation Examples XML elements may contain further, embedded elements, and the entire document must be enclosed by a single document element. These are recursive hierarchical structures. A Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents.
Internet Technologies7 Things the DTD does not do: Specify the document root. Specify the number of instances of each kind of element. (Or, it’s rather hard to do.) Describe the character data inside an element (the precise syntax). DTD’s don’t naturally handle namespaces. The XML schema language is much more recent and improves on DTD’s. We have “programmer level” type specifications. To see a real DTD, view source on
Internet Technologies8 We’ll run this program against several xml files with DTD’s. We’ll study the code soon. // Validate.java using Xerces import java.io.*; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.InputSource; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.helpers.DefaultHandler; This slide shows the imported classes.
Internet Technologies9 public class Validate { public static boolean valid = true; public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } Here we check if the command line is correct.
Internet Technologies10 try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature(" true); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]); // go ahead and parse reader.parse(inputSource); }
Internet Technologies11 catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); } // Catch any errors or fatal errors here. // The parser will handle simple warnings.
Internet Technologies XML Document DTD Valid document is true
Internet Technologies XML Document DTD on the Web? VERY NICE Valid document is true
Internet Technologies14 <!DOCTYPE FixedFloatSwap [ ]> XML Document with an internal subset Valid document is true
Internet Technologies XML Document DTD Valid document is false
Internet Technologies XML Document
Internet Technologies17 DTD C:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xml Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times Valid document is true
Internet Technologies18 Is this a valid document? <!DOCTYPE person [ ]> Alan Turing computer scientist cryptographer Sure!
Internet Technologies19 The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data) XML Document
Internet Technologies20 C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" -- (#PCDATA) org.xml.sax.SAXParseException: Element type "StartYear" is not declared. org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (# PCDATA) org.xml.sax.SAXParseException: Element type "EndYear" is not declared. Valid document is false Output DTD
Internet Technologies21 There are strict rules which must be applied when an element is allowed to contain both text and child elements. The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”). The group must be optional and repeatable. This is known as a mixed content model. Mixed Content
Internet Technologies22 DTD H 2 O is water. XML Document Valid document is true
Internet Technologies23 Is this a valid document? <!DOCTYPE page [ ]> Alan Turing broke codes during World War II. He very precisely defined the notion of "algorithm". And so he had several professions: computer scientist cryptographer And mathematician Sure!
Internet Technologies24 How about this one? java Validate mixed.xml org.xml.sax.SAXParseException: The content of element type "page" must match "(paragraph)+". Valid document is false <!DOCTYPE page [ ]> The following is a paragraph marked up in XML. Alan Turing broke codes during World War II. He very precisely defined the notion of "algorithm". And so he had several professions: computer scientist cryptographer And mathemetician
Internet Technologies will not be parsed for markup]]> <!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) > XML Document DTD CDATA Section
Internet Technologies26 Recursion <!DOCTYPE tree [ ]> A DTD is a context-free grammar java Validate recursive1.xml Valid document is true
Internet Technologies27 How about this one? <!DOCTYPE tree [ ]> Alan Turing would like this Alan Turing would like this java Validate recursive1.xml org.xml.sax.SAXParseException: The content of element type "tree" must match "(node)". Valid document is false
Internet Technologies28 Relational Databases and XML Consider the relational database r1(a,b,c), r2(c,d) r1: a b c r2: c d a1 b1 c1 c2 d2 a2 b2 c2 c3 d3 c4 d4 How can we represent this database with an XML DTD?
Internet Technologies29 Relations <!DOCTYPE db [ ]> a1 b1 c1 c2 d2 c3 d3 c4 d4 java Validate Db.xml Valid document is true There is a small problem….
Internet Technologies30 Relations <!DOCTYPE db [ ]> a1 b1 c1 c2 d2 c3 d3 c4 d4 The order of the relations should not count and neither should the order of columns within rows.
Internet Technologies31 Attributes An attribute is associated with a particular element by the DTD and is assigned an attribute type. The attribute type can restrict the range of values it can hold. Example attribute types include : CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right) ID an element id that holds a unique value (among other element ID’s in the document) IDREF attributes refer to an ID
Internet Technologies32 DTD XML Document C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED. Valid document is false
Internet Technologies33 DTD XML Document Valid document is true
Internet Technologies34 DTD XML Document Valid document is true #IMPLIED means optional
Internet Technologies35 DTD XML Document Valid document is true
Internet Technologies36 ID and IDREF Attributes We can represent complex relationships within an XML document using ID and IDREF attributes.
Internet Technologies37 An Undirected Graph u vw x y z edge vertex
Internet Technologies38 A Directed Graph u w v y x
Internet Technologies39 Math 100 Geom100 Calc100Calc200 Calc300 Philo45CS1 CS2 This is called a DAG (Directed Acyclic Graph)
Internet Technologies40 Algebra I Students in this course study introductory algebra. This course has an ID But no prerequisites
Internet Technologies41 Geometry I Students in this course study how to prove several theorems in geometry. The DTD will force this to be unique.
Internet Technologies42 Calculus I Students in this course study the derivative. These are references to ID’s. (IDREFS)
Internet Technologies43 Calculus II Students in this course study the integral. The DTD requires that this name be a unique id defined within this document. Otherwise, the document is invalid.
Internet Technologies44 Calculus II Students in this course study the derivative and the integral (in 3-space). Prerequisites is an EMPTY element. It’s used only for its attributes.
Internet Technologies45 Introduction to Computer Science I In this course we study Turing machines. IDREFID A One-to-one link
Internet Technologies46 Introduction to Computer Science II In this course we study basic data structures. IDREFS ID One-to-many links
Internet Technologies47 Ethical Implications of Information Technology TBA
Internet Technologies48 The Course_Descriptions.dtd
Internet Technologies49 General Entities & General entities are used to place text into the XML document. They may be declared in the DTD and referenced in the document. They may also be declared in the DTD as residing in a file. They may then be referenced in the document.
Internet Technologies50 <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; <!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) > DTD Document using a General Entity Validate is true
Internet Technologies51 <xsl:stylesheet xmlns:xsl=" version="1.0"> XSLT Program The general entity is replaced before xslt sees it.
Internet Technologies52 C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark. xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwa p.xsl FixedFloatSwap.wml C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml Mellon National Bank and Trust XSLT OUTPUT
Internet Technologies53 <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ ] > &bankname; An external text entity
Internet Technologies54 Mellon Bank And Trust Corporation Pittsburgh PA XSLT Output Mellon Bank And Trust Corporation Pittsburgh PA JustAFile.dat
Internet Technologies55 Parameter Entities % While general entities are used to place text into the XML document parameter entities are used to modify the DTD. We want to build modular DTD’s so that we can create new DTD’s using existing ones. We’ll look at slide from and the see some examples.
Internet Technologies56 FpML is a Complete Description of the Trade Pool of modular components grouped into separate namespaces Date Schedule Product Rate Adjustable Period Notional Party Trade Trade ID Product Rate Adjustable Period Notional Party Vanilla Swap Vanilla Fixed Float Swap Cancellable Swaption FX Spot FX Outright FX Swap Forward Rate Agreement... Money Date
Internet Technologies57 <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > XML Document DTD Internal Parameter Entities
Internet Technologies58 External Parameter Entities and DTD Components Kevin Dick 123 Anywhere Lane Apt 1b Palo Alto CA USA Order.xml
Internet Technologies59 Kevin Dick 123 Not The Same Lane Work Place Palo Alto CA USA An order may have more than one address.
Internet Technologies60 440BX Motherboard MB PC-100 DIMM x CD-ROM 1 50 Several products may be purchased.
Internet Technologies61 Kevin S. Dick /01 The payment is with a Visa card. We want this document to be validated.
Internet Technologies62 order.dtd <!ATTLIST ORDER SOURCE (web | phone | retail) #REQUIRED CUSTOMERTYPE (consumer | business) "consumer" CURRENCY CDATA "USD" > Define an order based on other elements.
Internet Technologies63 %anAddress; %aLineItem; %aPayment; External parameter entity declaration % External parameter entity reference %
Internet Technologies64 address.dtd <!ELEMENT address (firstname, middlename?, lastname, street+, city, state,postal,country)> <!ATTLIST address ADDTYPE (bill | ship | billship) "billship"> <!ATTLIST street ORDER CDATA #IMPLIED>
Internet Technologies65 lineitem.dtd <!ATTLIST lineitem ID ID #REQUIRED> <!ATTLIST product CAT (CDROM|MBoard|RAM) #REQUIRED>
Internet Technologies66 <!ATTLIST card CARDTYPE (VISA|MasterCard|Amex) #REQUIRED> payment.dtd
Internet Technologies67 XML Schemas Improve on DTD’s XML Schema is the official name XSDL (XML Schema Definition Language) is the language used to create schema definitions XML Syntax Can be used to more tightly constrain a document instance Supports namespaces Permits type derivation Harder than DTD’s
Internet Technologies68 Other Grammars Include RELAX TREX (James Clark - Tree Regular Expressions for XML) RELAX NG (RELAX and TREX combined to Relax Next Generation) Schematron (“Rule based” rather than “grammar based” see Based on XSLT and XPathwww.ascc.net/xml/schematron
Internet Technologies69 XSDL - A Simple Purchase Order <purchaseOrder orderDate=" " xmlns=" xmlns:xsi=" xsi:schemaLocation=" po.xsd" >
Internet Technologies70 Dennis Scannel 175 Perry Lea Side Road Waterbury VT 15216
Internet Technologies71 Purchase Order XSDL <xs:schema xmlns:xs=" xmlns=" targetNamespace=" >
Internet Technologies72
Internet Technologies73
Internet Technologies74
Internet Technologies75
Internet Technologies76 Validate.java // Validate.java using Xerces import java.io.*; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.InputSource; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.helpers.DefaultHandler; import java.io.*;
Internet Technologies77 import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.SAXException; import org.xml.sax.InputSource; import org.xml.sax.SAXParseException;
Internet Technologies78 public class Validate extends DefaultHandler { public static boolean valid = true; public void error(SAXParseException exception) { System.out.println("Received notification of a recoverable error." + exception); valid = false; } public void fatalError(SAXParseException exception) { System.out.println("Received notification of a non-recoverable error."+ exception); valid = false; } public void warning(SAXParseException exception) { System.out.println("Received notification of a warning."+ exception); }
Internet Technologies79 public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature(" reader.setFeature( " reader.setErrorHandler(new Validate()); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]); // go ahead and parse reader.parse(inputSource);
Internet Technologies80 } catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); }
Internet Technologies81 XML Document <itemList xmlns:xsi=' xsi:noNamespaceSchemaLocation="itemList.xsd"> pen 5 eraser 7 stapler 2
Internet Technologies82 XSDL Grammar itemList.xsd <xsd:element ref="item" minOccurs="0" maxOccurs="3"/>
Internet Technologies83
Internet Technologies84 D: \examples\XSDL\testing>ant run Buildfile: build.xml run: Running Validate.java on itemList-xsd.xml Valid Document is true
Internet Technologies85 Another Example <myns:purchaseOrder orderDate=" " xmlns:myns=" xmlns:xsi= " xsi:schemaLocation= " po.xsd" >
Internet Technologies86 Dennis Scannel 175 Perry Lea Side Road Waterbury VT 05675A Note that there is a problem with this document.
Internet Technologies87
Internet Technologies88 XSDL Grammar po.xsd <xs:schema xmlns:xs=" xmlns=" targetNamespace=" >
Internet Technologies89
Internet Technologies90
Internet Technologies91 <xs:attribute name="artist" type="xs:string" />
Internet Technologies92 Running Validate D:..\examples\XSDL\testing>ant run Buildfile: build.xml run: Running Validate.java on po.xml Received notification of a recoverable error.org.xml.sax.SAXParseException: cvc-datatype-valid.1.2.1: '05675A' is not a valid 'integer' value. Received notification of a recoverable error.org.xml.sax.SAXParseException: cvc-type.3.1.3: The value '05675A' of element 'myns:postalCode' is not valid. Valid Document is false
Internet Technologies93 Fix the error and run again D:\..\XSDL\testing>ant run Buildfile: build.xml run: Running Validate.java on po.xml Valid Document is true
Internet Technologies94 Introduce a Namespace Error <myns:purchaseOrder orderDate=" " xmlns:myns=" xmlns:xsi= " xsi:schemaLocation=" po.xsd" >
Internet Technologies95 Dennis Scannel 175 Perry Lea Side Road Waterbury VT 05675
Internet Technologies96
Internet Technologies97 And run validate run: Running Validate.java on po.xml Received notification of a recoverable error.org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element 'myns:purchaseOrder'. Valid Document is false
Internet Technologies98 Code Generation Run JAXB against the.xsd file Code generated will present an API allowing us to process that style of document
Internet Technologies99 itemList.xsd again <xsd:element ref="item" minOccurs="0" maxOccurs="3"/>
Internet Technologies100
Internet Technologies101 Run xjc D:..XSDL\testing>xjc itemList.xsd D:\McCarthy\www\95-733\examples\XSDL\testing>java -jar D:\jwsdp-1.1\jaxb-1.0\lib \jaxb-xjc.jar itemList.xsd parsing a schema... compiling a schema... generated\impl\ItemImpl.java generated\impl\ItemListImpl.java generated\impl\ItemListTypeImpl.java generated\impl\ItemTypeImpl.java generated\impl\NameImpl.java
Internet Technologies102 generated\impl\QuantityImpl.java generated\Item.java generated\ItemList.java generated\ItemListType.java generated\ItemType.java generated\Name.java generated\ObjectFactory.java generated\Quantity.java generated\bgm.ser generated\jaxb.properties Write Java Code That uses NEW the api
Internet Technologies103 The build script used for these examples
Internet Technologies104 <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib/endorsed" includes="*.jar"/>
Internet Technologies105