Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold

Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu http://www.cafeconleche.org/

Part I: Syntax

Item 1: Include an XML declaration Optional, but treat as required Specifies version, character set, and encoding Very important for detecting encoding Identifies XML when file and media type information is unavailable or unreliable

Item 3: Stay with XML 1.0 XML 1.1: New name characters C0 control characters C1 control characters NEL Undeclare namespace prefixes Incompatible with Most XML parsers W3C and RELAX NG schema languages XOM, JDOM

Part II: Structure

The XML Stack

Item 14: Allow All XML syntax CDATA sections Entity references Processing instructions Comments Numeric character references Document type declarations Different ways of representing the same core content; not different information

Item 9: Distinguish text from markup A DocBook element 28657 ]]> The content is: 28657 This is the same: <value> <double>28657</double> </value>

The reverse problem Tools that create XML from strings: Tree-based editors like or XML Spy WYSIWYG applications like OpenOffice Writer Programming APIs such as DOM, JDOM, and XOM The tool automatically escapes reserved characters like, or &. Just because something looks like an XML tag does not mean it is an XML tag.

Item 10: White space matters Parsers report all white space in element content, including boundary white space An xml:space attribute is for the client application only, not the parser White space in attribute values is normalized Parsers do not report white space in the prolog, epilog, the document type declaration, and tags.

Item 11: Make structure explicit through markup Bad Withdrawal 2003 12 15 200.00 Better 2003-12-15 200.00

Item 12: Store metadata in attributes Material the reader doesn’t want to see URLs IDs Styles Revision dates Authors name No substructure Revision tracking Citations No multiple elements

Item 13: Remember mixed content Narrative documents Record-like documents The RSS problem Xerlin 1.3 released Xerlin 1.3, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include XML Schema support, WebDAV capabilities, and various user interface enhancements. Java 1.2 or later is required. http://www.cafeconleche.org/#news2003April7

What you really want is this: Xerlin 1.3,an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include: XML Schema support WebDAV capabilities Various user interface enhancements Java 1.2 or later is required.

What people do is this: <a href="http://www.xerlin.org">Xerlin 1.3</a>, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include: <ul> <li>XML Schema support</li> <li>WebDAV capabilities</li> <li>Various user interface enhancements</li> </ul> Java 1.2 or later is required.

Item 16: Prefer URLs to unparsed entities and notations URLs are simple and well understood Notations and unparsed entities are confusing and little used URLs don’t require the DTD to be read Many APIs don’t even support notations and unparsed entities

Part III: Semantics

Item 17: Use processing instructions for process-specific content For a very particular, even local, process Describes how a particular process acts on the data in the document Does not describe or add to the content itself A unit that can be treated in isolation Content is not XML-like. Applies to the entire document

Processing instructions are not appropriate when: Content is closely related to the content of the document itself. Structure extends beyond a single processing instruction Needs to be validated.

Item 18: Include all information in instance documents Not all parsers read the DTD Especially browsers Beware Default attribute values Parsed entity references XInclude ID type dependence (XPath, DOM, etc.)

Item 19: Encode binary data using quoted printable and/or Base64 Quoted printable works well for mostly text Base-64 for non-text data Can you link to the data with a URL instead?

Item 20-22: Use namespaces for modularity and extensibility Not hard; simple cases can use one default namespace http URIs are normally preferred DTD validation is tricky Code to namespace URIs, not prefixes Avoid namespace prefixes in element content and attribute values

Item 23: Reuse XHTML for generic narrative content

Item 24: Choose the right schema language for the job DTDs The W3C XML Schema Language RELAX NG Schematron

Item 25: Pretend there's no such thing as the PSVI Post Schema Validation Infoset Adds types like int and gYear to elements Often not specific enough Element/attribute names are types

Item 28: Use only what you need You need Well-formed XML 1.0 A parser You probably need: Namespaces You may not need: DTDs Schemas XInclude WS-Kitchen-Sink etc.

Item 29: Always use a parser Can’t use regular expressions: Detecting encoding Comments and processing instructions that contain tags CDATA sections Unexpected placement of spaces and line breaks within tags Default attribute values Character and entity references Malformed documents Internal DTD Subset Why not? Unfamiliarity with parsers Too slow

Item 30: Layer Functionality

Item 31-33: Program to standard APIs Easier to deploy in Java 1.4/1.5 Different implementations have different performance characteristics SAX is fast DOM interoperates Semi-standard: JDOM XOM Bleeding edge StAX JAXB

Item 34: Read the complete DTD Be conservative in what you generate; liberal in what you accept Important content from DTD: Default attribute values Namespace declarations Entity references

Item 35: Navigate with XPath More robust against unexpected structure Allow optimization by engine Easier to code; enhanced programmer productivity

Item 36: Serialize XML with XML

Item 37: Validate inside your program with schemas

Part IV: Implementation

Item 38: Write documents in Unicode Prefer UTF-8 Smaller in English ASCII compatible Normalization É, ü, ì and so forth NFC ICU

Item 40: Avoid Vendor Lockin; Beware Opaque, binary data used in place of marked up text. Over-abbreviated, inobvious names like F17354 and grgyt APIs that hide the XML Products that focus on the "Infoset” Alternate serializations of XML Patented formats

Item 41: Hang on to your relational database

Item 42: Document Namespaces with RDDL <!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN" "http://www.rddl.org/rddl-xhtml.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rddl="http://www.rddl.org/"> MegaBank Statement Markup Language (MBSML) This is the XML namespace for the <a href="http://developer.megabank.com/xml/">MegaBank Statement Markup Language. <rddl:resource xlink:type="simple" xlink:href="http://developer.megabank.com/xml/spec.html" xlink:role="http://www.w3.org/TR/html4/" xlink:arcrole ="http://www.rddl.org/purposes#normative-reference" > The MegaBank Statement Markup Language Specification 1.0

Item 43: Preprocess XSLT on the server side

Item 44: Serve XML+CSS to the client Supported by Safari IE 5.0 and later Mozilla Netscape 6 and later Konqueror Opera Firefox Omniweb

Item 45: Pick the correct MIME type application/xml Not text/xml! Don't use charset application/mathml+xml image/svg+xml application/xslt+xml

Item 46: TagSoup Your HTML

Item 47: Catalog common resources <catalog xmlns= "urn:oasis:names:tc:entity:xmlns:xml:catalog" > <public publicId= "-//OASIS//DTD DocBook XML V4.2//EN" uri= "file:///opt/xml/docbook/docbookx.dtd"/>

Item 50: Compress if space is a problem //output OutputStream fout = new FileOutputStream("data.xml.gz"); OutputStream out = new GZipOutputStream(fout); OutputFormat format = new OutputFormat(document); XMLSerializer output = new XMLSerializer(out, format); output.serialize(doc); // input InputStream fin = new FileInputStream("data.xml.gz"); InputStream in = new GZipInputStream(fin); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document doc = parser.parse(in); S // work with the document...

To Learn More This Presentation: http://cafeconleche.org/slides/albany/ effectivexml Effective XML: 50 Specific Ways to Improve Your XML Documents Elliotte Rusty Harold Addison-Wesley, 2003 ISBN 0-321-15040-6 $44.99 http://cafeconleche.org/books/ effectivexml http://cafeconleche.org/books/

Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold

Similar presentations

Presentation on theme: "Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold

Similar presentations

Presentation on theme: "Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold"— Presentation transcript:

Similar presentations

About project

Feedback