Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold

Slides:



Advertisements
Similar presentations
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
Advertisements

History Leading to XHTML
Tutorial 9 Working with XHTML
Document Type Definitions
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 11 Creating XML Document
Introduction to XML Extensible Markup Language
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Effective XML Elliotte Rusty Harold
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee.
Creating a Simple Page: HTML Overview
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XP The University of Akron Summit College Business Technology Department Computer Information Systems 2440: 140 Internet Tools Instructor: Enoch E. Damson.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
1 Document Object Model (DOM) MV4920 – XML 24 September 2001 Simon R. Goerger MAJ, US Army
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Session IV Chapter 9 – XML Schemas
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XHTML By Trevor Adams. Topics Covered XHTML eXtensible HyperText Mark-up Language The beginning – HTML Web Standards Concept and syntax Elements (tags)
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XP 1Creating Web Pages with XML Tutorial 1 New Perspectives on XML Tutorial 1 – Creating an XML Document.
Effective XML Elliotte Rusty Harold
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Unit 4 Representing Web Data: XML
Creating a Well-Formed Valid Document
Tutorial 9 Working with XHTML
Tutorial 9 Working with XHTML
Chapter 7 Representing Web Data: XML
Creating an XML Document
Tutorial 9 Working with XHTML
Presentation transcript:

Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold

Part I: Syntax

Item 1: Include an XML declaration Optional, but treat as required Specifies version, character set, and encoding Very important for detecting encoding Identifies XML when file and media type information is unavailable or unreliable

Item 3: Stay with XML 1.0 XML 1.1: New name characters C0 control characters C1 control characters NEL Undeclare namespace prefixes Incompatible with Most XML parsers W3C and RELAX NG schema languages XOM, JDOM

Part II: Structure

The XML Stack

Item 14: Allow All XML syntax CDATA sections Entity references Processing instructions Comments Numeric character references Document type declarations Different ways of representing the same core content; not different information

Item 9: Distinguish text from markup A DocBook element ]]> The content is: This is the same: <value> <double>28657</double> </value>

The reverse problem Tools that create XML from strings: Tree-based editors like or XML Spy WYSIWYG applications like OpenOffice Writer Programming APIs such as DOM, JDOM, and XOM The tool automatically escapes reserved characters like, or &. Just because something looks like an XML tag does not mean it is an XML tag.

Item 10: White space matters Parsers report all white space in element content, including boundary white space An xml:space attribute is for the client application only, not the parser White space in attribute values is normalized Parsers do not report white space in the prolog, epilog, the document type declaration, and tags.

Item 11: Make structure explicit through markup Bad Withdrawal Better

Item 12: Store metadata in attributes Material the reader doesn’t want to see URLs IDs Styles Revision dates Authors name No substructure Revision tracking Citations No multiple elements

Item 13: Remember mixed content Narrative documents Record-like documents The RSS problem Xerlin 1.3 released Xerlin 1.3, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include XML Schema support, WebDAV capabilities, and various user interface enhancements. Java 1.2 or later is required.

What you really want is this: Xerlin 1.3,an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include: XML Schema support WebDAV capabilities Various user interface enhancements Java 1.2 or later is required.

What people do is this: <p><a href=" 1.3</strong></a>, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include:</p> <ul> <li>XML Schema support</li> <li>WebDAV capabilities</li> <li>Various user interface enhancements</li> </ul> <p>Java 1.2 or later is required.</p>

Item 16: Prefer URLs to unparsed entities and notations URLs are simple and well understood Notations and unparsed entities are confusing and little used URLs don’t require the DTD to be read Many APIs don’t even support notations and unparsed entities

Part III: Semantics

Item 17: Use processing instructions for process-specific content For a very particular, even local, process Describes how a particular process acts on the data in the document Does not describe or add to the content itself A unit that can be treated in isolation Content is not XML-like. Applies to the entire document

Processing instructions are not appropriate when: Content is closely related to the content of the document itself. Structure extends beyond a single processing instruction Needs to be validated.

Item 18: Include all information in instance documents Not all parsers read the DTD Especially browsers Beware Default attribute values Parsed entity references XInclude ID type dependence (XPath, DOM, etc.)

Item 19: Encode binary data using quoted printable and/or Base64 Quoted printable works well for mostly text Base-64 for non-text data Can you link to the data with a URL instead?

Item 20-22: Use namespaces for modularity and extensibility Not hard; simple cases can use one default namespace http URIs are normally preferred DTD validation is tricky Code to namespace URIs, not prefixes Avoid namespace prefixes in element content and attribute values

Item 23: Reuse XHTML for generic narrative content

Item 24: Choose the right schema language for the job DTDs The W3C XML Schema Language RELAX NG Schematron

Item 25: Pretend there's no such thing as the PSVI Post Schema Validation Infoset Adds types like int and gYear to elements Often not specific enough Element/attribute names are types

Item 28: Use only what you need You need Well-formed XML 1.0 A parser You probably need: Namespaces You may not need: DTDs Schemas XInclude WS-Kitchen-Sink etc.

Item 29: Always use a parser Can’t use regular expressions: Detecting encoding Comments and processing instructions that contain tags CDATA sections Unexpected placement of spaces and line breaks within tags Default attribute values Character and entity references Malformed documents Internal DTD Subset Why not? Unfamiliarity with parsers Too slow

Item 30: Layer Functionality

Item 31-33: Program to standard APIs Easier to deploy in Java 1.4/1.5 Different implementations have different performance characteristics SAX is fast DOM interoperates Semi-standard: JDOM XOM Bleeding edge StAX JAXB

Item 34: Read the complete DTD Be conservative in what you generate; liberal in what you accept Important content from DTD: Default attribute values Namespace declarations Entity references

Item 35: Navigate with XPath More robust against unexpected structure Allow optimization by engine Easier to code; enhanced programmer productivity

Item 36: Serialize XML with XML

Item 37: Validate inside your program with schemas

Part IV: Implementation

Item 38: Write documents in Unicode Prefer UTF-8 Smaller in English ASCII compatible Normalization É, ü, ì and so forth NFC ICU

Item 40: Avoid Vendor Lockin; Beware Opaque, binary data used in place of marked up text. Over-abbreviated, inobvious names like F17354 and grgyt APIs that hide the XML Products that focus on the "Infoset” Alternate serializations of XML Patented formats

Item 41: Hang on to your relational database

Item 42: Document Namespaces with RDDL <!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN" " <html xmlns=" xmlns:xlink=" xmlns:rddl=" MegaBank Statement Markup Language (MBSML) This is the XML namespace for the <a href=" Statement Markup Language. <rddl:resource xlink:type="simple" xlink:href=" xlink:role=" xlink:arcrole =" > The MegaBank Statement Markup Language Specification 1.0

Item 43: Preprocess XSLT on the server side

Item 44: Serve XML+CSS to the client Supported by Safari IE 5.0 and later Mozilla Netscape 6 and later Konqueror Opera Firefox Omniweb

Item 45: Pick the correct MIME type application/xml Not text/xml! Don't use charset application/mathml+xml image/svg+xml application/xslt+xml

Item 46: TagSoup Your HTML

Item 47: Catalog common resources <catalog xmlns= "urn:oasis:names:tc:entity:xmlns:xml:catalog" > <public publicId= "-//OASIS//DTD DocBook XML V4.2//EN" uri= "file:///opt/xml/docbook/docbookx.dtd"/>

Item 50: Compress if space is a problem //output OutputStream fout = new FileOutputStream("data.xml.gz"); OutputStream out = new GZipOutputStream(fout); OutputFormat format = new OutputFormat(document); XMLSerializer output = new XMLSerializer(out, format); output.serialize(doc); // input InputStream fin = new FileInputStream("data.xml.gz"); InputStream in = new GZipInputStream(fin); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document doc = parser.parse(in); S // work with the document...

To Learn More This Presentation: effectivexml Effective XML: 50 Specific Ways to Improve Your XML Documents Elliotte Rusty Harold Addison-Wesley, 2003 ISBN $ effectivexml