Effective XML Elliotte Rusty Harold

Slides:



Advertisements
Similar presentations
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
Advertisements

XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
History Leading to XHTML
Tutorial 9 Working with XHTML
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
Tutorial 11 Creating XML Document
Introduction to XML Extensible Markup Language
Introduction to XML This material is based heavily on the tutorial by the same name at
Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold
Effective XML Elliotte Rusty Harold
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Creating a Simple Page: HTML Overview
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
COSC617 Project XML Tools Mark Liu Sanjay Srivastava Junping Zhang.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Using DSDL plus annotations for Netconf (+) data modeling Rohan Mahy draft-mahy-canmod-dsdl-01.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XP 1Creating Web Pages with XML Tutorial 1 New Perspectives on XML Tutorial 1 – Creating an XML Document.
XML Introduction to XML Extensible Markup Language.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Unit 4 Representing Web Data: XML
Tutorial 9 Working with XHTML
Server-Side Application and Data Management IT IS 3105 (FALL 2009)
XML in Web Technologies
Chapter 7 Representing Web Data: XML
Creating an XML Document
Presentation transcript:

Effective XML Elliotte Rusty Harold

Part 0: Should We Use XML?

The XML Backlash “With proper mark-up/logic separation, a POJO data model, and a refreshing lack of XML, Apache Wicket makes developing web-apps simple and enjoyable again. Swap the boilerplate, complex debugging and brittle code for powerful, reusable components written with plain Java and HTML.” -- Apache Wicket

Choose XML ● For data that must be exchanged ● Or extended ● Or stored

Don’t Choose XML for ● Purely local, transient data (e.g. internal method arguments ● RPC is an edge case

Why Use XML ● Well-defined, well understood ● Secure ● Extensible ● Fast ● Easy ● Robust ● Internationalizable ● Platform independent ● Language independent ● Not executable ● Standard parsers easily available

Avoid ● JSON ● YAML ● Java Properties ● Custom syntax ● Etc.

Why? 2 usually orthogonal reasons ● Mixing Data with Code is Bad – Unportable data – Opens big security holes – This is why you want to use XML instead of Ruby, Python, PHP, etc. ● Weak Parsers – Bugs and security holes – Not internationalizable – This is why you don’t want to use YAML, custom file formats parsed by regular expressions, etc.

Limited Use Cases ● Works for: – Lists – Maps – Sets – Simple config files ● Not so well for: – Trees – Networks – Narrative data – Annotated data

Choose the right tools: ● XPath, XSLT, XQuery ● E4X, XOM, JDOM ● RELAX NG ● Avoid – Regular expressions – DOM – W3C XSD Schemas

Part I: Syntax

Stay with XML 1.0 XML 1.1: New name characters C0 control characters C1 control characters NEL Undeclare namespace prefixes Incompatible with Most XML parsers W3C and RELAX NG schema languages XOM, JDOM Many browsers

Part II: Structure

The XML Stack

Allow All XML syntax CDATA sections Entity references Processing instructions Comments Numeric character references Document type declarations Different ways of representing the same core content; not different information

Distinguish text from markup A DocBook element ]]> The content is: This is the same: <value> <double>28657</double> </value>

The reverse problem Tools that create XML from strings: Tree-based editors like or XML Spy WYSIWYG applications like OpenOffice Writer Programming APIs such as DOM, JDOM, and XOM The tool automatically escapes reserved characters like, or &. Just because something looks like an XML tag does not mean it is an XML tag.

White space matters Parsers report all white space in element content, including boundary white space An xml:space attribute is for the client application only, not the parser White space in attribute values is normalized Parsers do not report white space in the prolog, epilog, the document type declaration, and tags.

Make structure explicit through markup Bad Withdrawal Better

Store metadata in attributes Material the reader doesn’t want to see URLs IDs Styles Revision dates Author’s name No substructure Revision tracking Citations Single item only

Remember mixed content Narrative documents Record-like documents The RSS problem Xerlin 1.3 released Xerlin 1.3, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include XML Schema support, WebDAV capabilities, and various user interface enhancements. Java 1.2 or later is required.

What you really want is this: Xerlin 1.3,an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include: XML Schema support WebDAV capabilities Various user interface enhancements Java 1.2 or later is required.

What people do is this: <p><a href=" 1.3</strong></a>, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include:</p> <ul> <li>XML Schema support</li> <li>WebDAV capabilities</li> <li>Various user interface enhancements</li> </ul> <p>Java 1.2 or later is required.</p>

Prefer URLs to unparsed entities and notations URLs are simple and well understood Notations and unparsed entities are confusing and little used URLs don’t require the DTD to be read Many APIs don’t even support notations and unparsed entities

Part III: Semantics

Use processing instructions for process-specific content For a very particular, even local, process Describes how a particular process acts on the data in the document Does not describe or add to the content itself A unit that can be treated in isolation Content is not XML-like. Applies to the entire document

Processing instructions are not appropriate when: Content is closely related to the content of the document itself Structure extends beyond a single processing instruction Needs to be validated

Include all information in instance documents Not all parsers read the DTD Especially browsers Beware Default attribute values Parsed entity references XInclude ID type dependence (XPath, DOM, etc.)

Encode binary data using quoted printable and/or Base64 Quoted printable works well for mostly text Base-64 for non-text data Can you link to the data with a URL instead? Can you bundle the data with XML using zip, jar, XOP, or MIME?

Use namespaces for modularity and extensibility Simple cases can use one default namespace http URIs are normally preferred DTD validation is tricky Code to namespace URIs, not prefixes Avoid namespace prefixes in element content and attribute values

Reuse XHTML for generic narrative content %xhtml1;

Choose the right schema language for the job DTDs The W3C XML Schema Language RELAX NG Schematron

Use only what you need You need Well-formed XML 1.0 A parser You probably need: Namespaces You may not need: DTDs Schemas XInclude SOAP WS-Kitchen-Sink etc.

Always use a parser Can’t use regular expressions: Detecting encoding Comments and processing instructions that contain tags CDATA sections Unexpected placement of spaces and line breaks within tags Default attribute values Character and entity references Malformed documents Internal DTD Subset Why not? Unfamiliarity with parsers Too slow

Layer Functionality

Program to standard APIs Easier to deploy in Java 1.4/1.5 Different implementations have different performance characteristics SAX is fast DOM interoperates

Program to non-standard APIs for ease of development ● JDOM, XOM ● E4X

Read the complete DTD Be conservative in what you generate; liberal in what you accept Important content from DTD: Default attribute values Namespace declarations Entity references ID types

Navigate with XPath More robust against unexpected structure Allow optimization by engine Easier to code; enhanced programmer productivity Might be slower

Validate inside your program with schemas

Part IV: Implementation

Write documents in Unicode Prefer UTF-8 Smaller in English ASCII compatible Normalization É, ü, ì and so forth NFC ICU

Avoid Vendor Lockin; Beware Opaque, binary data used in place of marked up text. Over-abbreviated, inobvious names like F17354 and grgyt APIs that hide the XML Products that focus on the "Infoset” Alternate serializations of XML Patented formats

Hang on to your relational database For tabular data But consider native XML databases going forward

Document Namespaces with RDDL <!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN" " <html xmlns=" xmlns:xlink=" xmlns:rddl=" MegaBank Statement Markup Language (MBSML) This is the XML namespace for the <a href=" Statement Markup Language. <rddl:resource xlink:type="simple" xlink:href=" xlink:role=" xlink:arcrole =" > The MegaBank Statement Markup Language Specification 1.0

Pick the correct MIME type application/xml Not text/xml! Don't use charset application/mathml+xml image/svg+xml application/xslt+xml

TagSoup Your HTML

Catalog common resources <catalog xmlns= "urn:oasis:names:tc:entity:xmlns:xml:catalog" > <public publicId= "-//OASIS//DTD DocBook XML V4.2//EN" uri= "file:///opt/xml/docbook/docbookx.dtd"/>

Compress if space is a problem //output OutputStream fout = new FileOutputStream("data.xml.gz"); OutputStream out = new GZipOutputStream(fout); OutputFormat format = new OutputFormat(document); XMLSerializer output = new XMLSerializer(out, format); output.serialize(doc); // input InputStream fin = new FileInputStream("data.xml.gz"); InputStream in = new GZipInputStream(fin); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document doc = parser.parse(in); // work with the document...

To Learn More Effective XML: 50 Specific Ways to Improve Your XML Documents Elliotte Rusty Harold Addison-Wesley, 2003 ISBN $ effectivexml