1 MicroXML: Who, What, When, Where, Why Copyright 2012 by John Cowan Licensed under CC-BY-SA 3.0.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
© De Montfort University, XML – a meta language Howell Istance and Peter Norris School of Computing De Montfort University.
Sistemi basati su conoscenza XML (esempi) Prof. M.T. PAZIENZA a.a
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
Extensible Markup Language XML MIS 520 – Database Theory Fall 2001 (Day) Lecture 14.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a
XML: New or Old? XML was not an extention of HTML That already existed! SGML (ISO 8879) XML was a simplification of SGML  80 / 20 rule  (80% of the features.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
26-Jun-15 XML. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 11 Creating XML Document
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Effective XML Elliotte Rusty Harold
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XHTML1 Building Document Structure Chapter 2. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML)
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Syntax - Writing XML and Designing DTD's
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Session IV Chapter 9 – XML Schemas
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
2 XML Syntax XML Document Structure August 15, :00 Darmstadt Hessen Germany fine 25 SW 6 Markup Content.
An Introduction to XML Sandeep Bhattaram
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
Unit 10 Schema Data Processing. Key Concepts XML fundamentals XML document format Document declaration XML elements and attributes Parsing Reserved characters.
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Well Formed XML The basics. A Simple XML Document Smith Alice.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Extensible Markup Language
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
HTML is about making documents. Simple Code for Simple Layout My Document This is an example HTML document First paragraph Second paragraph This is the.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Unit 4 Representing Web Data: XML
Java XML IS
XML Technologies and Related Applications
Chapter 7 Representing Web Data: XML
Presentation transcript:

1 MicroXML: Who, What, When, Where, Why Copyright 2012 by John Cowan Licensed under CC-BY-SA 3.0

2 Who, When, Where Two blog posts by James Clark in December 2010 Defining MicroXML as a concept Specifying what's in, what's out, and a grammar About ten other blog posts by various people, including me There is now a W3C Community Group, with Uche Ogbuji as chair and James and I as co- editors

3 Why: tentative design goals The syntax of MicroXML shall be a subset of XML 1.0 MicroXML shall define a data model as well as a syntax MicroXML shall be dramatically simpler than XML as regards its specification, syntax and data model – This applies even though XML tools already exist – SGML tools exist, but XML was not pointless – “To boldly go where no XML has gone before”

4 Why: tentative design goals MicroXML shall be designed to complement rather than replace XML, JSON and HTML MicroXML shall support the needs of documents, in particular mixed content MicroXML shall support Unicode MicroXML shall support the use of text editors for authoring MicroXML shall be able to straightforwardly represent HTML

5 Two example documents Hello world ! Habe nun, ach! Philosophie, Juristerei, und Medizin und leider auch Theologie durchaus studiert mit heißem Bemüh'n.

6 What's in like Flynn Elements and attributes ASCII name characters Unicode character content and attribute values A single normative data model Hex character references < > & " &apos;

7 What's out beyond doubt Mandatory draconian error handling – Tools can be either draconian or permissive The whole DTD ball of mud Encodings other than UTF-8 XML declaration Attribute value normalization CDATA sections

8 No verdict yet Unicode names for elements/attributes – Old XML, or XML 1.0 Fifth Edition / XML 1.1? empty tags comments bare DOCTYPE declarations – > in attribute values (for Canonical XML compatibility) decimal character references

9 Data model Like the XML Infoset or the XPath Data Model – But as simple as possible Just a tree of element objects – No document, attribute, entity, etc. etc. info items MicroXML processors: – MUST expose the whole data model – MAY do so by constructing a tree, pushing events, letting events be pulled, or otherwise

10 Element objects An element object has three parts: – Name (a string) – Attributes (a map from strings to strings) – Children (a sequence of elements and/or strings) Element objects form a tree mirroring the doc Consecutive string children are consolidated Namespaces don't affect the data model – Namespace declarations are just attributes

11 Processing instructions No PIs? PIs in the syntax but not the data model? PIs in the prolog but not the epilog? PIs that look like start-tags only?

12 Prefixes: three possibilities No prefixes except xml: – Elements can be namespaced by convention, using xmlns attributes Prefixes on attributes only – Attributes can be namespaced too, using xmlns attributes Prefixes on both attributes and elements

13 Processing instructions The xml-style and xml-model PIs in the document prolog are standardized by W3C The php PI is not, but is very common Adding PIs to the data model would greatly complicate it: – Without PIs: just element objects – With Pis: document, element, and PI objects

14 xml:* Four attributes with the xml: prefix: – xml:space for application whitespace handling – xml:lang for language identification – xml:base for base URI – xml:id for element identification Semantics are the same as for their full XML equivalents

15 MicroXML and JSON MicroXML is both richer and poorer than JSON as a data format Interoperability is a Good Thing Convert any JSON to MicroXML and back Convert any MicroXML to JSON and back Convert MicroXML to clean JSON and back – There's more than one way to do it – No consensus in the JSON and XML communities

16 MicroLark The first parser just for MicroXML – The name alludes to Tim Bray's original 1998 XML parser Lark, but no code is shared Written in Java by John Cowan – Open source (Apache 2.0 license) Provides pull parser, push parser, and element tree constructor in about 1/3 the LOCs of Lark – Applications can provide their own subclasses of Element to customize the tree

17 MicroLark Iterators “Poor man's XPath” based on chained method calls For “/html//p[3]” write: new ElementIterator(document).root("html").descendants("p").positionFilter(3) Then iterate on the result to fetch the elements

18 MicroLark Validators (not yet) ID/IDREF/IDREFS validator – Specify namespace/element/attribute triples and namespace/attribute pairs via Java API – Retrieve elements by ID – Validate IDs for uniqueness – Validate IDREF and IDREFS attributes against IDs Examplotron validator – Validate a document against a parsed Examplotron schema

19 Examplotron Defined for XML by Eric van der Vlist in 2003 Examplotron schemas are just sample documents Element content models are deduced from content Elements in the schema are required by default Attributes in the schema are permitted but not required in the document

20 Examplotron eg:occurs indicates repetition of an element – Can be “*”, “+”, “?”, “.” (once), “-” (zero times) eg:content indicates specific content model – ordered, unordered, or mixed-content child elements – a few simple types: string, integer, decimal, double, date, dateTime, boolean, base64Binary More at

21 When: Now! Join the community group – As an individual – As a representative of your organization (which does not need to belong to W3C) Post to the mailing list Add pages to the wiki Talk to your friends Make MicroXML a reality