Efficient XML Interchange What is it? Why is it? How does it fit in?
What is Efficient XML Interchange? Alternative Representation of XML Infoset –support full XML (Infoset) data model –not a subset –no really, not a subset! Interchange Format –optimized for data exchange –transmission, storage, processing –can use Schema, conventional compression
Why? Expand the Web –limited uptake of XML & friends in certain domains performance is problem –noteworthy domains mobile, embedded, scientific, … Lesson From Binary XML Formats –real need, and real solutions –widely applicable, win-win –multiple formats cause segregation, limit adoption
Integration into XML Stack Same Data Model –merely an alternative encoding Open Issues –format, or encoding? –content negotiation? –schema knowledge vs content negotiation –modes, configurability (e.g. simple types)
WebAPI / EXI? Impact on… –APIs initalisation: encoding modes, schema info? –XMLHttpRequest again: modes, schema info? diversity of formats? –Are data models in sync? HTML as XML? –REX fragment support?
Efficient XML Interchange Format Basics
Efficient XML Interchange Goal(s) –maintain XML (Infoset) data model –seamless integration into XML software stack –improve compaction AND processing Observation: –smallness has multiple benefits –e.g. energy consumption during transmission –allows XML deployment in new scenarios Underlying Philosophy: –exploit a-priori knowledge of (likely) content
How does it work? Exploit Knowledge, at Several Different Levels –XML knowledge copious syntactic redundancy –Schema knowledge schema describes content in detail –heuristics e.g. (declared) elements >> processing instructions e.g. repeated string elements e.g. small numbers >> large numbers Cooperation with Conventional Compression –heavily biased data stream as compressor input
EXI Base Format Coding Grammars –generic grammar: describe full XML Infoset arbitrary elements, PIs, comments, entity references, etc. –schema-derived grammar describes a specific format –content-derived grammar add rules depending on encountered elements –splice these together, at very fine granularity allow anything, but know what is (currently) likely likely content: more efficient encoding
EXI Base Format Built-in, Generic Element Grammar StartTag Element EE AT(*) NS SE(*) CH ER CM PI SE(*), CH, ER, CM, PI
EXI Base Format A Schema-Based Grammar AT(color) SE(quantity) SE(desc) SE(price) SE(quantity) EE SE(desc) Element Content Model: (optional) attribute color (optional) element desc (mandatory) elements quantity, price
EXI Base Format Merged Generic & Schema Derived Grammar SE(quantity) EE SE(*), CH, ER, CM, PI SE(quantity) SE(price) SE(desc) SE(*), CH, ER, CM, PI SE(*) CH ER EE CM PI quantity desc
Other, Major EXI Features Simple Type Values –optimized codecs –type assigment through grammar generic text coding always available –string / value tables Bit-Packed vs byte-aligned codec –biased input into deflate compression
Impact on the XML Stack Questions –content negotiation, header http integration? what do you need? what would be a problem? pre-shared schemas –which formats? samples? (X)HTML? AJAX? –need hooks in the specification? –options / variables different schemas, different options?