Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Agenda Introduction and welcome - Martin Westhead 10mins Binary Format Description Language (BFD) - Alan Chappell 10mins Binary XML (BinX) - Stephen Rutherford 10mins DFDL - Martin Westhead 15mins – Big picture – Structural Description Language – Charter (20 mins Discussion) Examples repository - Alan Chappell 10mins –Bruce Barkstrom Examples at NASA (15mins Discussion)
Motivation There will never be a standard data format –E.g. XML – verbose, tree-based, explicit structure –Legacy formats –Application specific formats –One size will never fit all But could we provide a language for describing formats –Transparency of physical representation –Automatic format conversion –Unambiguous description of data
Theres more… Explicit structure enables: Standard transformation to/from XML representation –Could allow application to read/write XML –But provide underlying efficient binary representation Data stream/file becomes database –Point to parts of the structure –Extract parts of the structure –Modify parts of the structure –Integrate parts of different structures
And more… Generic tools possible –Browsing –Conversion and transformation Annotation of data –E.g. identify bits that depict hurricane in an image Enables general semantic labels, many ontologies could be developed e.g.: –S.I. units, SQL types, Time –Community specific labels, starClass = whiteDwarf –Application specific labels, nodeColour = green Could lead to a standard transformation language
Not fairy tales Based on implemented work –BinX –BFD part of the Scientific Annotation Middleware project ( Generalized and extended a little Formal semantics Foundation for extensibility
Approach Separate out structure and semantics General structural language –Repetition –Pointers –References to data –New structures can be built (compositionality) Semantics –Hard to express so…we dont –General labeling –Label semantics define elsewhere (ontologies) –Labels can be added (extensibility)
Structure – arbitrary labels fooSet fooPair foo bunchThings thing bunchThings foo fooPair......
Structure – example labels complex Array complex float byte bit byte float complex......
Structural language Formal semantics –Structured binary sequence –Defines hierarchical structure over underlying sequence of binary values Language for describing hierarchical structure –Repetition Explicit number repeats Termination characters –Data reference Conditionals Data size –Pointers Scope –As general as possible but –Must be concise and implementable Draft language definition on web page (
CSV file example char:=byte data:=[(char - [',']).*] field:=[data; [',']] finalField:=[data; [\n]] row:=[field.*] :: [finalField] table:=[row.*]
Semantic labels Many ontologies possible Initial scope probably: –Basic types (floating point, integer, character) –Simple structures (structs, arrays, tables) Obvious extensions: –SQL types –XML Schema types Key WG goal: –Define form and requirements of new ontologies
What is an Ontology? XML Schema for new types Structural description of new types Definition of core API behaviour on new type API extensions Relationships to other types
WG goals Formal language for DFDL data structure Standard representation of this language in XML Requirements for DFDL ontology Basic types ontology Basic structures ontology
Currently under discussion Abstraction from the underlying binary –Compression, encoding, encryption –Physical vs. conceptual binary sequence Abstraction of description –complex:=[foo; foo] –Instantiate foo:= float or foo:= double at use time Filtering of results –Getting to data model and leave format behind –CSV -> [[value; value; value]; [value; value; value]]
DFDL in the VO Generic tools Metadata possibilities –Ontologies can define relationships between types –E.g. polar to Cartesian –Standard classes over data objects
Getting involved Webpages: Mailing list My address: