Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Overview Aims Approach Documents Discussion
Aims A language for describing formats –Transparency of physical representation –Unambiguous (persistent) description of data –Standard implicit XML “view” of data –Generic tools (browsing, conversion) Existing work –BinX –BFD part of the Scientific Annotation Middleware project ( –ESML
Basic Mechanism XML description of structure Description can be annotated –byteOrder=“littleEndian” In DFDL description language extensible DFDL description Data file
Approach Separate out structure and semantics Tried to avoid a data model General structural language –Repetition –Pointers –References to data –New structures can be built (compositionality) Semantics –Hard to express so…we don’t –General labeling –Label semantics define elsewhere (ontologies) –Labels can be added (extensibility)
Structure – arbitrary labels fooSet fooPair foo bunchThings thing bunchThings foo fooPair......
Structure – example labels complex Array complex float byte bit byte float complex......
Structural language Formal semantics –Structured binary sequence –Defines hierarchical structure over underlying sequence of binary values Language for describing hierarchical structure –Repetition Explicit number repeats Termination characters –Data reference Conditionals Data size –Pointers Scope –As general as possible but –Must be concise and implementable
Ontologies Define mappings binary structure language primitive SDL define binary structure Core API: –getAsInt, getAsFloat, getAsByte… –getAsIntArray, getAsFloatArray… For each structure, ontology defines semantics for each method
Ontology 2 short … byte byte byte … … Ontology defines: –New structures –New XML associated with them –Mappings from new structures to language primitives via core API –API extensions –Relationships between structures (RDF/OWL)? Java primitives
Current documents SDL formal definition XML representation of SDL Primitives ontology Basic structures ontology Primer All strawmen known to be incomplete/flawed aim to kick start discussion
Discussion issues Transformations –Low level: encodings, compression, blocking –High level: filtering out formatting Concept of type needed –distinct from binary representation Programming language independence Pointer semantics Expressive power of SDL vs. implementability –Layered standard (?)
Current proposals More transformation oriented view –as opposed to representation oriented Introduction of data model … dfdl:char Java charJava byteC char bindings representations