Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Overview Background Motivation Approach Current status
Motivation There will never be a standard data format –E.g. XML – verbose, tree-based, explicit structure –Legacy formats –Application specific formats –One size will never fit all But could we provide a language for describing formats –Transparency of physical representation –Automatic format conversion –Unambiguous description of data
There’s more… Explicit structure enables: Standard transformation to/from XML representation –Could allow application to read/write XML –But provide underlying efficient binary representation Data stream/file becomes database –Point to parts of the structure –Extract parts of the structure –Modify parts of the structure –Integrate parts of different structures
And more… Generic tools possible –Browsing –Conversion and transformation Annotation of data –E.g. identify bits that depict hurricane in an image Enables general semantic labels, many ontologies could be developed e.g.: –S.I. units, SQL types, Time –Community specific labels, “starClass = whiteDwarf” –Application specific labels, “nodeColour = green” Could lead to a standard transformation language
Not fairy tales Based on implemented work –BinX –BFD part of the Scientific Annotation Middleware project ( –ESML Generalized and extended a little Clear semantics Foundation for extensibility
Layers Data Model Structure Primitives FortranC/C++Java Binary fileText fileData stream API Data Model Transformations
Approach Data model –XML infoset –Obvious way to describe it: XSD API –DOM/SAX –Extended to provide non-string value access Transformations –Ontology of predefined transformations (extensible) –XML language for: Composition Attaching to file contents Populating the model
Or to put it another way… XSD defines models for XML documents DFDL extends XSD to define models for data in different formats Efficient read/write access to binary and text data sources using DOM/SAX
Current status WG status –Formed 1 year ago –6 months on a false start –First draft expected GGF11 Key discussion: –Mapping/transformation language –Linking mechanisms –XML representation –Flexibility
Getting involved Webpages: Mailing list My address: