Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Agenda About the BinX project About the BinX project A brief introduction to the BinX language A brief introduction to the BinX language Introduction to the BinX library Introduction to the BinX library Advanced API to the BinX library Advanced API to the BinX library Use cases and requirements Use cases and requirements Dr Bob Mann Dr Bob Mann Dr Chris Maynard Dr Chris Maynard Discussion Discussion
About the BinX project
The problem XML is useful to represent metadata XML is useful to represent metadata Scientific datasets can be too large in XML Scientific datasets can be too large in XML Most scientific data are in binary files Most scientific data are in binary files Binary data files are not all standardized Binary data files are not all standardized Binary data files are platform-dependent Binary data files are platform-dependent
BinX – a solution Initially designed for the Grid environment Initially designed for the Grid environment Annotate data schema for any binary file Annotate data schema for any binary file Data elements are marked up in XML Data elements are marked up in XML Describe three levels of features in a binary file Describe three levels of features in a binary file Underlying physical representation (byte order) Underlying physical representation (byte order) Primitive data types (integer, float) Primitive data types (integer, float) Structure of the dataset (array, table) Structure of the dataset (array, table)
The BinX project at eDIKT Implementing a software library for BinX Implementing a software library for BinX Develop a series of tools based on the library Develop a series of tools based on the library Choose C++ for performance Choose C++ for performance Write portable code for different platforms Write portable code for different platforms Robust and easy to use Robust and easy to use
Development status Requirement gathering from July 2002 Requirement gathering from July 2002 Development started in October 2002 Development started in October 2002 Prototype finished in December 2002 Prototype finished in December 2002 Alpha version complete in April 2003 Alpha version complete in April 2003 Beta version to be released in June 2003 Beta version to be released in June 2003
The deliverables The BinX library The BinX library Compiled code on different platforms Compiled code on different platforms Source code with Open Source license Source code with Open Source license Documentation Documentation Users guide Users guide Developers guide Developers guide Utilities and examples Utilities and examples
The BinX Language
What is BinX? The Binary XML Description Language The Binary XML Description Language A language for annotating binary data files A language for annotating binary data files It describes data types, data structures and attributes such as byte order It describes data types, data structures and attributes such as byte order A BinX document is an XML file with metadata of a binary data file A BinX document is an XML file with metadata of a binary data file
A BinX document Root element Data class section Data instance section Abstract data type
Data elements Primitive data elements Primitive data elements Byte, character, integer, real Byte, character, integer, real Complex data elements Complex data elements Arrays, struct, union Arrays, struct, union User-defined data elements User-defined data elements
Primitive data types Bit Bit Character Character Integer Integer,, Real Real
Complex data types Arrays Arrays Repetitive collection of any data element Repetitive collection of any data element Multidimensional Multidimensional Three types of arrays Three types of arrays Fixed length array Fixed length array Variable-length array Variable-length array Streamed array Streamed array Struct Struct A sequence of data elements A sequence of data elements Union Union One of a group of possible data elements conditional to the discriminant One of a group of possible data elements conditional to the discriminant
Arrays Fixed-length array Fixed-length array Variable-length array Variable-length array Streamed array Streamed array
Struct
Union
User-defined data type
Data elements as instances
Reference defined elements
The BinX Library Alpha version
Fundamental requirements Access to data elements in binary files via BinX Access to data elements in binary files via BinX Parse the BinX document Parse the BinX document Build in-memory data structures Build in-memory data structures Read data values from the binary file Read data values from the binary file Automatic conversion Automatic conversion Byte ordering Byte ordering Padding Padding Producing BinX document and binary data Producing BinX document and binary data Generate BinX document for data structures Generate BinX document for data structures Save assigned data values into binary files Save assigned data values into binary files
General use cases Data conversion (byte order) Data conversion (byte order) Data extraction (sub-dataset) Data extraction (sub-dataset) Data combination (two arrays to one) Data combination (two arrays to one) Data presentation (browse, pure XML) Data presentation (browse, pure XML)
BinX Components The library has core functionality to support generic utilities and applications The library has core functionality to support generic utilities and applications Applications Utilities BinX Library Core BinX core functionality Parse BinX document Read binary data Generic tools Data conversion Extraction Packing/Unpacking Applications Domain-specific
The BinX library core Input: SchemaBinX, binary data file Input: SchemaBinX, binary data file Output: DataBinX, In-memory dataset Output: DataBinX, In-memory dataset … … The BinX library In-memory Data structure (Values loaded on demand)
The BinX Utilities DataBinX generator DataBinX generator DataBinX splitter DataBinX splitter SchemaBinX creator SchemaBinX creator Binary file indexer Binary file indexer
DataBinX generator Put binary data inside XML Put binary data inside XML For browsing, web service return, query result set For browsing, web service return, query result set … … The BinX library
DataBinX splitter The reverse of DataBinX generator The reverse of DataBinX generator Generate binary file for testing, transportation Generate binary file for testing, transportation Cross-platform (byte order) Cross-platform (byte order) … … The BinX library
SchemaBinX creator GUI and Web-based utilities GUI and Web-based utilities Build BinX document interactively Build BinX document interactively Create a BinX document based on another Create a BinX document based on another
Binary file indexer Generating indices for binary data files Generating indices for binary data files Such indices can be used for fast data access Such indices can be used for fast data access … … The BinX library XYXY
Applications for astronomy FITS and VOTable conversion FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … END SIMPLE = T … END <?xml version=. … <?xml version=. …
FITS DataBinX VOTable FITS to VOTable conversion FITS to VOTable conversion DataBinx Utility FITS Schema BinX Schema BinX Preprocessor DataBinx VOTable XSLT transformer
VOTableDataBinXFITS VOTable to FITS conversion VOTable to FITS conversion XSLT transformer VOTable XSLT Preprocessor DataBinx FITS Schema BinX Schema BinX DataBinx Utility Binary Data Binary Data Post processor FITS Header FITS Header
FITS-VOTable experiment Sample FITS file Sample FITS file A data table of 82 rows X 20 fields A data table of 82 rows X 20 fields File size: 37KB File size: 37KB Generated DataBinx by DataBinx utility Generated DataBinx by DataBinx utility Time spent: 268 ms Time spent: 268 ms DataBinx document size: 1.2MB DataBinx document size: 1.2MB VOTable transformed by MSXML VOTable transformed by MSXML Time spent: about 1 second Time spent: about 1 second VOTable document size: 51KB VOTable document size: 51KB
Possible future releases DataBinX parsing DataBinX parsing Utilities (GUI BinX editor) Utilities (GUI BinX editor) XPath-based data query XPath-based data query DFDL support DFDL support Preserving special tags Preserving special tags For comments, application-specific tags For comments, application-specific tags Text file support Text file support
Features or issues to consider Converting floating point numbers Converting floating point numbers 80-bit, 96-bit, 128-bit floating point 80-bit, 96-bit, 128-bit floating point Array manipulation (slice, section) Array manipulation (slice, section) SAX-based XML document parsing SAX-based XML document parsing Use cases in place of DOM parsing Use cases in place of DOM parsing Built in the library or as add-on component? Built in the library or as add-on component? Database support Database support Annotating database tables? Annotating database tables? Query database tables through BinX? Query database tables through BinX? Java version of the library Java version of the library Keeping exactly the same features with the C++ version? Keeping exactly the same features with the C++ version? Supporting XQuery Supporting XQuery Query binary data files with XQuery on BinX Query binary data files with XQuery on BinX
Support For problems of usage: For problems of usage: (coming soon) (coming soon) For requirements and suggestions: For requirements and suggestions: