e-Science Data Information and Knowledge Transformation The BinX Language
What is BinX? Binary in XML –Use XML to mark up binary data –Mark up data types –Mark up sequences –Mark up arrays –Complex structures
Primitive Data Types Mark up data types FF 7F 7F FF FF FF C C
Abstract “struct” types Mark up a sequence Screen descriptor in GIF: Screen width: unsigned short; Screen height: unsigned short; Packed field: a byte Background colour index: byte Pixel aspect ratio: byte
Abstract “array” types Mark up an array A 2-dimensional array containing 10-by-100, 32-bit integers
Embedded abstract types Complex structures
User-defined metadata Label the data types and structures
Reusable type definitions Define macros for reuse
Linking to binary data Reference the binary data file … …
A BinX document – – – – – Root element Data class section Data instance section Abstract data type
DataBinX DataBinX = BinX with Data
e-Science Data Information and Knowledge Transformation The BinX Library
BinX Components The library has core functionality to support generic utilities and applications Applications Utilities BinX Library Core BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX Generic tools DataBinx pack/unpack Extractor, Viewer BinX editor Applications Domain-specific
BinX application models Data catalogue model Data manipulation model Data query model Data service model Data transportation model
Data catalogue model Primary storage Binary data files Metadata Syntactic annotation Semantic annotation Classification Domain specific Cross-reference XLink BinX 1.1 BinX 1.1 BinX BinX BinX BinX BinX BinX BinX 1.2 BinX 1.2 BinX 1 BinX 1 BINARY Detailed Abstract METADATA
Data manipulation model Extraction –Subset of a dataset Combination –Merge several datasets Transformation –Conversion of data types –Change of sequence order –Transposition of array dimensions Transparency –Automatic change of byte order
Data query model In-dataset query –XPath against virtual XML Cross-dataset query –Link into multiple datasets Defining result format –XQuery-based return fragment Output interface –SAX events Utility BinX library BinX data source BinX data source DataBinX SAX Events VOTable SAX Events APP VOTable APP DataBinx BinX data source BinX data source APP Custom XQuery SAX Events BinX data source BinX data source XPath BinX data source BinX data source XLink Transform
Data service model Publishing logical datasets in BinX DB Client BinX Grid BinX Dataset from one binary file Dataset from several binary files Dataset from multiple data sources
Data transportation model DataBinX as interlingua XML document XML document DataBinX Schema BinX Schema BinX + Binary BinX + Binary ZIP (MIME) ZIP (MIME) XSLT BinX Util ZIP tool Send Receive XSLT BinX Util ZIP tool
e-Science Data Information and Knowledge Transformation Application in Astronomy Case Study 1 Data Conversion Between FITS and VOTable
Application in astronomy FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … END SIMPLE = T … END <?xml version=. … <?xml version=. …
FITS file SIMPLE = T / file does conform to FITS standard BITPIX = 8 / number of bits per data pixel NAXIS = 1 / number of data axes … END 3D 4A 14 0F 1C FE … … XTENSION= ‘BINTABLE’ / binary table extension BITPIX = 8 / 8-bit bytes NAXIS = 2 / 2-dimensional binary table … END 7B 3E 40 2C E7 6F … … 0 79 Primary HDU Extension Header Data
VOTable Procyon
FITS →DataBinX →VOTable FITS to VOTable conversion DataBinX Utility FITS Schema BinX Schema BinX Preprocessor DataBinX VOTable XSLT transformer
VOTable→DataBinX→FITS VOTable to FITS conversion XSLT transformer VOTable XSLT Preprocessor DataBinX FITS Schema BinX Schema BinX DataBinX Utility Binary Data Binary Data Post processor FITS Header FITS Header
FITS-VOTable experiment Sample FITS file –A data table of 82 rows X 20 fields –File size: 37KB Generated DataBinX by DataBinX utility –Time spent: 268 ms –DataBinX document size: 1.2MB VOTable transformed by MSXML –Time spent: about 1 second –VOTable document size: 51KB F V DB
e-Science Data Information and Knowledge Transformation Application in Astronomy Case Study 2 Data Transportation by pipelining BinX and VOTable
The Problem Three kinds of VOTable data sources –Pure XML VOTable (large) –VOTable + FITS (small) –VOTable + Binary (smaller) Difficulties –Additional parser for VOTable+Binary –Limited binary format –Byte order and data types
The Solution: VOTable + BinX No coding necessary Smaller data files Easy to separate and restore Pipelined to work in the background Platform independent
Approaches 1.Embedded BinX 2.BinX document linking Perhaps another method?
Embedded BinX Example:
BinX Document Linking Example:
Comparison of the two approaches Embedded BinX –Advantages: One annotation file Consistency with VOTable definitions –Disadvantages: Spoil the VOTable document Difficult to parse BinX document linking –Advantages: Keep VOTable clean Easy to parse –Disadvantages: Need separate BinX document Difficult to keep consistent
e-Science Data Information and Knowledge Transformation BinX Software Today and the Future
Future releases Utilities (GUI BinX editor) XPath-based data query DFDL support Text file support Output through SAX events Output as XQuery return Database interfacing Java wrapper for utilities
Support Information and software download: – (coming soon) Questions: Requirements and suggestions: