Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll

Slides:



Advertisements
Similar presentations
Introduction to Java 2 Programming Lecture 10 API Review; Where Next.
Advertisements

E-Science Data Information and Knowledge Transformation BinX An edikt Project Testbed Ted Wen, Robert Carroll, Denise Ecklund, Bob Gibbins, Davy Virdee,
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Datatypes for OGSA Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:
Data formats in e-Science Two key requirements Two key requirements –Interoperability and Scalability –XML is flexible, but verbose –Binary formats are.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation NeSC Review, 30 September 2003.
.NET Technology. Introduction Overview of.NET What.NET means for Developers, Users and Businesses Two.NET Research Projects:.NET Generics AsmL.
Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
® Page 1 Intel Compiler Lab – Intel Array Visualizer HDF Workshop VI December 5, 2002 John Readey
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
E-Science Data Information and Knowledge Transformation The BinX Language.
ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Chapter 10.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Programming Languages Structure
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
BinX and Astronomy Bob Mann Institute for Astronomy and National e-Science Centre.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
SQL Server 2000 and XML Erik Veerman Consultant Intellinet Business Intelligence.
HDF 1 NCSA HDF XML Activities Robert E. McGrath Mike Folk National Center for Supercomputing Applications.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Programming Languages
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
EdSkyQuery-G Overview Brian Hills, December
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
INTRODUCTION TO JAVASCRIPT AND DOM Internet Engineering Spring 2012.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages.
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
E-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation E-Science Centres of Excellence.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
MIAMExpress development October 2002 Mohammad shojatalab
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
ESA UNCLASSIFIED – For Official Use SOIS EDS & Toolchain ESA YGT Study F. Torelli & P. Skrzypek CCSDS Fall Meeting /10/2013.
WEEK INTRODUCTION CSC426 SOFTWARE ENGINEERING.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
1 Metadata Working G roup Report Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK.
- Athena Data Dictionary (28nov00 - SW CERN) Athena Data Dictionary Craig E. Tull HCG/NERSC/LBNL Software CERN November 28,
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
EPICS Development for the ASKAP Design Enhancements Program ASTRONOMY AND SPACE SCIENCE Craig Haskins 18 th October 2015 EPICS User Meeting – Melbourne.
February 8, 2006copyright Thomas Pole , all rights reserved 1 Lecture 3: Reusable Software Packaging: Source Code and Text Chapter 2: Dealing.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
The BinX API eDIKT project team May 2003 Ted Wen Robert Carroll
Digital Data Preservation: a schema-driven model Student: Stacy Kowalczyk Co-Authors: Clare McInerney and Phil Mitchell Digital Data Preservation – the.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
CHAPTER 9 File Storage Shared Preferences SQLite.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
What is FITS? FITS = Flexible Image Transport System
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Introduction to Python
GIFT / Fiscal Data Package Iteration 3
Topics Introduction Hardware and Software How Computers Store Data
Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Presentation transcript:

Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll

Agenda About the BinX project About the BinX project A brief introduction to the BinX language A brief introduction to the BinX language Introduction to the BinX library Introduction to the BinX library Advanced API to the BinX library Advanced API to the BinX library Use cases and requirements Use cases and requirements Dr Bob Mann Dr Bob Mann Dr Chris Maynard Dr Chris Maynard Discussion Discussion

About the BinX project

The problem XML is useful to represent metadata XML is useful to represent metadata Scientific datasets can be too large in XML Scientific datasets can be too large in XML Most scientific data are in binary files Most scientific data are in binary files Binary data files are not all standardized Binary data files are not all standardized Binary data files are platform-dependent Binary data files are platform-dependent

BinX – a solution Initially designed for the Grid environment Initially designed for the Grid environment Annotate data schema for any binary file Annotate data schema for any binary file Data elements are marked up in XML Data elements are marked up in XML Describe three levels of features in a binary file Describe three levels of features in a binary file Underlying physical representation (byte order) Underlying physical representation (byte order) Primitive data types (integer, float) Primitive data types (integer, float) Structure of the dataset (array, table) Structure of the dataset (array, table)

The BinX project at eDIKT Implementing a software library for BinX Implementing a software library for BinX Develop a series of tools based on the library Develop a series of tools based on the library Choose C++ for performance Choose C++ for performance Write portable code for different platforms Write portable code for different platforms Robust and easy to use Robust and easy to use

Development status Requirement gathering from July 2002 Requirement gathering from July 2002 Development started in October 2002 Development started in October 2002 Prototype finished in December 2002 Prototype finished in December 2002 Alpha version complete in April 2003 Alpha version complete in April 2003 Beta version to be released in June 2003 Beta version to be released in June 2003

The deliverables The BinX library The BinX library Compiled code on different platforms Compiled code on different platforms Source code with Open Source license Source code with Open Source license Documentation Documentation Users guide Users guide Developers guide Developers guide Utilities and examples Utilities and examples

The BinX Language

What is BinX? The Binary XML Description Language The Binary XML Description Language A language for annotating binary data files A language for annotating binary data files It describes data types, data structures and attributes such as byte order It describes data types, data structures and attributes such as byte order A BinX document is an XML file with metadata of a binary data file A BinX document is an XML file with metadata of a binary data file

A BinX document Root element Data class section Data instance section Abstract data type

Data elements Primitive data elements Primitive data elements Byte, character, integer, real Byte, character, integer, real Complex data elements Complex data elements Arrays, struct, union Arrays, struct, union User-defined data elements User-defined data elements

Primitive data types Bit Bit Character Character Integer Integer,, Real Real

Complex data types Arrays Arrays Repetitive collection of any data element Repetitive collection of any data element Multidimensional Multidimensional Three types of arrays Three types of arrays Fixed length array Fixed length array Variable-length array Variable-length array Streamed array Streamed array Struct Struct A sequence of data elements A sequence of data elements Union Union One of a group of possible data elements conditional to the discriminant One of a group of possible data elements conditional to the discriminant

Arrays Fixed-length array Fixed-length array Variable-length array Variable-length array Streamed array Streamed array

Struct

Union

User-defined data type

Data elements as instances

Reference defined elements

The BinX Library Alpha version

Fundamental requirements Access to data elements in binary files via BinX Access to data elements in binary files via BinX Parse the BinX document Parse the BinX document Build in-memory data structures Build in-memory data structures Read data values from the binary file Read data values from the binary file Automatic conversion Automatic conversion Byte ordering Byte ordering Padding Padding Producing BinX document and binary data Producing BinX document and binary data Generate BinX document for data structures Generate BinX document for data structures Save assigned data values into binary files Save assigned data values into binary files

General use cases Data conversion (byte order) Data conversion (byte order) Data extraction (sub-dataset) Data extraction (sub-dataset) Data combination (two arrays to one) Data combination (two arrays to one) Data presentation (browse, pure XML) Data presentation (browse, pure XML)

BinX Components The library has core functionality to support generic utilities and applications The library has core functionality to support generic utilities and applications Applications Utilities BinX Library Core BinX core functionality Parse BinX document Read binary data Generic tools Data conversion Extraction Packing/Unpacking Applications Domain-specific

The BinX library core Input: SchemaBinX, binary data file Input: SchemaBinX, binary data file Output: DataBinX, In-memory dataset Output: DataBinX, In-memory dataset … … The BinX library In-memory Data structure (Values loaded on demand)

The BinX Utilities DataBinX generator DataBinX generator DataBinX splitter DataBinX splitter SchemaBinX creator SchemaBinX creator Binary file indexer Binary file indexer

DataBinX generator Put binary data inside XML Put binary data inside XML For browsing, web service return, query result set For browsing, web service return, query result set … … The BinX library

DataBinX splitter The reverse of DataBinX generator The reverse of DataBinX generator Generate binary file for testing, transportation Generate binary file for testing, transportation Cross-platform (byte order) Cross-platform (byte order) … … The BinX library

SchemaBinX creator GUI and Web-based utilities GUI and Web-based utilities Build BinX document interactively Build BinX document interactively Create a BinX document based on another Create a BinX document based on another

Binary file indexer Generating indices for binary data files Generating indices for binary data files Such indices can be used for fast data access Such indices can be used for fast data access … … The BinX library XYXY

Applications for astronomy FITS and VOTable conversion FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … END SIMPLE = T … END <?xml version=. … <?xml version=. …

FITS DataBinX VOTable FITS to VOTable conversion FITS to VOTable conversion DataBinx Utility FITS Schema BinX Schema BinX Preprocessor DataBinx VOTable XSLT transformer

VOTableDataBinXFITS VOTable to FITS conversion VOTable to FITS conversion XSLT transformer VOTable XSLT Preprocessor DataBinx FITS Schema BinX Schema BinX DataBinx Utility Binary Data Binary Data Post processor FITS Header FITS Header

FITS-VOTable experiment Sample FITS file Sample FITS file A data table of 82 rows X 20 fields A data table of 82 rows X 20 fields File size: 37KB File size: 37KB Generated DataBinx by DataBinx utility Generated DataBinx by DataBinx utility Time spent: 268 ms Time spent: 268 ms DataBinx document size: 1.2MB DataBinx document size: 1.2MB VOTable transformed by MSXML VOTable transformed by MSXML Time spent: about 1 second Time spent: about 1 second VOTable document size: 51KB VOTable document size: 51KB

Possible future releases DataBinX parsing DataBinX parsing Utilities (GUI BinX editor) Utilities (GUI BinX editor) XPath-based data query XPath-based data query DFDL support DFDL support Preserving special tags Preserving special tags For comments, application-specific tags For comments, application-specific tags Text file support Text file support

Features or issues to consider Converting floating point numbers Converting floating point numbers 80-bit, 96-bit, 128-bit floating point 80-bit, 96-bit, 128-bit floating point Array manipulation (slice, section) Array manipulation (slice, section) SAX-based XML document parsing SAX-based XML document parsing Use cases in place of DOM parsing Use cases in place of DOM parsing Built in the library or as add-on component? Built in the library or as add-on component? Database support Database support Annotating database tables? Annotating database tables? Query database tables through BinX? Query database tables through BinX? Java version of the library Java version of the library Keeping exactly the same features with the C++ version? Keeping exactly the same features with the C++ version? Supporting XQuery Supporting XQuery Query binary data files with XQuery on BinX Query binary data files with XQuery on BinX

Support For problems of usage: For problems of usage: (coming soon) (coming soon) For requirements and suggestions: For requirements and suggestions: