Usability Issues Facing 21st Century Data Archives Joey Mukherjee and David Winningham
Current Archiving Goal MissionTeam Raw Data Processed Data Write Papers Data Iteration Quality Data Archive Future Scientists Quality Data
Current Archiving Reality MissionTeam Raw Data Processed Data Write Papers Data Iteration Data Subsets Permanent Archive Future Scientists Unchecked Data Home Institution Archive Public Data
New Goal MissionTeam Raw Data Processed Data Write Papers Data Iteration Processed Data Archive Future Scientists Processed Data
Standardizing HOWTO Make it easy Make it useful Make it extensible
Make it Easy Reading / writing files must be super easy (i.e. cheap!) –Either with tools or libraries Tools can be command line or GUI
Make it Useful How do I look at it? –Plots/Analysis What else can I do with it? –Read into IDL, Matlab, Excel, etc. Must have immediate benefits
Make it Extensible Must be possible for others to add value added services Must be able to hold varieties of data Must agree to give up control on content
Case Studies: HTML Easy to create! Once done, look at in browser Embrace / Extend
Case Studies: SPASE Creation is slow and difficult Once created, no real benefits yet VxOs have embraced, no one extended yet
Case Studies: IDFS Until recently, difficult to create, complex Once in, easy to look at, use, archive, etc. Somewhat extensible
Things right with IDFS Efficient Self documenting Calibrations stored in text file Science units derived instead of stored Little to no reprocessing ever needed
Other IDFS Benefits Can store most types of space physics data from raw telemetry to highly processed science units Reversible from science units to raw telemetry Usable by data processor, scientist, and data archiver
Things wrong with IDFS Overly complex format and API Not enough support in other tools - poor buy-in Analysis routines merged with the file format - tried to do too much!
Implementation Plan Develop a simple file format that can contain any and all types of time series space physics data Develop tools that allow someone to create and inspect files in this format Merge in the best parts of IDFS, CDF, netCDF, HDF, FITS, etc... without breaking paradigm of simplicity
Simple File Format Format might already exist: –HDF5 –XML –JSON –Other data models?
Making it useful Get buy-in from visualization tools (SDDAS, DataShop, VisBard, IDL DLM, etc.) Get buy-in from archives sites (PDS, PSA, NSSDC, etc.) Seed money is essential
Advantages Providers Users Management
Advantages: Providers Instrument teams now have something to work toward Can develop expertise
Advantages: Users Quick ways to create plots or access data Expertise again!
Advantages: Management Homogenous archives are infinitely easier to manage and maintain Value added services are a natural extension of quality archives
Conclusion Why now? Because SPASE is gaining traction, this is the next logical step. This will save money for everyone in the long run. Everyone benefits with value added services.