SUMMARY Jane Russell Perot Systems Corp & NASA/GSFC
KEYNOTE from Ray Walker A PERSISTENT DREAM A global data environment in which all Earth and space science data are organized in a common way with “one stop shopping” for any data product. GOALS Help scientists locate data required for a given study. Provide scientists with access to those data. Assure that those data are useable. Preserve the data forever. Aid scientists in using the data.
CHALLENGES Data are found worldwide. Science may require data from multiple sources. Missions & instruments are more complex. Data volumes are increasing. Data complexity is increasing. –Not all flat files, images –Now databases, animations, what next?
MORE CHALLENGES User wants/needs evolving from “just the data” to high level products, correlated searches and usable tools to process and manipulate data Put it on a shelf vs curation Involvement from conception vs data falling over the transom Training our communities archiving is a required function, not just an option Metadata curation can be mostly automated, but not completely Community wide standards, key is metadata Identifier – journals, pubs, societies
& MORE CHALLENGES Large vs Small (& mission) data centers Domestic vs International Long term preservation –Government vs university –Hardware/technology evolves –Software won’t rust, will go bust with new systems –Human troubles, operators & hackers Emphasize service not formats for providers
QUESTIONS Centralized vs distributed? Archive like a Scientist or a Librarian? Metadata, when is enough enough? To archive higher level products, curate data or curate s/w to create them?
COST CONSIDERATIONS Evolution vs Revolution – remember revolutions are expensive Need leverage -- compliance needs to be contractural obligation mandated by funding agency.
INFO & LESSONS LEARNED Audits & Checklists How to Morph an Archive Goal should be a standard format, e.g. html Persistent Dataset Identifiers & Bibcodes Data Grids ESAC Document libraries
QUOTES Clarity comes from usage. What about the unborn users? So far I’ve learned “centralized” is bad. It’s the metadata, stupid. They’ve been very flexible – for engineers. Science would be much better if you didn’t have to mess with formats. It’s very hard to produce well-documented data.