ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Topics Environment setting Data management tools and principles ICOADS NCAR Release 2.5 contributions Background Collections Future Challenges
Environment Setting ICOADS is part of a larger collection called the Research Data Archive (RDA) RDA – briefly 600+ datasets (atmosphere, ocean, geosciences) 4.3M files, 462 TB (primary data) unique users annually, including ICOADS Staff, 7 scientific programmers (M.S. degrees), me, and administrative assistant
Data management principles Always archive 2 copies of observational data 3 rd copy at a partner center (disaster recovery) Free and open data access world-wide Internet Past – other media, cd-roms, tapes, etc. Share what we have to build archives E.g. Digitization of Maury data in China in exchange for global land surface data
Data Management Tools Old System: Specialized Software to manage each data input. Inefficient Difficult to Scale RDA Metadata Database RDA Metadata Database Unidata Server University Server NWP Server NWP Server Online Disk Tape Storage GCMD Metadata Server GCMD Metadata Server RDA Data Server Specialized Software Package 2 Specialized Software Package 3 Specialized Software Package 1 New System: Common RDA tools that homogenize data management. Efficient Scalable RDA Data Management Common Tool Set
Data Management tools – a few details Common scripting structure to do routine dataset updates (dsupdt) Very tunable Frequency, multiple server priority list, validation Fully integrated with RDADB Users view is automatically update and therefore always current Common single archiving function (dsarch) location and copy control (MSS/HPSS storage, and online disk) Fills all DB entries (e.g. file and dataset relationships)
Data management tools Harvest file level metadata (gatherxml) Handle various formats (GRIB1, GRIB2, netCDF, BUFR, IMMA, ON29, etc.) Save as and populate DB Benefits Problem detection Versioning, replacement, extension Inventory information Drive better data service for users
Data management tools Provide access to data in tape storage archive (dsrqst) Relatively new, not universally available across RDA - yet Delayed mode – with DB control (many details) Why – RDA holds 462 TB 40 TB online – most popular small scale products Access to more products for greater community
ICOADS Release 2.5 NCAR Data Preparation – format evaluations, translate native formats to IMMA format Moored research buoy delayed mode archives TOA, PIRATA (PMEL, JAMSTEC) World Ocean Database 2005 Multiple ocean profile types (NODC) Receive/archive ICOADS data processing results NOAA/ESRL does processing - source merging, duplicate elimination, preconditioning deletion and fixes, etc.
ICOADS Release 2.5 NCAR Create and maintain user data access interfaces File access IMMA and binary (observations, monthly summary statistics) Sub-selection (time, space, parameter) Example coming. Output is ASCII tabular format Runs automatically – nearly all requests completed in 10 minutes Keep user metrics
ICOADS Release 2.5 NCAR Near-term preliminary extensions to R2.5 Beginning with data in 2008 and forward Based on NCEP GTS compilation/merge Runs on day 2 of each month – processes previous month. Create IMMA observations and binary monthly summary statistics Harvest file level metadata Do all archiving of original and processed files Automatically, update user interfaces
Brief drive through NCAR
World-wide User Access
File Level Metadata – ICOADS IMMA Example
8 pages of information like this
A look at 2009
What is happening in 2009?
World-wide User Access
Similar service for the monthly summary statistics
Who uses the sub-setting interfaces? Countries
Background Collections Historical Most complete set of ALL source data used to create ALL ICOADS Releases Beginning in mid-1980s Copies of ALL ICOADS Releases We do not delete any files
Background Collections Ongoing / Routine data receipts Format conversions are done at NCDC DescriptionSourceFrequency Marine Surface GTSNCEP (BUFR)Monthly Marine Surface GTSNCDC (IMMA)Monthly SEASNCDC (IMMA)Monthly KeyedNCDC (IMMA)Monthly (nominally) GCCNCDC (IMMA)Quarterly (nominally) VOSClimNCDC (IMMA)Monthly
Future Challenges Eliminate user interface dependency on java applets – deploy java script instead. Support “advanced” ICOADS initiative Bias adjusted / corrected observations Serve as a central DB / handle data ingest Build a user interface Continue as a full U.S. partner.