1
Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak
Gridded Data Sub-setting Services Through the RDA at NCAR Research Data Archive (RDA) Overview Problem Background Required Infrastructure Current Services Future Directions 3
RDA Overview Total archive volume over 1.3 PB unique users annually 4 Meteorological and Oceanographic Observations Operational and Reanalysis model outputs Remote Sensing Observations Topography/Bathym etry, Vegetation, Land Use
Problem Background 5 Data Volume
Problem Background Large computational/storage resources needed –Store data –Extract desired data from large grids/files –Convert data to desirable format(s) 6 Scientific data centers have these resources Individual researchers generally don’t
Problem Background Goals –Make data more accessible and easier to use for individual researchers Reasonable access volumes Desired data formats User defined parameters/grids 7 Researchers stay focused on research
Required Infrastructure 8 Powerful Computing NCAR HPC/DAV Large Disk Storage (500 TB) Rich and Detailed Metadata Databases (RDADB) Generalized Software Tools -Control system (RDAMS) -Sub-setting -Format conversion Web Interface Command Line Interface
Required Infrastructure Rich Metadata Databases (key ingredient) 9 Metadata DB File attribute metadata: Name, Dataset, Location, Format File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) Drive Interfaces Support Efficient Backend Processing Provide Scalability
Current Services 10 Sub-setting available on 13 datasets –ERA-I, CFSR, Operational Model, EaSM –Also available on select observation sets Sub-setting options –Parameter selection –Spatial region selection (limited availability) Available output formats –Native GRIB formats –NetCDF format
Current Services 11
Current Services 12 Sub-set requests Processed in delayed mode User notified by when request is ready Download data via server provided wget scripts
Current Services 13
Current Services 14
Future Directions Spatial Interpolation Faster Request Processing (NWSC) Include More RDA Datasets Improved Access Portals Additional Output Formats Web Service Access 15
Summary Data Analysis Research Challenges –Large and Growing Data Volumes –Numerous Formats RDA – Supply “User Friendly” Data –Parameter and Spatial Sub-Setting –Format Conversion –Improved and Additional Services 16