The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR
Topic Outline lIntroduction/History lCore Data Categories/Featured Datasets lArchive Management/Tools lNew Supporting IT Infrastructure lFuture Possibilities 1/25/2011AMS
Introduction/History lData Support Section (Founded 1965) lPaper -> Punch Cards -> Tapes -> CD/DVD’s ->Hard Drives -> Network Based Storage and Transfer lKB of observations -> Terabytes of Model Generated Data (Total archive volume over 600 TB) lWeeks or months for a user to get data -> Users want data access now (over 7000 registered users) lPay for Data -> Free and open access to all datasets that aren’t subject to source restrictions 1/25/2011AMS
Introduction/History lHow do we evolve to support the growing needs of data users and generators? lStay aware of current research uses lStrengthen datasets supporting core research data categories lUpdate archive management tools lRebuild/Augment IT infrastructure lEducate supporting staff 1/25/2011AMS
Core Data Categories lContent to support atmospheric and geosciences research lSome research examples: lClimate lOceanographic lHydrologic lWeather Prediction lRenewable Energy (Wind/Solar) 1/25/2011AMS
Core Data Categories 1/25/2011AMS Operational and Reanalysis model outputs Meteorological and Oceanographic Observations Remote Sensing Observations Topography/Bathymetry, Vegetation, Land Use
Featured Datasets Platform Observations Dataset TitleCoverageUpdate Frequency NCEP GDAS observations (PREPBUFR and NetCDF)Global 1999 – PresentDaily RDA Upper Air DatabaseGlobal 1920 – PresentMonthly NCDC TD3200 U.S. Cooperative Summary of DayU.S – PresentMonthly Unidata IDD GTS based observations (NetCDF)Global 2002 – PresentDaily NCEP operational observations (ON-29 Format)Global 1975 – 2007Fixed International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Global 1662 – PresentMonthly 1/25/2011AMS Global Platform Observations
Featured Datasets Analysis and Forecast Model Data Dataset TitleCoverageUpdate Frequency Thorpex Interactive Grand Global Ensemble (TIGGE)Global PresentHourly Unidata IDD (GFS 0.5deg, RUC 20km, NAM 12km) Global and Regional Present Daily NCEP ETA/NAM (40km) North America Present Monthly ECMWF Operational Deterministic (1.25 x 1.25 Deg)Global PresentBi-Yearly NCEP GDAS Final Analysis (1x1 Deg)Global PresentDaily NCEP OI Global SST (1x1 Deg)Global PresentWeekly NOAA OI Global SST (0.25 x 0.25 Deg)Global PresentMonthly Hadley Centre Global Sea Ice and SSTGlobal PresentMonthly 1/25/2011AMS Analysis and Forecast Model Data
Featured Datasets 1/25/2011AMS High Resolution Re-Analysis Dataset TitleCoverageUpdate Frequency ERA-40 (T159)Global Static Set ERA-Interim (N128 Gaussian)Global PresentYearly High Resolution Re-Analysis JRA-25 (1.125 Deg Gaussian)Global 1979 – PresentYearly NCEP/DOE (T62)Global PresentStatic Set NCEP/NCAR (T62)Global PresentQuarterly NARR (32 x 32 km) North America Present Quarterly CFSR (0.5 x 0.5 Deg)Global PresentMonthly NOAA-CIRES 20 th CenturyGlobal 1870 – 2008Static Set
Archive Management How can we support an archive that continuously grows in volume and complexity with a fixed number of supporting staff? 1/25/2011AMS
Archive Management lCommon Data Management Tools lFunctionality Requirements lScalable lIntegrated –one call does all lAutomatable 1/25/2011AMS
Archive Management lCommon Data Management Tools lTask Completion Requirements 1.Data acquisition lGet Data (daily or irregularly) 2.Data Archival lArchive to disk and tape 3.Metadata Collection lCollect Metadata lUpdate Metadata Databases 4.Metadata Publishing lUpdate Web Server Pages lUpdate Internal Metadata Access Points 1/25/2011AMS
Integrated Archival Tools 1/25/2011AMS Model Generated Data GRIB, NetCDF Obs Data BUFR, ASCII etc. Obs Data BUFR, ASCII etc. Topography Vector Image, Binary, etc Topography Vector Image, Binary, etc Remote Sensing Data Binary RDA/CISL Servers
Integrated Archival Tools 1/25/2011AMS Model Generated Data GRIB, NetCDF Obs Data BUFR, ASCII etc. Obs Data BUFR, ASCII etc. Topography Vector Image, Binary, etc Topography Vector Image, Binary, etc Remote Sensing Data Binary Model Generated Data Files GRIB-2 DISK HPSS Model Generated Data File dsarch RDA Database File attribute metadata: Name, Dataset, Location, Format File attribute metadata: Name, Dataset, Location, Format
RDA/CISL Servers Integrated Archival Tools 1/25/2011AMS RDA DB Model Generated File, GRIB-2 Format Model Generated File, GRIB-2 Format Temperature (Center, Date, Time, Level, Location) Humidity (Center, Date, Time, Level, Location) Vorticity (Center, Date, Time, Level, Location) Visibility (Center, Date, Time, Level, Location) Precip Rate (Center, Date, Time, Level, Location) File attribute metadata: Name, Dataset, Location, Format File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) Gather Meta data Gather Meta data
RDA/CISL Servers Integrated Archival Tools 1/25/2011AMS RDA Web Server -Dynamic File lists -Data Search tools -Detailed Content Metadata -Data Subsetting Interfaces -Dynamic File lists -Data Search tools -Detailed Content Metadata -Data Subsetting Interfaces CISL Computational Node -Detailed Metadata for files on disk. -Data Subsetting -Detailed Metadata for files on disk. -Data Subsetting RDA DB File attribute metadata: Name, Dataset, Location, Format File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L)
New Supporting IT/Infrastructure lOnline Disk Upgrades lLarger Disk (450 TB) lCommon Disk Interfaces (webserver and compute nodes) lTape Archive Upgrades lHigh Performance Storage System (HPSS) lComputing Power Upgrades lAdditional and more powerful servers 1/25/2011AMS
New Supporting IT/Infrastructure 1/25/2011 AMS Complete User Community Pros: -Fast access to online data. -Access to all RDA metadata. -Access to RDA data. processing services. Complete User Community Cons: -Small fraction of RDA online. -Slow access to offline data. -Data processing requests take a long time to finish. NCAR User Community Pros: -Access to full RDA. -Fast computing. NCAR User Community Cons: -No access to online data. -Forced to use MSS as a file server: access is too slow -No direct access to RDA metadata.
New Supporting IT/Infrastructure 1/25/2011 AMS Complete User Community Improvements: -Faster access to full RDA. -Expanded data processing services available. -Faster turnaround on data processing requests. NCAR User Community Improvements: -Faster access to full RDA. -Direct access to all RDA metadata.
Future Possibilities 1/25/2011AMS lLeverage New IT Infrastructure lServer side parameter and spatial sub-setting across multiple datasets lModel or In-Situ observations lData provided in multiple output formats lWeb services based requests (REST, etc.) lAddition of large and diverse data sets to the RDA.
1/25/2011AMS