Scientific Investigations; Support from Research Data Archives for Computing in Atmospheric Sciences October, 2001 Steven Worley National Center for Atmospheric Research Scientific Computing Division
Key Steps of Scientific Investigations Formulate the questions and review the state of understanding Search and discover data Access data Analyzes data Community sharing and archive Document new understandings
Search and Discover Data How? Web based Information Server Salient Features –2.5K + html pages (metadata) –All datasets are described (500+) –Location of all data files in MSS –Higher level information Catalogs Project specific descriptions Always current dataset descriptions
Features Organization Navigation Archive Navigation Pull down menus Search Project Links
Dataset Page Title and Brief description Systematic Navigation Metadata highlights Period of Record Usage Variables Related Sites (NOAA) Contact Person Related Datasets
Brief Archive History and Specifications Started in middle 1960’s, (35 years) Managed by nine people 211K data files 17 TB in a MSS 530 datasets – all sizes
Global Observations P.O.R# YrsIncep. DateComments Rawinsondes1946-on551967Upper Air Pibals1942-on Upper Air, wind Aircraft1947-on USAF and Commer. Sat. cloud wind drift1967-on GOES and GTS Satellite Soundings TOVS + irradiance Surface Synoptic1948-on some much older Ocean Surface1794-on COADS Usages: Input for global atmospheric reanalysis Basic long term climate assessment and case studies
Operational and Composite Analyses Daily SLP is a small but very popular dataset, e.g. NAO evaluations Two main operational centers provide the best current analyses
Key Aspects Medium size archive – 170 Gigabytes multi-(product, temporal res., spatial res.) - complex Concerns; Restricted distribution U.S. non-profits and UCAR members only Need online authentication and authorization for easy access
Highlights Frequent updates to FNL, 1º, daily via FTP High resolution N. America product, ETA at 40km No distribution restrictions or cost
Reanalyses P.O.R# YrsIncep. Date NCEP/NCAR Reanalysis I / ECMWF ERA NCEP Reanalysis II / Notes: ERA-15 is finished, ERA-40 is running now NCEP II, primarily experimental run
Outstanding Features Three different coordinate surfaces Very long analysis, 2+ Terabytes size Unrestricted distribution CD-ROMS are very popular
Countries Receiving Reanalysis CDROMs Highlights Over 8900 CDROMs /2001 Recipients; U.S. 46%, Japan 11%, (Canada, UK) 4%, (Germany, India) 3%, (Australia, S.Korea, Spain, Mexico, Norway, Russia, France) 2%
Reanalysis Users for 2001 (4 th qtr estimated) 209 From the MSS [157 Jan.-Sep.] 47 On CDROM [35] 48 Custom data orders on FTP or Tape [36] 540 From the online server [406] 844 Total Served
Reanalysis Data Distributed for 2001 (4 th qtr estimated) 9616 GB from the MSS [7230 GB Jan.-Sep.] 808 GB On CD-ROM 1383 GB Custom orders, FTP and tape [1040] 88 GB From the online server [66 GB] GB, 11.9 TB Total
GCIP Model Data Center Collection High resolution atmospheric models focused on energy and hydrology cycles. GCIP: GEWEX Continental-Scale International Project / GEWEX : Global Energy and Water Cycle Exper. Critical data for N. American mesoscale studies Complete archive is about 1 Terabyte Eta –NCEP3 hr40 km 25 lvs 5/1995 – 7/2001 MAPS – FSL NOAA 3 hr40 km 5 lvs 8/ /2001 GEM – Canadian 6 hr41 km 28 lvs 4/1997 – 6/2001
Ocean Model Data MICOM; Miami Isopynic Coordinate Ocean Model, 1/12 th degree 70N to 28 S, layers COADS Clim. Forcing 6 yrs305 Gigabytes ECMWF Clim. Forcing 2 yrs164 Gigabytes ECMWF Daily Forcing 5 yrs415 Gigabytes ( ) University of Miami 6-yr Mean T at 5 meters
Dataset Sizes and Scales Today –~ 800 Unique users –~ 12 Terabytes data transferred –2 Terabyte dataset size –Example: NCEP/NCAR Reanalysis Near Future Excludes TB-PB Level 0 and 1 satellite and the super scale experimental models –Numbers of Users, ~ same –Data transferred, 5x to 10x more ? –Dataset size, 2-20 TB –Examples: Ocean and Atmosphere models ECMWF Reanalysis (ERA40)
Access to Data Methods NCAR computers –From the local MSS Web data server Custom data packages – by request (FTP, tape, CDROM) Users World class programmer Research Scientist Graduate Students Undergraduate Students
Data Access in the future Do we continue doing what we are doing? “Absolutely” Why? It Works –Over 1000 users annually Very diverse skills –The archive is a heterogeneous collection Many formats (ASCII, Binary, GrIB, BUFR, netCDF, HDF) Many sizes (1 MB to 2 TB) –Capable of serving large and small projects Maintain a variety of flexible methods
Data Access in the future Keys to handling future larger collections –Plan to create useful data products Condensed datasets from high resolution output Group most popular variables products together –Serve many, e.g. CDROMS and WWW –Continue to develop emerging online data systems User driven subset selection with graphics and data download options Server-side elementary analysis –Multi-dataset comparisons –Statistical summaries and basic meteorological calculations –Our development is the “Community Data Portal”
Data Analysis Tools –NCAR Command Language (NCL) software Features in brief –I/O for many ‘standard’ data formats –Easy adaptations to read any format –100’s meteorological functions –“Publication quality” graphics –The CDP is capable of analysis NCL is one of several middleware packages
Community Sharing Support for the scientist –A place to distribute new data results Possibly with authentication and authorization control E.g. model outputs –Spin off benefit New data resources for the archive Many users can then use new product
NCEP Operational Analyses blended with QSCAT Satellite data Wind Stress Curl, 01/24/ UTC a)NCEP Operational ONLY b)NCEP + QSCAT swaths c)OI blend of NCEP + QSCAT Blending by Colorado Research Associates We archive all three products. ab c
Key Steps of Scientific Investigations Formulate the questions and review the state of understanding Search and discover data Access data Analyzes data Community sharing and archive Document new understandings