Growing and Future Datasets in the SCD Research Data Archives for NSF SCD Review Panel 16 October 2001 Steven Worley Scientific Computing Division Data Support Section
Outline Introduction Extant Growing Archives Data Archive Assistance for the Research Community Future Archives
Introduction Components of a good archive –Maintained in a reliable system (MSS) –Clear and concise information interface Complete discovery metadata –Convenient data access for many users Local computing platforms Transfer to remote computing platforms –Consultants for assistance Guidance to the best products, sometimes within multi-product complex collections
Components, continued –Underlying archive with rich content Many historical reference datasets - have 100’s of these, but not discussed here Relevant new and frequently updated datasets Focus today: Growing and Future datasets
Global Observations P.O.R# YrsIncep. DateComments Rawinsondes1946-on551967Upper Air Pibals1942-on Upper Air, wind Aircraft1947-on USAF and Commer. Sat. cloud wind drift1967-on GOES and GTS Satellite Soundings TOVS + irradiance Surface Synoptic1948-on some much older Ocean Surface1794-on COADS Usages: Input for global atmospheric reanalysis Basic long term climate assessment and case studies
Operational and Composite Analyses Special analyses were discontinued when global operational analyses became very good Daily SLP is a small but very popular dataset, e.g. NAO evaluations
Highlights up to date, 1985 – June 2001 different temporal resolutions, 6 hr to 1 mn different spatial resolutions, ~ 1 degree to 2.5 degree many atmospheric levels and variables Details and Drawbacks Distribution Restriction; U.S. non-profits and UCAR members only. Cost, increasing and unpredictable $11K in 1999, $16K in 2000, $19K in 2001 We get only modest resolution (T106, N80), T319 and N256 are available – again cost is an issue.
Highlights Very current, FNL 1.0 is done daily High resolution N. America, ETA at 40km No cost or distrib. restrictions from NCEP - GREAT
Reanalyses P.O.R# YrsIncep. Date NCEP/NCAR Reanalysis I / ECMWF ERA NCEP Reanalysis II / Notes: ERA-15 is terminal, ERA-40 is under development now NCEP II, experimental run (testing new data and schemes)
Outstanding Features Three different coordinate surfaces Very long analysis Unrestricted distribution CD-ROMS are very popular
Countries Receiving Reanalysis CD-ROMs Highlights Over 8900 CD-ROMs /2001 Top 13; U.S. 46%, Japan 11%, (Canada, 4%, (Germany, 3%, (Australia, S.Korea, Spain, Mexico, Norway, Russia, 2%
Other Users, Jan.-Sep Received CD-ROMs 80 Custom data orders (FTP, tape) 906 Data downloads from the online server 406 taking more than one file (66 GB) 678 Served
Data Archive Assistance for the Research Community Who is the Research Community –International –University –Other UCAR Research Programs Why? –Helps others achieve goals –Provides additional resources at NCAR –Can lead to future opportunities
International collaboration GCIP Model Data Center High res. atmos. models focused on energy and hydrology cycles – many surface boundary layer data. GCIP: GEWEX Continental-Scale International Project / GEWEX : Global Energy and Water Cycle Exper. Critical data for N. American mesoscale studies Complete archive is approx. 1 Terabyte JOSS/UCAR has many of the GCIP observations Eta –NCEP3 hr40 km 25 lvs 5/1995 – 7/2001 MAPS – FSL NOAA 3 hr40 km 5 lvs 8/ /2001 GEM – Canadian 6 hr41 km 28 lvs 4/1997 – 6/2001
University collaboration MICOM; Miami Isopynic Coordinate Ocean Model, 1/12 th degree 70N to 28 S, layers COADS Clim. Forcing 6 yrs305 Gigabytes ECMWF Clim. Forcing 2 yrs164 Gigabytes ECMWF Daily Forcing 5 yrs415 Gigabytes ( ) U. Miami, Ocean Model Data Why? Needed help getting the data to users How? –Web order interface –Automatic processes to stage the data from the MSS and create subsets –Data then staged for FTP pickup 6-yr Mean T at 5 meters
UCAR research collaboration GTS: GPS Science and Technology Program, leveraging GPS satellites for science SuomiNet Data from GST GPS satellite signals at receivers Estimate integrated water vapor Total electron count (strato.) Why? The GST project is focused on real-time data capture and provision We want to preserve the archive for long term studies in the future Receiver Sites How? GST staff stage data to the MSS SCD staff perform archive maintenance and access
UCAR research collaboration Support real-time data services Unidata’s (UCAR) role –Runs full capacity IDD/LDM application –Serves ± 20 universities downstream SCD’s role –24 hr x 7 day operation monitoring –Routine system back up –Data archive backup and maintenance Covers one year – done daily 2.3 GB per day to the MSS Observations, NCEP model data, wind profilers
Future Data Collection Rescue of African Data (NOAA/FSU) –Upper air and surface data –Digitize and save magnetic tapes –Why? Improve coverage Rescue of Russian Ocean Observations (NSF) –Digitize six million marine surface data, –“Cold War” archive – global coverage, 60’s and 70’s Map reanalysis model QC back onto original data –Merge many sources of observations together Many others –E.g. more global river flow data Observations;
Future Data Collection ECMWF’s ERA-40 – –SCD has provided observational datasets –Three time periods computed at once, completion approx. late –T159 (~82km), 60 levels, and 6 hr resolutions –15 TB of data –Cost = $100K, and restricted distribution Reanalyses;
Future Data Collection NCEP N. America Mesoscale –Probably, 30 km, levels –Starting late 2001, maybe later – Based on the ETA model Next U.S. Global Reanalysis –Based on NCEP experiments and ERA40 results –Might be under a combined NASA and NOAA project Reanalyses; Generally: Stay informed about science activities, sometimes through participation, and then collect new data sources as they emerge.
Key Summary Points Update many archives to support research –Observational collections –Operational Analyses –Reanalyses Collaborations to promote great science –International –University –NCAR/UCAR Have plans to collect emerging new data resources