Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Curation in Climate and Weather

Similar presentations


Presentation on theme: "Data Curation in Climate and Weather"— Presentation transcript:

1 Data Curation in Climate and Weather
Clifford Jacobs National Science Foundation, US Steven J. Worley National Center for Atmospheric Research, US Joseph L. “Joey” Comeaux II Data Curation in Climate and Weather

2 Outline Sustainable Data Curation NCAR Research Data Archive
Examples of Sustainable Data Curation Atmospheric Re-analyses TIGGE Summary

3 Sustainable Data Curation
Stable Funding Enriched Staff Knowledgeable Consistent Levels Robust Storage Backup Plans Data Formats Partnerships

4 Sustainable Data Curation
Focused on Data Management Not project specific Allows flexibility Necessary to keep curated collection viable Stable Funding Knowledgeable and educated in the specific discipline Important for checking integrity of data Choosing organization of data Creating adequate meta-data Designing access system and assisting users Consistent Staffing Levels Dedicated to best practices in archiving and stewardship Great deal of knowledge held by staff, regardless of documentation Value of human based knowledge cannot be under-estimated We find ~10 years is good Staff

5 Sustainable Data Curation
Capable of meeting growth needs NCAR -> tape based Mass Storage System (MSS) Size > 2x every 2.5 years Currently > 6PB Must be able to handle data migration across generations of media (oozing) Tapes size in MSS : 20GB -> 60GB -> 200GB -> 1000GB Oozing must not interrupt normal, day-day operations Provide access speeds able to handle daily curation and stewardship activities Robust Storage Facilities Loss of data attributed to 2 general causes Environmental -> Fire, Flood, Earthquake…. Equipment -> Drive failures, Tape deterioration Resolution Store copies of irreplaceable data at separate facilities Backup copies of data should be stored on different drives/tapes than originals Backups

6 Sustainable Data Curation
Ensure data access for long term Fully documented to the byte level Non-proprietary Practices to avoid Formats should not be dependent on OS, hardware or applications Latest/Greatest formats not always best for your situation Format World-wide sharing of data thru unrestricted open access provides research opportunities that are greater than any one center can provide National and international No single institute can “do it all” Most users “need/want it all” Good way to share some costs Partnerships

7 NCAR Research Data Archive (RDA)
Reference datasets maintained for use by research community Receives high level of curation and stewardship Primarily Meteorological and Oceanographic datasets > 200 person-years invested in RDA RDA managed by 8 staff members 246 TB (currently) 580 datasets (~ new datasets added annually)

8 Contents of the RDA 580 datasets

9 Comeaux/Worley/Dattore - SCD/DSS
10/1/2019

10 Atmospheric Re-analyses TIGGE
Case Studies Jacobs/Worley/ Comeaux – NSF/NCAR 10/1/2019

11 Reanalysis Projects Prime example of data curation and stewardship
Encompass all 6 major aspects of good data curation Main feature of the RDA and have been a very valuable resource for a wide variety of climate and weather studies Jacobs/Worley – NSF/NCAR 10/1/2019

12 Input Data Sources for ERA-40 Reanalysis

13 Assimilation Model System
Process data for long periods Uses most current system available at time reanalysis is run Critical aspect is that Model System does not change during process Any detected changes or trends in the atmosphere cannot be attributed to changes in the model Radiosondes Surface Data Oceanographic Satellite Assimilation Model System Surface Atmospheric Observations Gridded Analyses

14 Most Current Reanalysis Projects
Name Temporal Range Highest Resolution Start End Horizontal Vertical NCEP/NCAR 1948 Ongoing 6 hours 209 km 17 Plvl NCEP-DOE 1979 ECMWF ERA-40 1957 2002 125 km 23 Plvl NCEP NARR 3 hours 32 km 29 Plvl Japanese JRA

15 The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble - TIGGE Goal : Accelerate improvements in 1-day to 14-day high impact weather forecasts

16 TIGGE Output from daily numerical weather model runs and forecasts provided 10 Data providers 3 Archive centers (NCAR is one) 1.6 Million fields and 240 GB/day received at NCAR (~ 87 TB/year)

17 TIGGE Dataflow Between Archive and Provider Centers
NCAR NCEP CMC CPEC UKMO Meteo-France KMA JMA BoM ECMWF CMA

18 Major modern challenge for sustainable data curation
Real time data aspect Data volume Recovery from network outages, production delays, power down events spread thru 10 production centers and 3 archive centers Providing access to such a large and dynamic archive

19 SUMMARY Important Factors in Sustainable Data Curation
Stable Funding Enriched Staff Robust Storage Backups Formats Partnerships

20 Questions and/or comments
Thank you Questions and/or comments Jacobs/Worley/Comeaux – NSF/NCAR 10/1/2019

21


Download ppt "Data Curation in Climate and Weather"

Similar presentations


Ads by Google