Climate Data Records and Science Data Stewardship: Playing for Keeps Bruce R. Barkstrom National Climatic Data Center NOAA.

Slides:



Advertisements
Similar presentations
Product Quality and Documentation – Recent Developments H. K. Ramapriyan Assistant Project Manager ESDIS Project, Code 423, NASA GFSC
Advertisements

Global Climate Observing System (GCOS) including GRUAN Greg Bodeker Bodeker Scientific, Alexandra, New Zealand Presented at the 9 th Ozone Research Managers.
1 Manufacturing Process A sequence of activities that is intended to achieve a result (Juran). Quality of Manufacturing Process depends on Entry Criteria.
Auditing Concepts.
Department of Industrial Management Engineering 1.Introduction ○Usability evaluation primarily summative ○Informal intuitive evaluations by designers even.
Preserving Cloud Information Bruce R. Barkstrom & John J. Bates NCDC.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Mitigating Risk of Out-of-Specification Results During Stability Testing of Biopharmaceutical Products Jeff Gardner Principal Consultant 36 th Annual Midwest.
Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements Workforce Demand and Career Opportunities From.
Statistical Concepts (continued) Concepts to cover or review today: –Population parameter –Sample statistics –Mean –Standard deviation –Coefficient of.
Alan F. Hamlet Dennis P. Lettenmaier Center for Science in the Earth System Climate Impacts Group and Department of Civil and Environmental Engineering.
Six Sigma Quality Engineering
Recent developments in the UNFCCC process in relation to global observations 4 th GTOS Steering Committee Paris, 1-2 December 2009 Rocio Lichte Programme.
Stephane Larocque – Consulting Practice Leader, Impact Infrastructure A DECISION MAKING FRAMEWORK FOR SUSTAINABLE INFRASTRUCTURE DEVELOPMENT 1 ST INTERNATIONAL.
Quality of Information systems. Quality Quality is the degree on which a product satifies the requirements Quality management requires that : that requirements.
Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part.
CEOS-CGMS Working Group on Climate John Bates, NOAA SIT-30 Agenda Item #11 Climate Monitoring, Research, and Services 30 th CEOS SIT Meeting CNES Headquarters,
Meteorological Observatory Lindenberg – Richard Assmann Observatory The GCOS Reference Upper Air Network.
THEME[ENV ]: Inter-operable integration of shared Earth Observation in the Global Context Duration: Sept. 1, 2011 – Aug. 31, 2014 Total EC.
A Public Trust at Risk: The Heritage Health Index Report on the Condition of Alabama’s Collection.
WP4: Models to predict & test recovery strategies Cefas: Laurence Kell & John Pinnegar Univ. Aberdeen: Tara Marshall & Bruce McAdam.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
WGClimate John Bates NOAA SIT Workshop Agenda Item #8 WGClimate Work Plan progress & Issues CEOS SIT Technical Workshop CNES, Montpellier, France 17 th.
U.S. Department of the Interior U.S. Geological Survey U.S. Department of the Interior U.S. Geological Survey Natural Hazards Science – Reducing the World’s.
GOES-R Support to Future Climate Monitoring Needs Mitch Goldberg Chief, Satellite Meteorology and Climatology Division Office of Research and Applications.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Workshop on Quality Assurance in Geographical Data Production 1 Results of the Survey on Quality Asurance Routines Anders Östman University of Gävle SWEDEN.
Penetration Test
© 2014 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Assessing the Maturity of Climate Data Records
The Challenge of Software Cost Estimation for Long-Term Information Preservation of Earth Science Data Bruce R. Barkstrom Asheville, NC Paula L. Sidell.
M u l t I b e a m III W o r k s h o p M u l t I b e a m III W o r k s h o p National Geophysical Data Center / World Data Centers NOAA Slide 1 End-to-End.
GOES Users’ Conference III May 10-13, 2004 Broomfield, CO Prepared by Integrated Work Strategies, LLC GOES USERS’ CONFERENCE III: Discussion Highlights.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
12/9-10/2009 TGDC Meeting Auditing concepts David Flater National Institute of Standards and Technology
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
NIST Data Science SymposiumMarch 4, 2014 NIST Data Science SymposiumMarch 4, Climate Archives in NOAA: Challenges and Opportunities March 4, 2014.
Pathogen Reduction Dialogue Panel 3 Microbial Testing for Control Verification Robert L. Buchanan U.S. Food and Drug Administration Center for Food Safety.
3/30/04 16:14 1 Lessons Learned CERES Data Management Presented to GIST 21 “If the 3 laws of climate are calibrate, calibrate, calibrate, then the 3 laws.
Alaa Mubaied Risk Management Alaa Mubaied
Chapter 10 Verification and Validation of Simulation Models
Climate data past and future: Can we more effectively monitor and understand our changing climate? Peter Thorne.
Thoughts on Stewardship, Archive, and Access to the National Multi- Model Ensemble (NMME) Prediction System Data Sets John Bates, Chief Remote Sensing.
1 NOAA’s Science Data Stewardship Project: Background, Concepts, and Examples of Climate Data and Information Records (CDRs and CIRs) National Climatic.
The State Climatologist Program and a National Climate Services Initiative Mark A. Shafer Oklahoma Climatological Survey University of Oklahoma.
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 The Influences of Changes.
Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Assessment Findings Validation Title of the Project (date)
Benefit: Cost Ratio David Pannell School of Agricultural and Resource Economics University of Western Australia.
Software Quality Assurance SOFTWARE DEFECT. Defect Repair Defect Repair is a process of repairing the defective part or replacing it, as needed. For example,
Science plan S2S sub-project on verification. Objectives Recommend verification metrics and datasets for assessing forecast quality of S2S forecasts Provide.
Ed Kearns National Climatic Data Center Asheville, NC.
CIOSS Ocean Optics Aug 2005 Ocean Optics, Cal/Val Plans, CDR Records for Ocean Color Ricardo M Letelier Oregon State University Outline - Defining Ocean.
Wildlife Program Amendments Joint Technical Committees and Members Advisory Group Amendment Strategy Workshop.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Testing for equal variance Scale family: Y = sX G(x) = P(sX ≤ x) = F(x/s) To compute inverse, let y = G(x) = F(x/s) so x/s = F -1 (y) x = G -1 (y) = sF.
1. 2 NOAA’s Mission To describe and predict changes in the Earth’s environment. To conserve and manage the Nation’s coastal and marine resources to ensure.
QA4EO in 10 Minutes! A presentation to the 10 th GHRSST Science Team Meeting.
WGClimate The Joint CEOS/CGMS Working Group on Climate Perspective for Cycle#3 Jörg Schulz WGClimate The Joint CEOS/CGMS Working Group on Climate 6th Meeting.
Committee on Earth Observation Satellites John Bates, NOAA Plenary Agenda Item 8 29 th CEOS Plenary Kyoto International Conference Center Kyoto, Japan.
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
Accounting for the Value of Earth Science Data Bruce R. Barkstrom and Paula L. Sidell.
GSICS Procedure for Product Acceptance Update
Auditing Concepts.
THE ENDANGERED SPECIES ACT
CSC 480 Software Engineering
Power Analysis and Meta-analysis
Chapter 10 Verification and Validation of Simulation Models
Presentation transcript:

Climate Data Records and Science Data Stewardship: Playing for Keeps Bruce R. Barkstrom National Climatic Data Center NOAA

Outline What are CDRs –An Example –General Characteristics What’s Involved in SDS –Assuring that the data and context are valuable to the future –Making sure data are ready to preserve –Making sure data and context will be useful –Making sure data and context will survive –Being cost effective

An Example CDR – Solar Constant Original data cover several decades Multiple data sources Work needed: –Physical model of causes of differences –Development of homogeneous data set versions –Estimation of detectable variability and trends

CDR Characteristics Covers long time period (decades or more if possible) Likely to have multiple data sources Every attempt to deal with errors on a physical basis Every attempt to make errors homogeneous over record –Software must have full configuration management –Input data sources should be as homogeneous as possible Intent is to provide –Quantified variability: Cumulative Distribution Functions (CDFs) of parameter variations, not only for global averages, but also regional values and extreme value statistics –Quantification of Change Detection: Ability to test observed CDFs against expected CDFs of potential changes

How Do We Assess the Value of a CDR? 3 Approaches: –Cost of Acquiring CDR –Cost of Reconstruction – if possible Need to have original data, need to assemble hardware and software, need to run (maybe 2 or 3 million jobs) –Present Value of Future Use Economists discount future benefits at 7%

Valuation is Tough OMB Question: Why do we need more than $2B/year for climate? CCSP and CEOS both have had trouble prioritizing Probably two scales of value –Scientific “Value” – represented by “Bretherton Issues” –Societal Benefit – represented by reduction in damage, lives saved, new industries created Quantifying to OMB’s satisfaction is difficult Question 1: Can CI help with justifying priorities?

Good Archival Practice ISO Standard for “What an Archive Should Do for Long-Term Preservation” –OAIS Reference Model Recommendation: –Prepare a Submission Agreement between an Archive and a Data Provider –Evaluate condition and completeness of candidate data and metadata –Plan work required to repair deficiencies SDS Preferred Approach – use “Maturity Model”

Maturity Model Evaluate Maturity 3 ways: –Scientific Maturity –Preservation Maturity –Societal Benefit For Each Axis: –Reduce evaluation to non-dimensional scaling of attributes –Ask for evaluation from experts Question 2: Can CI help with evaluation of maturity?

Work Required to Produce CDRs Evaluation of Available Record for Gaps and Understandability –Gaps –Documentation Evaluation of Candidate CDR Uncertainties –Error Sources Considered –Calibration and Validation Evaluation of Record Repair Work –Gaps –Recalibration –Uncertainty Estimation

Roles of Satellite Data and In-Situ Data In-situ Data Complements Satellite Data –Satellites for coverage – although challenge is getting adequate length of record –In-situ for calibration and validation For Data Stewardship –Need preservation of context: cal-val data preservation, source code, documentation of procedures, metadata –Results of intercomparisons should have measurable improvement in uncertainty

Some Thoughts on Quantifying Impact of In-Situ Data Errors in satellite measurements –Estimates should be based on physical causes –Stewardship needs way of making publically available – and accomodating changes in assessments by community over time –Statistical in nature –Delimited by time interval and spatial region –Most rigorously specified as CDF of error –Might be simply specified in terms of std dev of error about “average” measured value Cal-Val efforts should improve “error bars” –Stringency: ratio of error dispersion about mean after cal-val to dispersion before 1 – no improvement; 2 to 5 – moderate improvement; >10 – really stringent requirement on cal-val Related to number of independent samples in cal-val set –Plausibility: significance of improvement Unsuspicious – p of difference 20%; Somewhat convincing – p ~ 5%; Fairly confident – p ~ 1% Number of iterations in reprocessing –Inversely proportional to experience –Increases with required stringency and plausibility Question 3: Can CI help evaluate proposed In-Situ Validation Data Sets for Error Reductions, Stringency, and Plausibility?

The Odds for Long-Term Preservation Preservation inclines one toward pessimism –If p is annual probability of survival and –N is number of years to survive –Probability of survival is (1 – p)**N –To have 99% probability of survival for 200 years, requires p = 5. E -05 Standard approach to reducing risk –Assess mechanisms of loss –Quantify annual probability of loss and probable value of loss [note return to valuation issue] –Find affordable risk mitigation approach

Science Data Stewardship: What are the Odds Important Risks –IT Security Incidents 10% per year probability; maybe 10% of collection at risk of corruption (p = 1%/yr – need dispersion acrosss systems) –Operator Error 10% per year probability; loss depends on time operators work and degree of automation (p = 1%/yr – need QA) –Hardware or Software Error 5% per year probability; loss as in operator error –Hardware or Software Obsolescence 100% probability of loss in 5 to 10 years (p = 20%/yr) Suggests treating expenses of hardware and software replacement as “insurance expenses” – not assets

Science Data Stewardship: How Do We Improve the Odds? SDS will require several new things: –Making the history and details of data provenance public (anything proprietary dies) –Capturing now-tacit knowledge before it disappears (knowledge not captured dies when the knower retires, gets sick, or dies) –Creating methods of tracing the evolution of data, metadata, and assessments of same Expectation: SDS grants program provides avenue for bringing in ideas that –improve information survivability –reduce cost of archival –make data and context more useful for those that come after If we don’t succeed, we’ve all been publishing in The Journal of Irreproducable Results