Data Quality A Science Community Perspective 17/13/11K. Lehnert, ESIP Panel on Data Quality Kerstin Lehnert Lamont-Doherty Earth Observatory Columbia University.

Slides:



Advertisements
Similar presentations
Rolling Deck to Repository: Transforming the United States Academic Fleet Into an Integrated Global Observing System Suzanne M. Carbotte, Robert Arko,
Advertisements

Goals Rob Procter Dave Berry Anne Trefethen Paul Watson.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
North American Water Program a prospectus nawaterprogram.org P. Houser, Page 1.
OVERVIEW & LIBRARY SUPPORT FOR DATA MANAGEMENT/SHARING Jim Van Loon, MSME/MLIS Science Librarian.
Data Quality and Education Sean Fox SERC, Carleton College.
DARE: building a networked academic repository in the Netherlands ICOLC October 25 Ronald Dekker Delft University of Technology Library.
Uncertainty estimates in input (Rrs) and output ocean color data: a brief review Stéphane Maritorena – ERI/UCSB.
May 17, Capabilities Description of a Rapid Prototyping Capability for Earth-Sun System Sciences RPC Project Team Mississippi State University.
NSF and Environmental Cyberinfrastructure Margaret Leinen Environmental Cyberinfrastructure Workshop, NCAR 2002.
ACTeon Innovation, policy, environment Madrid – WFD Conference April 2006 How to proceed with the Programme of Measures and the River Basin Management.
IFSA 2004 Workshop 5 Combined micro-economic and ecological assessment tools for sustainable rural development in the context of Farming Systems Analysis.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of Pretoria.
Data management for NEES Stanislav (Standa) Pejša, NEEScomm Data Curator
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Jake Blanchard – University of Wisconsin – August 2007.
USGS Data Release ESIP 2015 Winter Meeting Viv Hutchison US Geological Survey U.S. Department of the Interior U.S. Geological Survey.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Organizational Memory: Issues in Design & Implementation Sree Nilakanta May 1, 2000.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Italy: OA repositories, mandates and author’s rights management. Does it really work? Paola Gargiulo CASPUR.
Managing Data: The Long View FORCE15 – 12 January 2015 Amy Friedlander, Ph.D.
DCO's Data Science Day Introduction June 5, 2014, Troy NY Peter Fox (Rensselaer Polytechnic Institute)
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
United States Department of Agriculture Food Safety and Inspection Service 1 National Advisory Committee on Meat and Poultry Inspection August 8-9, 2007.
An Introduction. Aspiration To begin the process of adding significant value to those emerging repositories in which.
GOES Users’ Conference III May 10-13, 2004 Broomfield, CO Prepared by Integrated Work Strategies, LLC GOES USERS’ CONFERENCE III: Discussion Highlights.
Data Management in Scholarly Journals and possible Roles for Libraries – Some Insights from EDaWaX Sven Vlaeminck | Leibniz-Information Centre for Economics.
Kerstin Lehnert Lamont-Doherty Earth Observatory, Columbia University.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
2008 US CLIVAR Summit Phenomena Observations and Synthesis.
Office of Research and Development National Exposure Research Laboratory, Atmospheric Modeling and Analysis Division S.T. Rao Director, NERL/AMAD U.S.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
Biological and Chemical Oceanography Data Management Office slide 1 of 19 CAMEO Data Management Bob Groman Biological and Chemical Oceanography Data Management.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
Diane E. Wickland NPP Program Scientist NPP Science: HQ Perspective on VIIRS May 18, 2011.
The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert
Universiteit Antwerpen Conference "New Frontiers in Evaluation", Vienna, April 24th-25th Reliability and Comparability of Peer Review Results Nadine.
Outcomes of CLIMAR-IV DAVID I. BERRY ETMC-V, 22 – 25 JUNE 2015.
Publishing & Citing Research Data Arun Prakash. Agenda  Introduction  Why is Data publishing important ?  Ongoing Work  Role of Semantics.
Theme 2 Developing MPA networks Particular thanks to: Theme 2 Concurrent Session Rapporteurs, Dan Laffoley, Gilly Llewellyn G E E L O N G A U S T R A L.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Audit Evidence Process
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 Closing the Global Sea Level.
Building a Multi-Year Database of AAG Conference Abstracts André Skupin /Shujing Shu Dept. of Geography / Dept. of Computer Science University of New Orleans.
Data Systems Integration Committee of the Earth Science Data System Working Group (ESDSWG) on Data Quality Robert R. Downs 1 Yaxing Wei 2, and David F.
Title Presenter name Slideshow-URL Conference name Date.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
NIH BioCADDIE / Force11 Data Citation Pilot Kickoff Meeting Nine Zero Hotel, Boston MA, 3 February 2016 Introduction: Tim Clark, Maryann Martone and Joan.
GOES Users’ Conference III May 10-13, 2004 Broomfield, CO Prepared by Integrated Work Strategies, LLC GOES USERS’ CONFERENCE III: Discussion Highlights.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
DATA COLLECTION AND RECORD MANAGEMENT PRESENTED BY: MRS OLUWAFOLAKEMI A. AJAYI DEPUTY BURSAR UNIVERSITY OF IBADAN 5 TH APRIL 2016.
MBA/1092/10 MBA/1093/10 MBA/1095/10 MBA/1114/10 MBA/1115/10.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Data Management: Data Analysis Types of Data Analysis at USGS There are several ways to classify Data Analysis activities at USGS, and here are some of.
© University of Reading 2011www.reading.ac. uk Tracking Earth’s Energy since 2000 Richard Allan University of Reading/NCAS climate Collaborators: Norman.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Rule-Based Approach for Earth Science Metadata Quality Assurance (QA) Tyler Stevens and Ellen Neff NASA’s Global Change Master Directory (GCMD) WYLE Information.
Helmholtz Open Science Webinars on Research Data Webinar 34 – 6 / 11 April 2016 Dr. Birgit Schmidt Niedersächsische Staats- und Universitätsbibliothek.
Chapter 1: Introduction to Econometrics
Users Requirements The inconsistencies between the UR and GCOS-2006 identified in some of the URDs will be reduced with the new iteration of the GCOS.
Data Ingestion in ENES and collaboration with RDA
Persistent Identifiers Implementation in EOSDIS
A Science Community Perspective
Financial Econometrics Fin. 505
The role of metadata in census data dissemination
Table 1. Conceptual Framework Learning Outcomes
Table 3. Standardized Factor Loadings of EFA
Presentation transcript:

Data Quality A Science Community Perspective 17/13/11K. Lehnert, ESIP Panel on Data Quality Kerstin Lehnert Lamont-Doherty Earth Observatory Columbia University Thanks for helpful comments: Mark Ghiorso Ken Ferrier Al Hofmann Alexey Kaplan Roger Nielsen Mohan Ramamoorthy Tom Whittaker

2 DQ & Science 7/13/11K. Lehnert, ESIP Panel on Data Quality2 ScienceTechnology Norms Standards Tools

The Social Side of DQ 7/13/11K. Lehnert, ESIP Panel on Data Quality3 “The reliability of knowledge about climate change depends on the commensurability of data in space and time.” From Paul N. EdwardsPaul N. Edwards: "A Vast Machine": Standards as Social Technology Science, vol. 304, 2004 DOI: /science Matthew Maury's 1858 diagram of the global atmospheric circulation.

4 Earth Science Data 7/13/11K. Lehnert, ESIP Panel on Data Quality4

Error Budgets Diagram from White Paper on the SST Error Budget, produced by the U.S. SST Science Team 7/13/11K. Lehnert, ESIP Panel on Data Quality 5

6 DQ: Instrument Errors 7/13/11K. Lehnert, ESIP Panel on Data Quality6 “Most of the rapid decrease in globally integrated upper (0– 750 m) ocean heat content anomalies (OHCA) between 2003 and 2005 reported by Lyman et al. [2006] appears to be an artifact resulting from the combination of two different instrument biases recently discovered in the in situ profile data.”

“Mantle Myths, Reservoirs, and Databases” Presentation by A. Hofmann at the Goldschmidt Conference 2008 DQ: Precision 7/13/11K. Lehnert, ESIP Panel on Data Quality7

8 What Defines DQ?  “Knowing that I can trust the numbers.”  “Data having an uncertainty that actually corresponds to the uncertainty stated in the the source.”  “In one word, ‘completeness’.” (allows others to assess the validity of data, because then you can check for standards used, techniques, reproducibility, etc.  Reproducibility, precision, … 7/13/11K. Lehnert, ESIP Panel on Data Quality8

9 How Do You Evaluate DQ?  ‘Analytical completeness’, including uncertainties, and metadata.  Statistical tests, internal consistency.  Rely on reputation of the investigator, either directly or by association.  “Well, usually I don't, because that's a lot of work.” 7/13/11K. Lehnert, ESIP Panel on Data Quality9

10 DQ Needs Carrots & Sticks  Tools for DQ metadata management, e.g. capture during data acquisition  Software for using DQ metadata in data analysis, synthesis, modeling  Policies for and enforcement of data & metadata reporting  Peer-review of data 7/13/11K. Lehnert, ESIP Panel on Data Quality10

11 Data Publication  Publication of data in repositories  QC/QA at repository (completeness, consistency)  Open Access  Long-term archiving  Link to scientific articles via unique identifiers  Support for investigators to comply with agency policies 7/13/11K. Lehnert, ESIP Panel on Data Quality11

12 Conclusions (I): Science Community  Needs to define the disciplinary norms for DQ measures  Needs to drive the implementation of disciplinary standards  Policies for data reporting & publication  Recommendations for data acquisition 7/13/11K. Lehnert, ESIP Panel on Data Quality12

13 Conclusions (II): Technology  Needs to translate disciplinary standards to technical standards  Needs to provide software tools that facilitate DQ management (capture, communication, & assessment) 7/13/11K. Lehnert, ESIP Panel on Data Quality13

14 Conclusion (III)  Science and technology need to work closely to develop meaningful solutions for DQ management.  The process needs to take into account the diversity of Earth Science disciplines and data types. 7/13/11K. Lehnert, ESIP Panel on Data Quality14