Download presentation
Presentation is loading. Please wait.
1
A Science Community Perspective
Kerstin Lehnert Lamont-Doherty Earth Observatory Columbia University Thanks for helpful comments: Mark Ghiorso Ken Ferrier Al Hofmann Alexey Kaplan Roger Nielsen Mohan Ramamoorthy Tom Whittaker Data Quality A Science Community Perspective K. Lehnert, ESIP Panel on Data Quality 7/13/11
2
DQ & Science Science Technology Standards Norms Tools
DQ is a shared norm which against the work of scholars is measured. Scientist produce data and use data that are the foundation of theories, hypothesis, and models. The quality of the data impacts the quality of the Before we can start thinking about the development and implementation of metadata schemas and protocols, science communities need to understand the norms Attacking large-scale problems in the Earth Sciences from climate change to convection of the Earth mantle requires scientists to use large datasets , many of which need to be compiled K. Lehnert, ESIP Panel on Data Quality 7/13/11
3
The Social Side of DQ “The reliability of knowledge about climate change depends on the commensurability of data in space and time.” From Paul N. Edwards: "A Vast Machine": Standards as Social Technology Science, vol. 304, 2004 DOI: /science Yet even these changes have not eliminated the importance of disciplined human beings for the successful implementation of standards. For example, in the late 1980s, the U.S. Weather Service replaced liquid-in-glass thermometers with digital electronic ones at thousands of stations in its Cooperative Station Network. The new thermometers displayed Fahrenheit in units of 0.1°. Observers were meant to round off the readings to the nearest degree, but about 10% of observers simply entered the entire figure. Many who did round off probably did so incorrectly (6). Furthermore, the new, more accurate instruments did not correlate exactly with the old ones. Network-wide, the new instrumentation altered the mean daily temperature range by −0.7°C and the average daily temperature by −0.1°C compared with the previous system (7). This example illustrates the complex combination of social and technical problems that affect the implementation of standards. The consequences for the detection of climatic change can be profound: The biases discovered in the U.S. Cooperative Station Network, although correctable, “are of the same magnitude as the changes of global and United States mean temperatures since the turn of the 20th century” (6). Matthew Maury's 1858 diagram of the global atmospheric circulation. K. Lehnert, ESIP Panel on Data Quality 7/13/11
4
Earth Science Data Diversity of data, many disciplines, mostly observational data and model outputs Big data from sensors and sensor networks, standardized data acquisition, scientist is not the producer of raw data, but synthesizes and generates derived products Small data that are generated in the lab or in the field, sometimes not even numerical, acquired by a wide range of methods, often personalized K. Lehnert, ESIP Panel on Data Quality 7/13/11
5
Error Budgets http://www.ssterrorbudget.org/ISSTST/White_Paper.html
7/13/11 K. Lehnert, ESIP Panel on Data Quality Error Budgets Diagram from White Paper on the SST Error Budget, produced by the U.S. SST Science Team
6
DQ: Instrument Errors “Most of the rapid decrease in globally integrated upper (0–750 m) ocean heat content anomalies (OHCA) between 2003 and 2005 reported by Lyman et al. [2006] appears to be an artifact resulting from the combination of two different instrument biases recently discovered in the in situ profile data.” K. Lehnert, ESIP Panel on Data Quality 7/13/11
7
DQ: Precision “Mantle Myths, Reservoirs, and Databases”
Presentation by A. Hofmann at the Goldschmidt Conference 2008 K. Lehnert, ESIP Panel on Data Quality 7/13/11
8
What Defines DQ? “Knowing that I can trust the numbers.”
“Data having an uncertainty that actually corresponds to the uncertainty stated in the the source.” “In one word, ‘completeness’.” (allows others to assess the validity of data, because then you can check for standards used, techniques, reproducibility, etc. Reproducibility, precision, … K. Lehnert, ESIP Panel on Data Quality 7/13/11
9
How Do You Evaluate DQ? ‘Analytical completeness’, including uncertainties, and metadata. Statistical tests, internal consistency. Rely on reputation of the investigator, either directly or by association. “Well, usually I don't, because that's a lot of work.” K. Lehnert, ESIP Panel on Data Quality 7/13/11
10
DQ Needs Carrots & Sticks
Tools for DQ metadata management, e.g. capture during data acquisition Software for using DQ metadata in data analysis, synthesis, modeling Policies for and enforcement of data & metadata reporting Peer-review of data K. Lehnert, ESIP Panel on Data Quality 7/13/11
11
Data Publication Publication of data in repositories
QC/QA at repository (completeness, consistency) Open Access Long-term archiving Link to scientific articles via unique identifiers Support for investigators to comply with agency policies K. Lehnert, ESIP Panel on Data Quality 7/13/11
12
Conclusions (I): Science Community
Needs to define the disciplinary norms for DQ measures Needs to drive the implementation of disciplinary standards Policies for data reporting & publication Recommendations for data acquisition K. Lehnert, ESIP Panel on Data Quality 7/13/11
13
Conclusions (II): Technology
Needs to translate disciplinary standards to technical standards Needs to provide software tools that facilitate DQ management (capture, communication, & assessment) K. Lehnert, ESIP Panel on Data Quality 7/13/11
14
Conclusion (III) Science and technology need to work closely to develop meaningful solutions for DQ management. The process needs to take into account the diversity of Earth Science disciplines and data types. K. Lehnert, ESIP Panel on Data Quality 7/13/11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.