Science Data Management Implications for the Ethos of Science Ruth Duerr, Mark A. Parsons
Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO On the Ethos of Science A fundamental component of science is trust Society must trust that the outputs of science are accurate and unbiased Scientists must be able to trust the work of others in the field National Academies pamphlet “On Being a Scientist” Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Concepts of the Scientific Method Results should be repeatable Results should be published in peer-reviewed journals Source materials should be explicitly acknowledged Data and information should be available So that the results may be verified by others Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
On the Importance of Data “… a scholar’s contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.” Shared Nobel prize with Golgi on structure or the neurosystem - Santiago Ramon y Cajal, Advice to a Young Investigator (1897) Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
What is the Problem? Digital data has changed the paradigm Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
On the Importance of Data Preservation “Preservation without access is pointless; Access without preservation is impossible!” - heard in the halls of NSIDC, 2004 There are no good business models; funding for preservation is difficult to find Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Digital Data is Difficult to Preserve “digital objects require constant and perpetual maintenance, and they depend on elaborate systems of hardware, software, data and information models, and standards that are upgraded or replaced every few years” NSF and Library of Congress, August 2003 Previously data was small in quantity, and physically could be published in a journal or special report of some kind. The simple act of publishing the data was sufficient to ensure its preservation and availability for many years. Scientists quite often can not provide the level of support needed to ensure preservation; data management has become its own specialty Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO Making Data Available Historically the scientist acquired and published the data directly Now many data sets come from large institutional programs Many datasets are so large that publishing them in a normal journal (even an electronic journal) is not feasible In many cases, there may be several versions of the same dataset Unclear what the role of the investigator who uses such data is in terms of making sure others can find and use the same data s/he did Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Trusting Provided Data Three main components to ensuring the integrity of data: The data must demonstrate scientific integrity The data repository must be trustworthy. The data must not have been altered since creation (or any alterations have been well described) Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Suggestions for Institutional Data Providers Follow the OAIS Reference Model Implement a method to detect any corruption of the data be it intentional or inadvertent (fixity from the OAIS model) Institute peer-review of datasets Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Suggestions for Investigators Publish small datasets Ensure large datasets are transferred to a data center Acknowledgement doesn’t work May not specify which data set was used Citation of an article published by the data provider that describes the data set and its collection May not exist in the peer-reviewed literature May only describe a portion of the data set May not be relevant to this new application of the data May not allow readers to acquire the data and even if it does the information may degrade over time Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Suggestions for Investigators (continued) Use data citations to reference institutional datasets used What is a data citation? A mechanism to properly credit the creator of a data set A mechanism to credit the publisher of the the data set A mechanism to allow your readers to find the data you used in your paper Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Suggestions for Investigators (continued) What do they look like? Like a book or paper reference (see examples below) Hall, D.K., G.A. Riggs, and V.V. Salomonson. 2000, updated daily. MODIS/Terra Snow Cover 5-Min L2 Swath 500m V004, September - December 2003. Boulder, CO, USA: National Snow and Ice Data Center. Digital media. Armstrong, R., J. Francis, J. Key, J. Maslanik, T. Scambos, and A. Schweiger. 1998. Polar Pathfinder sampler: Combined AVHRR, SMMR- SSM/I, and TOVS time series and full-resolution samples. Compiled by S. Khalsa. Boulder, CO, U.S.A.: National Snow and Ice Data Center. CD-ROM. If Digital Object Identifiers(DOI’s) are available for the data, they should be included in the citation. Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO
Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO For More Information About NSIDC in general http://nsidc.org nsidc@nsidc.org About data management or archiving at NSIDC rduerr@nsidc.org (303) 735-0136 Science Data Management and Preservation: Implications for the Ethos of Science Presented April 7, 2005 at the 2005 AAG meeting, Denver, CO