Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.

Slides:



Advertisements
Similar presentations
TDWG, October 19, 2008 Giri Palanisamy, Metadata Management – Mercury (  Current strategy and standards:
Advertisements

Database Management: Getting Data Together Chapter 14.
1 ORNL DAAC: Data and Services Robert Cook and Suresh SanthanaVannan Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Presentation.
Data Management: Documentation & Metadata Types of Documentation.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory 5 th NACP Principal Investigator’s.
U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 6: Manage Quality CC image by Shane Melaugh on Flickr.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
Elements of a Data Management Plan
Best Practices for Preserving Data Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey Data Management Training Modules: Best Practices for Preparing Science Data to Share.
SAFARI 2000 Data Activities at the ORNL DAAC Bob Cook, Les Hook, Stan Attenberger, Dick Olson, and Tim Rhyne Oak Ridge National Laboratory.
From Best Practices for Preserving Data by Bob Cook, Environmental Sciences Division Oak Ridge National Laboratory Module 9.
Fundamental Practices for Preparing Data Sets Robert Cook ORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share.
U.S. Department of the Interior U.S. Geological Survey Planning for Data Management Creating data management plans for your project.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
Why Create a Data Management Plan? Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date Section: Data Management Plans.
Understanding MYP Criteria
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Version 1.0 Review Date.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Getting Ready for the Future Woody Turner Earth Science Division NASA Headquarters May 7, 2014 Biodiversity and Ecological Forecasting Team Meeting Sheraton.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Preparing Metadata Records Suresh K.S. Vannan ORNL, Oak Ridge, TN Viv Hutchison US Geological Survey, Denver, CO
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
Data Management 101 for Earth Scientists Data Management Plans Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
M u l t I b e a m III W o r k s h o p M u l t I b e a m III W o r k s h o p National Geophysical Data Center / World Data Centers NOAA Slide 1 End-to-End.
ARCSS Data Management Support Overview and Update James Moore Steve Williams NCAR Earth Observing Laboratory 3-5 October 2007.
WK 13 - How to Prepare Ecological Data Sets for Effective Analysis and Sharing 2:00 PM-5:00 PM August 1 st, 2010.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
1 NARSTO Quality Systems Science Center Les A. Hook and Sigurd W. Christensen NARSTO QSSC Environmental Sciences Division Oak Ridge National Laboratory.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
ORNL DAAC Semi-Automated Data Ingest Process Daine Wright Suresh Vannan, Tammy Beaty, Bob Cook, Yaxing Wei, Ranjeet Deverakonda, Harold.
Managing and Curating Data Chapter 8. Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Reconstituting the Ocean: a tale from U.S. JGOFS Cyndy Chandler (MCG, WHOI) U.S. JGOFS Data Management Office and Ocean Carbon and Biogeochemistry Coordination.
Biological and Chemical Oceanography Data Management Office slide 1 of 19 CAMEO Data Management Bob Groman Biological and Chemical Oceanography Data Management.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
1 U.S. Department of the Interior U.S. Geological Survey LP DAAC Stacie Doman Bennett, LP DAAC Scientist Dave Meyer, LP DAAC Project Scientist.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Why Create a Data Management Plan? Ruth Duerr National Snow and Ice Data Center Version 1.0 February 2013 Data Management Plans Copyright 2013 Ruth Duerr.
Metadata Training for Gulf Restoration Partners Module 1 – Introduction to Metadata and Metadata Standards.
Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.
Primer on Data Management Data Management Plans Robert Cook Environmental Sciences Division Oak Ridge National Laboratory American Meteorological Society.
RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the.
DOE Data Management Plan Requirements
Metadata ESA Workshop. In this session we will discuss…  Metadata: what are they? and why should they be created?  Metadata standards  Creating metadata.
Data Entry Goal is to accurately transcribe data from data sheets into electronic form –Good database design helps –Validation rules help –Good data sheet.
Data Systems Integration Committee of the Earth Science Data System Working Group (ESDSWG) on Data Quality Robert R. Downs 1 Yaxing Wei 2, and David F.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Data Management Plans: Why Do a Data Management Plan? Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Version 1.0 Review Date.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
SWRCBSWRCBSWRCBSWRCB AB2886 Implementation San Jose Training San Jose Training July 30, 2001 Marilyn R. Arsenault ArsenaultLegg, Inc.
1. 2 NOAA’s Mission To describe and predict changes in the Earth’s environment. To conserve and manage the Nation’s coastal and marine resources to ensure.
Introduction to BODC and GEOTRACES data office Edward Mawji British Oceanographic Data Centre
Understanding the Value and Importance of Proper Data Documentation 5-1 At the conclusion of this module the participant will be able to List the seven.
Standardization Promotes Biogeochemical Data Management and Use in Multidisciplinary Environmental Research Yaxing Wei, Suresh Vannan, Robert B. Cook,
Do You Know Where Your Data Are?
Data Management 101 for Earth Scientists Metadata for your Data
Fundamental Practices for Preparing Data Sets
The role of metadata in census data dissemination
Instructor Materials Chapter 5: Ensuring Integrity
Presentation transcript:

Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory

Data Management 101 for Earth Scientists Roadmap Introduction Metadata Best Practices for Data Management

Data Management 101 for Earth Scientists Benefits of Good Data Management Practices Short-term Spend less time doing data management and more time doing research. Easier to prepare and use data for yourself. Collaborators can readily understand and use data files. Long-term (data publication) Scientists outside your project can find, understand, and use your data to address broad questions. You get credit for archived data products and their use in other papers. Sponsors protect their investment. 3

Data Management 101 for Earth Scientists Metadata Information to let you find, understand, and use the data descriptors documentation 4

Data Management 101 for Earth Scientists Metadata needed to Understand Data The details of the data …. Measurement date Sample ID Parameter name location 5

Data Management 101 for Earth Scientists Metadata Needed to Understand Data Measurement QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS 6

Data Management 101 for Earth Scientists The 20-Year Rule ( NRC 1991) The metadata accompanying a data set should be written for a user 20 years into the future-- what does that investigator need to know to use the data? Prepare the data and metadata / documentation for a user who is unfamiliar with the details of your project, methods, and observations 7

Data Management 101 for Earth Scientists Proper Curation Enables Data Reuse Time Information Content Planning Collection Assure Metadata and Documentation Archive Sufficient for Sharing and Reuse 8

Data Management 101 for Earth Scientists Proper Curation Enables Data Reuse Time Information Content Planning Collection Assure Metadata and Documentation Archive Sufficient for Sharing and Reuse 9

Data Management 101 for Earth Scientists Fundamental Data Practices 1.Define the contents of your data files 2.Use consistent data organization 3.Use stable file formats – Curt Tilmes 4.Assign descriptive file names 5.Perform basic quality assurance 6.Provide documentation 7.Protect your data 10

Data Management 101 for Earth Scientists 1. Define the contents of your data files Content flows from science plan (hypotheses) and is informed from requirements of final archive. Keep a set of similar measurements together in one file same investigator, methods, time basis, and instrument No hard and fast rules about contents of each files. 11

Data Management 101 for Earth Scientists 1. Define the Contents of Your Data Files Define the parameters 12 Scholes (2005) Use commonly accepted parameter names and units(SI Units) Be consistent Explicitly state units Use ISO formats Parameter Table

Data Management 101 for Earth Scientists 2. Use consistent data organization (one good approach) StationDateTempPrecip Units YYYYMMDDCmm HOGI HOGI HOGI Note: is a missing value code for the data set 13 Each row in a file represents a complete record, and the columns represent all the parameters that make up the record.

Data Management 101 for Earth Scientists 2. Use consistent data organization (a 2 nd good approach) StationDateParameterValueUnit HOGI Temp12C HOGI Temp14C HOGI Precip0mm HOGI Precip3mm 14 Parameter name, value, and units are placed in individual columns. This approach is used in relational databases.

Data Management 101 for Earth Scientists Use descriptive file names Unique Reflect contents ASCII characters only Avoid spaces Bad: Mydata.xls 2001_data.csv best version.txt Better:bigfoot_agro_2000_gpp.tiff Site name Year What was measured (abundance) Project Name File Format 4. Assign descriptive file names 15

Data Management 101 for Earth Scientists 16 Source : PhD Comics

Data Management 101 for Earth Scientists 5. Perform basic quality assurance Perform and review statistical summaries Plot data and assess errors No better QA than to analyze data 17

Data Management 101 for Earth Scientists 5. Perform basic quality assurance (con’t) Plot information to examine outliers 18 Model X uses UTC time, all others use Eastern Time Data from the North American Carbon Program Interim Synthesis (Courtesy of Yaxing Wei, ORNL) Model-Observation Intercomparison

Data Management 101 for Earth Scientists 5. Perform basic quality assurance (con’t) Plot information to examine outliers 19 Data from the North American Carbon Program Interim Synthesis (Courtesy of Yaxing Wei, ORNL) Model-Observation Intercomparison

Data Management 101 for Earth Scientists 6. Provide Documentation / Metadata What does the data set describe? Why was the data set created? Who produced the data set and Who prepared the metadata? How was each parameter measured? What assumptions were used to create the data set? When and how frequently were the data collected? Where were the data collected and with what spatial resolution? (include coordinate reference system) How reliable are the data?; what is the uncertainty, measurement accuracy?; what problems remain in the data set? What is the use and distribution policy of the data set? How can someone get a copy of the data set? Provide any references to use of data in publication(s) 20

Data Management 101 for Earth Scientists 7. Protect data Create back-up copies often Ideally three copies original, one on-site (external), and one off-site Frequency based on need / risk Know that you can recover from a data loss Periodically test your ability to restore information 21

Data Management 101 for Earth Scientists DataONE Resources dataone.org Data Management Plans Links to DMP Tool Best Practices Tools Database Everything you wanted to know about data management, but were afraid to ask. Carly Strasser, et al 22

Data Management 101 for Earth Scientists Best Practices: Conclusions Data management is important in today’s science Well organized data: enables researchers to work more efficiently can be shared easily by collaborators can potentially be re-used in ways not imagined when originally collected 23

Data Management 101 for Earth Scientists Questions? 24

Data Management 101 for Earth Scientists References and Resources (continued) Ball, C. A., G. Sherlock, and A. Brazma Funding high-throughput data sharing. Nature Biotechnology 22: doi: /nbt Borer, ET., EW. Seabloom, M.B. Jones, and M. Schildhauer Some Simple Guidelines for Effective Data Management,Bulletin of the Ecological Society of America. 90(2): Christensen, S. W. and L. A. Hook NARSTO Data and Metadata Reporting Guidance. Provides reference tables of chemical, physical, and metadata variable names for atmospheric measurements. Available on-line at: Cook, Robert B, Richard J. Olson, Paul Kanciruk, and Leslie A. Hook Best Practices for Preparing Ecological Data Sets to Share and Archive. Bulletin of the Ecological Society of America, Vol. 82, No. 2, April Hook, L. A., T. W. Beaty, S. Santhana-Vannan, L. Baskaran, and R. B. Cook. June Best Practices for Preparing Environmental Data Sets to Share and Archive. Kanciruk, P., R.J. Olson, and R.A. McCord Quality Control in Research Databases: The US Environmental Protection Agency National Surface Water Survey Experience. In: W.K. Michener (ed.). Research Data Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16,

Data Management 101 for Earth Scientists References and Resources Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16, National Research Council (NRC) Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information. Report by the Committee on Geophysical Data of the National Research Council Commission on Geosciences, Environment and Resources. National Academy Press, Washington, D.C.Michener, W K Meta- information concepts for ecological data management. Ecological Informatics. 1:3-7. Michener, W.K. and J.W. Brunt (ed.) Ecological Data: Design, Management and Processing, Methods in Ecology, Blackwell Science. 180p. Michener, W. K., J. W. Brunt, J. Helly, T. B. Kirchner, and S. G. Stafford Non- Geospatial Metadata for Ecology. Ecological Applications. 7: U.S. EPA Environmental Protection Agency Substance Registry System (SRS). SRS provides information on substances and organisms and how they are represented in the EPA information systems. Available on-line at: USGS Metadata in plain language. Available on-line at: