U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share.

Slides:



Advertisements
Similar presentations
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Advertisements

Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Data Management: Documentation & Metadata Types of Documentation.
Publishing your paper. Learning About You What journals do you have access to? Which do you read regularly? Which journals do you aspire to publish in.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory 5 th NACP Principal Investigator’s.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Best Practices for Preserving Data Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey USGS Data Management Training Modules: Value of Data Management “Data is a precious thing and will.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
U.S. Department of the Interior U.S. Geological Survey Data Management Training Modules: Best Practices for Preparing Science Data to Share.
From Best Practices for Preserving Data by Bob Cook, Environmental Sciences Division Oak Ridge National Laboratory Module 9.
Fundamental Practices for Preparing Data Sets Robert Cook ORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML National Water Quality Monitoring Council August 20, 2003.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
Report Writing Sylvia Corsham De Montfort University 2008/9 (in association with Vered Hawksworth BSc.)
Introduction to Geospatial Metadata – ISO 191** Metadata National Centers for Environmental Information (NCEI)
Web Page Design. Some Terms Cascading Style Sheet, (CSS) –a style sheet language used to describe the look and formatting of a document written in html;
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Version 1.0 Review Date.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Introduction to Geospatial Metadata – ISO 191** Metadata National Coastal Data Development Center A division of the National Oceanographic Data Center.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
DAY 4: MICROSOFT EXCEL: IN-CLASS PROJECT Aliya Farheen August 27, 2015.
© 2012 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the U.S.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Microsoft ® Office Access ™ 2007 Training Datasheets I: Create a table by entering data ICT Staff Development presents:
CMPS 1371 Introduction to Computing for Engineers FILE Input / Output.
How to Write A Lab Report
WK 13 - How to Prepare Ecological Data Sets for Effective Analysis and Sharing 2:00 PM-5:00 PM August 1 st, 2010.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.
© 2014 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Digital Preservation 8/7/2012 Karen Estlund Head, Digital Library Services
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Metadata Training for Gulf Restoration Partners Module 1 – Introduction to Metadata and Metadata Standards.
Project management Topic 7 Controls. What is a control? Decision making activities – Planning – Monitor progress – Compare achievement with plan – Detect.
Files: By the end of this class you should be able to: Prepare for EXAM 1. create an ASCII file describe the nature of an ASCII text Use and describe string.
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
Data Organization Quality Assurance and Transformations.
Getting Familiar with Metadata Laurie Porth Rocky Mountain Research Station Audience: Scientists/researchers who have heard of metadata and now need to.
British Atmospheric Data Centre ( Searching: Whither NDG? Bryan Lawrence.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
VCE IT Theory Slideshows by Mark Kelly study design By Mark Kelly, vceit.com, Begin.
Digital Stewardship Lee Dotson Digital Initiatives Librarian University of Central Florida John C. Hitt Library Presentation available at
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
NASA Earth Science Data Stewardship
Standardization Promotes Biogeochemical Data Management and Use in Multidisciplinary Environmental Research Yaxing Wei, Suresh Vannan, Robert B. Cook,
Active Data Management in Space 20m DG
Research Data Management: Store and Analyze
Storage Basic recommendations:
Fundamental Practices for Preparing Data Sets
Explaining the Methodology : steps to take and content to include
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share

Objectives · Outline 9 fundamental data management habits for preparing data to share · For each data management habit, list associated best practices

Problem Statement · It is important to understand that good data management is crucial to achieving better and more streamlined data integration. There tends to be an underlying assumption that a majority of science data is available and poised for integration and re- use. Unfortunately, this is not the reality for most data. One problem scientists encounter when they discover data to integrate with other data, is the incompatibility of the data. Scientists can spend a lot of time trying to transform data to fit the needs of their project.

Benefits of Good Data Management Practices · Short-term benefits · Spend less time doing data management and more time doing research · Easier to prepare and use data · Collaborators can readily understand and use data files · Long-term benefits · Scientists outside your project can find, understand, and use your data to address broad questions · You get credit for preserving data products and for their use in other papers · Sponsors protect their investment

Fundamental Practices 1.Define the contents of your data files 2.Use consistent data organization 3.Use stable file formats 4.Assign descriptive file names 5.Preserve processing information 6.Perform basic quality assurance 7.Provide documentation 8.Protect your data 9.Preserve your data

Fundamental Practice #1 Define the contents of your data files: Use commonly accepted parameter names, descriptions, and units Be consistent Explicitly state units Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file Use ISO formats Example of a Parameter Table Scholes (2005) Examples: Use yyyymmdd; January 2, 1999 is Use 24-hour notation (13:30 hrs instead of 1:30 p.m. and 04:30 instead of 4:30 a.m.)

Fundamental Practice #2 Use consistent data organization: StationDateTempPrecip UnitsYYYYMM DD Cmm HOGI HOGI HOGI * Example 1: Each row in a file represents a complete record, and the columns represent all the parameters that make up the record. StationDateParameterValueUnit HOGI Temp12C HOGI Temp14C HOGI Precip0mm HOGI Precip3mm Example 2: Parameter name, value, and units are placed in individual rows. This approach is used in relational databases. · Don’t change or re-arrange columns · Include header rows (first row contains file name, data set title, author, date, and companion file names) · column headings should describe content of each column, · include one row for parameter names and one for parameter units

Example of Poor Data Practice for Collaboration and Sharing

Example of Good Data Practice for Collaboration and Sharing

Fundamental Practice #3 Use stable file formats: Use text (ASCII) file formats for tabular data Examples:.txt or.csv (comma-separated values) Suggested geospatial file formats Raster formats Geotiff netCDF (with CF convention preferred) HDF ASCII (plain text file gridded format with external projection information) Vector Shapefile KML/GML

Fundamental Practice #4 Assign descriptive file names File names should: Be unique Reflect contents Use ASCII characters only Use lower case letters, numbers, dashes, and underscores Avoid spaces and special characters bigfoot_agro_2000_gpp.tiff Site name Year What was measured Project Name File Format Biodiversity Lake Experiments Field Work Grasslands Make sure your file system is logical and efficient

Fundamental Practice #5 Preserve processing information Keep raw data raw: Do not Include transformations, interpolations, etc in raw file Consider making your raw data “read only” to ensure no changes Giles_zoopCount_Diel_2001_2003.csv TAXCOUNTTEMPC C F M F011.9 C F M N … Raw Data File ### Giles_zoop_temp_regress_4jun08.r ### Load data Giles<-read.csv("Giles_zoopCount_Diel_2001_2003.csv") ### Look at the data Giles plot(COUNT~ TEMPC, data=Giles) ### Log Transform the independent variable (x+1) Giles$Lcount<-log(Giles$COUNT+1) ### Plot the log-transformed y against x plot(Lcount ~ TEMPC, data=Giles) When processing data: Use a scripted language (e.g., R, SAS, MATLAB) Processing scripts are records of the processing done Scripts can be revised, rerun

Fundamental Practice #6 Perform Basic Quality Assurance Assure data are delimited and line up in proper columns Check for missing values in key parameters Scan for impossible and anomalous values Perform and review statistical summaries Map location data and assess any errors Example: Model X uses UTC time, all others use Eastern Time

Fundamental Practice #7 Provide Documentation / Metadata that follows standards Who Who collected the data? Who processed the data? Who wrote the metadata? Who to contact for questions? Who to contact to order? Who owns the data? Where Where were the data collected? Where were the data processed? Where are the data located? When When were the data collected? When were the data processed? How How were the data collected? How were the data processed? How do I access the data? How do I order the data? How much do the data cost? How was the quality assessed? Why Why were the data collected? What What are the data about? What project were they collected under? What are the constraints on their use? What is the quality? What are appropriate uses? What parameters were measured? What format are the data in?

Fundamental Practice #8 Protect Your Data Create back-up copies often Ideally three copies: original, one on-site (external), and one off-site Frequency based on need / risk Ensure that you can recover from a data loss Periodically test your ability to restore information Ensure file transfers are done without error Compare checksums before and after transfers

Fundamental Practice #9 Preserve Your Data What to preserve from the research project? Well-structured data files, with variables, units, and values well-defined Metadata record describing the data structured using Federal standards Additional information (provides context) Materials from project wiki/websites Files describing the project, protocols, or field sites (including photos) Publication(s)

Key Points · Data Management is important and critical in today’s science · Well-organized and documented data: · Enables researchers to work more efficiently · Can be shared easily by collaborators · Can potentially be re-used in ways not imagined when originally collected · Include data management in your research workflow. Make it a habit to manage your data well.

Resources USGS Data Management WebsiteData Management Website DataONE Data Management PrimerData Management Primer Best Practices for Preparing Environmental Data Sets to Share and ArchiveBest Practices for Preparing Environmental Data Sets to Share and Archive (Hook et al., 2010) ORNL DAAC Data Management for Data ProvidersData Management for Data Providers