U.S. Department of the Interior U.S. Geological Survey Data Management Training Modules: Best Practices for Preparing Science Data to Share.

Slides:



Advertisements
Similar presentations
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Advertisements

October 5, 2007 Weed Watching Tips for Citizen Scientists Monitoring Invasive Plant Species Citizen Based Monitoring Conference Devil's Head Resort and.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
U.S. Department of the Interior U.S. Geological Survey Tom Armstrong Senior Advisor for Global Change Programs U.S. Geological Survey
Brent Frakes, Functional Analyst.  Need to discover, share, have access to large holdings of data and information to address rapid climate change  e.
Advantages of Monitoring Vegetation Restoration With the Carolina Vegetation Survey Protocol M. Forbes Boyle, Robert K. Peet, Thomas R. Wentworth, and.
Modules, Hierarchy Charts, and Documentation
Lineage February 13, 2006 Geog 458: Map Sources and Errors.
Publishing your paper. Learning About You What journals do you have access to? Which do you read regularly? Which journals do you aspire to publish in.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Heather Henkel and Viv Hutchison September 5, 2012 A USGS Website and.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory 5 th NACP Principal Investigator’s.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
Elements of a Data Management Plan
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Best Practices for Preserving Data Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey USGS Data Management Training Modules: Value of Data Management “Data is a precious thing and will.
U.S. Department of the Interior U.S. Geological Survey Data Management Training Modules: Value of Data Management “Data is a precious thing and will last.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
From Best Practices for Preserving Data by Bob Cook, Environmental Sciences Division Oak Ridge National Laboratory Module 9.
Fundamental Practices for Preparing Data Sets Robert Cook ORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share.
FCM Quality of Life Reporting System Metadata By: Acacia Consulting and Research June 2002.
U.S. Department of the Interior U.S. Geological Survey Planning for Data Management Creating data management plans for your project.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Introduction to Geospatial Metadata – ISO 191** Metadata National Centers for Environmental Information (NCEI)
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Version 1.0 Review Date.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
An Introduction to Metadata Tammy Walker Beaty Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Data Management.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Introduction to Geospatial Metadata – ISO 191** Metadata National Coastal Data Development Center A division of the National Oceanographic Data Center.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
U.S. Department of the Interior U.S. Geological Survey Using Advanced Satellite Products to Better Understand I&M Data within the Context of the Larger.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Reasons our analysis needs to be reproducible… Efficiency on a given topic or study Efficiency across groups (especially critical with new “machine services”
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Microsoft ® Office Access ™ 2007 Training Datasheets I: Create a table by entering data ICT Staff Development presents:
WK 13 - How to Prepare Ecological Data Sets for Effective Analysis and Sharing 2:00 PM-5:00 PM August 1 st, 2010.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Digital Preservation 8/7/2012 Karen Estlund Head, Digital Library Services
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Metadata Training for Gulf Restoration Partners Module 1 – Introduction to Metadata and Metadata Standards.
Data Organization Quality Assurance and Transformations.
U.S. Department of the Interior U.S. Geological Survey August 24-25, 2011 Data Management Best Practices: FY11 Report.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
NVS New Zealand National Vegetation Survey. What is NVS? NVS (National Vegetation Survey) – New Zealand’s largest archive facility for plot-based vegetation.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
NASA Earth Science Data Stewardship
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Standardization Promotes Biogeochemical Data Management and Use in Multidisciplinary Environmental Research Yaxing Wei, Suresh Vannan, Robert B. Cook,
Research Data Management: Store and Analyze
Fundamental Practices for Preparing Data Sets
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

U.S. Department of the Interior U.S. Geological Survey Data Management Training Modules: Best Practices for Preparing Science Data to Share

Module Objectives · Describe the importance of maintaining well- managed science data. · Outline 9 fundamental data management habits for preparing data to share. · For each data management habit, list associated best practices.

Problem Statement · It is important to understand that good data management is crucial to achieving better and more streamlined data integration. There tends to be an underlying assumption that a majority of science data is available and poised for integration and re- use. Unfortunately, this is not the reality for most data. One problem scientists encounter when they discover data to integrate with other data, is the incompatibility of the data. Scientists can spend a lot of time trying to transform data to fit the needs of their project.

Importance of well-managed data · Why manage our data? · Aids in the reproducibility of science. · Creates efficiencies in how science is done. · Makes sharing across groups more efficient. · Creates improved provenance in the science iteration process. · Aids responses to legal requirements such as Freedom of Information Act (FOIA) and Information Quality Act (IQA). · Supports scientific review and integrity.

Example: National Tamarisk Habitat Map: 2006 This national map of habitat suitability for tamarisk was developed to aid land managers in early detection of this invasive plant species. The map produced by combining MODIS satellite products (vegetation indices and land cover type) with field observations to predict suitable habitat for tamarisk. Morisette, J.T., C. S. Jernevich, A. Ullah, W. Cai, J.A. Pedelty, J. Gentle, T.J.Stohlgren, J.L. Schnase, A tamarisk habitat suitability map for the continental US., Frontiers in Ecology and the Environment, Volume 4, Issue 1 (February 2006) pp. 11–17

Example: National Tamarisk Habitat Map: 2013 “Perhaps one of the most important implications of this work is first proof that invasive species habitat suitability mapping should be seen as an ongoing and iterative process…” “Careful and continued iteration between the model development and model-use communities can help ensure the most prudent use of modeling tools." Crall et al., Using habitat suitability models to target invasive plant species surveys. Ecological Applications, 23(1), 2013, pp. 60–72

Example: Python Maps Rodda GH, Jarnevich CS, Reed RN (2011) Challenges in Identifying Sites Climatically Matched to the Native Ranges of Animal Invaders. PLoS ONE 6(2):e doi: /journal.pone Jarnevich, C.S., G.H. Rodda, and R.N. Reed Data for giant constrictors—Biological management profiles and an establishment risk assessment for nine large species of pythons, anacondas, and the boa constrictor: U.S. Geological Survey Data Series 579.

Example: Python Maps Rodda GH, Jarnevich CS, Reed RN (2011) Challenges in Identifying Sites Climatically Matched to the Native Ranges of Animal Invaders. PLoS ONE 6(2):e doi: /journal.pone Jarnevich, C.S., G.H. Rodda, and R.N. Reed Data for giant constrictors—Biological management profiles and an establishment risk assessment for nine large species of pythons, anacondas, and the boa constrictor: U.S. Geological Survey Data Series 579.

Importance of well-managed data · Down the road, you may need the documentation for your data to be readily available. · It’s not just well defined data, but also the analysis that needs to be documented. · You may have to justify the conclusions and defend the results.

Benefits of Good Data Management Practices · Short-term benefits · Spend less time doing data management and more time doing research. · Easier to prepare and use data. · Collaborators can readily understand and use data files. · Long-term benefits · Scientists outside your project can find, understand, and use your data to address broad questions. · You get credit for preserving data products and for their use in other papers. · Sponsors protect their investment.

Fundamental Practices 1.Define the contents of your data files 2.Use consistent data organization 3.Use stable file formats 4.Assign descriptive file names 5.Preserve processing information 6.Perform basic quality assurance 7.Provide documentation 8.Protect your data 9.Preserve your data

Fundamental Practice #1 Define the contents of your data files: Use commonly accepted parameter names, descriptions, and units. Be consistent. Explicitly state units. Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file. Use ISO formats. Example of a Parameter Table Scholes (2005) Examples: Use yyyymmdd; January 2, 1999 is Use 24-hour notation (13:30 hrs instead of 1:30 p.m.).

Fundamental Practice #2 Use consistent data organization: StationDateTempPrecip UnitsYYYYMM DD Cmm HOGI HOGI HOGI * Example 1: Each row in a file represents a complete record, and the columns represent all the parameters that make up the record. StationDateParameterValueUnit HOGI Temp12C HOGI Temp14C HOGI Precip0mm HOGI Precip3mm Example 2: Parameter name, value, and units are placed in individual rows. This approach is used in relational databases. · Don’t change or re-arrange columns · Include header rows (first row contains file name, data set title, author, date, and companion file names). · Column headings should describe content of each column. · include one row for parameter names and one for parameter units.

Example of Poor Data Practice for Collaboration and Sharing Courtesy of S. Hampton, National Center for Ecological Analysis and Synthesis

Example of Good Data Practice for Collaboration and Sharing Courtesy of Oak Ridge National Laboratory

Fundamental Practice #3 Use stable file formats: Use text (ASCII) file formats for tabular data. Examples:.txt or.csv (comma-separated values) Suggested geospatial file formats: Raster formats Geotiff netCDF (with CF convention preferred) HDF ASCII (plain text file gridded format with external projection information) Vector Shapefile KML/GML Aranibar et al. (2005) Ackerman et al. (2012)

Fundamental Practice #4 Assign descriptive file names File names should: Be unique Reflect contents Use ASCII characters only Use lower case letters, numbers, dashes, and underscores Avoid spaces and special characters bigfoot_agro_2000_gpp.tiff Site name Year What was measured Project Name File Format Biodiversity Lake Experiments Field Work Grasslands Make sure your file system is logical and efficient Aranibar and Macko (2005)

Fundamental Practice #5 Preserve processing information Keep raw data raw: Do not Include transformations, interpolations, etc in raw file. Consider making your raw data “read only” to ensure no changes. Giles_zoopCount_Diel_2001_2003.csv TAXCOUNTTEMPC C F M F011.9 C F M N … Raw Data File ### Giles_zoop_temp_regress_4jun08.r ### Load data Giles<-read.csv("Giles_zoopCount_Diel_2001_2003.csv") ### Look at the data Giles plot(COUNT~ TEMPC, data=Giles) ### Log Transform the independent variable (x+1) Giles$Lcount<-log(Giles$COUNT+1) ### Plot the log-transformed y against x plot(Lcount ~ TEMPC, data=Giles) When processing data: Use a scripted language (e.g., R, SAS, MATLAB). Processing scripts are records of the processing done. Scripts can be revised and rerun.

Fundamental Practice #6 Perform Basic Quality Assurance Assure data are delimited and line up in proper columns. Check for missing values in key parameters. Scan for impossible and anomalous values. Perform and review statistical summaries. Map location data and assess any errors. Example: Model X in red uses UTC time, all others use Eastern Time. Photo Courtesy of Dan Ricciuto & Yaxing Wei, ORNL

Fundamental Practice #7 Provide Documentation / Metadata that follows standards Who Who collected the data? Who processed the data? Who wrote the metadata? Who to contact for questions? Who to contact to order? Who owns the data? Where Where were the data collected? Where were the data processed? Where are the data located? When When were the data collected? When were the data processed? How How were the data collected? How were the data processed? How do I access the data? How do I order the data? How much do the data cost? How was the quality assessed? Why Why were the data collected? What What are the data about? What project were they collected under? What are the constraints on their use? What is the quality? What are appropriate uses? What parameters were measured? What format are the data in? The Speech Dudes (2012)

Fundamental Practice #8 Protect Your Data Create back-up copies often Ideally three copies: original, one on-site (external), and one off-site Frequency based on need / risk Ensure that you can recover from a data loss Periodically test your ability to restore information Ensure file transfers are done without error Compare checksums before and after transfers From Flickr The Commons From Flickr by Brian J Matis

Fundamental Practice #9 Preserve Your Data What to preserve from the research project? Well-structured data files, with variables, units, and values well-defined Metadata record describing the data structured using Federal standards Additional information (provides context) Materials from project wiki/websites Files describing the project, protocols, or field sites (including photos) Publication(s) The Speech Dudes (2012) Courtesy of ORNL

Key Points · Data Management is important and critical in today’s science. · Well-organized and documented data: · Enables researchers to work more efficiently. · Can be shared easily by collaborators. · Can potentially be re-used in ways not imagined when originally collected. · Include data management in your research workflow. Make it a habit to manage your data well.

Data Management Modules developed by: · Viv Hutchison (USGS) · Thomas Burley (USGS) · Michelle Chang (USGS) · Thomas Chatfield (BLM) · Robert Cook (ORNL) · Heather Henkel (USGS) · Carly Strasser (California Digital Library) · Lisa Zolly (USGS) This effort was funded by the USGS Community for Data (CDI) Integration in 2013 in collaboration with the U.S. Geological Survey, Bureau of Land Management, California Digital Library, and the Oak Ridge National Laboratory. These modules were developed through the USGS Office of Organizational and Employee Development's (OED) Technology Enabled Learning (TEL) Program.