Best Practices for Collecting Data Bill Corey Data Consultant University of Virginia Library Andrea Horne Denton Health Sciences Data.

Slides:



Advertisements
Similar presentations
Data Storage and Security Best Practices for storing and securing your data The goal of data storage is to ensure that your research data are in a safe.
Advertisements

Preservation Strategies: What do long-term archives do with my data? Jeff Arnfield NOAA’s National Climatic Data Center Version 1.0 Review Date.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Online training for professionals: how this is being addressed by AccessIT Adam Dudczak Poznań Supercomputing and Networking Center
Discovering Computers 2010
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory 5 th NACP Principal Investigator’s.
Database Software Application
Creating and publishing accessible course materials Practical advise you can replicate.
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.
Agenda Overview 2.What is SharePoint? 3.NCDOT Websites 4.Roles 5.Search 6.SharePoint Interface.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Fundamental Practices for Preparing Data Sets Robert Cook ORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share.
Microsoft Word 2000: Mail Merge Basics Peggy Serfazo Marple Molly Calvello Support Professionals Business Applications - Desktop Microsoft Corporation.
Preserving the Scientific Record: Establishing Relationships with Archives Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
U.S. Department of the Interior U.S. Geological Survey Planning for Data Management Creating data management plans for your project.
Data Organization Data Collection and Spreadsheets.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
Why Should You Care about Managing Your Research? Sherry Lake and Bill Corey Data Management Consulting Group Research Data Services Purdom Lindblad Head.
Introduction to Versioning
Data Management: Documentation and Metadata for Engineering and Physical Sciences Ivey Glendon, Metadata Librarian Jeremy Bartczak, Intellectual Access.
Windows Tutorial 4 Working with the Internet and
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
To enhance learning, service, and research through an advanced information technology environment. Our Mission:To enhance learning, service,and research.
Data Wrangling and Interoperability Andrea Denton Research and Data Services Manager Claude Moore Health Sciences Library Ricky Patterson.
UVa Library Research Data Services
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Data Management: Documentation & Metadata Sherry Lake, Senior Data Consultant Bill Corey, Data Consultant Jeremy Bartczak, Intellectual Access & Metadata.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
© Prentice Hall, 2007 Business Communication Essentials, 3eChapter Writing and Completing Reports and Proposals.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Elements of a Data Management Plan Bill Michener University of New Mexico
Data Management: Documentation & Metadata Metadata (Structured Documentation)
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
Data Organization Quality Assurance and Transformations.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Version 1.0 Review Date.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Session 6: Data Flow, Data Management, and Data Quality.
A Beginner’s Guide to Preserving Digital Resources in Historic Environment Records Catherine Hardman and Kieron Niven Archaeology Data Service.
Digital Stewardship Lee Dotson Digital Initiatives Librarian University of Central Florida John C. Hitt Library Presentation available at
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Scientific data storage: How are computers involved in the following?
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
Slide Template for Module 2: Types, Formats, and Stages of Data.
Todd Quinn – Business & Economics Librarian
Open Exeter Project Team
GO! with Microsoft Office 2016
Step 1: Prepare data in Excel for mail merge
GO! with Microsoft Access 2016
REDCap Data Migration from CSV file
Research Data Management: Store and Analyze
Digital Project Lifecycle Curating Across the Curriculum
The Office Procedures and Technology
The role of metadata in census data dissemination
Presentation transcript:

Best Practices for Collecting Data Bill Corey Data Consultant University of Virginia Library Andrea Horne Denton Health Sciences Data Consultant Claude Moore Health Sciences Library © 2013 by the Rector and Visitors of the University of Virginia. This work is made available under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license

Goals for the workshop Learn about why this is important Learn about common problems Learn about 7 best practice areas Complete hands-on exercises Gain peer and expert feedback

Website with Sample Files Go to:

WHY? Following these Best Practices……. Will improve the usability of the data by you or by others Your data will be “computer ready”

Spreadsheet Examples

Spreadsheet Problems? Pause for Exercise

Problems Dates are not stored consistently Values are labeled inconsistently Data coding is inconsistent Order of values are different

Problems Confusion between numbers and text Different types of data are stored in the same columns The spreadsheet loses interpretability if it is sorted

Next Exercise Possible Solution

Best Practices Data Organization Lines or rows of data should be complete –Designed to be machine readable, not human readable (sort)sort

Possible Solution

Best Practices Data Organization Include a Header Line 1 st line (or record) Label each Column with a short but descriptive name –Names should be unique –Use letters, numbers, or “_” (underscore) –Do not include blank spaces or symbols (+ - & ^ *)

Best Practices Data Organization Columns of data should be consistent –Use the same naming convention for text data Columns should include only a single kind of data –Text or “string” data –Integer numbers –Floating point or real numbers

Use Standardized Formats ISO 8601 Standard for Date and Time –YYYYMMDDThh:mmss.sTZD T09:1234.9Z T09: :00 Spatial Coordinates for Latitute/Longitude – +/- DD.DDDDD (longitude) (latitude)

File Names

Use descriptive names Not too long Don’t use spaces Try to include time, place & theme May use “-” or “_”

File Names String words together with Caps (VegBiodiv_2007) Think about using version numbers Don’t change default extensions (txt, jpg, csv,…)

Organize Files Logically Make sure your file system is logical and efficient Biodiversity Lake Grassland Experiments Field Work Biodiv_H20_heatExp_2005_2008.csv Biodiv_H20_predatorExp_2001_2003.csv … Biodiv_H20_planktonCount_start2001_active.csv Biodiv_H20_chla_profiles_2003.csv …

Quality Assurance / Control QA: Manually check 5 – 10% of data records QA: Check for out-of-range values (plotting) QA: Map Location Data QC: Use a data entry program –Program to catch typing errors –Program pull-down menu option QC: Double entry keying

Preserve Information Keep Original (Raw) File –Uncorrected copy, make “read-only” Use scripted code to transform and correct data Save as a new file Raw Data File Processing Script (R)

Preserving: Scripted Notes Use a scripted language to process data –R Statistical package (free, powerful) –SAS –MATLAB Processing scripts records processing –Steps are recorded in textual format –Can be easily revised and re-executed –Easy to document GUI-based analysis may be easier, but harder to reproduce

Define Contents of Data Files Create a Project Document File (Lab Notebook) Details such as: –Names of data & analysis files associated with study –Definitions for data and codes (include missing value codes, names) exampleexample –Units of measure (accuracy and precision) –Standards or instrument calibrations

Next Exercise Create a Data Dictionary (Document) for the file “sortdata-good” Template

Possible Solution

Data Dictionary Example

File Format Sustainability TypesExamples TextASCII, Word, PDF NumericalASCII, SPSS, STATA, Excel, Access, MySQL MultimediaJpeg, tiff, mpeg, quicktime Models3D, statistical SoftwareJava, C, Fortran Domain-specificFITS in astronomy, CIF in chemistry Instrument-specificOlympus Confocal Microscope Data Format

Choosing File Formats Accessible Data (in the future) –Non-proprietary (software formats) –Open, documented standard –Common, used by the research community –Standard representation (ASCII, Unicode) –Unencrypted & Uncompressed

Best Practices Creating Data 1.Use Consistent Data Organization 2.Use Standardized Formats 3.Assign Descriptive File Names 4.Perform Basic Quality Assurance / Quality Control 5.Preserve Information - Use Scripted Languages 6.Define Contents of Data Files; Create Metadata 7.Use Consistent, Stable and Open File Formats

Why Manage Data? Saves time Others can understand your data Makes sharing data easier –Increases the visibility of your research –Facilitates new discoveries –Reduces costs by avoiding duplication –Required by funding agencies

Research Life Cycle Data Life Cycle Re-Purpose Re-UseDeposit Data Collection Data Collection Data Analysis Data Analysis Data Sharing Data Sharing Proposal Planning Writing Data Discovery Data Discovery End of Project End of Project Data Archive Data Archive Project Start Up Project Start Up

Managing Data in the Data Life Cycle Choosing file formats File naming conventions Document and metadata Access control & security Backup & storage

Data Security & Access Control Network security –keep confidential or sensitive data off internet servers or computers on connected to the internet Physical security –Access to buildings and rooms Computer Systems & Files –Use passwords on files/system –Virus protection

Backup Your Data Reduce the risk of damage or loss Use multiple locations (here, near, far) Create a backup schedule Use reliable backup medium Test your backup system (i.e., test file recovery)

Storage & Backup

Sustainable Storage Lifespan of Storage Media:

Best Practices Bibliography Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5 th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from Graham, A., McNeill, K., Stout, A., & Sweeney, L. (2010). Data Management and Publishing. Retrieved 05/31/2012, from

Best Practices Bibliography (Cont.) Van den Eynden, V., Corti, L., Woollard, M. & Bishop, L. (2011). Managing and Sharing Data: A Best Practice Guide for Researchers (3 rd ed.). Retrieved 05/31/2012, from archive.ac.uk/media/2894/managingsharing.pdf Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online ( from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.

Mailing List Subscription Please check the box on our sign-in sheet to receive occasional s to keep up with our services, training, and news. Please encourage others to subscribe:

More Research Data Services in the Library Offering expert data assistance at every stage of the research process. PLANNING Need a data management plan? We can assist you with developing a data management plan that meets increasingly stringent criteria from funding agencies, including: Implementation of procedures, tools, and workflows for managing data sets Designing a strong study that yields reliable statistics FINDING & COLLECTING Need help finding data or collecting your own? We have thousands of sources with the data you seek and experts who will help you: Locate, evaluate and format data Design metadata and data documentation protocols for new data collection Capture data with the appropriate technology tools for your needs SHARING Ready to share or archive your data? We can consult with you on strategies to help others discover or access your research by: Adhering to data sharing policies and norms Selecting a data-sharing repository Making your data easier to discover and link ANALYZING Want help uncovering unique and compelling insights? Get expert assistance from statistical, spatial, or media specialists to analyze your data and convey your research message: Learn how to use cutting-edge tools and methods Experiment with high-resolution visualization technologies Develop graphical representations that bring impact to your analysis Workshops 1:1 Consultations Class Presentations Contact me at to find out more.

QUESTIONS? Bill Corey Data Consultant Data Management Consulting Group University of Virginia Library Andrea Horne Denton Health Sciences Data Consultant Claude Moore Health Sciences Library Data Management Consulting Group University of Virginia Library