How NOT to share your data: Avoiding data horror stories

Slides:



Advertisements
Similar presentations
Alternative FILE formats
Advertisements

Business Excellence Day November 2009 Putting trust in your electronic information store Alan Shipman Group 5 Training Limited.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Converting Microsoft Office Documents Bill Weber E-Learning Systems Administrator E-Learning Operations.
Graphical Modeling Pictorial Method for Representation of Numerical Data.
Word to HTML March 4, Accessibility Awareness  At this point in time, the training goal is to develop an awareness of web content accessibility.
Delivering Value Driven Document Management. The Business Case An unfulfilled need in the market for a powerful, comprehensive and value driven document.
Data Collection Tips & Tricks: Adding a data series to a graph and Google forms.
Agenda Overview 2.What is SharePoint? 3.NCDOT Websites 4.Roles 5.Search 6.SharePoint Interface.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
RADAR “How To…” Guide DEPOSITING RESEARCH OUTPUTS in RADAR Covered: -Accessing RADAR -Logging in -Depositing outputs -Managing outputs -Uploading documents.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
What are research data? Kathryn Unsworth (Natasha Simons' slides) James Cook University Workshop 17 September 2015.
CIT 256 Dreamweaver Intro. Dr. Beryl Hoffman. Start Dreamweaver Start from Start Menu/Adobe Master Collection CS6/ Adobe Dreamweaver CS6 Choose Create.
Confidential, I.R.I.S. © 2005, All rights reserved Discover… The most robust solution to structure, index, compress and convert all your documents into.
What can publishers do to support data? Dryad’s perspective STM Annual US Conference - April 22, 2015 Meredith Morovati Executive Director Illustration.
Preserving and Sharing Data: Best Practices & Requirements for Selecting a Data Sharing Repository
Microsoft Office XP Illustrated Introductory, Enhanced Tables and Queries Using.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Making Data Accessible Yolanda Gil USC/ISI February 20, 2015 "To deposit or not to deposit, that is the question - journal.pbio g001"
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Digital Preservation Policies: Technical Considerations SAA Boston: Andrea Goethals, FCLA.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Paper 2 Exam Tips Guidance: 1.Evidence Document 2.Unit 9: – Communication ( ) 3.Unit 10: - Document Production (Word) 4.Unit 16: PowerPoint 5.Unit.
1 Taking Notes. 2 STOP! Have I checked all your Source cards yet? Do they have a yellow highlighter mark on them? If not, you need to finish your Source.
GCSE Magazines Making the double page spread. The bigger picture You have to complete: A front cover for a new magazine A contents/editorial page A double.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Introduction to Managing Research and Personal Data.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
Poster Title Author Name(s) PRINTING INFORMATION
AP CSP: Making Visualizations & Discovering a Data Story
Analysis of Time Series Data
Finding Magazine & Newspaper Articles in a Library Database
Slide Template for Module 2: Types, Formats, and Stages of Data
Miscellaneous Excel Combining Excel and Access.
PLOS Facilitating Text & Data Mining The Role the Publisher Can Play
Copyright © 2014 Pearson Canada Inc.
GO! with Microsoft Office 2016
Using ODS Excel Migrating from DDE to ODS
Topics in Born Digital Archiving
The importance of being Connected
Single Sample Registration
P o s t e r t i t l e g o e s h e r e Add your logos here Introduction
GO! with Microsoft Access 2016
File Formats.
Vocabulary byte - The technical term for 8 bits of data.
DIGITAL RESEARCH DATA MANAGEMENT
Lesson 9 Sharing Documents
Microsoft Office Illustrated
Mail Merge Instructions (Yanick’s Version)
Exchanging Data with Other Programs
Working with Data in Windows
Creating assignments: Best Practices
11/7/2018.
Introduction to DSpace
e-Thesis Submission: What You Need to Know About Going Global
Data Management: Documentation & Metadata
TRAINING OF FOCAL POINTS ON THE CountrySTAT/FENIX SYSTEM
Open Access to your Research Papers and Data
Poster Title Author Name(s) PRINTING INFORMATION
Creative Media Pre-production Introduction Creative Media GCSE
Spreadsheets, Modelling & Databases
Beyond Description: Metadata for Catalogers in the 21st Century
Digital Preservation Policies: Technical Considerations
Computer Networks Lesson 4.
Mukurtu: Batch Upload, Roundtrip
Using Spreadsheets in Research – Best Practices
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

How NOT to share your data: Avoiding data horror stories Rosie Higman Office of Scholarly Communication 8th March 2017

Formatting your spreadsheet* Document and describe your data! Where? What? File formats Formatting your spreadsheet* Document and describe your data! * Based on Avoiding data disasters course by Mark Dunning, CRUK-CI http://bioinformatics-core-shared-training.github.io//avoid-data-disaster/

Warning! Every discipline is different These are general principles Application will vary according to your research

Where NOT to share your data Real examples – many found in various DAF surveys

Where SHOULD you share your data? Disciplinary repositories where possible, Apollo as a repository of last resort for Cambridge people DOI Preservation policy Well-indexed in search engines

You’ve decided to put your data in the repository now need to decide what data and how to present it. Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) - Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779, CC BY 4.0,

What data should you include? Graphs without underlying data do not add much to your article

What data should you include?

What data should you include? Numerical data in Word documents is not hugely helpful

What data should you include? More than your figures! Data and code necessary to recreate your results

Powerpoint is for presentations NOT data! Researchers often want to annotate images to highlight important areas and add metadata Lose the metadata embedded in the original image files – particularly important as some microscopes etc automatically embed substantial metadata Powerpoint is a poor format for both re-use and preservation

Powerpoint is for presentations NOT data! Instead: Original image files Appropriate formats Annotations embedded in separate PDF/csv/txt file (README file)

Think! Preservation vs access/re-use Find your file format Textual data = XML, TXT, HTML, PDF/A (Archival PDF) Tabular data (spreadsheets) = CSV Databases = XML, CSV Images = TIFF, PNG, JPEG* Audio = FLAC, WAV, MP3 You may need to submit multiple copies of the same data – one which facilitates easy re-use and one which will be possible to preserve – important for many research funders JPEG as a lossy format (some data is lost in compressing images into JPEGs) but one which may be necessary in some cases as TIFF files can be very large and so expensive to store. Think! Preservation vs access/re-use

Once you’ve found an appropriate file format don’t forget to publish in a way which allows for re-use, otherwise there is no point in publishing! Read-Only files make it less likely that other researchers will re-use your data and so cite you.

Messy spreadsheets are harder to re-use

Keep your spreadsheets tidy Graphs in separate sheet No highlighting No colours No formulas* *In your raw data Graphs should not be obscuring the underlying data Highlighting and colours are saved in CSV format and also not understood by computers so not useful if others are automating their analysis. Sometimes when you’ve used lots of formulas and they add to the data it might be helpful to include an xlsx file with the forumlae, as well as a csv which just contains the raw data

Keep your spreadsheets tidy No blank cells 1 piece of data per cell Keep units out of cells Use data validation No blank cells and choose a null value which cannot be confused Units should be in column titles not individual cells

Keep your spreadsheets tidy Paper found that Excel was automatically converting gene names into dates (gene symbols SEPT2 (Septin 2) and MARCH1 are converted by default to ‘2-Sep’ and ‘1-Mar’) This has corrupted many spreadsheets submitted as supporting information in genomics. Important to be aware of what is in your spreadsheet and use data validation when appropriate.

How NOT to describe your data

How you SHOULD describe your data Tells you what is in the dataset and how the data were collected.

Remember to document your data A good README file makes data much more usable. Space to describe methods, process of cleaning and analysing data. Opportunity to make your data a valuable resource

Choose open file formats. Choose sharing more than your figures. Choose a repository. Choose open file formats. Choose sharing more than your figures. Choose a tidy spreadsheet. Choose to describe your data. Choose decent documentation so your research is reproducible. CHOOSE DATA SHARING info@data.cam.ac.uk @CamOpenData www.data.cam.ac.uk