WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
Up ahead The need for Open Data in science GigaScience and GigaDB Everything is data Open is accessible Literate programming So, what are we going to do with data? DOI: /m9.figshare
THE NEED FOR OPEN DATA IN SCIENCE DOI: /m9.figshare
Researcher bias Positive result bias 20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: /journal.pmed DOI: /m9.figshare
Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 5 DOI: /ng.295 DOI: /m9.figshare
Software? “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI: /m9.figshare
DOI: /journal.pmed % of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI: /m9.figshare
What, why Open Data? Knowledge is open if anyone is – free to access, – use, – modify, – and share it – subject, at most, to measures that preserve provenance and openness. DOI: /m9.figshare
FAIR Data DOI: /m9.figshare
GIGASCIENCE AND GIGADB DOI: /m9.figshare
The publishing tradition DOI: /m9.figshare
The publishing tradition Aimed at paper product Limited length Limited detail No supporting data No supporting code Poor images Limited figures DOI: /m9.figshare
Anatomy of a traditional Publication Data Idea Study Analysis Answer Metadata 13 DOI: /m9.figshare
Anatomy of an Open Data Publication 14 Data Idea Study Analysis Answer Metadata DOI: /m9.figshare
Multi-faceted publication Open-access journal Data Publishing Platform Data Analysis Platform Data Metadata Methods Analyses DOI: /m9.figshare
“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 16 DOI: /m9.figshare
“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 17 DOI: /m9.figshare
“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 18 DOI: /m9.figshare
Image Source: “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 19 DOI: /m9.figshare
EVERYTHING IS DATA DOI: /m9.figshare
Data is data DOI: / X-3-7 DOI: /m9.figshare
Software is data “For loading data from the provided datasets, a script that can load individual spectra or images is provided” DOI: DOI: /m9.figshare
Metadata is data Findable, reusable… Bioontologies/ISA-Tab – Standard language ORCID – Unique, traceable authors Fundref – Track funding outputs API’s – Easy search DOI: /m9.figshare
ACCESSIBLE, USABLE DATA DOI: /m9.figshare
Curation Not all science data is pretty ISA-Tab, SRA helps Peer reviewed data is better data DOI: /m9.figshare
Software pipelines Gigagalaxy.net Tool List Tool Parameters History/results DOI: /m9.figshare
Visualise pipelines DOI: /m9.figshare Gigagalaxy.net
Reproducing results? SOAPdenovo2 S. aureus pipeline DOI: / X-1-18 DOI: /m9.figshare
Easy installation Virtual machine – Pre-installed – Peer-reviewed – Reproducibility, frozen in time DOI: / X-3-23 DOI: /m9.figshare
Literate programming Data journalism for all! KnitR, iPython, project Jupyter DOI: / X-3-3 DOI: /m9.figshare
WHAT ARE WE GOING TO DO WITH DATA? DOI: /m9.figshare
Add value DOI: /m9.figshare
Do science? Data – DOI: / Subsequent analysis – DOI: /scitranslmed Science journalism – Why not do part 2 as well? DOI: /m9.figshare
Summary Science has problems – so how good can science journalism be? Things are changing – slowly The future is bright The future is data-driven Data journalists will be the new scientists? DOI: /m9.figshare
THANKS! DOI: /m9.figshare GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson