Presentation is loading. Please wait.

Presentation is loading. Please wait.

WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750.

Similar presentations


Presentation on theme: "WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750."— Presentation transcript:

1 WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750

2 Up ahead The need for Open Data in science GigaScience and GigaDB Everything is data Open is accessible Literate programming So, what are we going to do with data? DOI:10.6084/m9.figshare.1439750

3 THE NEED FOR OPEN DATA IN SCIENCE DOI:10.6084/m9.figshare.1439750

4 Researcher bias Positive result bias  20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: 10.1371/journal.pmed.0020124 DOI:10.6084/m9.figshare.1439750

5 Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 5 DOI: 10.1038/ng.295 DOI:10.6084/m9.figshare.1439750

6 Software? http://reproducibility.cs.arizona.edu/ “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI:10.6084/m9.figshare.1439750

7 DOI: 10.1371/journal.pmed.1001747 85% of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI:10.6084/m9.figshare.1439750

8 What, why Open Data? Knowledge is open if anyone is – free to access, – use, – modify, – and share it – subject, at most, to measures that preserve provenance and openness. http://opendefinition.org/od/ DOI:10.6084/m9.figshare.1439750

9 FAIR Data http://datafairport.org/ DOI:10.6084/m9.figshare.1439750

10 GIGASCIENCE AND GIGADB DOI:10.6084/m9.figshare.1439750

11 The publishing tradition 1812 16651869 DOI:10.6084/m9.figshare.1439750

12 The publishing tradition Aimed at paper product Limited length Limited detail No supporting data No supporting code Poor images Limited figures DOI:10.6084/m9.figshare.1439750

13 Anatomy of a traditional Publication Data Idea Study Analysis Answer Metadata 13 DOI:10.6084/m9.figshare.1439750

14 Anatomy of an Open Data Publication 14 Data Idea Study Analysis Answer Metadata DOI:10.6084/m9.figshare.1439750

15 Multi-faceted publication Open-access journal Data Publishing Platform Data Analysis Platform Data Metadata Methods Analyses DOI:10.6084/m9.figshare.1439750

16 “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 16 DOI:10.6084/m9.figshare.1439750

17 “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 17 DOI:10.6084/m9.figshare.1439750

18 “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 18 DOI:10.6084/m9.figshare.1439750

19 Image Source: http://commons.wikimedia.org/wiki/File:System-Mechanic-California.jpg “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 19 DOI:10.6084/m9.figshare.1439750

20 EVERYTHING IS DATA DOI:10.6084/m9.figshare.1439750

21 Data is data DOI:10.1186/2047-217X-3-7 DOI:10.6084/m9.figshare.1439750

22 Software is data “For loading data from the provided datasets, a script that can load individual spectra or images is provided” DOI: DOI:10.6084/m9.figshare.1439750

23 Metadata is data Findable, reusable… Bioontologies/ISA-Tab – Standard language ORCID – Unique, traceable authors Fundref – Track funding outputs API’s – Easy search DOI:10.6084/m9.figshare.1439750

24 ACCESSIBLE, USABLE DATA DOI:10.6084/m9.figshare.1439750

25 Curation Not all science data is pretty ISA-Tab, SRA helps Peer reviewed data is better data http://bit.ly/1F47YZz DOI:10.6084/m9.figshare.1439750

26 Software pipelines Gigagalaxy.net Tool List Tool Parameters History/results DOI:10.6084/m9.figshare.1439750

27 Visualise pipelines DOI:10.6084/m9.figshare.1439750 Gigagalaxy.net

28 Reproducing results? SOAPdenovo2 S. aureus pipeline DOI: 10.1186/2047-217X-1-18 DOI:10.6084/m9.figshare.1439750

29 Easy installation Virtual machine – Pre-installed – Peer-reviewed – Reproducibility, frozen in time DOI:10.1186/2047-217X-3-23 DOI:10.6084/m9.figshare.1439750

30 Literate programming Data journalism for all! KnitR, iPython, project Jupyter DOI:10.1186/2047-217X-3-3 DOI:10.6084/m9.figshare.1439750

31 WHAT ARE WE GOING TO DO WITH DATA? DOI:10.6084/m9.figshare.1439750

32 Add value http://bit.ly/1JyTfxO DOI:10.6084/m9.figshare.1439750

33 Do science? Data – DOI: 10.5524/100034 Subsequent analysis – DOI: 10.1126/scitranslmed.3006086 Science journalism – http://bit.ly/1AXEkKJ Why not do part 2 as well? DOI:10.6084/m9.figshare.1439750

34 Summary Science has problems – so how good can science journalism be? Things are changing – slowly The future is bright The future is data-driven Data journalists will be the new scientists? DOI:10.6084/m9.figshare.1439750

35 THANKS! DOI:10.6084/m9.figshare.1439750 GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson


Download ppt "WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750."

Similar presentations


Ads by Google