Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: 10.6084/m9.figshare.1466889.

Similar presentations


Presentation on theme: "Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: 10.6084/m9.figshare.1466889."— Presentation transcript:

1 Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson GigaScience @bobbledavidson #MetSoc2015 This presentation DOI: 10.6084/m9.figshare.1466889

2 Big Science DOI: 10.6084/m9.figshare.1466889

3 R&D is getting bigger http://www.battelle.org/docs/tpp/2014_global_rd_funding_forecast.pdf DOI: 10.6084/m9.figshare.1466889

4 More PhDs doi:10.1038/472276a DOI: 10.6084/m9.figshare.1466889

5 More postdocs http://www.nature.com/news/the-future-of-the-postdoc-1.17253 DOI: 10.6084/m9.figshare.1466889

6 Not so much at the top http://bit.ly/1yLO2de DOI: 10.6084/m9.figshare.1466889

7 Big is at the bottom http://www.phdcomics.com/comics/archive.php?comicid=1144 DOI: 10.6084/m9.figshare.1466889

8 THE NEED FOR OPEN DATA IN SCIENCE DOI: 10.6084/m9.figshare.1466889

9 Researcher bias Positive result bias  20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: 10.1371/journal.pmed.0020124 DOI: 10.6084/m9.figshare.1466889

10 Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 10 DOI: 10.1038/ng.295 DOI: 10.6084/m9.figshare.1466889

11 Software? http://reproducibility.cs.arizona.edu/ “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI: 10.6084/m9.figshare.1466889

12 DOI: 10.1371/journal.pmed.1001747 85% of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI: 10.6084/m9.figshare.1466889

13 OPEN DATA CASE STUDY DOI: 10.6084/m9.figshare.1466889

14 Pregnancy-Induced Metabolic Phenotype Variations in Maternal Plasma DOI: 10.1021/pr401068k DOI: 10.6084/m9.figshare.1466889

15 Data Note DOI: 10.6084/m9.figshare.1466889

16

17 Devil in the detail DOI: 10.6084/m9.figshare.1466889

18 Minor discrepancies Major considerations DOI: 10.6084/m9.figshare.1466889

19 Open Data Release data prior to peer review Produce highly detailed metadata descriptions – ISA Tab Expect/ accept updates, ‘ongoing review’ Release ‘negative data’ – Get credit for ALL work DOI: 10.6084/m9.figshare.1466889

20 OPEN SOURCE CASE STUDY DOI: 10.6084/m9.figshare.1466889

21 Birmingham metabolomics workflow Many tools Many languages Complex to learn Many parameters Complex to report DOI: 10.6084/m9.figshare.1466889

22 Galaxy-M GUI DOI: 10.6084/m9.figshare.1466889

23 Galaxy-M Workflows DOI: 10.6084/m9.figshare.1466889

24 Accessible, reusable Github – Ease of access Galaxy – Ease of use – Ease of reporting – Ease of adaptation Virtual Machine – Ease of installation – Guaranteed reproducibility Test Datasets DOI: 10.6084/m9.figshare.1466889

25 And yet… referee 2 “I think important aspects of reproducibility are lost when building on closed source and non-free applications.” “To be frank, if this were a genomics article I would recommend not publishing a purely computational methods paper when large parts of the pipeline are non- free and closed source - limiting both the reproducibility and transparency of the pipeline. Realistically though my understanding is that this is quite common in metabolomics” “I would have indicated the paper was of more broad interest if there was at least one complete open source pipeline for data analysis” DOI: 10.6084/m9.figshare.1466889

26 Solution Compiled all Matlab code REMOVED PLS Toolbox analysis Will work towards Matlab-free system in future DOI: 10.6084/m9.figshare.1466889

27 Open Source Use all the tools for – sharing, – installing, – Reusing Do not use proprietary systems – To increase collaboration – To increase interest and citations – Sorry Eigenvector DOI: 10.6084/m9.figshare.1466889

28 THANKS! GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson DOI: 10.6084/m9.figshare.1466889

29 Call for papers Plant Metabolomics Guest Edited by: Ute Roessner and Ruth Welti Open Access - Citable Data - Integrated Tools - Signed Peer Review Activities of plant metabolomics consortia Metabolomics and physiology of plant- environment interactions Insights into biochemical pathways and related physiology Plant MS-imaging www.gigasciencejournal.com editorial@gigasciencejournal.com


Download ppt "Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: 10.6084/m9.figshare.1466889."

Similar presentations


Ads by Google