Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI:10.6084/m9.figshare.1439750.

Similar presentations


Presentation on theme: "Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI:10.6084/m9.figshare.1439750."— Presentation transcript:

1 Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson GigaScience @bobbledavidson #MetSoc2015 This presentation DOI:10.6084/m9.figshare.1439750

2 Big Science

3 R&D is getting bigger http://www.battelle.org/docs/tpp/2014_global_rd_funding_forecast.pdf

4 More PhDs doi:10.1038/472276a

5 More postdocs http://www.nature.com/news/the-future-of-the-postdoc-1.17253

6 Not so much at the top https://royalsociety.org/~/media/Royal_Society_Content/policy/publications/2010/4294970126.pdf

7 Big is at the bottom http://www.phdcomics.com/comics/archive.php?comicid=1144

8 THE NEED FOR OPEN DATA IN SCIENCE DOI:10.6084/m9.figshare.1439750

9 Let me tell you about… “I am appalled sometimes at some papers today: they are so data-heavy and I don't think that makes them better papers.” – Tim Hunt 2014 Lab – http://www.labtimes.org/i50/i_01.lasso

10 Researcher bias Positive result bias  20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: 10.1371/journal.pmed.0020124 DOI:10.6084/m9.figshare.1439750

11 Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 11 DOI: 10.1038/ng.295 DOI:10.6084/m9.figshare.1439750

12 Software? http://reproducibility.cs.arizona.edu/ “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI:10.6084/m9.figshare.1439750

13 DOI: 10.1371/journal.pmed.1001747 85% of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI:10.6084/m9.figshare.1439750

14 OPEN DATA CASE STUDY DOI:10.6084/m9.figshare.1439750

15 Pregnancy-Induced Metabolic Phenotype Variations in Maternal Plasma DOI: 10.1021/pr401068k

16 Data Note

17

18 Devil in the detail

19 Minor discrepancies Major considerations

20 Open Data Release data prior to peer review Produce highly detailed metadata descriptions – ISA Tab Expect/ accept updates, ‘ongoing review’ Release ‘negative data’ – Get credit for ALL work

21 OPEN SOURCE CASE STUDY DOI:10.6084/m9.figshare.1439750

22 Birmingham metabolomics workflow Many tools Many languages Complex to learn Many parameters Complex to report

23 Galaxy-M GUI

24 Galaxy-M Workflows

25 Accessible, reusable Github – Ease of access Galaxy – Ease of use – Ease of reporting – Ease of adaptation Virtual Machine – Ease of installation – Guaranteed reproducibility Test Datasets

26 And yet… referee 2 “I think important aspects of reproducibility are lost when building on closed source and non-free applications.” “To be frank, if this were a genomics article I would recommend not publishing a purely computational methods paper when large parts of the pipeline are non- free and closed source - limiting both the reproducibility and transparency of the pipeline. Realistically though my understanding is that this is quite common in metabolomics” “I would have indicated the paper was of more broad interest if there was at least one complete open source pipeline for data analysis”

27 Solution Compiled all Matlab code REMOVED PLS Toolbox analysis Will work towards Matlab-free system in future

28 Open Source Use all the tools for – sharing, – installing, – Reusing Do not use proprietary systems – To increase collaboration – To increase interest and citations – Sorry Eigenvector

29 THANKS! DOI:10.6084/m9.figshare.1439750 GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson

30 Call for papers Plant Metabolomics Guest Edited by: Ute Roessner and Ruth Welti Open Access - Citable Data - Integrated Tools - Signed Peer Review Activities of plant metabolomics consortia Metabolomics and physiology of plant- environment interactions Insights into biochemical pathways and related physiology Plant MS-imaging www.gigasciencejournal.com editorial@gigasciencejournal.com


Download ppt "Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI:10.6084/m9.figshare.1439750."

Similar presentations


Ads by Google