Download presentation
Presentation is loading. Please wait.
Published byPhilippa Curtis Modified over 9 years ago
1
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson GigaScience @bobbledavidson #MetSoc2015 This presentation DOI: 10.6084/m9.figshare.1466889
2
Big Science DOI: 10.6084/m9.figshare.1466889
3
R&D is getting bigger http://www.battelle.org/docs/tpp/2014_global_rd_funding_forecast.pdf DOI: 10.6084/m9.figshare.1466889
4
More PhDs doi:10.1038/472276a DOI: 10.6084/m9.figshare.1466889
5
More postdocs http://www.nature.com/news/the-future-of-the-postdoc-1.17253 DOI: 10.6084/m9.figshare.1466889
6
Not so much at the top http://bit.ly/1yLO2de DOI: 10.6084/m9.figshare.1466889
7
Big is at the bottom http://www.phdcomics.com/comics/archive.php?comicid=1144 DOI: 10.6084/m9.figshare.1466889
8
THE NEED FOR OPEN DATA IN SCIENCE DOI: 10.6084/m9.figshare.1466889
9
Researcher bias Positive result bias 20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: 10.1371/journal.pmed.0020124 DOI: 10.6084/m9.figshare.1466889
10
Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 10 DOI: 10.1038/ng.295 DOI: 10.6084/m9.figshare.1466889
11
Software? http://reproducibility.cs.arizona.edu/ “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI: 10.6084/m9.figshare.1466889
12
DOI: 10.1371/journal.pmed.1001747 85% of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI: 10.6084/m9.figshare.1466889
13
OPEN DATA CASE STUDY DOI: 10.6084/m9.figshare.1466889
14
Pregnancy-Induced Metabolic Phenotype Variations in Maternal Plasma DOI: 10.1021/pr401068k DOI: 10.6084/m9.figshare.1466889
15
Data Note DOI: 10.6084/m9.figshare.1466889
17
Devil in the detail DOI: 10.6084/m9.figshare.1466889
18
Minor discrepancies Major considerations DOI: 10.6084/m9.figshare.1466889
19
Open Data Release data prior to peer review Produce highly detailed metadata descriptions – ISA Tab Expect/ accept updates, ‘ongoing review’ Release ‘negative data’ – Get credit for ALL work DOI: 10.6084/m9.figshare.1466889
20
OPEN SOURCE CASE STUDY DOI: 10.6084/m9.figshare.1466889
21
Birmingham metabolomics workflow Many tools Many languages Complex to learn Many parameters Complex to report DOI: 10.6084/m9.figshare.1466889
22
Galaxy-M GUI DOI: 10.6084/m9.figshare.1466889
23
Galaxy-M Workflows DOI: 10.6084/m9.figshare.1466889
24
Accessible, reusable Github – Ease of access Galaxy – Ease of use – Ease of reporting – Ease of adaptation Virtual Machine – Ease of installation – Guaranteed reproducibility Test Datasets DOI: 10.6084/m9.figshare.1466889
25
And yet… referee 2 “I think important aspects of reproducibility are lost when building on closed source and non-free applications.” “To be frank, if this were a genomics article I would recommend not publishing a purely computational methods paper when large parts of the pipeline are non- free and closed source - limiting both the reproducibility and transparency of the pipeline. Realistically though my understanding is that this is quite common in metabolomics” “I would have indicated the paper was of more broad interest if there was at least one complete open source pipeline for data analysis” DOI: 10.6084/m9.figshare.1466889
26
Solution Compiled all Matlab code REMOVED PLS Toolbox analysis Will work towards Matlab-free system in future DOI: 10.6084/m9.figshare.1466889
27
Open Source Use all the tools for – sharing, – installing, – Reusing Do not use proprietary systems – To increase collaboration – To increase interest and citations – Sorry Eigenvector DOI: 10.6084/m9.figshare.1466889
28
THANKS! GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson DOI: 10.6084/m9.figshare.1466889
29
Call for papers Plant Metabolomics Guest Edited by: Ute Roessner and Ruth Welti Open Access - Citable Data - Integrated Tools - Signed Peer Review Activities of plant metabolomics consortia Metabolomics and physiology of plant- environment interactions Insights into biochemical pathways and related physiology Plant MS-imaging www.gigasciencejournal.com editorial@gigasciencejournal.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.