Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
The Geant4 physics validation repository
Contributing source code to CSDMS Albert Kettner.
Background Info The UK Mirror Service provides mirror copies of data and programs from many sources all over the world. This enables users in the UK to.
1 Workshop on Metadata Interoperability for Electronic Records Management November 15, 2001 Archives II, College Park, MD.
Supporting Data Management Infrastructure for the Humanities (Sudamih): Database as a Service (DaaS) : A Tool For Researchers James A J Wilson
Sara Bowman Center for Open Science Open Science Framework: Facilitating Transparency and Reproducibility.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Software workflows as research objects Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015 Slideshow-URL.
Alfresco – An Open Source Content Management System - Bindu Nayar, Bhavana Mohanraj.
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
October 30, 2008 Extensible Workflow Management for Simmod ESUG32, Frankfurt, Oct 30, 2008 Alexander Scharnweber (DLR) October 30, 2008 Slide 1 > Extensible.
EPSRC expectations on research data: What researchers need to know 12/03/2015 Masud Khokhar and Hardy Schwamm.
Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: /m9.figshare
Created by the Community for the Community BizTalk & Build.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007.
Version Control. How do you share code? Discussion.
Information Systems and Network Engineering Laboratory II DR. KEN COSH WEEK 1.
Perforce Software Version Everything.. Visual Studio Industry Partner Perforce Software NEXT STEPS Contact us at: Perforce products.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
A portal interface to my Grid workflow technology Stefan Rennick Egglestone University of Nottingham
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
QUICK START OF GITHUB Lin Shuo-Ren 2013/3/6 1. Why We Should Control The Version Although it rains, throw not away your watering pot. All changes should.
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Solutions using Microsoft Content Management Server 2002 Connector for SharePoint Technologies Sue Corke Mark Harrison Microsoft UK.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Status Report on the Validation Framework S. Banerjee, D. Elvira, H. Wenzel, J. Yarba Fermilab 15th Geant4 Collaboration Workshop 10/06/
Recent Enhancements to Quality Assurance and Case Management within the Emissions Modeling Framework Alison Eyth, R. Partheepan, Q. He Carolina Environmental.
Title Presenter name Slideshow-URL Conference name Date.
Information Systems and Network Engineering Laboratory I DR. KEN COSH WEEK 1.
OPEN SCIENCE PUBLISHING: BEYOND OPEN ACCESS MAX PLANCK OPEN ACCESS AMBASSADORS CONFERENCE, 4 December 2014 Michaela Torkar Editorial Director, F1000 Research.
Webinar on increasing openness and reproducibility April Clyburne-Sherin Reproducible Research Evangelist
Monash.edu Research data ecosystem David Groenewegen Director, Research, University Library.
Canadian Bioinformatics Workshops
Building Enterprise Applications Using Visual Studio®
Using Galaxy for Metabolomics
Scholarly Workflow: Federal Prototype and Preprints
research data workflow
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
Improving quality and reproducibility of code
External Web Services Quick Start Guide
Document & Web Content Management
GigaDB – revolutionizing data dissemination, organization and use
Integrated Open Access (OA) Service Mick Eadie, Research Information Officer Valerie McCutcheon, Research Information
A portal interface to myGrid workflow technology
API Documentation Guidelines
Integration of EGA secure data access into Galaxy
GitHub A Tool for software collaboration James Skon
Contributing source code to CSDMS
Introduction to the SHIWA Simulation Platform EGI User Forum,
Presentation transcript:

Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion Metabolomics Workshop

Researcher bias Positive result bias  20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: /journal.pmed

85% of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI: /journal.pmed

Data sharing Supported by gov policy: e.g. UK and NIH MetaboLights repository  NIH Metabolomics Data Repository  ISA-Tab for metadata 

What about methods? “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions

Problem There is a reproducibility crisis  Published results are untrustworthy  Research is a waste of government money (85%) What's the solution?  Share data AND methods

Galaxy Over 36,000 main Galaxy server users Over 1,000 papers citing Galaxy use Over 55 Galaxy servers deployed Open source

Galaxy – Toolshed Many 'omics, stats, visualisations Metabolomics can plug into this tools! Download; Run instantly

Any tool in Galaxy python myfunction input1 Basic xml 'wrapper' Describe inputs and outputs Calls command Monitors for output Logs/returns to 'history'

Galaxy Tool ListTool ParametersHistory/results

Birmingham metabolomics workflow SIM-Stitch DOI: /j.jasms XCMS DOI: /ac051437y MI-Pack DOI: /j.chemolab KNN Impute DOI: /s PQ-Normalisation DOI: /ac051632c G-Log transform DOI: / PCA (with statistical test of scores)

Birmingham metabolomics workflow Many tools Many languages Complex to learn Many parameters Complex to report

Metabolomics workflow in Galaxy User sees website (intuitive) Centrally stored (secure) Workflow is recorded Methods shareable

View, share, edit, rerun workflow

Citable workflow Add as supplemental files or publish with distinct DOI via GigaDB or FigShare

Where to get our workflow Coming soon!  Galaxy Toolshed  Github  Submitted to GigaScience (gigasciencejournal.com) VM/Code/TestData to be available on GigaDB.org Test server to be available at GigaGalaxy 

Summary Share your data Share your software Share your workflow – in full Galaxy is not a new 'software', it's a flexible sharing platform  Add your tools to ours, in Galaxy Toolshed Help make metabolomics:  Trustworthy, meaningful, reproducible

Acknowledgements University of Birmingham  Ralf Weber  Mark Viant GigaScience  Pete Li Funding  NERC NE/K011294/1

Me: Rob L. DavidsonThis presentation: