Download presentation
Presentation is loading. Please wait.
Published byVeronica Harrell Modified over 9 years ago
1
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015
2
All Your Research Objects Project proposal Project experimental SOPs Images of equipment, subjects, conditions RAW data Meta-data Analysis code, parameters, pipelines Analysis environment, VM or provisioning script Intermediate results Publication figures/images/tables: codify Publication text Source: DOI: 10.6084/m9.figshare.1330219
3
GigaSolution: deconstructing the paper Combines and integrates: Open-access journal Data Publishing Platform Data Analysis Platform
4
Today’s message Tools that fit with GigaDB – General purpose Research Object store Enhancing – Accessibility – Reproducibility Of some of your research objects – Software – images
5
Problems with scientific software - reproducibility
6
Measuring software reproducibility Systematic study: 515 papers (429 conference, 86 journal) <30% reproducible DOI: 10.6084/m9.figshare.1330219 http://reproducibility.cs.arizona.edu
7
Measuring software reproducibility DOI: 10.6084/m9.figshare.1330219 http://reproducibility.cs.arizona.edu
8
Reasons for failure “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” DOI: 10.6084/m9.figshare.1330219 http://reproducibility.cs.arizona.edu
9
Cost of failure Waste time Waste money – Ioannidis 2014 – 85% resources wasted Frustrating Distrust DOI: 10.6084/m9.figshare.1330219DOI: 10.1371/journal.pmed.1001747
10
Literate programming - KnitR
11
Literate programming Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do. – Donald E. Knuth, Literate Programming, 1984
12
Literate programming options See listing: http://www.gigasciencejournal.com/content/ 3/1/19 http://www.gigasciencejournal.com/content/ 3/1/19 – R: KnitR, Sweave, R-Markdown – Javascript: Tangle, Active Markdown (CoffeeScript) – Python: Ipython Notebooks – iReport links this functionality for Galaxy DOI: 10.6084/m9.figshare.1330219
13
KnitR is versatile R Python Ruby Haskell Perl SAS Coffeescript.txt LaTeX HTML D3.js R Markdown HTML5 slides Command line Any text? WordPress
14
KnitR – how does it work? Code chunks – Basic text (or latex or markdown), interrupted by ‘chunks’ of code For latex, similar to Sweave …some text \Sexpr{rfunc(var)} more text… …some text >= Some code @ Process this combined text/code with knit() in R
15
KnitR uses: easy to explain DOI: 10.6084/m9.figshare.1330219 http://reproducibility.cs.arizona.edu
16
KnitR uses: reproducible analysis Can string different tools/languages together Stores parameters Just like a pipeline/workflow system – E.g. galaxy, taverna, Knime But also: codifies your figures…
17
KnitR uses – codified figures DOI: 10.6084/m9.figshare.1330219 Classic problems: No description of error bars No description of distributions Admittedly this could be fixed by ‘proper’ peer review Source code: http://bit.ly/1NQZlHh
18
KnitR uses: codified figures DOI: 10.6084/m9.figshare.1330219 Code can be found quickly Using text as markers Plot can be altered – 1 line of code New visualisation produced instantaneously Better evaluation of results Source code: http://bit.ly/1NQZlHh
19
GigaScience KnitR example “This article is an example of a literate programming document. It has been created in R using the knitr package. Figures and tables in this paper are generated dynamically as the document is compiled. Several R packages are required to run the analysis. Materials are archived in the Gigascience database” DOI: 10.6084/m9.figshare.1330219DOI:10.1186/2047-217X-3-3
20
Environment wrappers - VMs DOI: 10.6084/m9.figshare.1330219
21
Measuring software reproducibility DOI: 10.6084/m9.figshare.1330219 http://reproducibility.cs.arizona.edu
22
Your environment How hard would it be to start from scratch? What if you move from Ubuntu to Centos? Or just upgrade? Dependencies / Versions System settings Hard for you, horrendous for others! DOI: 10.6084/m9.figshare.1330219
23
Share your environment Virtual machine – Copy your exact environment – If it works for you, it works for anyone – Reproducibility, frozen in time DOI: 10.6084/m9.figshare.1330219 DOI:10.1186/2047-217X-3-23
24
Share your environment Docker – ‘light’ vm – Discrete unit of code+environment – Can be called from command line – Can be linked together New possibilities e.g. nucleotid.es – Benchmarking -> “data-driven peer-review”? DOI: 10.6084/m9.figshare.1330219 http://nucleotid.es/
25
Share your environment Some concerns: – http://ivory.idyll.org/blog/vms-considered- harmful.html http://ivory.idyll.org/blog/vms-considered- harmful.html – VM = black box? – Docker == black box! Solution-> codify the environment DOI: 10.6084/m9.figshare.1330219
26
Codify your environment Provisioning scripts are ‘research objects’ Improves adaptability (easier to recode for alternative OS etc) Builds in extra documentation Easier to share – although GigaDB still wants a compiled snapshot (i.e. full machine) DOI: 10.6084/m9.figshare.1330219
27
Short list of provisioning systems Vagrant Chef Salt Puppet Ansible Many more – see link for info DOI: 10.6084/m9.figshare.1330219 Source: http://bit.ly/1wrYiuI
28
Images: release ALL the images with OMERO “And now for something completely different”
29
NO Phenotyping with microCT doi:10.1186/2047-217X-2-14
30
NO Phenotyping with microCT doi:10.1186/2047-217X-3-6
31
Hosting Images Image LIMS Links to GigaDB Can handle most formats Web embedding View online, no need for software Open Source www.openmicroscopy.org/site/products/omero
33
OMERO: providing access to imaging data View, filter, measure raw images with direct links from journal article. See all image data, not just cherry picked examples. Download and reprocess.
34
OMERO: Adding value http://jcb-dataviewer.rupress.org/
35
The alternative......look but don't touch
36
Thanks for listening! Acknowledgements GigaTeam – Scott Edmunds – Peter Li – Chris Hunter – Jesse Xiao – Nicole Edmunds – Laurie Goodman Where to get these slides FigShare DOI:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.