Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: /m9.figshare DOI: /m9.figshare
Today’s message Tools that fit with GigaDB – General purpose Research Object store Enhancing – Accessibility – Reproducibility Of some of your research objects – Software – images DOI: /m9.figshare
Problems with scientific software - reproducibility DOI: /m9.figshare
Measuring software reproducibility Systematic study: 515 papers (429 conference, 86 journal) <30% reproducible DOI: /m9.figshare
Measuring software reproducibility DOI: /m9.figshare
Reasons for failure “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” DOI: /m9.figshare
Cost of failure Waste time Waste money – Ioannidis 2014 – 85% resources wasted Frustrating Distrust DOI: /journal.pmed DOI: /m9.figshare
Literate programming - KnitR DOI: /m9.figshare
Literate programming Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do. – Donald E. Knuth, Literate Programming, 1984 DOI: /m9.figshare
Literate programming options See listing: 3/1/19 3/1/19 – R: KnitR, Sweave, R-Markdown – Javascript: Tangle, Active Markdown (CoffeeScript) – Python: Ipython Notebooks – iReport links this functionality for Galaxy DOI: /m9.figshare
KnitR is versatile R Python Ruby Haskell Perl SAS Coffeescript.txt LaTeX HTML D3.js R Markdown HTML5 slides Command line Any text? WordPress DOI: /m9.figshare
KnitR – how does it work? Code chunks – Basic text (or latex or markdown), interrupted by ‘chunks’ of code For latex, similar to Sweave …some text \Sexpr{rfunc(var)} more text… …some text >= Some Process this combined text/code with knit() in R DOI: /m9.figshare
KnitR uses: easy to explain DOI: /m9.figshare
KnitR uses: reproducible analysis Can string different tools/languages together Stores parameters Just like a pipeline/workflow system – E.g. galaxy, taverna, Knime But also: codifies your figures… DOI: /m9.figshare
KnitR uses – codified figures Classic problems: No description of error bars No description of distributions Admittedly this could be fixed by ‘proper’ peer review Source code: DOI: /m9.figshare
KnitR uses: codified figures Code can be found quickly Using text as markers Plot can be altered – 1 line of code New visualisation produced instantaneously Better evaluation of results Source code: DOI: /m9.figshare
GigaScience KnitR example “This article is an example of a literate programming document. It has been created in R using the knitr package. Figures and tables in this paper are generated dynamically as the document is compiled. Several R packages are required to run the analysis. Materials are archived in the Gigascience database” DOI: / X-3-3 DOI: /m9.figshare
Environment wrappers - VMs DOI: /m9.figshare
Measuring software reproducibility DOI: /m9.figshare
Your environment How hard would it be to start from scratch? What if you move from Ubuntu to Centos? Or just upgrade? Dependencies / Versions System settings Hard for you, horrendous for others! DOI: /m9.figshare
Share your environment Virtual machine – Copy your exact environment – If it works for you, it works for anyone – Reproducibility, frozen in time DOI: / X-3-23 DOI: /m9.figshare
Share your environment Docker – ‘light’ vm – Discrete unit of code+environment – Can be called from command line – Can be linked together New possibilities e.g. nucleotid.es – Benchmarking -> “data-driven peer-review”? DOI: /m9.figshare
Share your environment Some concerns: – harmful.html harmful.html – VM = black box? – Docker == black box! Solution-> codify the environment DOI: /m9.figshare
Codify your environment Provisioning scripts are ‘research objects’ Improves adaptability (easier to recode for alternative OS etc) Builds in extra documentation Easier to share – although GigaDB still wants a compiled snapshot (i.e. full machine) DOI: /m9.figshare
Short list of provisioning systems Vagrant Chef Salt Puppet Ansible Many more – see link for info Source: DOI: /m9.figshare
Images: release ALL the images with OMERO “And now for something completely different” DOI: /m9.figshare
NO Phenotyping with microCT doi: / X-2-14 DOI: /m9.figshare
NO Phenotyping with microCT doi: / X-3-6 DOI: /m9.figshare
Hosting Images Image LIMS MetaData!!! Can handle most formats Web embedding View online, no need for software Open Source DOI: /m9.figshare
DOI: /m9.figshare
OMERO: providing access to imaging data View, filter, measure raw images with direct links from journal article. See all image data, not just cherry picked examples. Download and reprocess. DOI: /m9.figshare
OMERO: Adding value DOI: /m9.figshare
The alternative......look but don't touch DOI: /m9.figshare
Thanks for listening! Acknowledgements GigaTeam – Scott Edmunds – Peter Li – Chris Hunter – Jesse Xiao – Nicole Edmunds – Laurie Goodman Where to get these slides FigShare DOI: – /m9.figshare DOI: /m9.figshare