Download presentation
Presentation is loading. Please wait.
Published byNoel Park Modified over 9 years ago
1
Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774 DOI: 10.6084/m9.figshare.1368774
2
Today’s message Tools that fit with GigaDB – General purpose Research Object store Enhancing – Accessibility – Reproducibility Of some of your research objects – Software – images DOI: 10.6084/m9.figshare.1368774
3
Problems with scientific software - reproducibility DOI: 10.6084/m9.figshare.1368774
4
Measuring software reproducibility Systematic study: 515 papers (429 conference, 86 journal) <30% reproducible http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774
5
Measuring software reproducibility http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774
6
Reasons for failure “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774
7
Cost of failure Waste time Waste money – Ioannidis 2014 – 85% resources wasted Frustrating Distrust DOI: 10.1371/journal.pmed.1001747 DOI: 10.6084/m9.figshare.1368774
8
Literate programming - KnitR DOI: 10.6084/m9.figshare.1368774
9
Literate programming Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do. – Donald E. Knuth, Literate Programming, 1984 DOI: 10.6084/m9.figshare.1368774
10
Literate programming options See listing: http://www.gigasciencejournal.com/content/ 3/1/19 http://www.gigasciencejournal.com/content/ 3/1/19 – R: KnitR, Sweave, R-Markdown – Javascript: Tangle, Active Markdown (CoffeeScript) – Python: Ipython Notebooks – iReport links this functionality for Galaxy DOI: 10.6084/m9.figshare.1368774
11
KnitR is versatile R Python Ruby Haskell Perl SAS Coffeescript.txt LaTeX HTML D3.js R Markdown HTML5 slides Command line Any text? WordPress DOI: 10.6084/m9.figshare.1368774
12
KnitR – how does it work? Code chunks – Basic text (or latex or markdown), interrupted by ‘chunks’ of code For latex, similar to Sweave …some text \Sexpr{rfunc(var)} more text… …some text >= Some code @ Process this combined text/code with knit() in R DOI: 10.6084/m9.figshare.1368774
13
KnitR uses: easy to explain http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774
14
KnitR uses: reproducible analysis Can string different tools/languages together Stores parameters Just like a pipeline/workflow system – E.g. galaxy, taverna, Knime But also: codifies your figures… DOI: 10.6084/m9.figshare.1368774
15
KnitR uses – codified figures Classic problems: No description of error bars No description of distributions Admittedly this could be fixed by ‘proper’ peer review Source code: http://bit.ly/1NQZlHh DOI: 10.6084/m9.figshare.1368774
16
KnitR uses: codified figures Code can be found quickly Using text as markers Plot can be altered – 1 line of code New visualisation produced instantaneously Better evaluation of results Source code: http://bit.ly/1NQZlHh DOI: 10.6084/m9.figshare.1368774
17
GigaScience KnitR example “This article is an example of a literate programming document. It has been created in R using the knitr package. Figures and tables in this paper are generated dynamically as the document is compiled. Several R packages are required to run the analysis. Materials are archived in the Gigascience database” DOI:10.1186/2047-217X-3-3 DOI: 10.6084/m9.figshare.1368774
18
Environment wrappers - VMs DOI: 10.6084/m9.figshare.1368774
19
Measuring software reproducibility http://reproducibility.cs.arizona.edu DOI: 10.6084/m9.figshare.1368774
20
Your environment How hard would it be to start from scratch? What if you move from Ubuntu to Centos? Or just upgrade? Dependencies / Versions System settings Hard for you, horrendous for others! DOI: 10.6084/m9.figshare.1368774
21
Share your environment Virtual machine – Copy your exact environment – If it works for you, it works for anyone – Reproducibility, frozen in time DOI:10.1186/2047-217X-3-23 DOI: 10.6084/m9.figshare.1368774
22
Share your environment Docker – ‘light’ vm – Discrete unit of code+environment – Can be called from command line – Can be linked together New possibilities e.g. nucleotid.es – Benchmarking -> “data-driven peer-review”? http://nucleotid.es/ DOI: 10.6084/m9.figshare.1368774
23
Share your environment Some concerns: – http://ivory.idyll.org/blog/vms-considered- harmful.html http://ivory.idyll.org/blog/vms-considered- harmful.html – VM = black box? – Docker == black box! Solution-> codify the environment DOI: 10.6084/m9.figshare.1368774
24
Codify your environment Provisioning scripts are ‘research objects’ Improves adaptability (easier to recode for alternative OS etc) Builds in extra documentation Easier to share – although GigaDB still wants a compiled snapshot (i.e. full machine) DOI: 10.6084/m9.figshare.1368774
25
Short list of provisioning systems Vagrant Chef Salt Puppet Ansible Many more – see link for info Source: http://bit.ly/1wrYiuI DOI: 10.6084/m9.figshare.1368774
26
Images: release ALL the images with OMERO “And now for something completely different” DOI: 10.6084/m9.figshare.1368774
27
NO Phenotyping with microCT doi:10.1186/2047-217X-2-14 DOI: 10.6084/m9.figshare.1368774
28
NO Phenotyping with microCT doi:10.1186/2047-217X-3-6 DOI: 10.6084/m9.figshare.1368774
29
Hosting Images Image LIMS MetaData!!! Can handle most formats Web embedding View online, no need for software Open Source www.openmicroscopy.org/site/products/omero DOI: 10.6084/m9.figshare.1368774
30
www.openmicroscopy.org/site/products/omero DOI: 10.6084/m9.figshare.1368774
31
OMERO: providing access to imaging data View, filter, measure raw images with direct links from journal article. See all image data, not just cherry picked examples. Download and reprocess. DOI: 10.6084/m9.figshare.1368774
32
OMERO: Adding value http://jcb-dataviewer.rupress.org/ DOI: 10.6084/m9.figshare.1368774
33
The alternative......look but don't touch DOI: 10.6084/m9.figshare.1368774
34
Thanks for listening! Acknowledgements GigaTeam – Scott Edmunds – Peter Li – Chris Hunter – Jesse Xiao – Nicole Edmunds – Laurie Goodman Where to get these slides FigShare DOI: – 10.6084/m9.figshare.1368774 http://bit.ly/1JmnRiU DOI: 10.6084/m9.figshare.1368774
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.