Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: 10.6084/m9.figshare.1368774.

Slides:



Advertisements
Similar presentations
Rclis in vision and reality Thomas Krichel
Advertisements

More about Ruby Maciej Mensfeld Presented by: Maciej Mensfeld More about Ruby dev.mensfeld.pl github.com/mensfeld.
Configuration management
The Web Warrior Guide to Web Design Technologies
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with SAS - and other languages Søren Højsgaard Faculty of Agricultural.
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Rewarding Reproducibility and Method Publishing the GigaScience Way Scott Edmunds
Server-Side vs. Client-Side Scripting Languages
J4www/jea Week 3 Version Slide edits: nas1 Format of lecture: Assignment context: CRUD - “update details” JSP models.
Experiences with Reproducible Research in Various Facets of Signal Processing Research Patrick Vandewalle Philips Research, The Netherlands November 12,
Russell Taylor Lecturer in Computing & Business Studies.
Before class begins… Help us to assess this session and plan for future workshops Please complete the Advanced Refworks Pre-learning assessment at:
Introduction to JavaScript. Aim To enable you to write you first JavaScript.
1 CS428 Web Engineering Lecture 18 Introduction (PHP - I)
Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Literate programming with multiple languages Russel V. Lenth Department of Statistics.
Software workflows as research objects Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015 Slideshow-URL.
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Linux Operations and Administration
Proprietary & Confidential The Thread That Ties it All Together Voicethread and Discovery Education Jennifer Dorman denblogs.com/jendorman.
CS110/CS119 Introduction to Computing (Java)
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
Testing Virtual Machine Performance Running ATLAS Software Yushu Yao Paolo Calafiura LBNL April 15,
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Model a Container Runtime environment on Your Mac with VMware AppCatalyst VMworld Fabio Rapposelli
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Software Engineering in Robotics Packaging and Deployment of Systems Henrik I. Christensen –
Client Scripting1 Internet Systems Design. Client Scripting2 n “A scripting language is a programming language that is used to manipulate, customize,
Designing Interface Components. Components Navigation components - the user uses these components to give instructions. Input – Components that are used.
WIRESCRIPT1 WIRESCRIPT Web Interactive REview of Scientific Culture, Research, Innovation Policy and Technology.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Publishing and Sharing Sherif Farag University of North Carolina at Chapel Hill, USA.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Electronic Submission and Marking Tutors and Students Evaluate ‘Crocodoc’
Unit 1 – Improving Productivity
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation, use File > Save As > Tools (upper right) > Save Options > Embed.
Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
Title Presenter name Slideshow-URL Conference name Date.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
20-753: Fundamentals of Web Programming Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 15: Java Basics Fundamentals of Web Programming.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
ICS Software Development Environment Blaž Zupanc and Leandro Fernandez 19 February 2016.
Updating image To update the background image: Go to ‘View’ Select ‘Slide Master’ Select the page with the image Right click on the image and select ‘Change.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
Software Development Languages and Environments. Computer Languages Just as there are many human languages, there are many computer programming languages.
Canadian Bioinformatics Workshops
bitcurator-access-webtools Quick Start Guide
ONAP on Vagrant for ONAPers
Development Environment
Licenses and Interpreted Languages for DHTC Thursday morning, 10:45 am
Basic 1960s It was designed to emphasize ease of use. Became widespread on microcomputers It is relatively simple. Will make it easier for people with.
Publishing software and data
Building A Web-based University Archive
University of Wisconsin – Stout
Drupal VM and Docker4Drupal For Drupal Development Platform
Drupal VM and Docker4Drupal as Consistent Drupal Development Platform
Introduction to Computers and Python
denblogs.com/jendorman
Chapter 7 –Implementation Issues
bitcurator-access-webtools Quick Start Guide
Presentation transcript:

Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: /m9.figshare DOI: /m9.figshare

Today’s message Tools that fit with GigaDB – General purpose Research Object store Enhancing – Accessibility – Reproducibility Of some of your research objects – Software – images DOI: /m9.figshare

Problems with scientific software - reproducibility DOI: /m9.figshare

Measuring software reproducibility Systematic study: 515 papers (429 conference, 86 journal) <30% reproducible DOI: /m9.figshare

Measuring software reproducibility DOI: /m9.figshare

Reasons for failure “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” DOI: /m9.figshare

Cost of failure Waste time Waste money – Ioannidis 2014 – 85% resources wasted Frustrating Distrust DOI: /journal.pmed DOI: /m9.figshare

Literate programming - KnitR DOI: /m9.figshare

Literate programming Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do. – Donald E. Knuth, Literate Programming, 1984 DOI: /m9.figshare

Literate programming options See listing: 3/1/19 3/1/19 – R: KnitR, Sweave, R-Markdown – Javascript: Tangle, Active Markdown (CoffeeScript) – Python: Ipython Notebooks – iReport links this functionality for Galaxy DOI: /m9.figshare

KnitR is versatile R Python Ruby Haskell Perl SAS Coffeescript.txt LaTeX HTML D3.js R Markdown HTML5 slides Command line Any text? WordPress DOI: /m9.figshare

KnitR – how does it work? Code chunks – Basic text (or latex or markdown), interrupted by ‘chunks’ of code For latex, similar to Sweave …some text \Sexpr{rfunc(var)} more text… …some text >= Some Process this combined text/code with knit() in R DOI: /m9.figshare

KnitR uses: easy to explain DOI: /m9.figshare

KnitR uses: reproducible analysis Can string different tools/languages together Stores parameters Just like a pipeline/workflow system – E.g. galaxy, taverna, Knime But also: codifies your figures… DOI: /m9.figshare

KnitR uses – codified figures Classic problems: No description of error bars No description of distributions Admittedly this could be fixed by ‘proper’ peer review Source code: DOI: /m9.figshare

KnitR uses: codified figures Code can be found quickly Using text as markers Plot can be altered – 1 line of code New visualisation produced instantaneously Better evaluation of results Source code: DOI: /m9.figshare

GigaScience KnitR example “This article is an example of a literate programming document. It has been created in R using the knitr package. Figures and tables in this paper are generated dynamically as the document is compiled. Several R packages are required to run the analysis. Materials are archived in the Gigascience database” DOI: / X-3-3 DOI: /m9.figshare

Environment wrappers - VMs DOI: /m9.figshare

Measuring software reproducibility DOI: /m9.figshare

Your environment How hard would it be to start from scratch? What if you move from Ubuntu to Centos? Or just upgrade? Dependencies / Versions System settings Hard for you, horrendous for others! DOI: /m9.figshare

Share your environment Virtual machine – Copy your exact environment – If it works for you, it works for anyone – Reproducibility, frozen in time DOI: / X-3-23 DOI: /m9.figshare

Share your environment Docker – ‘light’ vm – Discrete unit of code+environment – Can be called from command line – Can be linked together New possibilities e.g. nucleotid.es – Benchmarking -> “data-driven peer-review”? DOI: /m9.figshare

Share your environment Some concerns: – harmful.html harmful.html – VM = black box? – Docker == black box! Solution-> codify the environment DOI: /m9.figshare

Codify your environment Provisioning scripts are ‘research objects’ Improves adaptability (easier to recode for alternative OS etc) Builds in extra documentation Easier to share – although GigaDB still wants a compiled snapshot (i.e. full machine) DOI: /m9.figshare

Short list of provisioning systems Vagrant Chef Salt Puppet Ansible Many more – see link for info Source: DOI: /m9.figshare

Images: release ALL the images with OMERO “And now for something completely different” DOI: /m9.figshare

NO Phenotyping with microCT doi: / X-2-14 DOI: /m9.figshare

NO Phenotyping with microCT doi: / X-3-6 DOI: /m9.figshare

Hosting Images Image LIMS MetaData!!! Can handle most formats Web embedding View online, no need for software Open Source DOI: /m9.figshare

DOI: /m9.figshare

OMERO: providing access to imaging data View, filter, measure raw images with direct links from journal article. See all image data, not just cherry picked examples. Download and reprocess. DOI: /m9.figshare

OMERO: Adding value DOI: /m9.figshare

The alternative......look but don't touch DOI: /m9.figshare

Thanks for listening! Acknowledgements GigaTeam – Scott Edmunds – Peter Li – Chris Hunter – Jesse Xiao – Nicole Edmunds – Laurie Goodman Where to get these slides FigShare DOI: – /m9.figshare DOI: /m9.figshare