Reproducible Bioinformatics Bioconda and BioContainers enabling sustainable bioinformatic infrastructure Yasset Perez-Riverol PhD. github: github.com/ypriverol twitter: @ypriverol Bioconda/BioContainers community
Outline Reproducibility in Science. What is a containers? & Why they are so popular? BioContainers Architecture Future Directions
Bioinformatics Software: Reproducibility/Usability Challenge Publication Software Research Dependency Issues Versioning Testing/Integration ? My Data ? Publication Software
Learning from some communities: Bioconductor Publication Software: R Package Research Implementation of Guidelines: Versioning Testing Dependency management Documentation My Data ? Publication Software
Matrix of Hell
Containers Solution.
Docker Architecture
Why is so popular Build once, run anywhere A clean, safe & portable runtime environment for your app. No worries about missing dependencies, packages & other pain points during subsequent deployments. Run each app in its own isolated container Automate testing, integration, packaging…anything you can script A VM without the overhead of a VM
Current BioContainers Architecture
Dockerfile Container # Base Image FROM biocontainers/biocontainers:latest # Metadata LABEL base.image="biocontainers:latest" LABEL version="3" LABEL software="Comet" LABEL software.version="2016012" LABEL description="an open source tandem mass spectrometry sequence database search tool" LABEL website="http://comet-ms.sourceforge.net/" LABEL documentation="http://comet-ms.sourceforge.net/parameters/parameters_2016010/" LABEL license="http://comet-ms.sourceforge.net/" LABEL tags="Proteomics" # Maintainer MAINTAINER Felipe da Veiga Leprevost <felipe@leprevost.com.br> USER biodocker RUN ZIP=comet_binaries_2016012.zip && \ wget https://github.com/BioDocker/software-archive/releases/download/Comet/$ZIP -O /tmp/$ZIP && \ unzip /tmp/$ZIP -d /home/biodocker/bin/Comet/ && \ chmod -R 755 /home/biodocker/bin/Comet/* && \ rm /tmp/$ZIP RUN mv /home/biodocker/bin/Comet/comet_binaries_2016012/comet.2016012.linux.exe /home/biodocker/bin/Comet/comet ENV PATH /home/biodocker/bin/Comet:$PATH WORKDIR /data/ CMD ["comet"]
Dockerfile Container # Base Image FROM biocontainers/biocontainers:latest # Metadata LABEL base.image="biocontainers:latest" LABEL version="3" LABEL software="Comet" LABEL software.version="2016012" LABEL description="an open source tandem mass spectrometry sequence database search tool" LABEL website="http://comet-ms.sourceforge.net/" LABEL documentation="http://comet-ms.sourceforge.net/parameters/parameters_2016010/" LABEL license="http://comet-ms.sourceforge.net/" LABEL tags="Proteomics" # Maintainer MAINTAINER Felipe da Veiga Leprevost <felipe@leprevost.com.br> USER biodocker RUN ZIP=comet_binaries_2016012.zip && \ wget https://github.com/BioDocker/software-archive/releases/download/Comet/$ZIP -O /tmp/$ZIP && \ unzip /tmp/$ZIP -d /home/biodocker/bin/Comet/ && \ chmod -R 755 /home/biodocker/bin/Comet/* && \ rm /tmp/$ZIP RUN mv /home/biodocker/bin/Comet/comet_binaries_2016012/comet.2016012.linux.exe /home/biodocker/bin/Comet/comet ENV PATH /home/biodocker/bin/Comet:$PATH WORKDIR /data/ CMD ["comet"]
Tool deployment & sustainability in science Programming language agnostic OS independent No root privileges needed Management of multiple version HPC and Cloud compatible easy to maintain What is needed?
Tool deployment & sustainability in science Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them.
BioContainers: Mulled based Containers Meeting, August 2017
But the new cool kid is called Containers Get all the other nice technologies for free rkt singularity modules Meeting, August 2017
How to find a container http://biocontainers.pro/registry/#/ 17 Meeting, August 2017 17
Name spaces Namespace for Dockerfile based containers: docker pull biocontainers/blast Namespace for Dockerfile free based containers: docker pull quay.io/biocontainers/bedtools
Who is using it?
What is next: MultiContainers http://biocontainers.pro/multi-package-containers/
Some Numbers Relevant Links More than 2000 Containers. Use in production by: Galaxy, Phenomenal2020, Cyverse, OSG. 210 issues discussed. More than 30 Contributors. Relevant Links http://biocontainers.pro http://biocontainers.pro/registry http://github.com/BioContainers/