Transforming Science Through Data-driven Discovery Bringing your Bioinformatics tools to CyVerse’s Discovery Environment using Docker Upendra Kumar Devisetty.

Slides:



Advertisements
Similar presentations
Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
Advertisements

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Customized cloud platform for computing on your terms !
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Extending the Discovery Environment: Tool Integration and Customization.
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Customized cloud platform for computing on your terms ! Nirav Merchant
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Network for Integrating Bioinformatics into Life Sciences Education April, 2014.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
IPlant Discovery Environment An Overview. What is it? The Discovery Environment has been described in many ways… “It’s a virtual workbench…” “It’s where.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative
Discovery Environment Tool Integration High Level Overview.
Unleash your inner (data) scientist : The ability and audacity to scale your science with extensible cyberinfrastructure Nirav Merchant The University.
Bringing your favorite analysis applications to iPlant using Docker containers Nirav Merchant
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Using Docker in a CyVerse World The main portion of this tutorial should take about 45 minutes to go through, and assumes you have already gone through.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Canadian Bioinformatics Workshops
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Joslynn S. Lee, PhD, Data Science Educator Cold Spring Harbor Laboratory, DNA Learning Center Transforming Science Through Data-driven Discovery.
Canadian Bioinformatics Workshops
Enhancements to Galaxy for delivering on NIH Commons
Accessing the VI-SEEM infrastructure
Scaling Compute with R in CyVerse
CyVerse Tools and Services
Budget JRA2 Beneficiaries Description TOT Costs incl travel
Tools and Services Workshop
University of Chicago and ANL
Customized cloud platform for computing on your terms !
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
A Few Questions Before We Begin
Tools and Services Workshop Overview of Atmosphere
Tools and Services Workshop
Stylelabs Develops the Marketing Content Hub to Offer Enterprises a High-End Marketing Content Management Platform Based on Microsoft Azure MICROSOFT AZURE.
Data uploading and sharing with CyVerse
SRA Submission Pipeline
Cyberinfrastructure for the Life Sciences
Module 01 ETICS Overview ETICS Online Tutorials
GWAS/QTL Apps Overview
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Presentation transcript:

Transforming Science Through Data-driven Discovery Bringing your Bioinformatics tools to CyVerse’s Discovery Environment using Docker Upendra Kumar Devisetty Science Informatician

Outline  Overview of CyVerse  Overview of the CyVerse Discovery Environment (DE)  Overview of Docker technology  Bringing tools to Discovery Environment using Docker How can Docker help bringing Bioinformatics tools to DE Benefits of running your software in DE Process of Dockerizing tools in DE How to get started Word of caution

Evolution of CyVerse iPlant 2008 Empowering a New Plant Biology iPlant 2013 Cyberinfrastructure for Life Science CyVerse 2016 Transforming Science Through Data-Driven Discovery

We are funded by the National Science Foundation We are your colleagues and collaborators! $100 Million in investment Freely available to the community Spur national/international collaboration Cite CyVerse: CyVerse.org/acknowledge-cite-cyverse DBI and DBI Overview of CyVerse

CyVerse 2016 Transforming Science Through Data-Driven Discovery Vision: Transforming science through data-driven discovery Mission: Design, develop, deploy, and expand a national cyberinfrastructure for life science research, and train scientists in its use More than 40K users, PB of data, and hundreds of publications, courses, and discoveries

What is cyberinfrastructure? Platforms, tools, datasets Storage and compute Training and support HPC People CI provides solutions to the challenges of large-scale computational science were unapproachable because the computational requirements were too large, too complex, or simply unknown.

CyVerse supports all domains of life science Plant / Microbial Animal Biomedical Ecological/Climate CyVerse is built for data

CyVerse architecture Ready to use Platforms Foundational Capabilities Established CI Components Extensible Services Ease of Use Flexibility

BisQue DNA Subway Science APIsData Store Discovery Environment Atmosphere CyVerse products From plant science, to life science, and beyond… The resources you need to share and manage data with your lab, colleagues and community Hundreds of bioinformatics apps in an easy-to-use interface Cloud computing for the life sciences Fully customize CyVerse resources Educational workflows for Genomes, DNA Barcoding, RNA-Seq Image analysis, management, and metadata

Discovery Environment Hundreds of bioinformatics apps in an easy-to-use interface A platform that can run almost any bioinformatics application Seamlessly integrated with data and high performance computing User extensible – add your own applications bioinformatics workflow—data management, analysis, sharing large datasets

Access your computational science through a single portal Discovery Environment Overview

Upload / Download files and folders Share files via URL (Public Links) Share files/folders with other users Data Manage data Discovery Environment Overview

Apps Run hundreds of bioinformatics Apps Build automated workflows Modify Apps or integrate new ones Analyze data and customize Applications Discovery Environment Overview

Analyses Monitor job status and find results Cancel jobs or re-launch jobs Detailed job history View history, find results, reproduce analyses, optimize parameters Discovery Environment Overview

Get Science Done Reproducibility Productivity Use hundreds of bioinformatics Apps without the command line Add your own applications – an extensible, scalable platform Create and publish Apps and workflows so anyone can use them Analysis history and provenance – “avoid forensic bioinformatics” High-performance computing – not dependent on your hardware Manage a secure data repository and share data easily Benefits

User perspectives and possible applications Discovery Environment Overview Bench Scientist Bioinformatician Does most of his data uploads/downloads/sharing here He pushes results from his lab’s workflow into a common folder Installed an HPC application here so that anyone can use it Creates custom applications with default parameters exposed Developed a workflow to QC and Filter reads for his users Teaches about genome assembly with examples in the DE Core Facilities Images from personas based on: Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies PLOS Biology DOI: /journal.pcbi

17 + = Simple Formula for Success

The Reality Excel, R PERL Python ARCGIS Java Ruby Fortran C C# C++ Matlab etc. Excel, R PERL Python ARCGIS Java Ruby Fortran C C# C++ Matlab etc. Amazon Azure Rackspace Campus HPC XSEDE Etc. Amazon Azure Rackspace Campus HPC XSEDE Etc. and lots of glue…..

+ = Simple Formula

Docker is a type of virtualization for software distribution, has revolutionized the way in which scientific software and all dependencies can be packaged, distributed, and deployed. Docker makes the complex and time-consuming installation procedures needed for scientific software a one-time process. Docker enables platform-independent installation, easy versioning of software and redeployment, and reproducibility across environments and versions. Docker is an ideal candidate for the deployment of software on different compute environments (XSEDE, Amazon AWS, etc.)

Container technology: What is it about ? Allows you to create a self contained package that contains: The specific operating system version (say Ubuntu ) Your application All of the parts your application needs (such as libraries and other dependencies) Ability to share this with other users This single package can now be run on any computing system that supports Container technology (regardless of its own version of operating system)

How does it work together?

CyVerse has adopted Docker for integrating software that run in the CyVerse DE’s Compute Cluster (Condor). Condor looks for a machine that matches your criteria (RAM, CPU, Disk Space) Once it find a suitable match: Data placement container runs and brings the data you want to operate on to that node from data store Your app (Docker container) runs (with the data visible to it as union file system) Date placement container for returning data data back to data store What happens when you run a job in DE?

The Process for Dockerizing Tools in DE

Dockerfile  Docker image  DE app User CyVerse Staff The Process for Dockerizing Tools in DE

1.Integrating a Dockerized tool into the DE enables users to begin creating apps built on the tool. 2.Because Dockerized apps use fewer resources, their analyses process more quickly. 3.Compared to the previous method for tool integration in the DE, this method streamlines the process and makes it more likely that the final DE app will function as the user intended. 4.It also increases the likelihood that more complicated and difficult to install software can be used in the DE. 5.You can use your Dockerized apps in the CyVerse Discovery Environment and Atmosphere. 6.If you are a developer or just write a nice script occasionally, or teach classes, or have a collaborative project, or are publishing a paper that uses a specific workflow: a.Share a specific app. b.Share a specific version of an app. c.Share a whole analysis pipeline. Benefits of running your software in DE

Get Docker setup on your local machine (win,mac,linux) or use Atmosphere Plan your steps i.e what you want to do Carry out those steps and verify that things work Create a Docker file file from those steps Submit the request for a “new tool” Once you hear back design your interface (and profit) Detailed instructions with videos, manuals, documentation F1000 publication: Focus forum webinar: CyVerse wiki: How to get started?

Containers are very powerful and has many bells and whistles (only choose parts that you really need !) Avoid storing data inside of containers Keep containers light and nimble, build on provided base images from trusted source (iPlant prefers Ubuntu 14.X and CentOS 7.X from Docker hub) Do not trust a app without Docker file (its not easy to recreate and a blackbox, bad for reproducibility ) Word of caution

Transforming Science Through Data-driven Discovery Parker Antin Nirav Merchant Eric Lyons Matt Vaughn Doreen Ware Dave Micklos CyVerse is supported by the National Science Foundation under Grant No. DBI and DBI CyVerse Executive Team