University of Chicago and ANL

Slides:



Advertisements
Similar presentations
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
© 2012 IBM Corporation 1 IBM Cognos 10 family Analytics in the hands of everyone Address all your analytic needs Report, Analyze, Model, Plan and Collaborate.
DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S.Deogun University of Nebraska-Lincoln Evaluating Distributed.
HP Quality Center Overview.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
Accelerate Business Success With CRM CRM Interoperability.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Globus Genomics – Science as a Service for large scale NGS analysis
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
-- Don Preuss NCBI/NLM/NIH
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Transforming video & photo collections into valuable resources John Waugaman President - Tygart Technology, Inc.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Enhancements to Galaxy for delivering on NIH Commons
CompSci 280 S Introduction to Software Development
Accessing the VI-SEEM infrastructure
Dispatcher Phoenix Is…
Computing Clusters, Grids and Clouds Globus data service
Clouds , Grids and Clusters
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Geoffrey Fox, Shantenu Jha, Dan Katz, Judy Qiu, Jon Weissman
CyVerse Discovery Environment
Pasquale Pagano CNR, Italy
INTAROS WP5 Data integration and management
Pipeline Execution Environment
Joseph JaJa, Mike Smorul, and Sangchul Song
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
An easier path? Customizing a “Global Solution”
Exploring Azure Event Grid
SRA Submission Pipeline
XSEDE’s Campus Bridging Project
Tools of Software Development
Azure Event Grid with Custom Events
Sky Computing on FutureGrid and Grid’5000
EOSCpilot All Hands Meeting 8 March 2018 Pisa
OGCE Portal Applications for Grid Computing
Enterprise Program Management Office
JOINED AT THE HIP: DEVSECOPS AND CLOUD-BASED ASSETS
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Overview of Workflows: Why Use Them?
Sky Computing on FutureGrid and Grid’5000
What is UiPATH? For more details visit this link online-training.
Getting Started with GridLAB-D on the Cloud
Presentation transcript:

University of Chicago and ANL Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL

Outline Challenges in Sequencing Analysis Proposed Approach Using Globus Genomics Example Collaborations Relevance to XSEDE Q&A

Challenges in Sequencing Analysis Data Movement and Access Challenges Manually move the data to the Compute node Install all the tools required for the Analysis BWA, Picard, GATK, Filtering Scripts, etc. Shell scripts to sequentially execute the tools Public Data Manually modify the scripts for any change Error Prone, difficult to keep track, messy.. FTP, SCP, HTTP FTP, SCP, HTTP Difficult to maintain and transfer the knowledge Sequencing Centers Research Lab Storage (Re)Run Script Install Modify Picard GATK Fastq Ref Genome Alignment Variant Calling SCP Local Cluster/ Cloud Seq Center Data is distributed in different locations Research labs need access to the data for analysis Be able to Share data with other researchers/collaborators Inefficient ways of data movement Data needs to be available on the local and Distributed Compute Resources Local Clusters, Cloud, Grid How do we analyze this Sequence Data Once we have the Sequence Data Manual Data Analysis

Globus Online Endpoints Globus Genomics Galaxy Based Workflow Management System Globus Genomics Public Data Globus Online Integrated within Galaxy Web-based UI Drag-Drop workflow creations Easily modify Workflows with new tools Globus Online Endpoints FTP, SCP, others Picard GATK Fastq Ref Genome Alignment Variant Calling FTP, SCP, HTTP Galaxy Data Libraries Sequencing Centers Globus Online Provides a High-performance Fault-tolerant Secure file transfer Service between all data-endpoints Research Lab Storage SCP FTP, SCP Local Cluster/ Cloud Globus Genomics on Amazon EC2 Analytical tools are automatically run on the scalable compute resources when possible Seq Center Data Management Data Analysis

Globus Genomics Workflows can be easily defined and automated with integrated Galaxy Platform capabilities Data movement is streamlined with integrated Globus file-transfer functionality Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure

Additional Capabilities Professionally managed and supported platform Best practice pipelines Enhanced workbench with breadth of analytic tools Technical support and bioinformatics consulting Access to pre-integrated end-points for reliable and high-performance data transfer (e.g. Broad Institute, Perkin Elmer, etc.) Cost-effective solution with subscription-based pricing

Globus Genomics – A flexible, scalable, simplified analysis platform Accessibility Unified Web-interface for obtaining genomic data and applying computational tools to analyze the data Easily integrate your own tools and scripts for analysis (CLI based tools) Collection of tools (Tools Panel) that reflect good practices and community insights Access every step of analysis and intermediate results: View, Download, Visualize, Reuse (History Panel) Data and Tools Reproducibility Track provenance and ensure repeatability of each analysis step: input datasets, tools used, parameter values, and output datasets Annotate each step or collection of steps to track and reproduce results Intuitive Workflow Editor to create or modify complex workflows and use them as templates – Reusable and Reproducible Templates Transparency Publish and share metadata, histories, and workflows at multiple levels Store public and generated datasets as Data Libraries – e.g: hg19 Ref Genome Shared datasets and workflows can be imported by other users for reuse Publish Globus Online Integration Access GO Endpoints and transfer data from within Galaxy UI and into Galaxy workspace Leverage local cluster or cloud based scalable computational resources for parallelizing the tools

Example Collaborations Dobyns Lab Backround: Investigate the nature and causes of a wide range of human developmental brain disorders Approach: Replaced manual analysis with Globus Genomics Results: Achieved greater than 10X speed-up in analysis of exome data Future Plans: Leverage scale-out capability of Globus Genomics by running increasingly larger data sets

Relevance to XSEDE XSEDE’s Mission Statement accelerat[ing] open scientific discovery by enhancing the productivity of researchers, engineers, and scholars and making advanced digital resources easier to use.” Key XSEDE Goals That Globus Genomics Addresses “Deepen and extend the impact of eScience infrastructure on research and education; in particular, to reach communities that have not previously made use of it; and Expand the environment through the integration of new capabilities and resources such as instruments and data repositories based on the identified needs of the community.”

Relevance to XSEDE (Cont..) Globus Genomics leverages an XSEDE service Globus Transfer for data movement Globus Nexus for identity management Globus Groups for group-based access management Integrates advanced digital resources sequencing centers, a commercial cloud provider, and NGS analysis pipelines Reduces the cost and complexity of scientific discovery for a new community (NGS researchers) who have not historically made much use of advanced eScience infrastructures.

XSEDE Vs AWS Globus Genomics achieves these goals without making use of XSEDE supercomputers Choice to use Amazon cloud services rather than XSEDE systems for Globus Genomics computations is deliberate scales at which our target users operate today, the costs associated with the use of Amazon cloud computers are modest, and Amazon’s on-demand, pay-as-you-go storage and computing capabilities match user needs better than the proposal- and queue-based access policies provided by XSEDE computers. We plan to explore using XSEDE resources to execute Globus Genomics pipelines

Acknowledgments This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02-06CH11357. We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments. The Globus Genomics and Globus Online teams at University of Chicago and Argonne National Laboratory

For more information More information on Globus Genomics and to sign up: www.globus.org/genomics More information on Globus Online: www.globusonline.org Questions? Thank you!