Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.

Slides:



Advertisements
Similar presentations
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Advertisements

DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
10 de abril de 2014 Cloud Services for Projects in Bioinformatics: Technical Considerations and Business Fernando Barraza Omicsco Universidad de San Buenaventura.
Data Grids Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
XSEDE 13 July 24, Galaxy Team: PSC Team:
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Globus Genomics – Science as a Service for large scale NGS analysis
BIRN Update Carl Kesselman Professor of Industrial and Systems Engineering Information Sciences Institute Fellow Viterbi School of Engineering University.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Moving Large Amounts of Data Rob Schuler University of Southern California.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
-- Don Preuss NCBI/NLM/NIH
Plan  Introduction  What is Cloud Computing?  Why is it called ‘’Cloud Computing’’?  Characteristics of Cloud Computing  Advantages of Cloud Computing.
Wikispaces Private Label for Higher Education. Unlimited wikis, unlimited pages, unlimited possibilities Popular use cases: Collaborative coursework E-portfolios.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Clinical Measures Genotype Local Storage BIRN Rack SRB MCAT HID/ XNAT/ LONI DUP Calibration & Analysis Tools GRID Portal Mediator Institution A BIRN Rack.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
The User Perspective Michelle Osmond. The Research Challenge Molecular biology, biochemistry, plant biology, genetics, toxicology, chemistry, and more.
What is Big Query?.
Biomedical Informatics Research Network BIRN Workflow Portal.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
1 Overall Architectural Design of the Earth System Grid.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
Flight is a SaaS Solution that Accelerates the Secure Transfer of Large Files and Data Sets Into and Out of Microsoft Azure Blob Storage MICROSOFT AZURE.
Erin Metcalfe Director of Communications (1003) Charles Brassard, P. Eng., M.B.A. President of Laubrass Inc (1001)
Trusted by over 500 companies worldwide Founded in 1993 “Many of our quality managers say that since using a handheld, they simply wouldn’t consider doing.
Tools for Portals, Search, Assimilation, Provenance Computing Infrastructure for Science Individual University and Lab PIs National and Int’l collabs Research.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
Trusted by over 500 companies worldwide Founded in 1993 “Many of our quality managers say that since using a handheld, they simply wouldn’t consider doing.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Application Hosting Services — Enabling Science 2.0 —
Provenance Work Plans and Deliverables October 2005  Data Provenance information in SRB and HID Test upload to SRB (March) Give DB working group formal.
Canadian Bioinformatics Workshops
NA-MIC National Alliance for Medical Image Computing Core 1b – Engineering Data Management Daniel Marcus Washington University.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
BIRN: Where We Have Been, Where We are Going. Carl Kesselman BIRN Principal Investigator Professor of Industrial and Systems Engineering Information Sciences.
TOWARDS AN ARCHITECTURE FOR NATIONAL DATA SERVICES Ian Foster Director, Computation Institute Argonne National Laboratory & The University of
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Enhancements to Galaxy for delivering on NIH Commons
CyVerse Tools and Services
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
QlikView Connector for Informatica Powercenter An Introduction
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
One Mobile App Connects to Office 365 Online Services to Reimagine Enterprise Collaboration “Microsoft is the only cloud service offering a world-class.
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
Azure Enables Mobility, Easy Sync and Share, and Allows Companies to Retain Data Control MINI-CASE STUDY “Azure provides the full stack of technology that.
Development Goals for Year 2
Presentation transcript:

Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago

Recap from other talks on genomics FBIRN combining imaging, clinical and genetics data CIDR provide better value to end users – Globus Online helping CIDR to reliably transfer large sequencing data sets to end users Ivo and Fabio presented various challenges in building Pipelines in Genomics – Large data volumes – Multiple, complex analytical tools In this talk we will focus on how we can provide workflow capabilities to end users in a way that is both easy to use and scalable

Enter Galaxy A free (for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage Open source software that makes it easy to integrate your own tools and data and customize your own site Flexible architecture -> Customizable 3

Galaxy Adoption ~50 deployments of Galaxy – Galaxy for MicroArray analysis, Machine Learning, Drug Discovery etc ~130,000 jobs a month and growing on the public instance of Galaxy 1 TB/week in user uploads – 60TB from China 150+ attendees in the Galaxy users conference – From 6 continents Adoption driven primarily by – Ease of use – Software as a service – Responsive to user needs 4

Opportunities for BIRN collaborators Galaxy for biomedical informatics – Researchers can discover, download interesting and useful datasets provided by BIRN – Analyze data using various BIRN tools – Create and share pipelines with other researchers – Create virtual collaborations by leveraging flexible, secure user and group management 5

Use case: CVRG-Galaxy Created a Galaxy instance for CVRG community Integrated it with Globus Online File transfer capabilities so researchers can get data for analysis Created a CVRG Toolbox in Galaxy with Bioconductor tools from CRData.org Investigating how individual PIs can contribute their own compute and storage 6

CVRG CRData Galaxy 7