Globus online Cloud Based Services for Science Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
Industry and economics Ian Foster Computation Institute Department of Computer Science Mathematics and Computer Science Division Institute for Genomic.
Advertisements

High Performance Computing Course Notes Grid Computing.
Collaboration on Large Datasets using Globus Rachana Ananthakrishnan University of Chicago.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Accelerating data-intensive science by outsourcing the mundane Ian Foster.
SaaS, PaaS & TaaS By: Raza Usmani
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Office 365: Efficient Cloud Solutions Wednesday March 12, 9AM Chaz Vossburg / Gabe Laushbaugh.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Assessment of Core Services provided to USLHC by OSG.
Cloud Computing Source:
3 Cloud Computing.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
Lessons from industry for science cyberinfrastructure Simplicity, scale, and sustainability via SaaS/PaaS.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
An emerging computing paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices.
Globus Genomics – Science as a Service for large scale NGS analysis
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Globus online Reliable, high-performance file transfer… made easy. XSEDE ECSS Symposium, Dec.12, 2011 Presenter: Steve Tuecke, Deputy Director Computation.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Leveraging Globus Services to Support Climate Model Data Access Through the Earth System Grid Federation (ESGF) Brian Knosp 1, Luca Cinquini 1, Lukasz.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Data Transfers in the ALCF Robert Scott Technical Support Analyst Argonne Leadership Computing Facility.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Microsoft Management Seminar Series SMS 2003 Change Management.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
IBM Bluemix Ecosystem Development Hands on Workshop Section 1 - Overview.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
CLOUD COMPUTING. What is cloud computing ??? What is cloud computing ??? Cloud computing is a general term for anything that involves delivering hosted.
1 Overall Architectural Design of the Earth System Grid.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Research data management using Globus ESIP Summer Meeting 2015 Rachana Ananthakrishnan University of Chicago
Web Technologies Lecture 13 Introduction to cloud computing.
Globus and ESGF Rachana Ananthakrishnan University of Chicago
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
Tools for Portals, Search, Assimilation, Provenance Computing Infrastructure for Science Individual University and Lab PIs National and Int’l collabs Research.
Support to scientific research on seasonal-to-decadal climate and air quality modelling Pierre-Antoine Bretonnière Francesco Benincasa IC3-BSC - Spain.
Built on the Microsoft Azure Platform, UberCloud Helps Engineers and Software Providers to Offer and Deploy Powerful Cloud Services On Demand MICROSOFT.
EGI-InSPIRE RI EGI Webinar EGI-InSPIRE RI Porting your application to the EGI Federated Cloud 17 Feb
Globus online Delivering a scalable service Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory.
Simplifying Large-Scale Data Movement with Globus Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National Laboratory.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Accessing the VI-SEEM infrastructure
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Computing Clusters, Grids and Clouds Globus data service
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
Software infrastructure for a National Research Platform
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Dev Test on Windows Azure Solution in a Box
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
XtremeData on the Microsoft Azure Cloud Platform:
Cloud Computing: Concepts
Using and Building Infrastructure Clouds for Science
Presentation transcript:

globus online Cloud Based Services for Science Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Promulgate via new educational methods Promulgate via new educational methods Apply to challenging problems Computation Institute (CI) Accelerate by building the research cloud

Cloud Layers 3 Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS) Infrastructure-as-a-Service (IaaS)

The data deluge

The data deluge in biology x10 5 in 6 years x10 in 6 years

Number of sequencing machines

18 orders of magnitude in 5 decades ! 12 orders of magnitude in 6 decades Moore’s Law for X-ray sources Credit: Linda Young

Exploding data volumes in astronomy 100,000 TB MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB Sloan: 40 TB Pan-STARRS: 40,000 TB

Exploding data volumes in climate science Climate model intercomparison project (CMIP) of the IPCC 2004: 36 TB 2012: 2,300 TB

Big science has been successful All build on Globus Toolkit software LIGO: 1 PB data in last science run, distributed to 800 scientists worldwide ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010 Substantial teams Sustained effort Leverage common technology Application-specific solutions Production focus

But small and medium science is suffering Data deluge Ad-hoc solutions Inadequate software, hardware & IT staff

Dark data in the long tail of science NSF grant awards, 2007 (Bryan Heidorn)

A crisis that demands new approaches We have exceptional infrastructure for the 1% (e.g., supercomputers, Large Hadron Collider, …) But not for the 99% (e.g., the vast majority of the 1.8M publicly funded researchers in the EU) We need new approaches to providing research cyberinfrastructure, that: — Provide advanced functionality — Reduce barriers to entry — Are cheaper — Are sustainable

Run a research project from a coffee shop? 14

Accelerate discovery and innovation worldwide by providing research IT as a service Leverage the cloud to provide millions of researchers with unprecedented access to powerful tools; enable a massive shortening of cycle times in time-consuming research processes; and reduce research IT costs dramatically via economies of scale Rethinking how we provide research IT

Time-consuming Tasks in Research Run experiments Collect data Manage data Move data Acquire computers Analyze data Run simulations Compare experiment with simulation Search the literature Communicate with colleagues Publish papers Find, configure, install relevant software Find, access, analyze relevant data Order supplies Write proposals Write reports … 16 “Civilization advances by extending the number of important operations which we can perform without thinking of them” —Alfred North Whitehead, 1911

Analysis Staging Ingest Community Repository ArchiveMirror Registry Next-gen genome sequencer Telescope In millions of labs worldwide, researchers struggle with massive data, advanced software, complex protocols, burdensome reporting Addressing the big data challenge Accelerate discovery and innovation by outsourcing difficult tasks Simulation

The Research Cloud Grid meets Cloud Millions of researchers worldwide need advanced IT to tackle important and urgent problems

Transfer and synchronize files –Easy “fire-and-forget” transfers –Automatic fault recovery –High performance –Across multiple security domains Minimize IT costs –Software as a Service (SaaS) No client software installation New features automatically available –Consolidated support & troubleshooting –Simple endpoint installation with Globus Connect and GridFTP Recommended by XSEDE, Blue Water, NERSC, ALCF, ESnet, many Universities What is Globus Online? 20

Case Study: Lattice QCD 21 Fast: Reduced transfer times Easy: Fire-and-forget transfers Automated retry No file pre-staging No complex infrastructure Convenient CLI and GUI interfaces “Globus Online frees up my time to do more creative work than typing scp commands or devising scripts to initiate and monitor progress to move many files.” Indiana University researcher moved ~6 TB from Tennessee to Texas in 2 days “I moved GB files tonight in about 1.5 hours. I am very impressed. I also like the new commands and help system.”

Challenge –“We need to provide web-based ways to accomplish computing tasks – it’s what our scientists expect. And it will make them more productive.” Solution –Globus Online endpoints maintained by NERSC –Globus = recommended transfer method Benefits for NERSC users –Drag and drop archiving –Easy to use –Users can focus on their research (not on IT) Benefits for NERSC –Operations and support outsourced to Globus –Fast and easy to make endpoints available –Automated authentication –Reliable performance and support Case Study: Enabling NERSC 22 Hopper, Franklin and HPSS are among the NERSC resources leveraged by Globus Online. “Fantastic! I have already started using Globus Connect to transfer data, and it only took me 5 minutes to set up. Thank you!” – NERSC user

23

EU-friendly identity providers EU relevant endpoints Integrated support with EGI & IGE EU Cookie Law compliant Consent to store log data in US globusonline.eu (coming soon) 24 Partners

Why SaaS? Deliver advanced functionality that: –Requires no user software installation or operation Minimal IT proficiency required –Can be cheaply and incrementally adopted Usage-based subscription pricing; no big up-front costs –Consolidates troubleshooting and support An expert group can proactively detect and correct problems –Utilizes an efficient software delivery lifecycle Updates developed, tested and deployed quickly Dominates commercial & consumer markets –What about the research market? Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS) Infrastructure-as-a-Service (IaaS)

SaaS changes assumptions and approach throughout the software lifecycle –Architecture and design Designed for specific environment (e.g., AWS) –Software development No porting or cross platform testing. Focus on functionality. –Operations Continuous update cycle. Nobody else will operate. Focus on availability, automation, monitoring –Support Tightly integrated with operations We are delivering a service, not software Software as a Service (SaaS) vs Traditional software delivery 26

Potential economies of scale Small laboratories –PI, postdoc, technician, grad students –Estimate 10,000 across US research community –Average ill-spent/unmet need of 0.5 FTE/lab? + Medium-scale projects –Multiple PIs, a few software engineers –Estimate 1,000 across US research community –Average ill-spent/unmet need of 3 FTE/project? = Total 8,000 FTE: at ~$100K/FTE => $800M/yr (If we could even find 8,000 skilled people) Plus computers, storage, opportunity costs, …

Other innovative science SaaS projects

Other innovative science SaaS projects

Other innovative science SaaS projects

Other innovative science SaaS projects

Other innovative science SaaS projects

Towards “research IT as a service” Research Data Management-as-a-Service Globus Integrate Globus Transfer Globus Storage Globus Catalog Globus Collaborate …SaaS …PaaS

No one SaaS provider can deliver it all Must create ecosystem that: –Allows any SaaS provider to easily participate –Dramatically reduces the cost of creating and operating services within the ecosystem –Provides seamless user experience across services –Agnostic to / works across any cloud IaaS provider Ecosystem requires Platform as a Service –Target the unique needs of the research community PaaS for Research 34 Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS) Infrastructure-as-a-Service (IaaS)

Integrate with the Globus research cloud ecosystem Write programs that leverage: –(federated) user identities, profiles, groups –data, compute and collaboration … via REST APIs and command line programs Globus Integrate Globus Integrate: For when you want to… Globus Transfer Globus Storage Globus Catalog Globus Connect Multi User Globus Connect Globus Nexus Globus Toolkit Globus Collaborate

Every night, they receive 100,000 files in Illinois They transmit files to Texas for analysis … then move results back to Illinois … and make them available to users Process must be reliable, routine, and efficient The cyberinfrastructure team is not large! Dark Energy Survey – Leveraging transfer Image credit: Roger Smith/NOAO/AURA/NSF Blanco 4m on Cerro Tololo

Earth System Grid – Portal integration 37 Outsource data transfer to Globus – Data download to user machine from search – Data transfer to another server by user – Replication of data between sites by administrator No ESGF client software needed

KBase – Leveraging the full platform 38

Genomics Analysis using Globus Online and Galaxy Sequencing Centers Public Data Public Data Storage Local Cluster/ Cloud Seq Center Research Lab Globus Online Provides a High-performance Fault-tolerant Secure file transfer Service between all data-endpoints Data ManagementData Analysis Picard GATK Fastq Ref Genome Alignment Variant Calling Galaxy Data Libraries Globus Online Integrated within Galaxy Web-based UI Drag-Drop workflow creations Easily modify Workflows with new tools Galaxy in Cloud Analytical tools are automatically run on the scalable compute resources when possible Galaxy Based Workflow Management System FTP, SCP, others FTP, SCP SCP Globus Online FTP, SCP, HTTP

Thank you! 40