Simplifying Large-Scale Data Movement with Globus Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Advertisements

Globus Online Prepared by Bekir Güler. What is Globus Online  Globus online transfer is a hosted service (deployed on Amazon’s cloud infrastructure)
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago.
High Performance Computing Course Notes Grid Computing.
Collaboration on Large Datasets using Globus Rachana Ananthakrishnan University of Chicago.
May 12, 2015 XSEDE New User Tutorials and User Support: Lessons Learned Marcela Madrid.
Data Grids Darshan R. Kapadia Gregor von Laszewski
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
GridFTP: File Transfer Protocol in Grid Computing Networks
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Accelerating data-intensive science by outsourcing the mundane Ian Foster.
Simo Niskala Teemu Pasanen
ORNL is managed by UT-Battelle for the US Department of Energy Globus: Proxy Lifetime Endpoint Lifetime Oak Ridge Leadership Computing Facility.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Portable SSH Brian Minton EKU, Dept. of Technology, CEN/CET)‏
Module 7: Fundamentals of Administering Windows Server 2008.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
Globus online Reliable, high-performance file transfer… made easy. XSEDE ECSS Symposium, Dec.12, 2011 Presenter: Steve Tuecke, Deputy Director Computation.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Leveraging Globus Services to Support Climate Model Data Access Through the Earth System Grid Federation (ESGF) Brian Knosp 1, Luca Cinquini 1, Lukasz.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Practical Distributed Authorization for GARA Andy Adamson and Olga Kornievskaia Center for Information Technology Integration University of Michigan, USA.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
Data Transfers in the ALCF Robert Scott Technical Support Analyst Argonne Leadership Computing Facility.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney Senior Research Scientist National Center for Supercomputing Applications University.
Virtualization Technology and Microsoft Virtual PC 2007 YOU ARE WELCOME By : Osama Tamimi.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Research data management using Globus ESIP Summer Meeting 2015 Rachana Ananthakrishnan University of Chicago
Globus and ESGF Rachana Ananthakrishnan University of Chicago
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
Globus online Delivering a scalable service Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Globus online Cloud Based Services for Science Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
Al Lilianstrom and Dr. Olga Terlyga NLIT 2016 May 4 th, 2016 Under the Hood of Fermilab’s Identity Management Service.
Open OnDemand: Open Source General Purpose HPC Portal
Computing Clusters, Grids and Clouds Globus data service
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
Introduction to Data Management in EGI
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Using and Building Infrastructure Clouds for Science
Presentation transcript:

Simplifying Large-Scale Data Movement with Globus Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National Laboratory

Coordination of widespread European Globus development and operation activities Central point of contact in Europe for Globus Add the European perspective to Globus Globus service provider for European e- Infrastructures such as DEISA, EGI, PRACE Initiative for Globus in Europe (IGE) 2 2 Help Scientists in the European Research Area

Globus ToolkitGlobus Online Use the Grid Reliable file transfer Software-as-a-Service globusonline.org Build the Grid Components for building custom grid solutions globustoolkit.org 3

Globus ToolkitGlobus Online Use the Grid Reliable file transfer Software-as-a-Service globusonline.org Build the Grid Components for building custom grid solutions globustoolkit.org 4

Big Science Built on Globus Toolkit Earth System Grid Cancer Biologiy Informatics Grid LIGO data grid LHC

Extensions to FTP standard –High-performance, Security, Reliability, etc. Used in XSEDE, LHC, ESG, OSG/VDT, LIGO, BIRN, … >10M files, >½PB transferred per day GT GridFTP: File transfer work horse 6

But small and medium science is suffering Ad-hoc solutions Inadequate software Inadequate hardware Data plan mandates

Globus ToolkitGlobus Online Use the Grid Reliable file transfer Software-as-a-Service globusonline.org Build the Grid Components for building custom grid solutions globustoolkit.org 8

Goal: Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to: –provide millions of researchers with unprecedented access to powerful tools –reduce research IT costs dramatically via economies of scale “Civilization advances by extending the number of important operations which we can perform without thinking of them” --Alfred North Whitehead, 1911 Globus Online Vision 9

What should be trivial… The Challenge: Moving Big Data Easily 10 … can be painfully tedious and time-consuming “I need my data over there – at my _____” ( supercomputing center, campus server, etc.) Data Source Data Destination ! Config issues ! Unexpected failure = manual retry Data Source Data Destination “GAAAH! ! Firewall issues

Reliable file transfer. –Easy “fire and forget” file transfers –Automatic fault recovery –High performance –Across multiple security domains No IT required. –Software as a Service (SaaS) –No client software installation –New features automatically available –Consolidated support and troubleshooting –Works with existing GridFTP servers –Globus Connect solves “last mile problem” What is Globus Online? 11 “I moved 400 GB of files and didn’t even have to think about it.” −Lawrence Berkeley National Lab “It’s just not a big deal to move big data anymore.” −Initiative for Biomedical Informatics “Fantastic! I have started using globus connect to transfer data, and it only took me 5 minutes to set up. Thank you!” −NERSC user

Case Study: Lattice QCD 12 Fast: Reduced transfer times Easy: Fire-and-forget transfers Automated retry No file pre-staging No complex infrastructure Convenient CLI or GUI interfaces “Globus Online frees up my time to do more creative work than typing scp commands or devising scripts to initiate and monitor progress to move many files.” Indiana University researcher moved ~6 TB from Oak Ridge to TACC in 2 days “I moved GB files tonight in about 1.5 hours. I am very impressed. I also like the new commands and help system.”

Challenge –“We need to provide web-based ways to accomplish computing tasks – it’s what our scientists expect. and it will make them more productive.” Solution –Globus Online endpoints maintained by NERSC –GO = recommended transfer method Benefits for NERSC users –Drag and drop archiving –Easy to use –Users can focus on their research (not on IT) Benefits for NERSC –Operations and support outsourced to GO –Fast and easy to make endpoints available –Automated authentication –Reliable performance and support Case Study: Enabling NERSC 13 Hopper, Franklin and HPSS are among the NERSC resources leveraged by Globus Online. “Fantastic! I have already started using Globus Connect to transfer data, and it only took me 5 minutes to set up. Thank you!” – NERSC user

How It Works 14 Data Source Data Source Data Destination Data Destination User initiates transfer request 1 1 Globus Online moves files 2 2 Globus Online notifies user 3 3 How It Works

Globus Online Interface 15 Web interface Command line interface ls alcf#dtn:~ scp alcf#dtn:~/myfile \ nersc#dtn:~/myfile Command line interface ls alcf#dtn:~ scp alcf#dtn:~/myfile \ nersc#dtn:~/myfile HTTP REST interface POST globusonline.org/ v0.10/ transfer HTTP REST interface POST globusonline.org/ v0.10/ transfer GridFTP servers FTP servers GridFTP servers FTP servers High-performance data transfer nodes High-performance data transfer nodes Globus Connect on local computers Globus Connect on local computers (Hosted on)

1.Sign up: Visit to create an accountwww.globusonline.org Getting Started (2 easy steps) 16

2.Start moving files: Pick your data and where you want to move it, then click to transfer Getting Started (2 easy steps) 17

Globus Connect to/from your Laptop 18

Interactive login to command line interface: Running commands remotely: Using CLI with gsissh: Using the CLI $ ssh $ gsissh $ ssh scp –r –s 3 -D olcf#/~/myfile* mylaptop:/~/projects/p1 Task ID: 4a3c471e-edef-11df-aa b1 $ _ $ ssh scp –r –s 3 -D olcf#/~/myfile* mylaptop:/~/projects/p1 Task ID: 4a3c471e-edef-11df-aa b1 $ _

Visit to: –Get a free account and start moving files Visit for: –Tutorials –FAQs –Pro Tips –Troubleshooting Contact –Help getting started –Help using the service For More Information 20