1 TeraGrid Data Transfer Jeffrey P. Gardner Pittsburgh Supercomputing Center

Slides:



Advertisements
Similar presentations
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Advertisements

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
1 Getting Started with TeraGrid Authentication Jeffrey P. Gardner Pittsburgh Supercomputing Center
FILE TRANSFER PROTOCOL Short for File Transfer Protocol, the protocol for exchanging files over the Internet. FTP works in the same way as HTTP for transferring.
GridFTP: File Transfer Protocol in Grid Computing Networks
(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.
Chapter 26 FTP.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
ORNL is managed by UT-Battelle for the US Department of Energy Tools Available for Transferring Large Data Sets Over the WAN Suzanne Parete-Koon Chris.
Simo Niskala Teemu Pasanen
File Transfer Protocol (FTP)
ORNL is managed by UT-Battelle for the US Department of Energy Globus: Proxy Lifetime Endpoint Lifetime Oak Ridge Leadership Computing Facility.
SoftwareTools CGS 3460, Lecture 7 Jan 25, 2006 Zhen Yang.
Chapter 31 File Transfer & Remote File Access (NFS)
2440: 141 Web Site Administration Remote Web Server Access Tools Instructor: Enoch E. Damson.
Accessing the Internet with Anonymous FTP Transferring Files from Remote Computers.
Overview of TeraGrid Resources and Usage Selim Kalayci Florida International University 07/14/2009 Note: Slides are compiled from various TeraGrid Documentations.
GridFTP Guy Warner, NeSC Training.
1 Web Server Administration Chapter 9 Extending the Web Environment.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
G053 - Lecture 08 Hosting Websites Mr C Johnston ICT Teacher
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
1 INFO 321 Server Technologies II FTP Material adapted from Dr. Randy Kaplan.
Secure Shell Mike Griffiths & Deniz Savas CiCS Dept Sheffield University November 2005.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
August 13, 2003Eric Hjort Getting Started with Grid Computing in STAR Eric Hjort, LBNL STAR Collaboration Meeting August 13, 2003.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 8 Omar Meqdadi Department of Computer Science and Software Engineering University of.
Computer Networking From LANs to WANs: Hardware, Software, and Security Chapter 13 FTP and Telnet.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
1 Linux Networking and Security Chapter 5. 2 Configuring File Sharing Services Configure an FTP server for anonymous or regular users Set up NFS file.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
1 Chapter 34 Internet Applications (Telnet, FTP).
Configuring a LAN FTP Client Chapter 17 & 19. Setting up the physical layer Chapter 17 covers the basics of hubs, switches, routers and WAPs.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
Hepix LAL April 2001 An alternative to ftp : bbftp Gilles Farrache In2p3 Computing Center
File Transfer Protocol (FTP)
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
Remote Access Usages. Remote Desktop Remote desktop technology makes it possible to view another computer's desktop on your computer. This means you can.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
FTP COMMANDS OBJECTIVES. General overview. Introduction to FTP server. Types of FTP users. FTP commands examples. FTP commands in action (example of use).
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Initiating Teragrid Sessions Raghu Reddy. Outline Motivation Initial Setup –Certificates –Proxies –Grid-map file entries and DNs Softenv for customizing.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Chapter 7: Using Network Clients The Complete Guide To Linux System Administration.
Gateways security Aashish Sharma Security Engineer National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign.
Getting Started with TeraGrid Authentication
FTP Lecture supp.
TeraGrid Data Transfer
Networking Applications
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Artem Trunov and EKP team EPK – Uni Karlsruhe
Chapter 16: Distributed System Structures
FTP and UNIX TOPICS Exploring your Web Hosting Site FTP UNIX
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Network+ Guide to Networks, Fourth Edition
Internet Applications (Telnet, FTP)
Presentation transcript:

1 TeraGrid Data Transfer Jeffrey P. Gardner Pittsburgh Supercomputing Center

2 CIG MCW, Boulder, CO Outline  GSISSH  Use passwordless login between TeraGrid machines  Hand-on Exercises  TeraGrid File Management  Data Transfer Performance  GridFTP  Terminology  TeraGrid Deployment  Hands-on Exercises  Use of GridFTP clients & servers to transfer files

3 CIG MCW, Boulder, CO Hands-on: Preparation Prepare for exercises by logging into NCSA, getting valid proxy certificate. Login to tg-login.ncsa.teragrid.org: ssh Enter your password: xxxxxx Get a valid proxy certificate: tg-login1> grid-proxy-init Enter GRID pass phrase for this identity: yyyyyy Creating proxy Done Your proxy is valid until: Tue Jun 21 08:06:

4 CIG MCW, Boulder, CO GSISSH: SSH using TG Certificates Now login to TACC using GSISSH tg-login> gsissh tg-login.sdsc.teragrid.org TA DA! See that your NCSA certificate DN and user account name have been entered into TACC’s grid-mapfile > grep -i userid /etc/grid-security/grid-mapfile "/C=US/O=National Center for Supercomputing Applications/CN=Jeff Gardner" gardnerj Logout of TACC > exit

5 CIG MCW, Boulder, CO TeraGrid File Placement  No common cross-site filesystems (currently)  This will change very shortly!  NCSA, SDSC, TACC, ANL will install GPFS ( “ Global Parallel File System ” )  User controls where their data resides  Appropriate sites(s)  Appropriate storage  Online Filesystem(s)  Speed, visibility, quotas, backup policy  Each filesystem directly accessible from single site  Mass Storage Systems  Long-term storage, slower access

6 CIG MCW, Boulder, CO TeraGrid File Movement  File movement responsibility of user  Between Online Filesystems  Intra-site  Cross-site*  Between Mass Storage and Online Filesystems  Intra-site*  Cross-site* * Session focuses on these types of transfers

7 CIG MCW, Boulder, CO TeraGrid Transfer Environment  TeraGrid backbone bandwidth means Wide Area Network is rarely a bottleneck  SDSC Caltech NCSA PSC: 40 Gb/sec  NCSA TACC: 10 Gb/sec  GSI authentication and proxy certificates provide automagic security for transfers  just do “ grid-proxy-init ” and you ’ re in  Transfer requests can be integrated into job execution scripts  Moving input data to site(s) of job execution  Moving results to another filesystem, site, or archive

8 CIG MCW, Boulder, CO Data Transfer Performance What impacts transfer rates? Disk and filesystem speed Connectivity of filesystem to node Node characteristics & load Connectivity of node to WAN For all networks Bandwidth Latency Buffer Size Protocol Load Encryption … Don’t expect 40 Gb/sec! node WAN (TG Backbone) 40 Gb/s switch node 30 Gb/s 1 Gb/s switch 30 Gb/s node

9 CIG MCW, Boulder, CO Performance – Choices Matter  Transfer large files for best performance  Use fast filesystems, dedicated transfer nodes, optimized transfer parameters  Transfer 1 GByte file from NCSA to SDSC (10/6/2004) ChoicesTransfer TimeTransfer Rate Home filesystems Login nodes Default parameters 20 min 18 sec.845 MBytes/sec (.0066 Gbits/sec) Parallel filesystems Transfer nodes Optimized parameters 11 sec MBytes/sec (.727 Gbits/sec)

10 CIG MCW, Boulder, CO GridFTP Terminology - Protocol  “ GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high- bandwidth, wide-area networks. GridFTP is based on FTP, the highly popular Internet file transfer protocol. ” - Quoted from Globus Alliance website

11 CIG MCW, Boulder, CO Terminology - Client  GridFTP client programs issue requests that adhere to the GridFTP protocol  Users run GridFTP client programs to transfer files  There is no client program named gridFTP, which can be confusing because users are told “ use gridFTP to transfer your files ”  tgcp, globus-url-copy and uberftp are three GridFTP client programs that are part of the Common TeraGrid Software Stack (CTSS)

12 CIG MCW, Boulder, CO Terminology – 3 rd Party Transfer A GridFTP transfer between two GridFTP servers, rather than between a server and a client, is called a third-party transfer A third-party transfer occurs when the GridFTP client initiating the transfer is run on a system that is neither the source nor the destination of the transfer operation Allows use of dedicated transfer nodes User runs GridFTP client to request data transfer; HOST A Source of Data GridFTP Server Process Host B Data Requests in GridFTP protocol Destination of Data GridFTP Server Process Host C

13 CIG MCW, Boulder, CO Terminology - Server A GridFTP server process understands requests that adhere to the GridFTP protocol, and performs authentication and data transfer operations based on those requests TeraGrid GridFTP servers usually run on: Login nodes: tg-login..teragrid.org Dedicated GridFTP nodes: tg-gridftp..teragrid.org Some mass storage front-ends are GridFTP servers mss.ncsa.teragrid.org

14 CIG MCW, Boulder, CO TG GridFTP Server Deployment  tg-login..teragrid.org is a login node and also runs a GridFTP server  Shared resource; Many tasks  tg-gridftp..teragrid.org is a dedicated GridFTP server  Dedicated file transfer resource  usually better connectivity

15 CIG MCW, Boulder, CO TG GridFTP Client Deployment uberftp interactive GridFTP transfer client configurable tcp buffersize and number of parallel streams

16 CIG MCW, Boulder, CO TG GridFTP Client Deployment globus-url-copy command line interface -tcp-bs | -tcp-buffer-size specify the size (in bytes) of the buffer to be used by the underlying ftp data channels -p | -parallel specify the number of streams to be used in the ftp transfer tgcp [gridFTP-server1:]file1 [gridFTP-server2:]file2 command line interface friendly “scp-like” wrapper around globus-url-copy

17 CIG MCW, Boulder, CO Hands-on:  Participants will be led through a series of exercises using tgcp, globus-url-copy and uberftp.  Demonstrates transferring files  Between TeraGrid sites  Between TG machines and archival storage systems

18 CIG MCW, Boulder, CO Hands-on preparation:  Login to tg-login.ncsa.teragrid.org if you have not already done so  Get the test data file: wget

19 CIG MCW, Boulder, CO Hands-on: Exercise 1 GridFTP between login nodes Copy a 9 MByte file from the current directory at NCSA to your home directory at TACC. Use the login node at TACC as the remote GridFTP server. Use default transfer parameters. Use globus-url-copy to transfer the file: Type command on a single line – no carriage return! tg-login1> /usr/bin/time –f %e globus-url-copy file:`pwd`/test.file gsiftp://tg-login.tacc.teragrid.org/~/test.file.Ex1 3.18

20 CIG MCW, Boulder, CO Hands-on: Exercise 2 GridFTP between GridFTP Servers Copy a 9 MByte file from the current directory at NCSA to your home directory at TACC. Use a third-party transfer and the GridFTP server nodes at both NCSA and SDSC. Use globus-url-copy to transfer the file: tg-login1> /usr/bin/time -f %E globus-url-copy gsiftp://tg-gridftp.ncsa.teragrid.org/`pwd`/test.file gsiftp://tg-gridftp.tacc.teragrid.org/~/test.file-Ex2 3.01

21 CIG MCW, Boulder, CO Hands-on: Exercise 3 GridFTP between GridFTP Servers Copy a 9 MByte file from the current directory at NCSA to your home directory at TACC. Use a third-party transfer and the GridFTP server nodes at both NCSA and SDSC. Use optimized transfer parameters. Use globus-url-copy to transfer the file: tg-login1> /usr/bin/time -f %E globus-url-copy –tcp-bs –p 4 gsiftp://tg-gridftp.ncsa.teragrid.org/`pwd`/test.file gsiftp://tg-gridftp.tacc.teragrid.org/~/test.file-Ex3 2.54

22 CIG MCW, Boulder, CO Hands-on: Exercise 4 Using tgcp Copy a 9 MByte file from your home directory at NCSA to your home directory at TACC using tgcp. tgcp automatically uses third-party transfers and optimized transfer parameters. Add tgcp to your path (it is not in there by default): tg-login1> soft add +tgcp Use tgcp to transfer the file: tg-login1> /usr/bin/time -f %E tgcp test.file tg-gridftp.tacc.teragrid.org:/home/userid/test.file-Ex4 globus-url-copy –p 4 –tcp-bs gsiftp://tg-gridftp.ncsa.teragrid.org:2812/home/ac/gardnerj/test.file gsiftp://tg-gridftp.tacc.teragrid.org:2812/home/gardnerj/test.file 4.06 (?!!)

23 CIG MCW, Boulder, CO Hands-on: Exercise 5 – pg 1 UberFTP between login nodes Copy a 9 MByte file from your NCSA home directory to TACC. Use optimized transfer parameters. Interactive session. Start uberftp and set transfer parameters: tg-login1> uberftp uberftp> parallel 4 uberftp> tcpbuf TCP buffer set to bytes Open connection to TACC: uberftp> open tg-login.tacc.teragrid.org %% BANNER %% 220 UNIX Archive FTP server ready. 230 User xxx logged in.

24 CIG MCW, Boulder, CO Hands-on: Exercise 5 – pg 2 UberFTP between login nodes Copy the file: uberftp> put test.file test.file-Ex5 150 Opening BINARY connection(s) for test.file-Ex Transfer complete. Transfer rate bytes in 0.51 seconds KB/sec Get a listing of the TACC home directory: uberftp> ls -rw---- user group date test.file-Ex1 -rw---- user group date test.file-Ex2 -rw---- user group date test.file-Ex3... Exit UberFTP: uberftp> quit

25 CIG MCW, Boulder, CO Hands-on: Exercise 6 – pg 1 UberFTP between GridFTP servers Copy a 9 MByte file from your NCSA home directory to TACC using third-party transfers. Use optimized transfer parameters. Interactive session. Start uberftp and set transfer parameters: tg-login1> uberftp uberftp> parallel 4 uberftp> tcpbuf TCP buffer set to bytes

26 CIG MCW, Boulder, CO Hands-on: Exercise 6 – pg 2 UberFTP between GridFTP servers Open “local” connection to NCSA dedicated GridFTP server tg-login1> lopen tg-gridftp.ncsa.teragrid.org 220 tg-gridftp4.ncsa...blah..blah ready. 230 User xxx logged in. Open “remote” connection to TACC dedicated GridFTP server: uberftp> open tg-gridftp.tacc.teragrid.org 220 lonestar GridFTP...blah..blah ready. 230 User xxx logged in.

27 CIG MCW, Boulder, CO Hands-on: Exercise 6 – pg 3 UberFTP between GridFTP servers Copy the file: uberftp> put test.file test.file-ex6 src> 150 Opening BINARY mode data connection(s). dst> 150 Opening BINARY mode data connection(s). src> 226 Transfer complete. dst> 226 Transfer complete. Exit UberFTP: uberftp> quit

28 CIG MCW, Boulder, CO Useful UberFTP commands Unix-like commands ls, cd, mkdir, rmdir, pwd, rm Put “l” in front for “local” versions of commands lls, lcd, lmkdir, lrmdir, lpwd, lrm put transfer from local host to remote host get transfer from remote host to local host mput, mget transfer multiple files between hosts help

29 CIG MCW, Boulder, CO Tweaking Optimization Parameters globus-url-copy -tcp-bs | -tcp-buffer-size specify the size (in bytes) of the buffer to be used by the underlying ftp data channels “Low” network traffic: “High” network traffic: p | -parallel specify the number of streams to be used in the ftp transfer Low network traffic: 1 High network traffic: 2 - 4

30 CIG MCW, Boulder, CO Tweaking Optimization Parameters uberftp tcpbuf specify the size (in bytes) of the buffer to be used by the underlying ftp data channels “Low” network traffic: “High” network traffic: parallel specify the number of streams to be used in the ftp transfer Low network traffic: 1 High network traffic: 2 - 4

31 CIG MCW, Boulder, CO Using Robotic-Tape Archival Resources NCSA Mass Storage System (MSS) Accessible using GridFTP to mss.ncsa.teragrid.org TACC SGI Data Migration Facility (DMF) Accessible by simply placing files in $ARCHIVE directory SDSC HPSS archival storage system Use HSI from SDSC cluster only PSC “Golem” Accessible using GridFTP to tg-gridftp.psc.teragrid.org

32 CIG MCW, Boulder, CO Using Robotic-Tape Archival Resources Files on these machines are transferred to their local disks, but may be automatically migrated to tape if necessary. If you access a file that has been migrated to tape, it will be retrieved automatically, but expect some delay (up to a few minutes) Storage capacity is essentially infinite!

33 CIG MCW, Boulder, CO Hands-on: Exercise 7 – pg 1 Copy several 9 MByte files from your home directory at TACC to the NCSA Mass Storage System. Use 3 rd party transfer at TACC. GSISSH from NCSA to TACC: tg-login> gsissh tg-login.tacc.teragrid.org Start uberftp session: lonestar> uberftp Establish “local” connection to TACC dedicated GridFTP server: uberftp> lopen tg-gridftp.tacc.teragrid.org 220 lonestar GridFTP..blah..blah..ready. 230 User xxx logged in. Establish “local” connection to TACC dedicated GridFTP server: uberftp> open tg-gridftp.tacc.teragrid.org %%%Lots of Stuff%%%% 230 User xxx logged in.

34 CIG MCW, Boulder, CO Hands-on: Exercise 7 – pg 2 Put multiple files to NCSA MSS: uberftp> mput test.file* src> 150 Opening BINARY mode data connection for test file... dst> 150 Opening BINARY mode data connection for test file... src> 226 Transfer complete. dst> 226 Transfer complete....

35 CIG MCW, Boulder, CO Hands-on: Exercise 7 – pg 3 Get a listing of the Mass Storage System directory: uberftp> ls -rw---- user group DK common date test.file-Ex1 -rw---- user group DK common date test.file-Ex2 -rw---- user group DK common date test.file-Ex3... Quit uberftp: uberftp> quit File is on disk. AR used to indicate file on tape.

36 CIG MCW, Boulder, CO Using PSC “Golem” tg-gridftp.psc.teragrid.org maps directly onto Golem’s filesystem. Example: tg-login1> globus-url-copy –tcp-bs –p 4 gsiftp://tg-gridftp.ncsa.teragrid.org/`pwd`/test.file gsiftp://tg-gridftp.psc.teragrid.org/~/test.file

37 CIG MCW, Boulder, CO Using TACC DMF Simply copy files to $ARCHIVE directory Files in this directory are automatically migrated to tape if necessary. If you access a file that has been migrated to tape, it will be retrieved automatically, but expect some delay (up to a few minutes) /archive/teragrid/username is visible from the login nodes, but not the TACC dedicated GridFTP servers.

38 CIG MCW, Boulder, CO Hands-on: Wrapup Logout of TACC gsissh session: lonestar> exit Destroy your proxy: tg-login> grid-proxy-destroy Logout of NCSA ssh session: tg-login> exit

39 CIG MCW, Boulder, CO Data Transfer Summary  GridFTP clients tgcp, globus-url-copy and uberftp can be used to perform transfers between many TeraGrid online filesystems and mass storage systems accessible via GridFTP servers.  Users responsible for managing data transfers, including job-related data movement which can be incorporated into job scripts.  Choose servers, filesystems, and transfer parameters wisely to optimize performance.  Ongoing efforts to improve rates and usability.

40 CIG MCW, Boulder, CO Useful URLs for help TeraGrid user information overview Summary of TG Resources Summary of machines with links to site-specific user guides (just click on the name of each site) Data Transfer guide Archival Storage guide