Download presentation
Published byHorace Rodgers Modified over 9 years ago
1
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
Wantao Liu1,2 Raj Kettimuthu2,3, Brian Tieman3, Ravi Madduri2,3, Bo Li1, and Ian Foster2,3 1Beihang University, Beijing, China 2The University of Chicago, Chicago, USA 3Argonne National Laboratory, Argonne, USA
2
Outline GridFTP overview GridFTP Challenges
Commonly used GridFTP clients Zero configure GUI client Experimental results
3
GridFTP A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol We also supply a reference implementation: Server Client tools (globus-url-copy) Development Libraries Multiple independent implementations can interoperate University of Virginia and Fermi Lab have home grown servers that work with ours. Lots of people have developed clients independent of the Globus Project. Widely used high performance data movement protocol. Based on FTP protocol - defines extensions for high-performance operation and security. Standardized through Open Grid Forum (OGF)
4
GridFTP Two channel protocol like FTP Control Channel Data Channel
Communication link (TCP) over which commands and responses flow Low bandwidth; encrypted and integrity protected by default Data Channel Communication link(s) over which the actual data of interest flows High Bandwidth; authenticated by default; encryption and integrity protection optional Supports both client/server transfers and 3rd party transfers (remote client can initiate a transfer between 2 servers)
5
Striping GridFTP offers a powerful feature called striped transfers (cluster-to-cluster transfers) If each node at the source and destination has a 1 Gbps NIC and if the capacity of the network connecting the source and destination cluster is on the order of 10 Gbps, this configuration can be used to achieve throughputs close to 10 Gbps
6
GridFTP Servers Around the World
Created by Lydia Prieto ; G. Zarrate; Anda Imanitchi (Florida State University) using MaxMind's GeoIP technology (
7
GridFTP in production Many Scientific communities rely on GridFTP
High Energy Physics – tiered data movement infrastructure for the LHC computing Grid LIGO routinely uses GridFTP to move 1 TB a day Southern California Earthquake Center (SCEC), Earth Systems Grid (ESG), Relativistic Heavy Ion Collider (RHIC), European Space Agency, BBC use GridFTP for data movement GridFTP facilitates an average of more than 5 million data transfers every day
8
Challenges Past success Current and future
Standard – big selling point for adoption Throughput – GridFTP was sold on speed Robustness – has to work all the time Current and future Ease-of-use Zero configuration clients Firewall Scalable Extensible
9
Globus-url-copy globus-url-copy [options] srcURL dstURL
Commonly used command line scriptable client globus-url-copy [options] srcURL dstURL URL format - Users can do client/server and 3rd party transfers using globus-url-copy
10
Other clients UberFTP Reliable file transfer service
Custom clients using globus C and Java client libraries All these clients require non-trivial configuration Security setup None of these clients provide graphical user interface
11
GridFTP GUI Zero configuration Fault tolerant
Drag and drop Zero configuration Integrated with myproxy Automatically trusts the CAs part of IGTF distribution Fault tolerant Transfer status monitoring Optimized for performance
12
Snapshot of the GUI
13
Fault tolerant Better fault tolerance than other GridFTP clients
Like other clients, GUI can recover from transient server and network failures Globus-url-copy can not recover from its own failures GUI can recover from its own failures Unlike RFT, stores information on the local file system
14
Lots of small files Scientific experiments produce huge volume of data
the individual file size is modest, on the order of kilobytes or megabytes hundreds of thousands of files to transfer every day the size of the entire dataset is tremendous, from hundreds of gigabytes to hundreds of terabytes
15
Advanced Photon Source
Advanced Photon Source at Argonne dozens of samples may be acquired for one experiment every day each sample generates about 2,000 raw data files after processing, each sample produces additional 2,000 reconstructed files each file is 8 to 16 MB in size
16
Lots of small files Transfer threads pool
Move multiple files concurrently Maximize the utilization of network bandwidth Improve the transfer performance Two windows for status information Directory window lists all directories and their transfer status File window lists all files under the active directory
17
Experiment Setup We conducted all of our experiments using TeraGrid NCSA nodes and the University of Chicago nodes GridFTP GUI is compared with scp and globus-url-copy TCP is configured as the underlying data transport protocol
18
Experiment Results The time consumed to transfer a single file
Eight file sizes were used: 1 MB, 5 MB, 10 MB, 50 MB, 100 MB, 300 MB, 500 MB, and 1000 MB Data was moved from NCSA to the University of Chicago
19
Experiment Results(cont.)
Data transfer time of a directory with tens of thousands of small files Each file is 1 MB Directories of 10,000, 20,000, 30,000, 40,000 and 50,000 files are created Data flowed from the University of Chicago to NCSA
20
Questions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.