The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,

Slides:



Advertisements
Similar presentations
Intro to GridFTP John Bresnahan. CCI DPI Components Client Control Channel (CC) Path between client and server used to exchange all information needed.
Advertisements

The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
GridFTP Challenges In Data Transport John Bresnahan Argonne National Laboratory The University of Chicago.
Categories of I/O Devices
Threads, SMP, and Microkernels
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
High Performance Computing Course Notes Grid Computing.
GridFTP: File Transfer Protocol in Grid Computing Networks
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
Distributed components
Technical Architectures
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Introduction to client/server architecture
File Transfer Protocol (FTP)
Lecture slides prepared for “Business Data Communications”, 7/e, by William Stallings and Tom Case, Chapter 8 “TCP/IP”.
Data Communications and Networks
GridFTP Guy Warner, NeSC Training.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Reliable Data Movement Framework for Distributed Science Environments Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
UDT: UDP based Data Transfer Protocol, Results, and Implementation Experiences Yunhong Gu & Robert Grossman Laboratory for Advanced Computing / Univ. of.
File and Object Replication in Data Grids Chin-Yi Tsai.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Managed Object Placement Service John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Argonne National Lab.
Computer Networking From LANs to WANs: Hardware, Software, and Security Chapter 13 FTP and Telnet.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
FTP File Transfer Protocol Graeme Strachan. Agenda  An Overview  A Demonstration  An Activity.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
1 Network Communications A Brief Introduction. 2 Network Communications.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
GridOS: Operating System Services for Grid Architectures
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Working at a Small-to-Medium Business or ISP – Chapter 7
Introduction to client/server architecture
Working at a Small-to-Medium Business or ISP – Chapter 7
Chapter 16: Distributed System Structures
Working at a Small-to-Medium Business or ISP – Chapter 7
Presentation transcript:

The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.

12/04/07National Tsing Hua University2 Outline l Motivation l GridFTP Protocol l Globus GridFTP design and architecture l Performance l New features l Summary

12/04/07National Tsing Hua University3 Motivation

12/04/07National Tsing Hua University4 Motivation l Science is increasingly data-driven l Geographically distributed communities of scientists need to access and analyze large amounts of data u Simulation science applications such as climate modeling u Experimental science applications such as high-energy physics l Rapid increase in raw capacity of wide area network - feasible to move large amounts of data across WAN

12/04/07National Tsing Hua University5 Motivation l NSF TeraGrid links large clusters and storage systems at nine sites u With a network providing up to 30 Gbit/s l In principle, move data across this network at > 3 Gbyte/s or 10 Tbyte/hr l In practice orchestration of such transfers is technically challenging u Exploit parallelism in multiple dimension u Deal with failures of various sorts

12/04/07National Tsing Hua University6 Motivation l Effective end-to-end data transfers thus demand a systems approach u File systems, computers, network interfaces, network protocols - managed in an integrated fashion to meet performance & robustness goals. l These considerations motivate the work that I describe here u Design, implementation & evaluation of a modular & extensible data transfer system suitable for wide area and high-performance environments.

12/04/07National Tsing Hua University7 The GridFTP Protocol

12/04/07National Tsing Hua University8 What is GridFTP? l A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol l A Protocol u Multiple Independent implementation can interoperate l This works. Fermi Lab has an implementation with their DCache system and U. Virginia has a.Net implementation that work with ours. l Lots of people have developed clients independent of the Globus Project. l The Globus Toolkit supplies a reference implementation: u Server u Client tools (globus-url-copy) u Development Libraries

12/04/07National Tsing Hua University9 GridFTP: The Protocol l Existing standards u RFC 959: File Transfer Protocol u RFC 2228: FTP Security Extensions u RFC 2389: Feature Negotiation for the File Transfer Protocol u Draft: FTP Extensions u GridFTP: Protocol Extensions to FTP for the Grid l Grid Forum Recommendation l GFD.20 l

12/04/07National Tsing Hua University10 Understanding GridFTP l GridFTP (and normal FTP) use (at least) two separate socket connections l Control Channel u Command/Response u Basic file system operations eg. mkdir, delete etc u Used to establish data channels l Data channel u Pathway over which file is transferred u Many different underlying protocols can be used l MODE command determines the protocol

12/04/07National Tsing Hua University11 Simple Two Party Transfer l GridFTP (and normal FTP) has 3 distinct components: u Client and server protocol interpreters which handle control channel protocol u Data Transfer Process which handles the accessing of actual data and its movement via the data channel DTPServer PI Client PI DTP Standard FTP Client Standard FTP Server

12/04/07National Tsing Hua University12 Simple Third Party Transfer l Client initiates data transfer between 2 servers l Client forms CC with 2 servers. l Commands routed through the client to establish DC between the two servers. l Data flows directly between servers u Client is notified by each server PI when the transfer is complete DTPServer PI DTPServer PI Client PI

12/04/07National Tsing Hua University13 Control Channel Establishment l Server listens on a well-known port (2811) l Client form a TCP Connection to server l Banner message l Authentication u Anonymous u Clear text USER /PASS u Base 64 encoded GSI handshake - Grid Security Infrastructure based on PKI l Accepted / Rejected

12/04/07National Tsing Hua University14 Data Channel Establishment l PASV command u Sent to the passive side of the transfer u Listen for a connection and reply with the contact information (IP address and port) l PORT IP,PORT u Actively establish a connection to a given passive listener

12/04/07National Tsing Hua University15 Data Channel Establishment DTP Server PI Client PI Connect IP:PORT

12/04/07National Tsing Hua University16 Data Channel Protocols l MODE Command u Allows the client to select the data channel protocol l MODE S u Stream mode, no framing u Legacy RFC959 l MODE E u GridFTP extension u Parallel TCP streams u Data channel caching Descriptor Size Offset (8 bits) (64 bits) (64 bits)

12/04/07National Tsing Hua University17 What issues are we addressing? l Striping u Storage systems are often clusters, and we need to be able to utilize all of that parallelism l Collective Operations u Essentially, the striping should be invisible to the outside world l Uniform interface u Ideally, any data source can be treated the same way

12/04/07National Tsing Hua University18 What issues are we addressing? l Network Protocol Independence u TCP has well known issues with high Bandwidth-Delay Product networks u Need to be able to take advantage of aggressive protocols on circuits. l Diverse Failure Modes u Much happening under the covers, so must be resilient to failures l End-to-End Performance u We need to be able to manage performance for a wide range of resources

12/04/07National Tsing Hua University19 What did the GridFTP protocol add? l Extended Block Mode u Data is sent in “packets” with a header containing a 64 bit offset and length u Allows out-of-order reception of packets l Restart and Performance Markers u Allows for robust restart and perf monitoring l SPAS/SPOR u Striped PASV and striped PORT u Allows a list of IP/ports to be returned

12/04/07National Tsing Hua University20 What did the GridFTP Protocol add? l Data Channel Authentication u Needed since in third party transfer, you don’t know who will connect to the listener. l ESTO/ERET u Allows for additional processing on the data prior to storage/transmission u We use this for partial file transfers l SBUF/ABUF u Manual and automatic TCP buffer tuning l Options to set parallelism/striping parameters

12/04/07National Tsing Hua University21 Parallelism vs Striping

12/04/07National Tsing Hua University22 Architecture / Design of our Implementation

12/04/07National Tsing Hua University23 Data Channel Server PI DTP Description of transfer: completely server-internal communication. Protocol is unspecified and left up to the implementation. Server PI DTP Internal IPC API Client PI Info on transfer: restart markers, performance markers, etc. Server PI optionally processes these, then sends them to the client PI Control Channels Overall Architecture

12/04/07National Tsing Hua University24 Possible Configurations Control Data Typical Installation Control Data Separate Processes Striped Server Control Data Control Striped Server (future) Data

12/04/07National Tsing Hua University25 Data Storage Interface Data Processing Module Data Channel Protocol Module Data source or sink Data channel Data Transfer Processor

12/04/07National Tsing Hua University26 Data Storage Interface l This is a very powerful abstraction l Several can be available and loaded dynamically via the ERET/ESTO commands l Anything that can implement the interface can be accessed via the GridFTP protocol l We have implemented u POSIX file (used for performance testing) u HPSS (tape system; IBM) u Storage Resource Broker (SRB; SDSC) u NeST (disk space reservation; UWis/Condor)

12/04/07National Tsing Hua University27 Globus Extensible IO (XIO) System l XIO framework presents a standard open/close/read/write interface to many different protocol implementations u including TCP, UDP, HTTP -- and now UDT l The protocol implementations are called drivers. u A driver can be dynamically loaded and stacked by any Globus XIO application.

12/04/07National Tsing Hua University28 Network Protocol Globus XIO Approach Application Disk Network Protocol Special Device Globus XIO Driver

12/04/07National Tsing Hua University29 Globus XIO Framework l Moves the data from user to driver stack. l Manages the interactions between drivers. l Assist in the creation of drivers. u Asynchronous support. u Close and EOF Barriers. u Error checking u Internal API for passing operations down the stack. User API Framework Driver Stack Transform Transport

12/04/07National Tsing Hua University30 Performance Results

12/04/07National Tsing Hua University31 Experimental results l Three settings u LAN msec RTT and 622 Mbit/s u MAN msec RTT and 1 Gbit/s u WAN - 60 msec RTT and 30 Gbit/s u MAN - Distributed Optical Testbed in the Chicago area u WAN - TeraGrid link between NCSA in Illinois and SDSC in California - each individual has a 1Gbit/s bottleneck link

12/04/07National Tsing Hua University32 Experimental results - LAN

12/04/07National Tsing Hua University33 Experimental results - MAN

12/04/07National Tsing Hua University34 Experimental results - WAN

12/04/07National Tsing Hua University35 Memory to Memory Striping Performance

12/04/07National Tsing Hua University36 Disk to Disk Striping Performance

12/04/07National Tsing Hua University37 Scalability tests l Evaluate performance as a function of the number of clients l DiPerf test framework to deploy the clients l Ran server on a 2-processor 1125 MHz x86 machine running Linux u 1.5 GB memory and 2 GB swap space u 1 Gbit/s Ethernet network connection and 1500 B MTU l Clients created on hosts distributed over PlanetLab and at the University of Chicago (UofC)

12/04/07National Tsing Hua University38 Scalability Results Left axis - load, response time, memory allocated Right axis - Throughput and CPU %

12/04/07National Tsing Hua University39 Scalability results l 1800 clients mapped in a round robin fashion on 100 PlanetLab hosts and 30 UofC hosts l A new client created once a second and ran for 2400 seconds u During this time, repeatedly requests the transfer of 10 Mbyte file from server’s disk to client’s /dev/null l Total of Gbytes transferred in 15,428 transfers

12/04/07National Tsing Hua University40 Scalability results l Server sustained 1800 concurrent requests with 70% CPU and 0.94 Mbyte memory per request l CPU usage, throughput, response time remain reasonable even when allocated memory exceeds physical memory u Meaning paging is occuring

12/04/07National Tsing Hua University41 New features

12/04/07National Tsing Hua University42 SSH security mechanism for GridFTP l GridFTP traditionally uses GSI for establishing secure connections l In some situations, preferable to use SSH security mechanism l Leverages the fact that an SSH client can remotely execute programs by forming a secure connection with SSHD

12/04/07National Tsing Hua University43 SSHFTP Interactions sshd CPI GridFTP Server 2811 Port 22 ROOT USER ssh Stdin/out

12/04/07National Tsing Hua University44 GridFTP over UDT l UDT is an application-level data transport protocol that uses UDP to transfer data l Implement its own reliability and congestion control mechanisms l Achieves good performance on high-bandwidth, high-delay networks where TCP has significant limitations l GridFTP uses Globus XIO interface to invoke network I/O operations u Created an XIO driver for UDT reference implementation u Enabled GridFTP to use it as an alternate transport protocol

12/04/07National Tsing Hua University45 GridFTP over UDT Argonne to NZ Throughput in Mbit/s Argonne to LA Throughput in Mbit/s Iperf – 1 stream Iperf – 8 streams GridFTP mem TCP – 1 stream GridFTP mem TCP – 8 streams GridFTP disk TCP – 1 stream GridFTP disk TCP – 8 streams GridFTP mem UDT GridFTP disk UDT UDT mem UDT disk

12/04/07National Tsing Hua University46 Summary l The GridFTP protocol provides a good set of features for data movement requirements in the Grid. l The Globus implementation of this protocol provides a flexible design / architecture for integrating with different communities, storage systems, and protocols. l Our implementation is robust and performant over a range of environments.

12/04/07National Tsing Hua University47 Questions?