Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.

Similar presentations


Presentation on theme: "The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu."— Presentation transcript:

1 The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu 2 Ian Foster 1,2 1 Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A. 2 Dept of Computer Science, University of Chicago, Chicago, IL 60615, U.S.A.

2 Introduction l Problem we are addressing l A brief discussion of the GridFTP Protocol l Design / Architecture of our implementation l Performance results

3 Technology Drivers l Internet revolution: 100M+ hosts u Collaboration & sharing the norm l Universal Moores law: x10 3 /10 yrs u Sensors as well as computers l Petascale data tsunami u Gating step is analysis l & our old infrastructure? 114 genomes 735 in progress You are here

4 What issues are we addressing? l Striping u storage systems are often clusters, and we need to be able to utilize all of that parallelism l Collective Operations u essentially, the striping should be invisible to the outside world l Uniform interface u Ideally, any data source can be treated the same way

5 What issues are we addressing? l Network Protocol Independence u TCP has well known issues with high Bandwidth-Delay Product networks u Need to be able to take advantage of aggressive protocols on circuits. l Diverse Failure Modes u Much happening under the covers, so must be resilient to failures l End-to-End Performance u We need to be able to manage performance for a wide range of resources

6 The GridFTP Protocol

7 What is GridFTP? l A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol l A Protocol u Multiple Independent implementation can interoperate l This works. Fermi Lab has an implementation with their DCache system and U. Virginia has a.Net implementation that work with ours. l Lots of people have developed clients independent of the Globus Project. l The Globus Toolkit supplies a reference implementation: u Server u Client tools (globus-url-copy) u Development Libraries

8 GridFTP: The Protocol l Existing standards u RFC 959: File Transfer Protocol u RFC 2228: FTP Security Extensions u RFC 2389: Feature Negotiation for the File Transfer Protocol u Draft: FTP Extensions u GridFTP: Protocol Extensions to FTP for the Grid l Grid Forum Recommendation l GFD.20 l http://www.ggf.org/documents/GWD-R/GFD-R.020.pdf

9 What did the GridFTP protocol add? l Extended Block Mode u data is sent in packets with a header containing a 64 bit offset and length u allows out-of-order reception of packets l Restart and Performance Markers u allows for robust restart and perf monitoring l SPAS/SPOR u striped PASV and striped PORT u allows a list of IP/ports to be returned

10 What did the GridFTP Protocol add? l Data Channel Authentication u Needed since in third party transfer, you dont know who will connect to the listener. l ESTO/ERET u allows for additional processing on the data prior to storage/transmission u We use this for partial file transfers l SBUF/ABUF u manual and automatic TCP buffer tuning l Options to set parallelism/striping parameters

11 1 Establish control connection 2 Establish data connection Client Server Client 1 Establish control connection 2 Establish control connection 3 Establish data connection Client Server Model 3 rd Party Transfer Model Client/Server vs 3 rd Party

12 Parallelism vs Striping

13 Architecture / Design of our Implementation

14 Data Channel Server PI DTP Description of transfer: completely server-internal communication. Protocol is unspecified and left up to the implementation. Server PI DTP Internal IPC API Client PI Info on transfer: restart markers, performance markers, etc. Server PI optionally processes these, then sends them to the client PI Control Channels Overall Architecture

15 Possible Configurations Control Data Typical Installation Control Data Separate Processes Striped Server Control Data Control Striped Server (future) Data

16 Data Storage Interface Data Processing Module Data Channel Protocol Module Data source or sink Data channel Data Transfer Processor

17 Data Storage Interface l This is a very powerful abstraction l Several can be available and loaded dynamically via the ERET/ESTO commands l Anything that can implement the interface can be accessed via the GridFTP protocol l We have implemented u POSIX file (used for performance testing) u HPSS (tape system; IBM) u Storage Resource Broker (SRB; SDSC) u NeST (disk space reservation; UWis/Condor)

18 Extensible IO (XIO) system l Provides a framework that implements a Read/Write/Open/Close Abstraction l Drivers are written that implement the functionality (file, TCP, UDP, GSI, etc.) l Different functionality is achieved by building protocol stacks l GridFTP drivers will allow 3 rd party applications to easily access files stored under a GridFTP server l Other drivers could be written to allow access to other data stores. l Changing drivers requires minimal change to the application code. l Ported GridFTP to use UDT in less than a day u AFTER the UDT driver was written

19 Network Protocol Globus XIO Approach Application Disk Network Protocol Special Device Globus XIO Driver

20 Globus XIO Framework l Moves the data from user to driver stack. l Manages the interactions between drivers. l Assist in the creation of drivers. u Asynchronous support. u Close and EOF Barriers. u Error checking u Internal API for passing operations down the stack. User API Framework Driver Stack Transform Transport

21 What issues are we addressing? l Striping u storage systems are often clusters, and we need to be able to utilize all of that parallelism l Collective Operations u essentially, the striping should be invisible to the outside world l Uniform interface u Ideally, any data source can be treated the same way

22 What issues are we addressing? l Network Protocol Independence u TCP has well known issues with high Bandwidth-Delay Product networks u Need to be able to take advantage of aggressive protocols on circuits. l Diverse Failure Modes u Much happening under the covers, so must be resilient to failures l End-to-End Performance u We need to be able to manage performance for a wide range of resources

23 Performance Results

24 Comparison in Stream Mode

25 Parallel Stream Performance

26 Memory to Memory Striping Performance

27 Disk to Disk Striping Performance

28 Storage Performance SDSC

29 Storage Performance NCSA

30 Scalability Results

31 Summary l The GridFTP protocol provides a good set of features for data movement requirements in the Grid. l The Globus Striped Server (Zebra) implementation of this protocol provides a flexible design / architecture for integrating with different communities, storage systems, and protocols. l Our implementation is robust and performant over a range of environments.

32 Questions?


Download ppt "The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu."

Similar presentations


Ads by Google