Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.

Slides:



Advertisements
Similar presentations
The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
Advertisements

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Chapter 17 Networking Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
GridFTP: File Transfer Protocol in Grid Computing Networks
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
Distributed Processing, Client/Server, and Clusters
Technical Architectures
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Socket Programming.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Chapter 4.1 Interprocess Communication And Coordination By Shruti Poundarik.
Introduction to client/server architecture
File Transfer Protocol (FTP)
Lecture slides prepared for “Business Data Communications”, 7/e, by William Stallings and Tom Case, Chapter 8 “TCP/IP”.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Chapter Two Application Layer Prepared by: Dr. Bahjat Qazzaz CS Dept. Sept
Presentation on Osi & TCP/IP MODEL
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Layer Architecture of Network Protocols
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 Version 3.0 Module 11 TCP Application and Transport.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
UDT: UDP based Data Transfer Protocol, Results, and Implementation Experiences Yunhong Gu & Robert Grossman Laboratory for Advanced Computing / Univ. of.
The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
1 Networking Chapter Distributed Capabilities Communications architectures –Software that supports a group of networked computers Network operating.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Application Layer Khondaker Abdullah-Al-Mamun Lecturer, CSE Instructor, CNAP AUST.
Managed Object Placement Service John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Argonne National Lab.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Chapter 131 Distributed Processing, Client/Server, and Clusters Chapter 13.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Connect communicate collaborate Performance Metrics & Basic Tools Robert Stoy, DFN EGI TF, Madrid September 2013.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
IST 201 Chapter 11 Lecture 2. Ports Used by TCP & UDP Keep track of different types of transmissions crossing the network simultaneously. Combination.
Chapter Objectives In this chapter, you will learn:
Module 4 Remote Login.
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Working at a Small-to-Medium Business or ISP – Chapter 7
Chapter 3: Windows7 Part 4.
Working at a Small-to-Medium Business or ISP – Chapter 7
An Introduction to Computer Networking
Working at a Small-to-Medium Business or ISP – Chapter 7
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
Outline Problem DiskRouter Overview Details Real life DiskRouters
Presentation transcript:

Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago

3 August 2005The Ohio State University2 Outline l Introduction l Features l Motivation l Architecture l Globus XIO l Experimental Results

3 August 2005The Ohio State University3 What is GridFTP? l In Grid environments, access to distributed data is very important l Distributed scientific and engineering applications require: u Transfers of large amounts of data between storage systems, and u Access to large amounts of data by many geographically distributed applications and users for analysis, visualization etc l GridFTP - a general-purpose mechanism for secure, reliable and high performance data movement.

3 August 2005The Ohio State University4 Features l Standard FTP features get/put etc, third party control of data transfer u User at one site to initiate, monitor and control data transfer operation between two other sites l Authentication, data integrity, data confidentiality u Support Generic Security Services (GSS)- API authentication of the connections u Support user-controlled levels of data integrity and/or confidentiality

3 August 2005The Ohio State University5 Features l Parallel data transfer u Multiple transport streams between a single source and destination l Striped data transfer u 1 or more transport streams between m network endpoints on the sending side and n network endpoints on the receiving side

3 August 2005The Ohio State University6 Features l Partial file transfer u Some applications can benefit from transferring portions of file u For example, analyses that require access to subsets of massive object-oriented database files l Manual/Automatic control of TCP buffer sizes l Support for reliable and restartable data transfer

3 August 2005The Ohio State University7 Motivation l Continued commoditization of end system devices means the data sources and sinks are often clusters u A common configuration might have individual nodes connected by 1 Gbit/s Ethernet connection to a switch connected to external network at 10 Gbit/s or faster u Striped data movement - data distributed across a set of computers at one end is transferred to remote set of computers

3 August 2005The Ohio State University8 Motivation l Data sources and sinks come in many shapes and sizes u Clusters with local disks, clusters with parallel file system, geographically distributed data sources, archival storage systems u Enable clients to access such sources and sinks via a uniform interface u Make it easy to adapt our system to support different sources and sinks

3 August 2005The Ohio State University9 Motivation l Standard protocol for network data transfer remains TCP. l TCP’s congestion control algorithm can lead to poor performance u Particularly in default configurations and on paths with high bandwidth and high round trip time l Solutions include: u Careful tuning of TCP parameters, multiple TCP connections and substitution of alternate UDP based reliable protocols l We want to support such alternatives

3 August 2005The Ohio State University10 Architecture l GridFTP (and normal FTP) use (at least) two separate socket connections: u A control channel for carrying the commands and responses u A Data Channel for actually moving the data l GridFTP (and normal FTP) has 3 distinct components: u Client and server protocol interpreters which handle control channel protocol u Data Transfer Process which handles the accessing of actual data and its movement via the data channel

3 August 2005The Ohio State University11 Architecture l These components can be combined in various ways to create servers with different capabilities. u For example, combining the server PI and DTP components in one process creates a conventional FTP server u A striped server might use one server PI on the head node of a cluster and a DTP on all other nodes.

3 August 2005The Ohio State University12 Globus GridFTP Architecture

3 August 2005The Ohio State University13 Architecture l The server PI handles the control channel exchange. l In order for a client to contact a GridFTP server u Either the server PI must be running as a daemon and listening on a well known port (2811 for GridFTP), or u Some other service (such as inetd) must be listening on the port and be configured to invoke the server PI. l The client PI then carries out its protocol exchange with the server PI.

3 August 2005The Ohio State University14 Architecture l When the server PI receives a command that requires DTP activity, the server PI passes the description of the transfer to DTP l After that, DTP can carry out the transfer on its own. l The server PI then simply acts as a relay for transfer status information. u Performance markers, restart markers, etc.

3 August 2005The Ohio State University15 Architecture l PI-to-DTP communications are internal to the server u Protocol used can evolve with no impact on the client. l The data channel communication structure is governed by data layout. u If the number of nodes at both ends is equal, each node communicates with just one other node. u Otherwise, each sender makes a connection to each receiver, and sends data based on data offsets.

3 August 2005The Ohio State University16

3 August 2005The Ohio State University17 Data transfer pipeline l Data Transfer Process is architecturally, 3 distinct pieces: u Data Access Module, Data processing module and Data Channel Protocol Module

3 August 2005The Ohio State University18 Data access module l Number of storage systems in use by the scientific and engineering community u Distributed Parallel Storage System (DPSS) u High Performance Storage System (HPSS) u Distributed File System (DFS) u Storage Resource Broker (SRB) u HDF5 l Use incompatible protocols for accessing data and require the use of their own clients

3 August 2005The Ohio State University19 Data access module l It provides a modular pluggable interface to data storage systems. l Conceptually, the data access module is very simple. l It consist of several function signatures and a set of semantics. l When a new module is created, programmer implements the functions to provide the semantics associated with them.

3 August 2005The Ohio State University20 Data processing module l The data processing module - provides ability to manipulate the data prior to transmission. u Compression, on-the-fly concatenation of multiple files etc u Currently handled via the data access module u In future we plan to make this a separate module

3 August 2005The Ohio State University21 Data channel protocol module l This module handles the operations required to fetch data from, or send data to, the data channel. u A single server may support multiple data channel protocols u the MODE command is used to select the protocol to be used for a particular transfer. We use Globus eXtensible Input/Output (XIO) system as the data channel protocol module interface u Currently support two bindings: Stream-mode TCP and Extended Block Mode TCP

3 August 2005The Ohio State University22 Extended block mode l Striped and parallel data transfer require support for out-of-order data delivery l Extended block mode supports out-of- sequence data delivery l Extended block mode header l Descriptor is used to indicate if this block is a EOD marker, restart marker etc Descriptor 8 bits Byte Count 64 bits Offset 64 bits

3 August 2005The Ohio State University23 Globus XIO l Simple Open/Close/Read/Write API l Make single API do many types of IO l Many Grid IO needs can be treated as a stream of bytes l Open/close/read/write functionality satisfies most requirements l Specific drivers for specific protocols/devices

3 August 2005The Ohio State University24 Typical approach

3 August 2005The Ohio State University25 Globus XIO approach

3 August 2005The Ohio State University26 Experimental results l Three settings u LAN msec RTT and 622 Mbit/s u MAN msec RTT and 1 Gbit/s u WAN - 60 msec RTT and 30 Gbit/s u MAN - Distributed Optical Testbed in the Chicago area u WAN - TeraGrid link between NCSA in Illinois and SDSC in California - each individual has a 1Gbit/s bottleneck link

3 August 2005The Ohio State University27 Experimental results - LAN

3 August 2005The Ohio State University28 Experimental results - MAN

3 August 2005The Ohio State University29 Experimental results - WAN

3 August 2005The Ohio State University30 Memory to Memory Striping Performance

3 August 2005The Ohio State University31 Disk to Disk Striping Performance

3 August 2005The Ohio State University32 Scalability tests l Evaluate performance as a function of the number of clients l DiPerf test framework to deploy the clients l Ran server on a 2-processor 1125 MHz x86 machine running Linux u 1.5 GB memory and 2 GB swap space u 1 Gbit/s Ethernet network connection and 1500 B MTU l Clients created on hosts distributed over PlanetLab and at the University of Chicago (UofC)

3 August 2005The Ohio State University33 Scalability Results Left axis - load, response time, memory allocated Right axis - Throughput and CPU %

3 August 2005The Ohio State University34 Scalability results l 1800 clients mapped in a round robin fashion on 100 PlanetLab hosts and 30 UofC hosts l A new client created once a second and ran for 2400 seconds u During this time, repeatedly requests the transfer of 10 Mbyte file from server’s disk to client’s /dev/null l Total of Gbytes transferred in 15,428 transfers

3 August 2005The Ohio State University35 Scalability results l Server sustained 1800 concurrent requests with 70% CPU and 0.94 Mbyte memory per request l CPU usage, throughput, response time remain reasonable even when allocated memory exceeds physical memory u Meaning paging is occuring