A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.

Slides:



Advertisements
Similar presentations
The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
Advertisements

GridFTP Challenges In Data Transport John Bresnahan Argonne National Laboratory The University of Chicago.
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
COM vs. CORBA.
High Performance Computing Course Notes Grid Computing.
GridFTP: File Transfer Protocol in Grid Computing Networks
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
GridFTP Guy Warner, NeSC Training.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
CEDPS: Center for Enabling Distributed Petascale Science Brian Tierney Lawrence Berkeley National Laboratory
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Reliable Data Movement Framework for Distributed Science Environments Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
TeraPaths TeraPaths: establishing end-to-end QoS paths - the user perspective Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
Module 7: Fundamentals of Administering Windows Server 2008.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Microsoft and Community Tour 2011 – Infrastrutture in evoluzione Community Tour 2011 Infrastrutture in evoluzione.
Jozef Goetz, Application Layer PART VI Jozef Goetz, Position of application layer The application layer enables the user, whether human.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Reliable Data Movement Framework for Distributed Petascale Science Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Managed Object Placement Service John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Argonne National Lab.
Nick LeRoy & Jeff Weber Computer Sciences Department University of Wisconsin-Madison Managing.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
VMware vSphere Configuration and Management v6
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Netprog: Client/Server Issues1 Issues in Client/Server Programming Refs: Chapter 27.
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
1 Network Communications A Brief Introduction. 2 Network Communications.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Netscape Application Server
Vincenzo Spinoso EGI.eu/INFN
Introduction to Data Management in EGI
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Network Requirements Javier Orellana
Issues in Client/Server Programming
NeST: Network Storage Technologies
Presentation transcript:

A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC Center for Enabling Distributed Petascale Science (CEDpS)

Overview Brief CEDPS overview Focus on data movement Managed Object Placement Service (MOPS) –Internal resource management (awareness) GFork capability –External awareness & interaction NEST (Network Storage Technology)

Petascale Data Challenge DOE facilities generate many petabytes of data (2 petabytes = all U. S. academic research libraries!) Massive data U U U U U DOE facilities Remote users (at labs universities, industry) need data! Rapid, reliable access key to maximizing value of $B facilities U Remote distributed users U U

Reliable: recover from many failures Predictable: data arrives when scheduled Secure: protect expensive resources & data Scalable: deal with many users & much data Bridging the Divide (1): Move Data to Users When & Where Needed C B A Fast: >10,000x faster than usual Internet “Deliver this 100 Terabytes to locations A, B, C by 9am tomorrow”

Flexible: easy integration of functions Secure: protect expensive resources & data Scalable: deal with many users & much data Bridging the Divide (2): Allow Users to Move Computation Near Data A Science services: provide analysis functions near data source “Perform my computation F on datasets X, Y, Z” Y Z X F

Instrument: include monitoring points in all system components Monitor: collect data in response to problems Diagnose: identify the source of problems Bridging the Divide (3): Troubleshoot End-to-End Problems C B A “Why did my data transfer (or remote operation) fail?” Identify & diagnose failures & performance problems

What is GridFTP Widely used, open source, production quality data mover –Separate control and data channels –Parallel streams (~3-5x faster than TCP/IP) –Parallel stripes (multiple servers) –Partial file transfer –Multiple security options (GSI, SSH) –Third party control –Extensible for both file system & protocols

GridFTP Modularity Data Storage Interfaces (DSI) -POSIX -SRB -HPSS -NEST GridFTP Server -separate control, data -striping XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI -SSH Client Interfaces -Globus-URL-Copy -C Library -RFT (3 rd party) I/O File Systems Clients

GridFTP Advanced Configurations GFork (Internal awareness) –Robust unix fork/setuid model –Allows server state to be maintained across connections Dynamic backends –Stability in the event of backend failure –Growing resource pools for peak demands Storage/Access Allocation (External awareness) –NEST (Network Storage Technology)

Why is awareness important? Currently, GridFTP does everything it is asked If asked, GridFTP in a worst case scenario could: –Use all available memory & buffers on the server –Write until the file system is full –Slow down all the transfers when overloaded (Worst case scenarios do not happen very often) Many tools designed to work around these limitations –SRM, DCache, … Services should be able to protect both themselves and their environments

GFork (Internal Awareness) Client Server Host GFork Server GridFTP Plugin GridFTP Server Instance Fork GridFTP Server Instance GridFTP Server Instance State Sharing Link Client Inherited Links Control Channel Connections

External Awareness: Why storage allocations ? Users need both temporary storage, and long-term guaranteed storage. Administrators need a storage solution with configurable limits and policy. Administrators will benefit from NeST’s autonomous reclamations of expired storage allocations.

External Awareness: GridFTP + NeST GridFTP Server NeST Callout Disk Storage NeST Server NeST Client Negotiator globus-url-copy (Lot operations, etc.) (File transfers) (GSI-FTP)

Overview of NeST NeST: Network Storage Technology Lightweight: Configuration and installation can be performed in minutes. Multi-protocol: Supports Chirp, GridFTP, NFS, HTTP –Chirp is NeST’s internal protocol Secure: GSI authentication Allocation: NeST negotiates “mini storage contracts” between users and server.

Storage allocations in NeST Lot – abstraction for storage allocation with an associated handle –Handle is used for all subsequent operations on this lot Client requests lot of a specified size and duration. Server accepts or rejects client request.

External Awareness Architecture Client GridFTP Server ACL Plugin DSI Plugin Main Codebase NEST

ACL Plugin Authorize/Init –Grant access Yes/No –Plugin establishes context (initializes state for future requests) Create/Modify/Read a file –Given pathname and size –Creates a transaction Update Transaction –Plug in may timeout waiting –Progessively commit bytes as ‘complete’ –Finished flag

Granting Access Client GridFTP Server ACL Plugin DSI Plugin Main Codebase Client connects GSI ID Allow? Y 230 Enter GSI Handshake Now known ID sent to auth plugin Do whatever needed to determine if allowed Notify client of access NEST

Recieving a File Client GridFTP Server ACL Plugin DSI Plugin Main Codebase Path/size Allow? Y 150 Begin RECV file Reserve Space Start transfer Receive Bytes Update Transaction Transaction Complete NEST

Notes Sending a file –Same interactions as receiving, only simpler (no space reservation) ACLs can be chained together –Chaining semantics still being worked out

Using NeST Init –NeST can use the client username/GSI subject to initialize. Create/modify –Reserve space with a given timeout Pathname is key to transaction If expires reservation and uncommitted data is lost Update –Commit bytes, reset timeout. Complete –Clean up state

Conclusion Services Must be able to protect themselves Awareness of environment (Internal & External) is key Managed Object Placement Service –Straight-forward technology advancements –Capability greater than sum of parts Invitation to work together…