Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.

Slides:



Advertisements
Similar presentations
Globus FTP Evaluation test Catania – 10/04/2001Antonio Forte – INFN Torino.
Advertisements

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
CENG 546 Dr. Esma Yıldırım.  A fundamental enabling technology for the "Grid," letting people share computing power, databases, and other tools securely.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
GridFTP: File Transfer Protocol in Grid Computing Networks
Presentation Two: Grid Security Part Two: Grid Security A: Grid Security Infrastructure (GSI) B: PKI and X.509 certificates C: Proxy certificates D:
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Linux Networking TCP/IP stack kernel controls the TCP/IP protocol Ethernet adapter is hooked to the kernel in with the ipconfig command ifconfig sets the.
File Transfer: FTP and TFTP
Grid Data Management Kasturi Chatterjee. 2 Motivation: The Data Problem Motivate our discussion with the large physics experiments Laser Interferometer.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
Introduction to client/server architecture
TCP/IP suit 4th Edition by Behrouz A Forouzan. 2 Internet Computing (CS-413)
John Degenhart Joseph Allen.  What is FTP?  Communication over Control connection  Communication over Data Connection  File Type  Data Structure.
File Transfer Protocol (FTP)
ORNL is managed by UT-Battelle for the US Department of Energy Globus: Proxy Lifetime Endpoint Lifetime Oak Ridge Leadership Computing Facility.
FTP File Transfer Protocol. Introduction transfer file to/from remote host client/server model  client: side that initiates transfer (either to/from.
Overview of TeraGrid Resources and Usage Selim Kalayci Florida International University 07/14/2009 Note: Slides are compiled from various TeraGrid Documentations.
GridFTP Guy Warner, NeSC Training.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
June 21-25, 2004Lecture4: Grid Data Management1 Lecture 4 Grid Data Management Jaime Frey UW-Madison Condor Group Slides prepared in.
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
COMP1321 Digital Infrastructure Richard Henson February 2014.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
GridNM Network Monitoring Architecture (and a bit about my phd) Yee-Ting Li, 1 st Year UCL, 17 th June 2002.
Directory and File transfer Services By Jothi. Two key resources Lightweight Directory Access Protocol (LDAP) File Transfer protocol Secure file transfer.
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Grid Data Management. 2 Data Management Want to move data around:  Store it long term in appropriate places (e.g., tape silos) ‏  Move input to where.
Grid Data Management. March 24-25, 2007 Grid Data Management 2 Motivation: The Data Problem Motivate our discussion with the large physics experiments.
1 TeraGrid Data Transfer Jeffrey P. Gardner Pittsburgh Supercomputing Center
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
TFTP: Trivial file transfer protocol
TCP Sockets Reliable Communication. TCP As mentioned before, TCP sits on top of other layers (IP, hardware) and implements Reliability In-order delivery.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
User Interface UI TP: UI User Interface installation & configuration.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
COMP1321 Digital Infrastructure Richard Henson March 2016.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
FTP Lecture supp.
Evaluation of “data” grid tools
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
File Transfer Protocol
Part Three: Data Management
APACHE WEB SERVER.
Presentation transcript:

Part Three: Data Management

3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy RFT C: Lab 3 — Data Management

A: Data Management — The Problem

General Principle Not all pipes are created equal.

Extremely Large Data Sets LIGO Generates data at 10 MB per second, just under 1 TB (= 1000 GB) per day Sloan Digital Sky Survey More than 15 TB of data catalogs Compact Muon Solenoid and ATLAS 100 MB per second, about 1 Petabyte (= 1000 TB) per year (per detector)

Big Files, Big Directories There are really two issues here. The individual files can be quite large How do you move such big blocks of data? How do you store such big blocks of data? The number of files to be handled can also be quite large Literally billions of filenames alone throughout a project

Data Duplication Sometimes the best way to store a file is to store it twice Local copies saves transmission times But there are new problems introduced with this approach Maintaining copies Locating copies

Data Management Questions What data and/or files exist on the grid? Where is a given file actually stored on the grid? How do I move a file from Point A to Point B?

B: Moving Data on the Grid

Requirements for Moving Data Speed Preferably, as fast as the wires will allow, i.e. no significant performance overhead Security Files should be shared only with authenticated clients Robustness Fault tolerance and general code stability

GridFTP Extends established FTP (File Transfer Protocol) Authentication via GSI Encryption Multiple parallel channels Third-party transfers Tunability for network and I/O parameters

Pedantic Semantics GridFTP is a protocol, not a utility A server or client is “GridFTP-enabled” “GridFTP” doesn’t always mean “Globus’ GridFTP-enabled server” … except that it usually does.

Globus GridFTP Server Built on top of wuftpd Hence, configuration is similar to wuftpf Runs as a inetd (xinetd) service Connection is attempted on port 2811 xinetd looks up port in /etc/services and finds responsible service xinetd starts service according to configuration with data from communication send on stdin

GridFTP Environment Variables LD_LIBRARY_PATH Point to $GLOBUS_LOCATION/lib GRIDMAP — (server side only!) Path to grid-mapfile for authentication Generic GSI environment variable X509_CERT_DIR Directory in which CA signing certificates held Generic GSI environment variable

globus-url-copy Another GridFTP client from Globus Copy files from one URL to another URL One URL is usually a gsiftp:// URL Another URL is usually a file:// URL A file, not a directory!

“globus-url-copy” syntax Server to local: $ globus-url-copy gsiftp:// file:/ Local to server: $ globus-url-copy file:/ gsiftp:// Remote server A to remote server B: $ globus-url-copy gsiftp:// \ gsiftp://

Single and Multiple Channels By default, globus-url-copy uses 1 channel Monitor performance using -vb flag globus-url-copy -vb gsiftp://ldas- cit.ligo.caltech.edu:15000/usr1/grid/smallfile file:/tmp/smallfile bytes KB/sec avg KB/sec inst Multiple channels dramatically boosts xfer rate $ globus-url-copy -vb -p 4 gsiftp://ldas- cit.ligo.caltech.edu:15000/usr1/grid/largefile file:/tmp/largefile bytes KB/sec avg KB/sec inst

More Performance Tweakage Still faster by using large TCP windows $ globus-url-copy -vb -p 4 -tcp-bs gsiftp://ldas- cit.ligo.caltech.edu:15000/usr1/grid/largefile file:/tmp/largefile bytes KB/sec avg KB/sec inst Still faster by using large memory buffers $ globus-url-copy -vb -p 4 -bs tcp-bs gsiftp://ldas- cit.ligo.caltech.edu:15000/usr1/grid/largefile file:/tmp/largefile bytes KB/sec avg KB/sec inst

What If You Can’t Authenticate? Unauthenticated, globus-url-copy is still a general purpose, single-channel URL copying tool No GSI authentication used Parallel channels etc. won’t work $ globus-url-copy file:/tmp/news

UberFTP Developed and supported at NCSA Interactive like ftp Use –a GSI for GSI authentication Supports multiple channels using –c flag $ uberftp -H ldas-grid.ligo-la.caltech.edu -a gsi 220 ligo-server.ncsa.uiuc.edu GridFTP Server 1.12 GSSAPI type Globus/GSI wu (gcc32dbg, ) ready. 230 User mfreemon logged in. uberftp>

SCP: Secure Copy scp from […] to scp scp host: scp Syntax is like cp -r flag to recursively copy directories man scp for more options

Trebuchet GUI for Grid-enabled file transfer Developed at NCSA

RFT: Reliable File Transfer An OGSA service for queuing file transfer requests Server-to-server transfers Checkpointing for restarts Database back-end for failovers Allows clients to requests transfers and then “disappear” No need to manage the transfer Status monitoring available if desired

Lab 3: Data Management

In this lab: Use SCP (Secure Copy) Use globus-url-copy Use UberFTP Use UberFTP for a third-party file move

Credits NSF disclaimer Portions of this presentation were adapted from the following sources: GryPhyN Grid Summer Workshop Jaime Frey, UW-Madison Condor Group