The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.
RLS and DRS Roadmap Items Ann Chervenak Robert Schuler USC Information Sciences Institute.
Globus Workshop at CoreGrid Summer School 2006 Dipl.-Inf. Hamza Mehammed Leibniz Computing Centre.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Database Architectures and the Web
Lightweight Preservation Environment Gary Jackson.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
The Globus Toolkit Gary Jackson. Introduction The Globus Toolkit is a product of the Globus Alliance ( It is middleware for developing.
Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
GT4 Introductory and Advanced Practicals Rachana Ananthakrishnan, Charles Bacon, Lisa Childers Argonne National Laboratory University of Chicago.
14.1 “Grid-enabling” applications ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
1 Globus Developments Malcolm Atkinson for OMII SC 18 th January 2005.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
Overview of GT4 Data Services. Globus Data Services Talk Outline Summarize capabilities and plans for data services in the Globus Toolkit Version
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
1 / 18 Federal University of Rio de Janeiro – COPPE/UFRJ Author : Wladimir S. Meyer – Doctorate Student Advisors : Jano Moreira de Souza – Ph.D. Milton.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
1 Overall Architectural Design of the Earth System Grid.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Research data management using Globus ESIP Summer Meeting 2015 Rachana Ananthakrishnan University of Chicago
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Current Globus Developments Jennifer Schopf, ANL.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Evaluation of “data” grid tools
A Replica Location Service
Presentation transcript:

The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute

The Data Replication Service l Included in the Tech Preview of GT4.0 release l Design is based on the publication component of the Lightweight Data Replicator system u Developed by Scott Koranda from U. Wisconsin at Milwaukee l Functionality u Replicate a set of files in the Grid on a local site u Users identify a set of desired files u DRS queries Replica Location Service to discover current locations of these files u Creates local replicas of desired files using the Reliable File Transfer Service u Registers new replicas in Replica Location Service for discovery

Motivation for DRS l Need for higher-level data management services that integrate lower-level Grid functionality u Efficient data transfer (GridFTP, RFT) u Replica registration and discovery (RLS) u Eventually validation of replicas, etc. l Goal is to generalize the custom data management systems developed by several application communities l Eventually plan to provide a suite of general, configurable, higher-level data management services l DRS is the first of these services

Relationship to Other Globus Services At requesting site, deploy: l WS-RF Services u Data Replication Service u Delegation Service u Reliable File Transfer Service l Pre WS-RF Components u Replica Location Service (Local Replica Catalog and Replica Location Index) u GridFTP Server

DRS Functionality l Initiate a DRS Request l Discover and select among replicas that act as source locations for data copies l Transfer data to local site to create new replicas l Register new replicas in catalogs

Initiating a DRS Request l Client uses GT4 Delegation Service to create a delegated credential that may be used by other services to act on behalf of user l Client creates a request file containing a replication request description including: u desired logical files u destination URLs l Client sends message to DRS to create the Replicator resource and passes the request file’s URL l Replicator retrieves the request file

Replica Discovery and Selection l Replicator queries the Globus Replica Location Service in a two-step process to discover locations of desired files: u Query local site’s Replica Location Index to find the catalogs at remote sites that contain mappings for the requested files u Query remote Local Replica Catalogs to get the physical file names of the replicas l Replicator selects source file for each file to be copied u Current implementation chooses randomly u A callout is provided for more sophisticated replica selection decisions based on state of Grid

File Transfers to Create New Replicas l The Replicator initiates a reqeust with Globus Reliable File Transfer Service u Creates RFT resource that holds state for each data transfer l Control passes from DRS to RFT, which also retrieves the delegated credential from the Delegation Service l RFT coordinates the file transfers l Transfers are performed by GridFTP servers at the source and destination sites l After transfers complete, the Replicator checks status of each file in the transfer request

Registration of New Replicas l Replicator adds mappings for the newly created replicas to its Globus RLS Local Replica Catalog l Local Replica Catalog updates Replica Location Indexes to make new replicas visible throughout Grid

Performance Measurements: Wide Area Testing l The destination for the pull-based transfers is located in Los Angeles u Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet u Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS l The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois u Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk u Runs a GT4 container as well as GridFTP and RLS services

DRS Operations Measured l Create the DRS Replicator resource l Discover source files for replication using local RLS Replica Location Index and remote RLS Local Replica Catalogs l Initiate an Reliable File Transfer operation by creating an RFT resource l Perform RFT data transfer(s) l Register the new replicas in the RLS Local Replica Catalog

Experiment 1: Replicate 10 Files of Size 10 Gigabytes Component of Operation Time (milliseconds) Create Replicator Resource317.0 Discover Files in RLS Create RFT Resource Transfer Using RFT Register Replicas in RLS l Data transfer time dominates l Wide area data transfer rate of 67.4 Mbits/sec

Experiment 2: Replicate 1000 Files of Size 10 Megabytes Component of Operation Time (milliseconds) Create Replicator Resource Discover Files in RLS 9.8 Create RFT Resource Transfer Using RFT Register Replicas in RLS l Time to create Replicator and RFT resources is larger u Need to store state for 1000 outstanding transfers l Data transfer time still dominates l Wide area data transfer rate of 85 Mbits/sec

Future Work l We will continue performance testing of DRS: u Increasing the size of the files being transferred u Increasing the number of files per DRS request l Add and refine DRS functionality as it is used by applications u E.g., add a push-based replication capability l We plan to develop a suite of general, configurable, composable, high-level data management services