Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
Chapter 10: Designing Databases
Data Management Expert Panel - WP2. WP2 Overview.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Grid computing Globus GridFTP & Replica Management Robert Nickel BTU - Mathematik 01.Februar 2002.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Data Management I DBMS Relational Systems. Overview u Introduction u DBMS –components –types u Relational Model –characteristics –implementation u Physical.
Basics Globus Toolkit™ Developer Tutorial The Globus Project™ Argonne National Laboratory USC Information Sciences Institute Copyright.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
SeLeNe - Architecture George Samaras Kyriakos Karenos Larnaca – April 2003 THE UNIVERSITY OF CYPRUS.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
File and Object Replication in Data Grids Chin-Yi Tsai.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Major Grid Computing Initatives Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Magda Distributed Data Manager Status Torre Wenaus BNL ATLAS Data Challenge Workshop Feb 1, 2002 CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
Data Intensive Computing on the Grid: Architecture & Technologies Presented by: Ian Foster Mathematics and Computer Science Division Argonne National Laboratory.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Globus – Part II Sathish Vadhiyar. Globus Information Service.
ATLAS Magda Distributed Data Manager Torre Wenaus BNL PPDG Robust File Replication Meeting Jefferson Lab January 10, 2002.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Globus Presented by: Yayati Kasralikar for CPA 5937.
NorduGrid plans and questions for gLite Marko Niinimaki, NorduGrid 3 rd EGEE meeting Athens, April 2005.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Management The European DataGrid Project Team
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
1 Data Management for Internet Backplane Protocol by Tang Ming Assoc/Prof. Francis Lee School of Computer Engineering, Nanyang Technological University,
Current Globus Developments Jennifer Schopf, ANL.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Databases and DBMSs Todd S. Bacastow January 2005.
The Data Grid: Towards an architecture for Distributed Management
Evaluation of “data” grid tools
File System Implementation
OGSA Data Architecture Scenarios
Advanced Operating Systems Chapter 11 Distributed File systems 11
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000

Replica Management l Maintain a mapping between logical names for files and collections and one or more physical locations l we define a replica to be a “managed copy of a file”. –The replica management system controls where and when copies are created, and provides information about where copies are located. However, the system does not make any statements about file consistency. In other words, it is possible for copies to get out of date with respect to one another, if a user chooses to modify a copy.

Our Approach to Replica Management l Identify replica cataloging and reliable replication as two fundamental services –Layer on other Grid services: GSI, transport, information service –Use LDAP as catalog format and protocol, for consistency –Use as a building block for other tools l Advantage –These services can be used in a wide variety of situations

A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk Array Disk Cache Application Replica Selection Multiple Locations NWS Selected Replica Performance Information and Predictions Replica Location 1Replica Location 2Replica Location 3 MDS Reliable Transport Reliable Replication

Replica Manager Components l Replica catalog definition –LDAP object classes for representing logical- to-physical mappings in an LDAP catalog l Low-level replica catalog API –globus_replica_catalog library –Manipulates replica catalog: add, delete, etc. l High-level reliable replication API –globus_replica_manager library –Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.

Replica Catalog Structure: A Climate Modeling Example Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location jupiter.isi.edu Location sprite.llnl.gov Logical File Feb 1998 Size: Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical Collection C02 measurements 1999

Replica Catalog API l globus_replica_catalog_collection_create() –Create a new logical collection l globus_replica_catalog_collection_open() –Open a connection to an existing collection l globus_replica_catalog_location_create() –Create a new location (replica) of a complete or partial logical collection l globus_replica_catalog_collection_list_filenames() –List all logical files in a collection l globus_replica_catalog_location_search_filenames() –Search for the locations (replicas) that contain a copy of all the specified files

Replica Management API l globus_replica_management_register_files: – Register a set of files at a source location in a replica catalog. l globus_replica_management_copy_files: –Replicate a set of files from a source location to a destination: i.e., copy the files and update the replica catalog. l globus_replica_management_is_current: –Function that returns a boolean vector that indicates if the specified files are out of date, with respect to a user-provided comparison function (Note: Just how to implement this function remains to be seen.) l globus_replica_management_update_files: –Update a set of files from a source location to a destination. l globus_replica_management_delete_files: –Delete a set of replicas from a specified location: i.e., delete the files and update the replica catalog. l globus_replica_management_synchronize_filena mes() –Ensure that the location object for a physical storage directory correctly reflects the contents of the directory

Replica Catalog Services as Building Blocks: Examples l Combine with information service to build replica selection services –E.g. “find best replica” using performance info from NWS and MDS –Use of LDAP as common protocol for info and replica services makes this easier l Combine with application managers to build data distribution services –E.g., build new replicas in response to frequent accesses

Relationship to Metadata Catalogs l Metadata services describe data contents –Have defined a simple set of object classes l Must support a variety of metadata catalogs –MCAT being one important example –Others include LDAP catalogs, HDF l Community metadata catalogs –Agree on set of attributes –Produce names needed by replica catalog: >Logical collection name >Logical file name

SRB Server MCAT FTP Transport Interface GSI Enabled FTP Server Globus Client Transport API GSI FTP Protocol Misc. FTP Clients Globus Server Transport API SRB Client API GSI FTP Protocol l FTP access to SRB-managed collections l SRB access to Grid-enabled storage systems Globus and SRB: Integration Plan

Outstanding Issues l What write consistency should we support? l Methodology for handling updates l Access Control l Replicating the replica catalog l Replication of partial files l Alternate catalog views: files belong to more than one logical collection l Intermediate feedback required (callbacks) l Timing

Status l Grid FTP and catalog management API and tools in alpha test l Demonstration applications with climate data l SRB/Globus data grid services integration underway l Replica Management API under design l Grid based access control strategy under design

Globus Data-Intensive Services Architecture Library Program Legend globus-url-copy Replica Programs Custom Servers globus_gass_copy globus_ftp_client globus_ftp_control globus_commonGSI (security) globus_ioOpenLDAP client globus_replica_catalog globus_replica_manager Custom Clients globus_gass_transfer globus_gass Released In Alpha

The End