Grid computing Globus GridFTP & Replica Management Robert Nickel BTU - Mathematik 01.Februar 2002.

Slides:



Advertisements
Similar presentations
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
High Performance Computing Course Notes Grid Computing.
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Introduction to Grid Computing The Globus Project™ Argonne National Laboratory USC Information Sciences Institute Copyright (c)
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Workload Management Massimo Sgaravatto INFN Padova.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
An easy way to manage Relational Databases in the Globus Community Sandro Fiore ISUFI/ Center for Advanced Computational Technologies Director: prof. Giovanni.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
CoG Kit Overview Gregor von Laszewski Keith Jackson.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
File and Object Replication in Data Grids Chin-Yi Tsai.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
The Anatomy of the Grid: An Integrated View of Grid Architecture Ian Foster, Steve Tuecke Argonne National Laboratory The University of Chicago Carl Kesselman.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Major Grid Computing Initatives Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Grid Computing Environments Grid: a system supporting the coordinated resource sharing and problem-solving in dynamic, multi-institutional virtual organizations.
Data Intensive Computing on the Grid: Architecture & Technologies Presented by: Ian Foster Mathematics and Computer Science Division Argonne National Laboratory.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Globus Presented by: Yayati Kasralikar for CPA 5937.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
The Data Grid: Towards an architecture for Distributed Management
Evaluation of “data” grid tools
University of Technology
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
Proposed Grid Protocol Architecture Working Group
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
INFNGRID Workshop – Bari, Italy, October 2004
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Grid computing Globus GridFTP & Replica Management Robert Nickel BTU - Mathematik 01.Februar 2002

Gliederung Globus Architecture GridFTP – Transportmechanismen bisher – Warum gerade FTP ? – Features von GridFTP – Ist-Stand der Implementation Globus Replica-Management 1.Globus 2.GridFTP 1.bisher 2.warum FTP 3.Features 4.Ist-Stand 3.Replica Management

Globus Architecture 1.Globus 2.GridFTP 1.bisher 2.warum FTP 3.Features 4.Ist-Stand 3.Replica Management Grid Security Interface Resource Management Architecture Information Management Architecture Data Management Architecture

Motivation l Need to manage large scientific computing datasets –Terabytes or petabytes shared by researchers around the world –Read-only data, “published” by experiments l Replicate portions of the data set in multiple locations –Local control, reduce access times, provide fault tolerance l Discover replicas and select the best replica for a necessary data transfer

Beispielanwengungen Klima Gemeinschaft – Sharing, remote access to and analysis of Terascale climate model datasets GriPhyN (Grid Physics Network) – Petascale Virtual Data Grids Distance visualization – Remote navigation through large datasets, with local and/or remote computing

Daten Intensive Belange beinhalten … potenziell große Anzahl von Daten, Speicher, Netzwerkressourcen verteilt in verschiedenen administrativen domains Respect lokale und globale policies governing what can be used Schedule Ressourcen effizient,gegenüber dem subjekt welches lokalisiert and global constraints Achieve high performance, with respect to both speed and reliability Catalog software and virtual data

Data Intensive Computing and Grids The term “Data Grid” is often used – Unfortunate as it implies a distinct infrastructure, which it isn’t; but easy to say Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, … Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained Fortunately this seems easy to do!

Examples of Desired Data Grid Functionality High-speed, reliable access to remote data Automated discovery of “best” copy of data Manage replication to improve performance Co-schedule compute, storage, network “Transparency” wrt delivered performance Enforce access control on data Allow representation of “global” resource allocation policies Central Q: How must Grid architecture be extended to support these functions?

Grid Protocols, Services, Tools Protocol-mediated access to resources – Mask local heterogeneities – Extensible to allow for advanced features – Negotiate multi-domain security, policy – “Grid-enabled” resources speak protocols – Multiple implementations are possible Broad deployment of protocols facilitates creation of Services that provide integrated view of distributed resources Tools use protocols and services to enable specific classes of applications

“Daten Grid” Architektur- elemente CPU resource manager Enquiry (LDAP) Access (GRAM) Stor age Storage resource manager Enquiry (LDAP) Access (???) … Location cataloging Metadata cataloging Virtual Data cataloging … Replica selection Attribute-based lookup Reliable replication Virtual Data Caching Task mgmt (Condor-G) Data request management … A P P L I C A T I O N S

The Globus Data Grid Services Two major components: 1. Data Transport and Access l Common protocol l Secure, efficient, flexible, extensible data movement l Family of tools supporting this protocol 2. Replica Management Architecture l Simple scheme for managing: l multiple copies of files l collections of files APIs, white papers:

Motivation for a Common Data Access Protocol Existing distributed data storage systems – DPSS, HPSS: focus on high-performance access, utilize parallel data transfer, striping – DFS: focus on high-volume usage, dataset replication, local caching – SRB: connects heterogeneous data collections, uniform client interface, metadata queries Problems – Incompatible protocols Each require custom client Partitions available data sets and storage devices – Each protocol has subset of desired functionality

A Common, Secure, Efficient Data Access Protocol Common, extensible transfer protocol Decouple low-level data transfer mechanisms from the storage service Advantages: – New, specialized storage systems are automatically compatible with existing systems – Existing systems have richer data transfer functionality Interface to many storage systems – HPSS, DPSS, file systems – Plan for SRB integration

Common Data Access Protocol and Storage Resource Managers Grid encompasses “dumb” & “smart” storage All support base functionality – “Put” and “get” as essential mechanisms – Integrated security mechanisms, of course Storage Resource Managers can enhance functionality of selected storage systems – E.g., progress, reservation, queuing, striping – Plays a role exactly analogous to “Compute Resource Manager” Common protocol means all can interoperate

And the Universal Protocol is … Grid-FTP Why FTP? – Ubiquity enables interoperation with many commodity tools – Already supports many desired features, easily extended to support others – Well understood and supported We use the term Grid-FTP to refer to – Transfer protocol which meets requirements – Family of tools which implement the protocol Note Grid-FTP > FTP Note that despite name, Grid-FTP is not restricted to file transfer!

Grid-FTP: Basic Approach FTP is defined by several IETF RFCs Start with most commonly used subset – Standard FTP: get/put etc., 3 rd -party transfer Implement standard but often unused features – GSS binding, extended directory listing, simple restart Extend in various ways, while preserving interoperability with existing servers – Striped/parallel data channels, partial file, automatic & manual TCP buffer setting, progress monitoring, extended restart

The Grid-FTP Family of Tools Patches to existing FTP code – GSI-enabled versions of existing FTP client and server, for high-quality production code Custom-developed libraries – Implement full GSI-FTP protocol, targeting custom use, high-performance Custom-developed tools – Servers and clients with specialized functionality and performance

Replica Management Maintain a mapping between logical names for files and collections and one or more physical locations Important for many applications – Example: CERN HLT data Multiple petabytes of data per year Copy of everything at CERN (Tier 0) Subsets at national centers (Tier 1) Smaller regional centers (Tier 2) Individual researchers will have copies

Our Approach to Replica Management Identify replica cataloging and reliable replication as two fundamental services – Layer on other Grid services: GSI, transport, information service – Use LDAP as catalog format and protocol, for consistency – Use as a building block for other tools Advantage – These services can be used in a wide variety of situations

Replica Manager Components Replica catalog definition – LDAP object classes for representing logical-to- physical mappings in an LDAP catalog Low-level replica catalog API – globus_replica_catalog library – Manipulates replica catalog: add, delete, etc. High-level reliable replication API – globus_replica_manager library – Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.

Replica Catalog Structure: A Climate Modeling Example Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location jupiter.isi.edu Location sprite.llnl.gov Logical File Feb 1998 Size: Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical Collection C02 measurements 1999

A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica gsiftp commands Performance Information and Predictions Replica Location 1Replica Location 2Replica Location 3 MDS

Relationship to Metadata Catalogs Metadata services describe data contents – Have defined a simple set of object classes Must support a variety of metadata catalogs – MCAT being one important example – Others include LDAP catalogs, HDF Community metadata catalogs – Agree on set of attributes – Produce names needed by replica catalog: Logical collection name Logical file name

Globus and SRB: Integration Plan SRB Server MCAT FTP Transport Interface GSI Enabled FTP Server Globus Client Transport API GSI FTP Protocol Misc. FTP Clients Globus Server Transport API SRB Client API GSI FTP Protocol FTP access to SRB-managed collections SRB access to Grid-enabled storage systems

Status Grid FTP and catalog management API and tools in alpha test Demonstration applications with climate data SRB/Globus data grid services integration underway Replica Management API under design Grid based access control strategy under design