Grid Coordination by Using the Grid Coordination Protocol

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
Administering Active Directory
Hands-On Microsoft Windows Server 2003 Administration Chapter 3 Administering Active Directory.
Understanding Active Directory
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
Chapter 7 Configuring & Managing Distributed File System
Network Topologies.
WP6: Grid Authorization Service Review meeting in Berlin, March 8 th 2004 Marcin Adamski Michał Chmielewski Sergiusz Fonrobert Jarek Nabrzyski Tomasz Nowocień.
© 2013 Cisco System Inc. All rights reserved Cisco Confidential 1 © 2013 Cisco System Inc. All rights reserved. 1 Allow System Distribution Lists to be.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
11 MANAGING AND DISTRIBUTING SOFTWARE BY USING GROUP POLICY Chapter 5.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
1 Chapter Overview Introducing Replication Planning for Replication Implementing Replication Monitoring and Administering Replication.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Scaling Networks Scaling Networks.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
TIDEN Node Management Texas Integrated Data Exchange Node Partnered with.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Module 4 Planning for Group Policy. Module Overview Planning Group Policy Application Planning Group Policy Processing Planning the Management of Group.
Paul Graham Software Architect, EPCC PCP – The P robes C oordination P rotocol A secure, robust framework.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
Chapter 1 Characterization of Distributed Systems
Jean-Philippe Baud, IT-GD, CERN November 2007
SmartCenter for Pointsec - MI
rain technology (redundant array of independent nodes)
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Introduction to Distributed Platforms
Maintaining Windows Server 2008 File Services
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Requirements for Ring Protection in MPLS-TP
GWE Core Grid Wizard Enterprise (
Securing the Network Perimeter with ISA 2004
Peer-to-peer networking
Network Requirements Javier Orellana
#01 Client/Server Computing
Leigh Grundhoefer Indiana University
CLUSTER COMPUTING.
Consistency and Replication
Fundamentals of Databases
Computer communications
An Introduction to Software Architecture
The GENIUS Security Services
SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691
Health & Consumers DG SANCO Unit A.4 Information systems
Network Architecture By Dr. Shadi Masadeh 1.
Requirements Date: Authors: March 2010 Month Year
CAD DESK PRIMAVERA PRESENTATION.
Issues of Scaling LAN Session 4321 SHARE 85 Pat Berastegui-Egen.
Distributed Systems and Concurrency: Distributed Systems
#01 Client/Server Computing
Condor-G: An Update.
Pig Hive HBase Zookeeper
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Grid Coordination by Using the Grid Coordination Protocol R. Harakaly, F. Bonnassieux, P. Primet Presented by: Laurent LEFEVRE CNRS-UREC, Lyon, FRANCE INRIA RESO, LIP (UMR CNRS, ENS, INRIA, UCB), Lyon, FRANCE

Outline Why do we need grid scheduling? Grid Coordination Protocol Features Architecture Multiple ring support Robustness Security One time token User Interface Implementation and Results Network monitoring Configuration coordination Network Topology Discovery Summary 17 January 2019 GAN 2004

Why do we need grid scheduling? Centralized services: VO servers CRL distribution servers Configuration servers Distributed services Network monitoring and discovery 17 January 2019 GAN 2004

Grid Coordination Protocol Based on the Probes Coordination Protocol (PCP) Generalized functions, not focused only to the network monitoring Ring with token approach Multiple ring support with inter-ring host locking for scalability Used for: Network monitoring synchronization Coordination of the configuration updates Scheduling of information distribution 17 January 2019 GAN 2004

Features Openness: Possibility to schedule any service needed Flexibility/Customizability: Full and easy (re)configuration/parametrization of the service on the remote nodes. Robustness/Reliability: Necessity to provide fully reliable service Scalability: Possibility to schedule big number of members Security: Distributed information and participating member nodes must be secure. One time token: information distribution on demand 17 January 2019 GAN 2004

GCP Architecture Distributed architecture Scalability No central information source No single point of failure Distributed token registration Distributed functions Scalability Ring: logical group of services Support of multiple rings Possibility to build hierarchy of rings 17 January 2019 GAN 2004

Multi-ring support Required by need of: Support of scalability by creation of the ring hierarchy Scheduling of different services (e.g. CRL update, topogrid, Iperf, etc.) Multiple independent rings: danger of possible collision Critical for active network measurements 17 January 2019 GAN 2004

Inter-Ring Experiment Collision Collision possibility: In case of multiple independent rings sharing one or more hosts Ring1 members {1, 2, 6, 7} Ring2 members {3, 4, 5, 7} Solution: Inter-ring host locking Two measurements on the same host 2 3 1 7 4 ! 6 5 17 January 2019 GAN 2004

GCP host locking mechanism Unable to lock destination Source and destination host locking Conflicting experiments are delayed due to lock on the host BLOCKED 2 3 1 7 4 6 5 17 January 2019 GAN 2004

GCP Robustness Distributed architecture No single point of failure In case of failure of one measurement host, GCP will bypass it without any impact on a service periodicity In case of reliable service the failure report can be created for later successful finishing of the task Protocols based on token passing face to problems connected with lost and/or duplicated token. Timeout based token recovery mechanism Token_ID and regenerating_host_ID based duplicate token elimination 17 January 2019 GAN 2004

GCP Security Three main security issues: Host Security: Impossibility to start non-approved service on the host, or action which compromises the host security Token Security: Integrity of the token cannot be modified on the way User Authentication: Assign owner to the token and base any token manipulation and service on this information 17 January 2019 GAN 2004

One Time Token New feature Token passes once through all member nodes. Used for: Non-periodic/on demand/interactive services On demand CRL update Ad Hoc monitoring measurements On demand/interactive active network monitoring probes Plan: Add possibility to define an arbitrary number of passes. 17 January 2019 GAN 2004

User Interface Set of utilities is provided for easy manipulation (creation, deletion, update, ..) of the rings and for an external GCP host (un)locking. C and JAVA API for embedding of GCP client functionality (ring creation, modification, etc.) is prepared. 17 January 2019 GAN 2004

edg-gcpd-admin output [hary@ccwp7 bin]$ ./edg-gcpd-admin -L grid-nm.ifae.es GCP daemon version: 2.0.7 Reporting node: 192.101.162.78 Ring name: pinger, token id: 940, options = 0 Token status: NORMAL Token state: WAITING Period 1800, Delay 60, Timeout 600 Command: edg-pinger Last execution timestamp: Fri Apr 9 10:50:14 2004 Members: 134.158.105.254 137.138.225.18 141.52.160.24 130.246.187.145 193.136.90.138 193.206.210.133 131.154.99.101 192.101.162.78 192.16.186.229 ... 17 January 2019 GAN 2004

Implementation and results Most of presented use cases are already deployed on the application testbed of the European DataGrid project. 17 January 2019 GAN 2004

Network monitoring Scheduling of the set of distributed network monitoring sensors Scalability problems solved by multilayer monitoring architecture Inter-ring locking used for avoiding the concurrent measurements between two rings Fr Backbone ring Es It 17 January 2019 GAN 2004

Experiment periodicity measurement count period token regeneration Periodicity [s] 5 10 15 20 118 120 122 124 126 128 130 100 200 300 400 500 600 700 17 January 2019 GAN 2004

Network monitoring configuration Network monitoring management cannot be completely distributed. It is always centralized in one (or several) network operation centers. Monitoring nodes then downloads the configuration files from these centers. GCP enables to create the easily maintainable and configurable upgrade scenarios This approach is easily applicable for any service which publish the information on a central node like (CA CRL updates, VO servers, etc.) 17 January 2019 GAN 2004

Network Topology Discovery 17 January 2019 GAN 2004

Summary GCP is a generic coordination protocol for grid control and management services Stability and usability were demonstrated on the use cases already implemented in the EDG DataGrid project Download: http://ccwp7.in2p3.fr Questions: robert.harakaly@urec.cnrs.fr 17 January 2019 GAN 2004