Grid Coordination by Using the Grid Coordination Protocol R. Harakaly, F. Bonnassieux, P. Primet Presented by: Laurent LEFEVRE CNRS-UREC, Lyon, FRANCE INRIA RESO, LIP (UMR CNRS, ENS, INRIA, UCB), Lyon, FRANCE
Outline Why do we need grid scheduling? Grid Coordination Protocol Features Architecture Multiple ring support Robustness Security One time token User Interface Implementation and Results Network monitoring Configuration coordination Network Topology Discovery Summary 17 January 2019 GAN 2004
Why do we need grid scheduling? Centralized services: VO servers CRL distribution servers Configuration servers Distributed services Network monitoring and discovery 17 January 2019 GAN 2004
Grid Coordination Protocol Based on the Probes Coordination Protocol (PCP) Generalized functions, not focused only to the network monitoring Ring with token approach Multiple ring support with inter-ring host locking for scalability Used for: Network monitoring synchronization Coordination of the configuration updates Scheduling of information distribution 17 January 2019 GAN 2004
Features Openness: Possibility to schedule any service needed Flexibility/Customizability: Full and easy (re)configuration/parametrization of the service on the remote nodes. Robustness/Reliability: Necessity to provide fully reliable service Scalability: Possibility to schedule big number of members Security: Distributed information and participating member nodes must be secure. One time token: information distribution on demand 17 January 2019 GAN 2004
GCP Architecture Distributed architecture Scalability No central information source No single point of failure Distributed token registration Distributed functions Scalability Ring: logical group of services Support of multiple rings Possibility to build hierarchy of rings 17 January 2019 GAN 2004
Multi-ring support Required by need of: Support of scalability by creation of the ring hierarchy Scheduling of different services (e.g. CRL update, topogrid, Iperf, etc.) Multiple independent rings: danger of possible collision Critical for active network measurements 17 January 2019 GAN 2004
Inter-Ring Experiment Collision Collision possibility: In case of multiple independent rings sharing one or more hosts Ring1 members {1, 2, 6, 7} Ring2 members {3, 4, 5, 7} Solution: Inter-ring host locking Two measurements on the same host 2 3 1 7 4 ! 6 5 17 January 2019 GAN 2004
GCP host locking mechanism Unable to lock destination Source and destination host locking Conflicting experiments are delayed due to lock on the host BLOCKED 2 3 1 7 4 6 5 17 January 2019 GAN 2004
GCP Robustness Distributed architecture No single point of failure In case of failure of one measurement host, GCP will bypass it without any impact on a service periodicity In case of reliable service the failure report can be created for later successful finishing of the task Protocols based on token passing face to problems connected with lost and/or duplicated token. Timeout based token recovery mechanism Token_ID and regenerating_host_ID based duplicate token elimination 17 January 2019 GAN 2004
GCP Security Three main security issues: Host Security: Impossibility to start non-approved service on the host, or action which compromises the host security Token Security: Integrity of the token cannot be modified on the way User Authentication: Assign owner to the token and base any token manipulation and service on this information 17 January 2019 GAN 2004
One Time Token New feature Token passes once through all member nodes. Used for: Non-periodic/on demand/interactive services On demand CRL update Ad Hoc monitoring measurements On demand/interactive active network monitoring probes Plan: Add possibility to define an arbitrary number of passes. 17 January 2019 GAN 2004
User Interface Set of utilities is provided for easy manipulation (creation, deletion, update, ..) of the rings and for an external GCP host (un)locking. C and JAVA API for embedding of GCP client functionality (ring creation, modification, etc.) is prepared. 17 January 2019 GAN 2004
edg-gcpd-admin output [hary@ccwp7 bin]$ ./edg-gcpd-admin -L grid-nm.ifae.es GCP daemon version: 2.0.7 Reporting node: 192.101.162.78 Ring name: pinger, token id: 940, options = 0 Token status: NORMAL Token state: WAITING Period 1800, Delay 60, Timeout 600 Command: edg-pinger Last execution timestamp: Fri Apr 9 10:50:14 2004 Members: 134.158.105.254 137.138.225.18 141.52.160.24 130.246.187.145 193.136.90.138 193.206.210.133 131.154.99.101 192.101.162.78 192.16.186.229 ... 17 January 2019 GAN 2004
Implementation and results Most of presented use cases are already deployed on the application testbed of the European DataGrid project. 17 January 2019 GAN 2004
Network monitoring Scheduling of the set of distributed network monitoring sensors Scalability problems solved by multilayer monitoring architecture Inter-ring locking used for avoiding the concurrent measurements between two rings Fr Backbone ring Es It 17 January 2019 GAN 2004
Experiment periodicity measurement count period token regeneration Periodicity [s] 5 10 15 20 118 120 122 124 126 128 130 100 200 300 400 500 600 700 17 January 2019 GAN 2004
Network monitoring configuration Network monitoring management cannot be completely distributed. It is always centralized in one (or several) network operation centers. Monitoring nodes then downloads the configuration files from these centers. GCP enables to create the easily maintainable and configurable upgrade scenarios This approach is easily applicable for any service which publish the information on a central node like (CA CRL updates, VO servers, etc.) 17 January 2019 GAN 2004
Network Topology Discovery 17 January 2019 GAN 2004
Summary GCP is a generic coordination protocol for grid control and management services Stability and usability were demonstrated on the use cases already implemented in the EDG DataGrid project Download: http://ccwp7.in2p3.fr Questions: robert.harakaly@urec.cnrs.fr 17 January 2019 GAN 2004