Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paul Graham Software Architect, EPCC +44 131 650 4992 PCP – The P robes C oordination P rotocol A secure, robust framework.

Similar presentations


Presentation on theme: "Paul Graham Software Architect, EPCC +44 131 650 4992 PCP – The P robes C oordination P rotocol A secure, robust framework."— Presentation transcript:

1 Paul Graham Software Architect, EPCC p.graham@epcc.ed.ac.uk@epcc.ed.ac.uk +44 131 650 4992 PCP – The P robes C oordination P rotocol A secure, robust framework for scheduling and coordinating regular tasks across multiple sites

2 AHM 20082 Overview Background Motivation The Probes Coordination Protocol New implementation PCP implementation features Summary

3 AHM 20083 Background Work has spanned three projects –European Data Grid (EDG) 2001-2004 –Enabling Grids for eScience (EGEE/EGEE-II) 2004-2008 –Joint Information Systems Committee (JISC) NPM 2008-2009 Network performance measurements –The collection of monitoring data in a Grid environment –Grid users want to know the expected performance of their network-based application –e2emonit, gridmon

4 AHM 20084 Motivation Issues for collecting monitoring data –Different measurement types –End to end –Backbone –Different tools –Different formats –Heterogeneous environments –Grid! –Many administrative domains –Different user groups

5 AHM 20085 The problem - sites Deployment of monitoring tools is not so easy –There has to be a clear benefit to the site before they install tools –This benefit is not obvious until after an incident has occurred, by which time it is too late… –Firewall changes may be difficult –Technically or politically –Tools need to be trivial to install and robust when running –Sys-admins very busy –Need to carefully consider scheduling for end-to-end tests –Overlapping measurements –Network overload

6 AHM 20086 The problem - users Users need to be able to start, stop and adjust the measurements –Potentially on remote administrative domains Traditionally system administrators manually set up, start and stop cron jobs for the tools –This caused various problems for scalability, coordination and basic practicalities

7 AHM 20087 Solution:The Probes Coordination Protocol Developed to solve the management overhead of running active measurement probes Token-based mechanism to co-ordinate periodic execution of monitoring tasks –But has other applications Initially developed as part of EDG (Robert Harakaly et al.) –Prototype implementation in C: usable but lacking some features Re-engineered and extended by EPCC to address these issues

8 AHM 20088 PCP Operation Client/Server model Based on a system of tokens passed between sites Client submits tokens to a site Server acts upon the arrival of a token –registers and monitors job tokens –Performs function defined by an admin token Sites are grouped into cliques

9 AHM 20089 PCP Token Trigger for activity at a site Job token –Name – an identifier –Delay – time to wait before executing the job for the first time –Period – frequency of command –Command – indicator of which command to run at the sites –Member(s) – sites in the clique to run the command Admin token –List - for retrieving data about the activities currently registered at a site –Kill – destroys the named clique activity –Clear – removes (i.e. deregisters) all the activities from a site –Update – modifies the named clique activity with the new token message (enables changes to values such as the period) –Exit – stops the PCP server at the given site Also can include security information

10 AHM 200810 PCP Clique The clique represents a group of sites, all of which are required to run a particular activity at particular intervals Example: will look at clique with three sites, A, B and C...

11 AHM 200811 Example PCP Token # Lines beginning with # are ignored as comments # name:PJG-EPCC-PCP_TEST member:sitea.epcc.ed.ac.uk member:siteb.epcc.ed.ac.uk member:sitec.epcc.ed.ac.uk period:1800 timeout:0 delay:300 command:pcp_test owner:somebody@epcc.ed.ac.uk lockDependent:true

12 AHM 200812 PCP normal operation

13 AHM 200813 PCP Site failure operation

14 AHM 200814 PCP Lock operation Individual sites may temporarily wish to drop out of a clique Previously required inter-site coordination to stop/restart commands Enabled via a locking mechanism –Administrator sets the lock –Lock dependent tokens are not allowed to execute –Lock either expires or is removed by administrator –The site operates normally as part of the clique

15 AHM 200815 PCP Features For NPM, prevents overlapping measurements –Probe will not run until token received Extensible “plug-in” design Communication –TCP/IP Security –VOMS/X.509 based authentication –Limited set of commands can be run Logging –Configurable to various levels –Security-related messages straightforwardly distinguishable Portable –Pure java

16 AHM 200816 Summary Protocol provides a means for scheduling regular tasks at multiple sites with minimal overheads for both users and administrators Software is: –Portable –Secure –Robust –Extensible Available for download: http://www.egee-npm.org/pcp/ http://www.egee-npm.org/pcp/ Any questions? Thank you p.graham@epcc.ed.ac.uk


Download ppt "Paul Graham Software Architect, EPCC +44 131 650 4992 PCP – The P robes C oordination P rotocol A secure, robust framework."

Similar presentations


Ads by Google