Paul Graham Software Architect, EPCC +44 131 650 4992 PCP – The P robes C oordination P rotocol A secure, robust framework.

Slides:



Advertisements
Similar presentations
This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Advertisements

Chapter 20 Oracle Secure Backup.
Lesson 17: Configuring Security Policies
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Tcl Agent : A flexible and secure mobile-agent system Paper by Robert S. Gray Dartmouth College Presented by Vipul Sawhney University of Pennsylvania.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 10: Server Administration.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Maintaining and Updating Windows Server 2008
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
1 Chapter Overview Creating User and Computer Objects Maintaining User Accounts Creating User Profiles.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW Understand the difference between service.
11 SYSTEMS ADMINISTRATION AND TERMINAL SERVICES Chapter 12.
Module 16: Software Maintenance Using Windows Server Update Services.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW  Understand the difference between service.
1 Chapter Overview Creating Sites and Subnets Configuring Intersite Replication Troubleshooting Active Directory Replication.
DONE-10: Adminserver Survival Tips Brian Bowman Product Manager, Data Management Group.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Networking Security Chapter 8 powered by dj. Chapter Objectives  Explain various security threats  Monitor security in Windows Vista  Explain basic.
Instant Messaging for the Workplace A pure collaborative communication tool that does not distract users from their normal activities.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Module 7: Fundamentals of Administering Windows Server 2008.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
INFSO-RI Enabling Grids for E-sciencE Federated Network Performance Monitoring for the Grid K. Kavoussanakis, EPCC, The University.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
DCE (distributed computing environment) DCE (distributed computing environment)
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Module 9: Preparing to Administer a Server. Overview Introduction to Administering a Server Configuring Remote Desktop to Administer a Server Managing.
Computer Emergency Notification System (CENS)
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
© 2002, Cisco Systems, Inc. All rights reserved..
1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 14: Windows Server 2003 Security Features.
Shuman Guo CSc 8320 Advanced Operating Systems
Core 3: Communication Systems. Network software includes the Network Operating Software (NOS) and also network based applications such as those running.
Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.
Chapter 3 - VLANs. VLANs Logical grouping of devices or users Configuration done at switch via software Not standardized – proprietary software from vendor.
Lundi 7 décembre 2015 Lavoisier. Motivations data sources provided by many partners –heterogeneity of used technologies objectives –reduce complexity.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update on Network Performance Monitoring.
The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no Workflow repository, user.
Grid Interoperability Update on GridFTP tests Gregor von Laszewski
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
1 Configuring Sites Configuring Site Settings Configuring Inter-Site Replication Troubleshooting Replication Maintaining Server Settings.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Mario Reale – GARR NetJobs: Network Monitoring Using Grid Jobs.
© 2001, Cisco Systems, Inc. CSPFA 2.0—16-1 Chapter 16 Cisco PIX Device Manager.
EGI-InSPIRE EGI-InSPIRE RI Network Troubleshooting and PerfSONAR-Lite_TSS Mario Reale GARR.
ConfigMgr Discovering and Organizing Resources Mariusz Zarzycki, Phd, MCT, MCTS, MCITP, MCSE, MCSA.....
Module 9: Preparing to Administer a Server
GWE Core Grid Wizard Enterprise (
Grid Coordination by Using the Grid Coordination Protocol
PLANNING A SECURE BASELINE INSTALLATION
Module 9: Preparing to Administer a Server
Presentation transcript:

Paul Graham Software Architect, EPCC PCP – The P robes C oordination P rotocol A secure, robust framework for scheduling and coordinating regular tasks across multiple sites

AHM Overview Background Motivation The Probes Coordination Protocol New implementation PCP implementation features Summary

AHM Background Work has spanned three projects –European Data Grid (EDG) –Enabling Grids for eScience (EGEE/EGEE-II) –Joint Information Systems Committee (JISC) NPM Network performance measurements –The collection of monitoring data in a Grid environment –Grid users want to know the expected performance of their network-based application –e2emonit, gridmon

AHM Motivation Issues for collecting monitoring data –Different measurement types –End to end –Backbone –Different tools –Different formats –Heterogeneous environments –Grid! –Many administrative domains –Different user groups

AHM The problem - sites Deployment of monitoring tools is not so easy –There has to be a clear benefit to the site before they install tools –This benefit is not obvious until after an incident has occurred, by which time it is too late… –Firewall changes may be difficult –Technically or politically –Tools need to be trivial to install and robust when running –Sys-admins very busy –Need to carefully consider scheduling for end-to-end tests –Overlapping measurements –Network overload

AHM The problem - users Users need to be able to start, stop and adjust the measurements –Potentially on remote administrative domains Traditionally system administrators manually set up, start and stop cron jobs for the tools –This caused various problems for scalability, coordination and basic practicalities

AHM Solution:The Probes Coordination Protocol Developed to solve the management overhead of running active measurement probes Token-based mechanism to co-ordinate periodic execution of monitoring tasks –But has other applications Initially developed as part of EDG (Robert Harakaly et al.) –Prototype implementation in C: usable but lacking some features Re-engineered and extended by EPCC to address these issues

AHM PCP Operation Client/Server model Based on a system of tokens passed between sites Client submits tokens to a site Server acts upon the arrival of a token –registers and monitors job tokens –Performs function defined by an admin token Sites are grouped into cliques

AHM PCP Token Trigger for activity at a site Job token –Name – an identifier –Delay – time to wait before executing the job for the first time –Period – frequency of command –Command – indicator of which command to run at the sites –Member(s) – sites in the clique to run the command Admin token –List - for retrieving data about the activities currently registered at a site –Kill – destroys the named clique activity –Clear – removes (i.e. deregisters) all the activities from a site –Update – modifies the named clique activity with the new token message (enables changes to values such as the period) –Exit – stops the PCP server at the given site Also can include security information

AHM PCP Clique The clique represents a group of sites, all of which are required to run a particular activity at particular intervals Example: will look at clique with three sites, A, B and C...

AHM Example PCP Token # Lines beginning with # are ignored as comments # name:PJG-EPCC-PCP_TEST member:sitea.epcc.ed.ac.uk member:siteb.epcc.ed.ac.uk member:sitec.epcc.ed.ac.uk period:1800 timeout:0 delay:300 command:pcp_test lockDependent:true

AHM PCP normal operation

AHM PCP Site failure operation

AHM PCP Lock operation Individual sites may temporarily wish to drop out of a clique Previously required inter-site coordination to stop/restart commands Enabled via a locking mechanism –Administrator sets the lock –Lock dependent tokens are not allowed to execute –Lock either expires or is removed by administrator –The site operates normally as part of the clique

AHM PCP Features For NPM, prevents overlapping measurements –Probe will not run until token received Extensible “plug-in” design Communication –TCP/IP Security –VOMS/X.509 based authentication –Limited set of commands can be run Logging –Configurable to various levels –Security-related messages straightforwardly distinguishable Portable –Pure java

AHM Summary Protocol provides a means for scheduling regular tasks at multiple sites with minimal overheads for both users and administrators Software is: –Portable –Secure –Robust –Extensible Available for download: Any questions? Thank you