Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring and performance measurement in Production Grid Environments David Wallom.

Similar presentations


Presentation on theme: "Monitoring and performance measurement in Production Grid Environments David Wallom."— Presentation transcript:

1 Monitoring and performance measurement in Production Grid Environments David Wallom

2 Overview Who uses monitoring? Aspects of performance measurement Tools for monitoring Adding a new service into a monitoring framework

3 Who are the consumers of monitoring? Grid/VO management –Responsible for designing & maintaining requirements –Verify fulfillment of SLAs by resource providers System administrators –Notified of problems –Enough information to understand context of problem End users –View results and compare to problems they are having –Debug user account/environment issues –Advanced users: feedback to Grid/VO

4 Monitoring from a user perspective Things that need to work for the Grid? –Can I login? –Is my application[s] available on connected systems? –Can I get to my input data? –What credentials do I need? –Can I get the input data to the application? –How long will my application take to run? –…

5 Performance Measurement Depends on monitoring of; –Availability –Usage

6 Measuring Availability Test the following grid functionality –User authorization –System information publishing –Data transfer to and from system –Submission of tasks onto the system Measurement of other functionality –Type of system

7 Measuring Usage Within each system need to know; –Current load e.g. queue lengths, number of running processes on an SMP system –Knowledge of network connectivity –Total throughput rate for a submitted user job

8 Tools for monitoring availability Systems status Grid status All system and grid status monitoring

9 Ganglia Developed out of HPC community, Will monitor worker as well as system head nodes, Can have sub nodes reporting to a master to create grid monitoring, Example: –http://oxgrid-vom.ierc.ox.ac.uk/ganglia/

10

11 Big Brother Designed to monitor individual systems, Simple interface giving immediate feedback on overall system status, Different providers can be added for additional services such as different process to be monitored etc. Can be difficult to look at historical trends though, Example; –http://cerb-mds.bris.ac.uk/bb/bb.htmlhttp://cerb-mds.bris.ac.uk/bb/bb.html

12

13 Grid Interoperability Test Scripts Developed by Southampton e-Science Centre, Tests in series each of the standard grid functionalities for a specified node Wrapper to test in parallel many systems Example of the results –http://www.ngs.ac.uk/ops/gits/oxford/National GridService.htmlhttp://www.ngs.ac.uk/ops/gits/oxford/National GridService.html

14

15 INCA Developed by SDSC and TeraGrid Extensible framework for monitoring Tests the following as standard –Static system information –Installed software versions –Network performance –Load both on head and queue system if available Additionally the UK NGS has developed a plug-in for the GITS tests. Example –http://inca.grid-support.ac.uk/

16

17 Testing the behaviour of a Grid Define a set of concrete requirements for connected systems Write tests to verify requirements Periodically run tests and collect data across all of the system Publish data and archive for reporting Automate Steps 3 and 4 to provide real time system status information

18 Connecting to existing production systems Determine monitoring requirements for systems to be connected Write independent tests for service being provided. Write information providers to fit tests into existing monitoring frameworks

19 Conclusions Monitoring must be based on a well known set of requirements for admins (both VO and systems) & users There are several products available to provide monitoring frameworks, each can be extended beyond initial capabilities Life would be made a lot simpler if there was a standard monitoring schema which could then be used to plug-in grid and system information into all monitoring frameworks!


Download ppt "Monitoring and performance measurement in Production Grid Environments David Wallom."

Similar presentations


Ads by Google