Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Mobile-Agent-Based Performance-Monitoring System at RHIC Richard Ibbotson.

Similar presentations


Presentation on theme: "A Mobile-Agent-Based Performance-Monitoring System at RHIC Richard Ibbotson."— Presentation transcript:

1 A Mobile-Agent-Based Performance-Monitoring System at RHIC Richard Ibbotson

2 2 Overview Motivation for a new monitoring system Motivation for a new monitoring system Design of the Instrumentation system Design of the Instrumentation system  Use of mobile agents (mobile programs vs remote procedures)  How it works, what it does and doesn’t do Practical experiences with a test instrument Practical experiences with a test instrument  What works well and what doesn’t Future enhancements Future enhancements

3 3 Monitoring System Purpose The system should: Provide performance monitoring at service-level Provide performance monitoring at service-level  “End-to-end” tests yielding mixed information on the functioning of several services  Track performance changes during configuration changes Monitor current health of system Monitor current health of system Provide some error-tracking/reporting capabilities Provide some error-tracking/reporting capabilities Be a tool for administrators & experimenters Be a tool for administrators & experimenters It will not: Provide detailed system information for fault diagnosis (system-specific, vendor-supplied tools already exist) Provide detailed system information for fault diagnosis (system-specific, vendor-supplied tools already exist)

4 4 Desired Features of the System View / compare past and current measurements View / compare past and current measurements Inspect correlations between metrics Inspect correlations between metrics Allow variation of sampling rate Allow variation of sampling rate  Automatically execute scheduled measurements  Can perform measurements on demand at shorter intervals Perform OS-independent measurements Perform OS-independent measurements Use a small fraction of available resources Use a small fraction of available resources

5 5 “Instruments” which perform measurements “Instruments” which perform measurements Centralized database of Instruments (code) and time-stamped results Centralized database of Instruments (code) and time-stamped results  Allows simple addition of new metrics  Allows previously run tests to be reproduced Mechanism for remote execution of Instruments Mechanism for remote execution of Instruments  IBM “Aglets” mobile-agent system (http://www.trl.ibm.co.jp/aglets) Components of the System code monitor sequence of measurements parameters

6 6 Mobile Agents vs. RPC Remote Procedure Call Remote Procedure Call Dataset to search Local search utility Search request User’s system Remote system A pre-defined procedure on remote host executes and returns result Mobile Agent Mobile Agent Daemon on remote host accepts agent and allows execution Dataset to search Local search utility Search request User’s system Remote system Increased network load for large agents

7 7 Advantages of Mobile Agents Metrics can be defined at any time, and implemented on the central host Metrics can be defined at any time, and implemented on the central host Performance is measured on the relevant host Performance is measured on the relevant host Aglets system is Java-based, providing platform-independent execution Aglets system is Java-based, providing platform-independent execution Sophisticated security model exists for restricting actions of the agents Sophisticated security model exists for restricting actions of the agents

8 8 Use of Mobile Agents In Monitoring Simplest approach, “Single-Remote- Host” was implemented for initial configuration Simplest approach, “Single-Remote- Host” was implemented for initial configuration Waiting between tests is done on central server for reliability Waiting between tests is done on central server for reliability Itinerary approach Single Remote Hostapproach Central server Target host Central server

9 9 Inherits from Anatomy of an Instrument storeInDB() setInvalid()...ResultInstrument loadParams() storeResult()... MobilityPattern startTrip() nextTransfer()... StatusUpdater registerWithMonitor() updateMonitor()... ParameterList loadParams() getValue(key)... SpecificInstrument onMeasuring() onInstrumentCreation()... Aglet onCreation() run()... Inherits from The code defining a specific implementation of an Instrument is  30 lines

10 10 Test Instrument: File Access NFS access time (write) used as test of concept NFS access time (write) used as test of concept File size, location (file-system) are passed as parameters in database (specified at run-time) File size, location (file-system) are passed as parameters in database (specified at run-time) Measurements are started by automated process as specified by Schedule table in database Measurements are started by automated process as specified by Schedule table in database Tested access to one file-system on several client computers: Tested access to one file-system on several client computers:  Linux (PIII) system with NFSv2, 1KB blocksize  Linux (PIII) system with NFSv2, 8KB blocksize  Linux (PIII) system with NFSv3  Solaris system with NFSv3

11 11 Report Generation Tool Sample tests are carried out automatically by a “Scheduler” Aglet Sample tests are carried out automatically by a “Scheduler” Aglet Reports are requested via an html form. Users specify a test-type, parameter-set and target host. A Perl cgi-script queries the database and plots results using Gnuplot. Reports are requested via an html form. Users specify a test-type, parameter-set and target host. A Perl cgi-script queries the database and plots results using Gnuplot.

12 12 Sample Report for File access Nightly backups Weekly de-frag Results indicate server load, client config

13 13 Problems With the Mobile Agents Transfer interrupted when several agents move to / from the same host within  1-2 sec Transfer interrupted when several agents move to / from the same host within  1-2 sec  Small size of Aglets currently used (  15KB) cannot explain the effective dead-time  The failure is presented to the Aglet as a refusal (can detect, wait and retry)  Congestion at central host can be relieved by following a “circuit” before returning (multiple hosts)

14 14 Future System Development Solve transfer interruption problem Solve transfer interruption problem Development of other mobility patterns Development of other mobility patterns  NFS read-access may be tested by writing on one host and timing a read on a different host (to avoid caching)  Use of “itinerary” can ease network congestion at the central server A tracking / error-reporting system is being developed, and will be connected to a paging system A tracking / error-reporting system is being developed, and will be connected to a paging system

15 15 Summary Initial implementation is proving useful Initial implementation is proving useful Mobile agent architecture adds design work but eases implementation, adds flexibility Mobile agent architecture adds design work but eases implementation, adds flexibility Transfer interruption causing scalability problems, but not insurmountable Transfer interruption causing scalability problems, but not insurmountable Plan to have expanded system running before data-taking begins Plan to have expanded system running before data-taking begins

16 Thanks to… David Stampf, BNL Tom Throwe, BNL Bruce Gibbard, BNL Questions... Richard Ibbotson, BNL Richard Ibbotson, BNL ibbotson@bnl.gov ibbotson@bnl.gov

17


Download ppt "A Mobile-Agent-Based Performance-Monitoring System at RHIC Richard Ibbotson."

Similar presentations


Ads by Google