Download presentation
Presentation is loading. Please wait.
Published byJocelyn Davis Modified over 9 years ago
1
Your university or experiment logo here Performance Monitoring Gidon Moont g.moont@imperial.ac.uk e-Science, HEP, Imperial College London Talk to JRA1 All-Hands Meeting @ CERN
2
Your university or experiment logo here 24 March 2006Performance Monitoring Introduction How we gather data. How we release the information. –Real Time Monitor –LCG Load Monitor –Daily Reports –XML files and ROOT analysis Interesting metrics
3
Your university or experiment logo here 24 March 2006Performance Monitoring How we gather data The data comes from direct queries of the mySQL databases of Resource Brokers. Around 30 Resource Brokers currently monitored. Queries once a minute. –find all jobs that had an event in the last minute –retrieve status and CE/WN information –write a complete (XML) description of all jobs –remove jobs that have finished status after 2 hours (or if Cleared) –As a job is removed, query all events and write a summary file Multithreaded (one thread per RB) Java program.
4
Your university or experiment logo here 24 March 2006Performance Monitoring Current RB List gdrb01.cern.ch lcgrb01.gridpp.rl.ac.ukrb01.pic.es gdrb02.cern.ch gfe01.hep.ph.ic.ac.ukrb-egee.bifi.unizar.es gdrb03.cern.ch egee-rb-01.cnaf.infn.itgrid09.lal.in2p3.fr gdrb04.cern.ch egee-rb-02.cnaf.infn.itnode04.datagrid.cea.fr gdrb06.cern.ch egee-rb-03.cnaf.infn.itmu3.matrix.sara.nl gdrb07.cern.ch gridit-rb-01.cnaf.infn.itrb.isabella.grnet.gr gdrb08.cern.ch a01-004-127.gridka.derb101.grid.ucy.ac.cy gdrb09.cern.ch grid-rb0.desy.degrid151.kfki.hu gdrb10.cern.ch grid-rb2.desy.delcg16.sinp.msu.ru gdrb11.cern.ch lcg00124.grid.sinica.edu.tw rb.phy.bg.ac.yu ui.ulakbim.gov.tr
5
Your university or experiment logo here 24 March 2006Performance Monitoring Real Time Monitor The Real Time Monitor has developed from a demo to show real time usage of the LCG Further development will include sortable tables of RB/CE info Java applet - does not require extra libraries
6
Your university or experiment logo here 24 March 2006Performance Monitoring LCG Load Monitor Requested as a tool to monitor London Tier 2 Java Application Can monitor RBs, CEs, and groups of CEs (eg a T2) Jobs colour coded by VO (stacked) Sortable table of all current jobs
7
Your university or experiment logo here 24 March 2006Performance Monitoring Daily Reports PDF documents created automatically at 3am Provides counts and metrics for all jobs that left the RTM in a 24 hour period Analysis split by –Resource Brokers –Virtual Organisation –Computing Element Metrics can identify problems Data used to generate reports is available as a tab delimited plain text file on request
8
Your university or experiment logo here 24 March 2006Performance Monitoring XML Files and ROOT Information from each RB is presented as an XML file For efficiency reasons the RTM and LCG Load programs use a single plain text file To see long term trends, the data is imported into ROOT. Graphs can then be made with larger data sets, and time dependent trends can be shown. We currently have data for half a year (from September 2005 - now) ROOT file available on request
9
Your university or experiment logo here 24 March 2006Performance Monitoring Interesting Metrics We can identify RB problems by looking at the match time for jobs. We have established that all RBs slow down with more than 10 jobs/second being submitted. We can show VO behaviour by average job lengths and success rates, as well as the usage of LCG components (RBs/CEs used) and the number of users (unique DNs). We can measure CE/VO efficiency by both the fraction of successful jobs AND by the amount of computational WN time that resulted in a Done (Success) state against the total time of all jobs (including those that failed) - labeled as “Useful Time”.
10
Your university or experiment logo here 24 March 2006Performance Monitoring RB Match Times Job scheduling (Match Time) versus load (mean number of jobs/sec during the matching)
11
Your university or experiment logo here 24 March 2006Performance Monitoring DNs over time / VO We can see weekends, as well as relative users per VO
12
Your university or experiment logo here 24 March 2006Performance Monitoring Useful Time Useful time for those CEs that had more than 30000 jobs submitted from September 2005 - February 2006 inclusive.
13
Your university or experiment logo here 24 March 2006Performance Monitoring URLS etc. http://gridportal.hep.ph.ic.ac.uk/rtm/ lcg-monitor@imperial.ac.uk
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.