Download presentation
Presentation is loading. Please wait.
Published bySara MacLeod Modified over 10 years ago
1
DPM Monitoring Wahid Bhimji University of Edinburgh, Apr-101Wahid Bhimji – Files access
2
Intro New DPM developer Alejandro Álvarez Ayllón working on new nagios based DPM monitoring List of Probes: https://twiki.cern.ch/twiki/bin/view/EGEE/LCGDMMonitoring Bridge to examples running at CERN: http://aalvarez.web.cern.ch/aalvarez/cgi/bridge.py/gt- septic/nagios3/ Hes happy to add more probes (very responsive). He also wants feedback on sensible WARN / FAIL values We can also contribute in our own probes Apr-10Wahid Bhimji – Files access2
3
LCGDM plugins Check validity of host certificates. – check_hostcert check_hostcert – Warning and critical configurable: Days until the certificate expires DB password lifetime – check_oracle_expiration check_oracle_expiration – Warning and critical configurable: Days until the password expires – Connection string, user and password can be specified Disk partitions activity (bytes/s in and out) – check_partition_activity check_partition_activity – No warning or critical criteria. – Individual disks can be selected. CPU utilization (System/Idle/IOwait/IRQ) – check_cpu check_cpu – Warning and critical configurable: Upper limit of CPU percentage per category Network activity: bytes/s in and out (and error percentage) – check_network check_network – No warning or critical criteria. – Individual interfaces can be selected Pool free space plus filesystem status – check_dpm_pool check_dpm_pool – Warning and critical configurable: Free space per subsystem or per pool. Specified as bytes (with suffixes K,M,G,T,P). – Individual pools can be selected, but no filesystems. Apr-10Wahid Bhimji – Files access3
4
LCGDM probes cont.. Collecting information about disk server activity (network, disk I/O, memory, number of connections) splitting the information between sequential I/O (gridFTP and rfcp) and random I/O (rfio and xroot) – check_process Can be used for that, excepting disk I/O and network usage (apparently a kernel patch is needed for that) check_process – Warning and critical configurable: Number of instances, % of CPU, % of memory, number of threads, number of connections, number of file descriptors. – Individual processes can be selected. DPNS ping – check_dpns check_dpns – Warning and critical configurable: ping time in millisecond. – Can be used remotely. GridFTP – check_gridftp check_gridftp – No warning criteria. Critical if a file can not be uploaded, downloaded, or the comparison is not successful. – Can be used remotely. Published information – check_dpm_infosys check_dpm_infosys – No warning criteria. Critical if any of the requests information is not being published. – Can be used remotely. RFIO – check_rfio check_rfio – Everything that applies to GridFTP probe. Can NOT be executed locally.GridFTP Apr-10Wahid Bhimji – Files access4
5
From NAGIOS itself DB activity and size – NAGIOS: check_oracle, check_mysql Number of processes and threads in use – NAGIOS: check_procs (not threads, though) Check if filesystem correctly mounted – NAGIOS: check_disk already does this Disk partitions: used and free – NAGIOS: check_disk Memory: swap, free and used – NAGIOS: check_swap Load average – NAGIOS: check_load Apr-10Wahid Bhimji – Files access5
6
From grid-monitoring Check validity of CRLs – crls from org.sam.sec crls from org.sam.sec Check validity of CAs – check_ca_dist check_ca_dist Number of sockets used for RFIO and number of sockets used for gridFTP – check_netstat.pl from Nagios Exchange can be used fot that. check_netstat.pl Socket count – check_netstat.pl does that and much more. check_netstat.pl Directory size – check_dirsize.sh may be useful. check_dirsize.sh Apr-10Wahid Bhimji – Files access6
7
Apr-10Wahid Bhimji – Files access7
8
Can plot stuff with pnp4nagios Apr-10Wahid Bhimji – Files access8
9
Conclusions / Questions This is nice - Take a look at the probes and give me or Alex some feedback Or try it out yourself. Not tied to any release http://etics- repository.cern.ch:8080/repository/pm/volatile/repo md/name/lcgdm_head_sl5_x86_64_gcc412/index.ht ml Do we want to add performance info into this? – Like what was in GridPPDPMMonitor – Summer student Martin (see DPM Stressing talk) could _maybe_ do some of that Apr-10Wahid Bhimji – Files access9
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.