DPM Monitoring Wahid Bhimji University of Edinburgh, Apr-101Wahid Bhimji – Files access.

Slides:



Advertisements
Similar presentations
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Advertisements

Generic MPI Job Submission by the P-GRADE Grid Portal Zoltán Farkas MTA SZTAKI.
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
DPM Basics and its status and plans Wahid Bhimji University of Edinburgh GridPP Storage Workshop – Apr 2010 Apr-101Wahid Bhimji – DPM.
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America SRM Installation and Configuration.
DPM Name Server (DPNS) Namespace Authorization Location of physical files DPM Server Requests queuing and processing Space Management SRM Servers v1.1,
The Premier Software Usage Analysis and Reporting Toolset CELUG Presentation – May 12, 2010 LT-Live : License Tracker’s License Server Monitor.
HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.
Silberschatz, Galvin and Gagne  Operating System Concepts Common System Components Process Management Main Memory Management File Management.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
Guest Processor: 12 Cores Memory: 32 GB Disks: 2-C: 4-D: Network: 1GB Guest Processor: 12 Cores Memory: 32 GB Disks: 2-C: 4-D: Network: 1GB Guest.
R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Performant and Future Proof: MySQL, Memcache and Raspberry Pi.
System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
Network Monitoring Manage your business without blowing your budget. Learn how the Calhoun ISD utilizes free “Open Source” tools for real-time monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
Alejandro Alvarez Ayllon on behalf of the LCGDM developer team IT/SDC 13/12/2013 DAV support in DPM.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
VOMS: Status & Plans Vincenzo Ciaschini, Valerio Venturi MWSG Meeting, CERN, Feb
DPM Python tools Ivan Calvet IT/SDC-ID DPM Workshop 10 th October 2014.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America SRM + gLite IO Server install Emidio Giorgio.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 in DPM Sophie Lemaitre Jean-Philippe.
Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Overview of DMLite Ricardo Rocha ( on behalf of the LCGDM team.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
SESEC Storage Element (In)Security hepsysman, RAL 0-1 July 2009 Jens Jensen.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Monitoring Alfresco with Nagios/Icinga Toni de la Fuente Alfresco Senior Solutions Engineer Blog: blyx.com
RI EGI-TF 2010, Tutorial Managing an EGEE/EGI Virtual Organisation (VO) with EDGES bridged Desktop Resources Tutorial Robert Lovas, MTA SZTAKI.
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Security recommendations DPM Jean-Philippe Baud CERN/IT.
Enabling Grids for E-sciencE INFN Workshop – May 7-11 Rimini 1 Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO.
System Components Operating System Services System Calls.
INFSO-RI Enabling Grids for E-sciencE GUMS vs. LCMAPS Oscar Koeroo.
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
Administering the SOWN Network David R Newman & Chris Malton.
Application or server monitoring
DPM Installation Configuration
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Diskpool and cloud storage benchmarks used in IT-DSS
Tango Administrative Tools
StoRM Architecture and Daemons
Oracle Database Monitoring and beyond
Adding Objects To Nagios 3.0
Presentation transcript:

DPM Monitoring Wahid Bhimji University of Edinburgh, Apr-101Wahid Bhimji – Files access

Intro New DPM developer Alejandro Álvarez Ayllón working on new nagios based DPM monitoring List of Probes: Bridge to examples running at CERN: septic/nagios3/ Hes happy to add more probes (very responsive). He also wants feedback on sensible WARN / FAIL values We can also contribute in our own probes Apr-10Wahid Bhimji – Files access2

LCGDM plugins Check validity of host certificates. – check_hostcert check_hostcert – Warning and critical configurable: Days until the certificate expires DB password lifetime – check_oracle_expiration check_oracle_expiration – Warning and critical configurable: Days until the password expires – Connection string, user and password can be specified Disk partitions activity (bytes/s in and out) – check_partition_activity check_partition_activity – No warning or critical criteria. – Individual disks can be selected. CPU utilization (System/Idle/IOwait/IRQ) – check_cpu check_cpu – Warning and critical configurable: Upper limit of CPU percentage per category Network activity: bytes/s in and out (and error percentage) – check_network check_network – No warning or critical criteria. – Individual interfaces can be selected Pool free space plus filesystem status – check_dpm_pool check_dpm_pool – Warning and critical configurable: Free space per subsystem or per pool. Specified as bytes (with suffixes K,M,G,T,P). – Individual pools can be selected, but no filesystems. Apr-10Wahid Bhimji – Files access3

LCGDM probes cont.. Collecting information about disk server activity (network, disk I/O, memory, number of connections) splitting the information between sequential I/O (gridFTP and rfcp) and random I/O (rfio and xroot) – check_process Can be used for that, excepting disk I/O and network usage (apparently a kernel patch is needed for that) check_process – Warning and critical configurable: Number of instances, % of CPU, % of memory, number of threads, number of connections, number of file descriptors. – Individual processes can be selected. DPNS ping – check_dpns check_dpns – Warning and critical configurable: ping time in millisecond. – Can be used remotely. GridFTP – check_gridftp check_gridftp – No warning criteria. Critical if a file can not be uploaded, downloaded, or the comparison is not successful. – Can be used remotely. Published information – check_dpm_infosys check_dpm_infosys – No warning criteria. Critical if any of the requests information is not being published. – Can be used remotely. RFIO – check_rfio check_rfio – Everything that applies to GridFTP probe. Can NOT be executed locally.GridFTP Apr-10Wahid Bhimji – Files access4

From NAGIOS itself DB activity and size – NAGIOS: check_oracle, check_mysql Number of processes and threads in use – NAGIOS: check_procs (not threads, though) Check if filesystem correctly mounted – NAGIOS: check_disk already does this Disk partitions: used and free – NAGIOS: check_disk Memory: swap, free and used – NAGIOS: check_swap Load average – NAGIOS: check_load Apr-10Wahid Bhimji – Files access5

From grid-monitoring Check validity of CRLs – crls from org.sam.sec crls from org.sam.sec Check validity of CAs – check_ca_dist check_ca_dist Number of sockets used for RFIO and number of sockets used for gridFTP – check_netstat.pl from Nagios Exchange can be used fot that. check_netstat.pl Socket count – check_netstat.pl does that and much more. check_netstat.pl Directory size – check_dirsize.sh may be useful. check_dirsize.sh Apr-10Wahid Bhimji – Files access6

Apr-10Wahid Bhimji – Files access7

Can plot stuff with pnp4nagios Apr-10Wahid Bhimji – Files access8

Conclusions / Questions This is nice - Take a look at the probes and give me or Alex some feedback Or try it out yourself. Not tied to any release repository.cern.ch:8080/repository/pm/volatile/repo md/name/lcgdm_head_sl5_x86_64_gcc412/index.ht ml Do we want to add performance info into this? – Like what was in GridPPDPMMonitor – Summer student Martin (see DPM Stressing talk) could _maybe_ do some of that Apr-10Wahid Bhimji – Files access9