Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016
Overview of Talk Introduction: the Need to Monitor perfSONAR itself Check_MK Overview Current check_mk services Monitoring perfSONAR How to install check_mk agents on your perfSONAR Summary and Questions March 9, 2016NA Throughput Meeting2
Monitoring perfSONAR As most of this group should know, perfSONAR is being used to monitor our networks for OSG and WLCG WLCG/OSG Deployment status as of today (great progress): Deployment statusDeployment status : 6 : 8 3.5 : 2 : 42 : 165 Unknown: 23 (These nodes are either down or hung) One challenge we face is keeping perfSONAR operating correctly among our ~125 sites globally When data isn’t being measured how do we know? (MaDDash!) When data isn’t being measured what is the reason? (check_mk!) NA Throughput Meeting3March 9, 2016
About OMD/Check_MK We need ways to track how our perfSONAR toolkit installations are performing and if there are issues with their many services or the underlying OS. To do this we can use a Nagios like capability to check that the services operating on a specific toolkit instance are functioning. ESnet perfSONAR developers have provided a set of nagios checks to monitor and verify the various perfSONAR toolkit services are functioning correctly Rather than just using Nagios we have select the Open Monitoring Distribution (OMD) to do this task ( ) OMD combines Nagios, PNP4Nagios, Nagvis and Check_MK NA Throughput Meeting4March 9, 2016
Check_mk Features NA Throughput Meeting5March 9, 2016 We have focused on Check_mk because it provides a number of very nice features We can easily discover, monitor and track services and their performance data Integrates well with Linux Oses Provides graphing, history and availability data automatically See Within the WLCG Network and Transfer Metrics WG we have enabled access to OMD/Check_mk via x509 certificates; any valid certificate in a browser should work
perfSONAR Monitoring Pages We have 3 versions of our perfSONAR monitoring pages Prototype at maddash.aglt2.org (intending to phase this out soon) Testing at OSG’s ITB instance Production at OSG’s production instance Main monitoring types are MaDDash and OMD/Check_MK Prototype: Testing: / / Production: Notes: OSG instances rely upon OSG Datastore: X509 cert needed to view check_mk/OMD pages (any IGTF cert) March 9, 2016NA Throughput Meeting6
OSG Network Datastore Diagram NA Throughput Meeting7 q OSG is gathering relevant metrics from the complete set of OSG and WLCG perfSONAR instances q Operating now q Running VMs on dedicated hardware q Data also published to CERN Active MQ instance and available for user subscription q Actively tuning and debugging 8 VMs Storage must host 7 distinct areas March 9, 2016
OMD for LHCONE/LHCOPN perfSONARs March 9, 2016NA Throughput Meeting8 (Prototype) (Production) We monitor: “Expected” test coverage NDT/NPAD running? Memory on hosts (<4GB) New “version” test Access requires x509 credential from IGTF CA Gives us a good view into where problems still exist
OMD Hostgroup Summary LHCOPN/LHCONE March 9, 2016NA Throughput Meeting9
Jump in…Live Demonstration Let’s go to the ITB instance and I will try to demonstrate some features. I will be sharing my screen for those attached to Vidyo. Sorry for those on the phone only. Open the following URL from a browser with your x509 certificate installed: Let’s start…. March 9, 2016NA Throughput Meeting10
Installing Check_mk Agent See On your perfSONAR toolkit run (as ‘root’): yum –y install el6.noarch.rpm p16-1.noarch.rpm el6.noarch.rpmhttp://omd.aglt2.org/check-mk-agent p16-1.noarch.rpmhttp://omd.aglt2.org/check-mk-agent-plugins el6.noarch.rpmhttp://omd.aglt2.org/check-mk-agent p16-1.noarch.rpm Then notify Shawn so he can tag and re-inventory your host(s) March 9, 2016NA Throughput Meeting11
Discussion/Questions/Comments? March 9, 2016NA Throughput Meeting12
References Network Documentation Deployment documentation for OSG and WLCG hosted in OSG New MA guide Modular Dashboard and OMD Prototypes OSG Production instances for OMD, MaDDash and Datastore Mesh-config in OSG Use-cases document for experiments and middleware c/edit c/edit c/edit NA Throughput Meeting13March 9, 2016