Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.

Slides:



Advertisements
Similar presentations
QAD’s Business Process Management Richard Day Senior Product Manager, QAD QAD Explore 2012.
Advertisements

Update on OSG/WLCG perfSONAR infrastructure Shawn McKee, Marian Babik HEPIX Spring Workshop, Oxford 23 rd - 27 th March 2015.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Task 3.5 Tests and Integration ( Wp3 kick-off meeting, Poznan, 29 th -30 th January 2002 Santiago González de la.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
Virtual Machine Management
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Use Cases. Summary Define and understand slow transfers – Identify weak links, narrow down the source – Understand what perfSONAR measurements mean wrt.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Introduction With TimeCard users can tag SharePoint events with information that converts them into time sheets. This way they can report.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Network Monitoring for OSG Shawn McKee/University of Michigan OSG Staff Planning Retreat July 10 th, 2012 July 10 th, 2012.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
Next Steps.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 18 h March 2015.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
WLCG Network and Transfer Metrics WG After One Year Shawn McKee, Marian Babik GDB 4 th November
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE GStat Work Plans for EGEE-III Joanna Huang, ASGC/OPS EGEE SA1 F2F Meetings, Abingdon.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update on Network Performance Monitoring.
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
Julia Andreeva on behalf of the MND section MND review.
Globus and ESGF Rachana Ananthakrishnan University of Chicago
PerfSONAR for LHCOPN/LHCONE Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Amsterdam, NL October 28 th, 2015.
Network Awareness and perfSONAR Why we want it. What are the challenges? Where are we going? Shawn McKee / University of Michigan OSG AHG - US CMS Tier-2.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September.
Identity Management in Open Science Grid Identity Management in Open Science Grid Challenges, Needs, and Future Directions Mine Altunay, James Basney,
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
Grid as a Service. Agenda Targets Overview and awareness of the obtained material which determines the needs for defining Grid as a service and suggest.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
OSG Production Foundations for 2M+ Hours/Day April 9, 2014 Rob Quick With Help from Shawn McKee and Chander Seghal.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
WLCG Accounting Task Force Introduction Julia Andreeva CERN 9 th of June,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
PerfSONAR operations meeting 3 rd October Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.
Shawn McKee, Marian Babik for the
perfSONAR-PS Deployment: Status/Plans
Andreas Unterkircher CERN Grid Deployment
LHCOPN/LHCONE perfSONAR Update
LHCOPN/LHCONE perfSONAR Update
Monitoring the US ATLAS Network Infrastructure with perfSONAR-PS
Alerting/Notifications (MadAlert)
LHCONE perfSONAR: Status and Plans
Network Monitoring Update: June 14, 2017 Shawn McKee
Presentation transcript:

Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016

Overview of Talk  Introduction: the Need to Monitor perfSONAR itself  Check_MK  Overview  Current check_mk services  Monitoring perfSONAR  How to install check_mk agents on your perfSONAR  Summary and Questions March 9, 2016NA Throughput Meeting2

Monitoring perfSONAR  As most of this group should know, perfSONAR is being used to monitor our networks for OSG and WLCG  WLCG/OSG Deployment status as of today (great progress): Deployment statusDeployment status  : 6  : 8  3.5 : 2  : 42  : 165  Unknown: 23 (These nodes are either down or hung)  One challenge we face is keeping perfSONAR operating correctly among our ~125 sites globally  When data isn’t being measured how do we know? (MaDDash!)  When data isn’t being measured what is the reason? (check_mk!) NA Throughput Meeting3March 9, 2016

About OMD/Check_MK  We need ways to track how our perfSONAR toolkit installations are performing and if there are issues with their many services or the underlying OS.  To do this we can use a Nagios like capability to check that the services operating on a specific toolkit instance are functioning.  ESnet perfSONAR developers have provided a set of nagios checks to monitor and verify the various perfSONAR toolkit services are functioning correctly  Rather than just using Nagios we have select the Open Monitoring Distribution (OMD) to do this task ( )  OMD combines Nagios, PNP4Nagios, Nagvis and Check_MK NA Throughput Meeting4March 9, 2016

Check_mk Features NA Throughput Meeting5March 9, 2016  We have focused on Check_mk because it provides a number of very nice features  We can easily discover, monitor and track services and their performance data  Integrates well with Linux Oses  Provides graphing, history and availability data automatically  See  Within the WLCG Network and Transfer Metrics WG we have enabled access to OMD/Check_mk via x509 certificates; any valid certificate in a browser should work

perfSONAR Monitoring Pages  We have 3 versions of our perfSONAR monitoring pages  Prototype at maddash.aglt2.org (intending to phase this out soon)  Testing at OSG’s ITB instance  Production at OSG’s production instance  Main monitoring types are MaDDash and OMD/Check_MK  Prototype:  Testing: / /  Production:  Notes:  OSG instances rely upon OSG Datastore:  X509 cert needed to view check_mk/OMD pages (any IGTF cert) March 9, 2016NA Throughput Meeting6

OSG Network Datastore Diagram NA Throughput Meeting7 q OSG is gathering relevant metrics from the complete set of OSG and WLCG perfSONAR instances q Operating now q Running VMs on dedicated hardware q Data also published to CERN Active MQ instance and available for user subscription q Actively tuning and debugging 8 VMs Storage must host 7 distinct areas March 9, 2016

OMD for LHCONE/LHCOPN perfSONARs March 9, 2016NA Throughput Meeting8 (Prototype) (Production) We monitor: “Expected” test coverage NDT/NPAD running? Memory on hosts (<4GB) New “version” test Access requires x509 credential from IGTF CA Gives us a good view into where problems still exist

OMD Hostgroup Summary LHCOPN/LHCONE March 9, 2016NA Throughput Meeting9

Jump in…Live Demonstration  Let’s go to the ITB instance and I will try to demonstrate some features. I will be sharing my screen for those attached to Vidyo. Sorry for those on the phone only.  Open the following URL from a browser with your x509 certificate installed:   Let’s start…. March 9, 2016NA Throughput Meeting10

Installing Check_mk Agent  See  On your perfSONAR toolkit run (as ‘root’):  yum –y install el6.noarch.rpm p16-1.noarch.rpm el6.noarch.rpmhttp://omd.aglt2.org/check-mk-agent p16-1.noarch.rpmhttp://omd.aglt2.org/check-mk-agent-plugins el6.noarch.rpmhttp://omd.aglt2.org/check-mk-agent p16-1.noarch.rpm  Then notify Shawn so he can tag and re-inventory your host(s) March 9, 2016NA Throughput Meeting11

Discussion/Questions/Comments? March 9, 2016NA Throughput Meeting12

References  Network Documentation  Deployment documentation for OSG and WLCG hosted in OSG  New MA guide  Modular Dashboard and OMD Prototypes   OSG Production instances for OMD, MaDDash and Datastore     Mesh-config in OSG  Use-cases document for experiments and middleware c/edit c/edit c/edit NA Throughput Meeting13March 9, 2016