Network performance event management based upon the perfSONAR framework Robert Szuman – Poznań Supercomputing and Networking Center, Poland TERENA Networking Conference 2011, Prague The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 238875 (GÉANT)"
Agenda Introduction Requirements Alarms perfSONAR Event management architecture Architecture advantages Implementation Additional information About GÉANT
Introduction Efficient and scalable event management functionality for the multi-domain network monitoring Supports NOCs not to overlook network performance incidents Prepared for extensions when new alarms are needed Based on standards (OGF) Replacement for existing solutions NPM Alarms Service (the event management tool created for the GÉANT network few years ago) Software product to be deployed in the GN3 Service Area
Requirements (1) Scalability – applicable for multi-domain and highly distributed environment Extensibility – simple adding new alarms Flexibility – alarm conditions as well as threshold values easily configurable Standards – communication between architecture components and the information model (topology)
Requirements (2) Use of existing framework that offers basic functionalities suitable for the multi-domain environment - perfSONAR Integration with already existing monitoring applications - access to multiple network performance data sources User-friendly visualisation with Nagios application (client side)
Alarms (1) User-selected alarms for the first release: RoutingAlarm (sourceSite, destinationSite) - if the path, as determined by traceroute output, changes and there are no light-paths down between the source and destination, then raise an alarm InterfaceCongestionAlarm (router, interface) - if a router interface output drops is above a threshold, and the router interface utilisation is below a threshold, raise an alarm. The cross-check of interface utilisation value can suppress alarms caused by output drops which are expected when the utilisation is high
Alarms (2) RoutingOutOfNetworkWarning (sourceSite, destinationSite) - if the route changes, and it includes a hop not in the user-defined list, raise an alarm InterfaceErrorsAlarm (router, interface) - if a router interface input errors is above a threshold, raise an alarm The proposed alarm severities follow the network management software convention because users are already familiar to them (Normal, Warning, Critical)
perfSONAR Multi-domain infrastructure for network performance monitoring Set of services delivering performance measurements in a federated environment Services act as an intermediate layer, between the performance measurement tools and the diagnostic or visualization applications Standardised communication (NMC) and information model (NM) - Open Grid Forum (OGF) Easy to add new functionalities – in this case it is events management Wide collaboration: GNx projects, US institutions (led by Internet2 and ESnet), RNP (Brasil)
Event Management Architecture Different variants of architecture (communication models) were considered The most distributed one selected User-defined rules for creating events Clear separation of basic functionalities: storage, alarms detection and visualisation Running within the perfSONAR infrastructure
Event Management Architecture – single-domain view
Event Management Architecture – multi-domain view
Architecture advantages perfSONAR provides a set of already existing functionalities (e.g. storage, service lookup) prepared for the multi-domain environment Distributed nature of perfSONAR services helps to easily add new functionalities or system components Separation of components allows to replace them dynamically (no need to stop event management service)
Implementation Java implementation (web services) Use of the perfSONAR MDM development library for services (pSbase) Open source (BSD-like) The plugin for visualization in Nagios (communication via perfSONAR protocol - NMC) The implementation scheduled for the perfSONAR MDM release in August/September 2011 The early pilot implementation before the final version will be provided for testing
Additional information Detailed specification document is available on the web site of GN3 project (community access only) Intranet->Research->JRA2->Task3->Documents->Events_Management_in_pS https://intranet.geant.net/sites/Research/JRA2/T3/Documents/Events_Management_in_pS/Events_Management_in_perfSONAR.pdf The authors (the PSNC team) will be glad to answer any questions sent via email More info about perfSONAR MDM - http://perfsonar.geant.net/
The pan-European infrastructure and Global Connectivity GN3 contract - total funding from the EC of 93 million Euro for four years from 1 April 2009 Together with Europe's national research networks, GÉANT connects 40 million users in over 8,000 institutions across 40 countries
Thank you for your attention! Robert Szuman (PSNC): rszuman@man.poznan.pl