Connect. Communicate. Collaborate GÉANT2 monitoring Otto Kreiter, DANTE Navneet Daga, DANTE LHC Monitoring Workshop, Munich,
Connect. Communicate. Collaborate Agenda Extraction of monitoring information from the GÉANT2 network External application developed by DANTE for JRA-4 Demonstration of a home grown weather-map Conclusion
Connect. Communicate. Collaborate Network Element Manager All network elements communicate with the NM separately NM task is to configure and monitor one by one each NE It is not service aware – no knowledge about the intra-domain e2e path status.
Connect. Communicate. Collaborate Regional Network Manager (RM) Topology Services Correlation “User” interface
Connect. Communicate. Collaborate How we export data ! Alarms Perf. Meas. Rem. Inv.
Connect. Communicate. Collaborate Status via alarms Alarms SNMPTrapD Alarms Monitoring station
Connect. Communicate. Collaborate Alarm content From the NM: –Information about interfaces and associated signal status, SDH timing problems –NE and ILA status From the RM –Information related to services –Information related to path, trails and physical connections at all layers
Connect. Communicate. Collaborate One hop case NMS vs JRA-4 Path – gen_mil_CERN OCH trailPhys-linkPhys link Domain linkP. ID link BOL-CERN-LHC-001
Connect. Communicate. Collaborate Multiple hop case NMS vs JRA-4 Path – gen_mil_CERN OCH trailPhys-linkPhys link Domain linkP. IDLink CERN-SARA-LHC-001 OCH trailPhys-link P. IDLink
Connect. Communicate. Collaborate Alarm processing SNMP traps from the Alcatel IOO module. Alcatel Enterprise v1/v2c MIB SNMP traps received by a Linux station –snmptrapd to pick up all alarms –For each trap a bash script is called which performs: Analysis Selection Action
Connect. Communicate. Collaborate Alarm type & information Alarm Raise: –friendlyName –probableCause –perceivedSeverity –currentAlarmId –eventTime –acknowledgementStatus –additionalInformation –eventType –snmpTrapAddress Alarm Clear: –friendlyName –probableCause –currentAlarmId –eventTime –snmpTrapAddress
Connect. Communicate. Collaborate Used alarm information Alarm Raise: –friendlyName –probableCause –perceivedSeverity –currentAlarmId –eventTime –acknowledgementStatus –additionalInformation –eventType –snmpTrapAddress Alarm Clear: –friendlyName –probableCause –currentAlarmId –eventTime –snmpTrapAddress
Connect. Communicate. Collaborate Alarm analyzer process SNMP trap received snmpTrapAddressMust be registered Check for type Of Alarm Raise Additional Info path clientpath ochtrail omstrail physicallink recordAlarm Call External Program Clear alarmID Read recordAlarm Call ExternalProgram Record all traps delete recordAl
Connect. Communicate. Collaborate Alarm analyzer Called every time a trap is received Written in bash Each trap is analyzed separately and if in the meantime a new trap arrives it waits in the queue (snmptrapd) –Possible problem if an external program get stuck and the scripts hangs. The alarms remains unprocessed in the queue Must maintain state –SNMP traps may get lost so a program needs to check time to time if the monitoring station is in syncro with the NMS.
Connect. Communicate. Collaborate External applications JRA-4 monitoring (xml file generation) perfSonar DB feeder Project weather-map: LHC
Connect. Communicate. Collaborate JRA-4 monitoring (XML file generation)
Connect. Communicate. Collaborate E2E Data transformation Prototype applications developed in Java – –E2EXMLWriter –XMLGenerator E2EXMLWriter takes in a template XML and produces an XML file containing live e2e path status information conforming to the JRA4 e2e data model –Triggered by a script listening to SNMP alarms –Parameters passed Trail ID Status XMLGenerator produces this template XML that E2EXMLWriter uses to export domain’s e2e information
Connect. Communicate. Collaborate Design of E2EXMLWriter Relies on 2 configuration files to produce live XML status information –Properties file (links.properties) Properties file containing key = value entries Each key is one e2e path name Value to each key is a csv of multiple trails that form one path Currently manually maintained –Alarm register A simple csv file Application maintained An “alarm raise” registers the associated path An “alarm clear” de-registers the associated path (contd).
Connect. Communicate. Collaborate Design (contd.) The application sets all path’s default status as UP with admin state as NORMALOPERATION Only the paths “registered” in the alarm-register csv file are set as DOWN with admin state as MAINTENANCE No implementation of the status DEGRADED at the moment No implementation of other admin states at the moment
Connect. Communicate. Collaborate Design of XMLGenerator Relies on 3 configuration files – –Properties file (init.properties) Contains a key = value entry Key = DOMAIN Value = Enables on-the-fly domain name configuration –Config file (config.csv) A simple CSV file Contains node-link-node information –A sample XML file containing “pieces of XML” to be replicated for each node and link in the final output “template XML” All configuration files are currently manually maintained
Connect. Communicate. Collaborate Data Provision Currently, the final XML containing live e2e path status information is written to a URL for export – Later, maybe integration with perfSONAR framework
Connect. Communicate. Collaborate perfSonar feeder Enters data in the perfSonar MA Takes as input: –Type of logical link: trunk, trail, physical link or path. –Name: friendlyName –Time: the time when the event occurred –Status: UP/Down –Alarm ID
Connect. Communicate. Collaborate LHC weather-map live demonstration 1.CERN user-side down 2.CERN user-side up 3.GEN-MIL Lambda down 4.GARR user-side down 5.Back-to-back interconnection in DE broken 6.AMS-FRA lambda down 7.Up DE interconnection 8.AMS-FRA lambda up 9.GARR user-side up 10. GEN-MIL lambda up
Connect. Communicate. Collaborate Conclusion Status monitoring via alarms in an advanced phase and well understood. –Once the characteristic of the equipment/alarms/faults understood the development was easy. Alarm collector can be reused by NRENs using Alcatel equipment. XMLGenerator and perfSonar feeder not bonded to a specific equipment.
Connect. Communicate. Collaborate Questions ?
Connect. Communicate. Collaborate Backup
Connect. Communicate. Collaborate CERN user side down
Connect. Communicate. Collaborate Lambda CH-IT down
Connect. Communicate. Collaborate Lambda and user failure in IT
Connect. Communicate. Collaborate Lambda + POP interconnect failure
Connect. Communicate. Collaborate Multiple Lambda, user and POP interconnect failure