Download presentation
Presentation is loading. Please wait.
Published byHarold Lothrop Modified over 10 years ago
1
Connect. Communicate. Collaborate GÉANT2 monitoring Otto Kreiter, DANTE Navneet Daga, DANTE LHC Monitoring Workshop, Munich, 19.07.2006
2
Connect. Communicate. Collaborate Agenda Extraction of monitoring information from the GÉANT2 network External application developed by DANTE Demonstration of a home grown weather-map Conclusion
3
Connect. Communicate. Collaborate Network Element Manager All network elements communicate with the NM separately NM task is to configure and monitor one by one each NE It is not service aware – no knowledge about the intra-domain e2e path status.
4
Connect. Communicate. Collaborate Regional Network Manager (RM) Topology Services Correlation “User” interface
5
Connect. Communicate. Collaborate How we export data ! Alarms Perf. Meas. Rem. Inv.
6
Connect. Communicate. Collaborate Status via alarms Alarms SNMPTrapD Alarms Monitoring station
7
Connect. Communicate. Collaborate Alarm content From the NM: –Information about interfaces and associated signal status, SDH timing problems –NE and ILA status From the RM –Information related to services –Information related to path, trails and physical connections at all layers
8
Connect. Communicate. Collaborate One hop case NMS vs JRA-4 Path – gen_mil_CERN OCH trailPhys-linkPhys link Domain linkP. ID link BOL-CERN-LHC-001
9
Connect. Communicate. Collaborate Multiple hop case NMS vs JRA-4 Path – gen_mil_CERN OCH trailPhys-linkPhys link Domain linkP. IDLink CERN-SARA-LHC-001 OCH trailPhys-link P. IDLink
10
Connect. Communicate. Collaborate Alarm processing SNMP traps from the Alcatel IOO module. Alcatel Enterprise v1/v2c MIB SNMP traps received by a Linux station –snmptrapd to pick up all alarms –For each trap a bash script is called which performs: Analysis Selection Action
11
Connect. Communicate. Collaborate Alarm type & information Alarm Raise: –friendlyName –probableCause –perceivedSeverity –currentAlarmId –eventTime –acknowledgementStatus –additionalInformation –eventType –snmpTrapAddress Alarm Clear: –friendlyName –probableCause –currentAlarmId –eventTime –snmpTrapAddress
12
Connect. Communicate. Collaborate Used alarm information Alarm Raise: –friendlyName –probableCause –perceivedSeverity –currentAlarmId –eventTime –acknowledgementStatus –additionalInformation –eventType –snmpTrapAddress Alarm Clear: –friendlyName –probableCause –currentAlarmId –eventTime –snmpTrapAddress
13
Connect. Communicate. Collaborate Alarm analyzer process SNMP trap received snmpTrapAddressMust be registered Check for type Of Alarm Raise Additional Info path clientpath ochtrail omstrail physicallink recordAlarm Call External Program Clear alarmID Read recordAlarm Call ExternalProgram Record all traps delete recordAl friendlyName
14
Connect. Communicate. Collaborate Alarm analyzer Called every time a trap is received Written in bash Each trap is analyzed separately and if in the meantime a new trap arrives it waits in the queue (snmptrapd) –Possible problem if an external program get stuck and the scripts hangs. The alarms remains unprocessed in the queue Must maintain state –SNMP traps may get lost so a program needs to check time to time if the monitoring station is in syncro with the NMS.
15
Connect. Communicate. Collaborate XML file generation
16
Connect. Communicate. Collaborate E2E Data transformation Prototype applications developed in Java – –E2EXMLWriter –XMLGenerator E2EXMLWriter performs 2 functions – –Takes in a template XML and produces an XML file containing live e2e path status information conforming to the JRA4 e2e data model. –Feeds a perfSonar MA with live path status information. E2EXMLWriter is triggered by a script listening to SNMP alarms –Parameters passed Trail ID Status XMLGenerator produces this template XML that E2EXMLWriter uses to export domain’s e2e information
17
Connect. Communicate. Collaborate Design of E2EXMLWriter Relies on 2 configuration files to produce live XML status information –Properties file (links.properties) Properties file containing key = value entries Each key is one e2e path name Value to each key is a csv of multiple trails that form one Domain Link and/or Partial ID Link Currently manually maintained –Alarm register A simple csv file Application maintained An “alarm raise” registers the associated path An “alarm clear” de-registers the associated path (contd).
18
Connect. Communicate. Collaborate Design (contd.) The application sets all path’s default status as UP with admin state as NORMALOPERATION Only the paths “registered” in the alarm-register csv file are set as DOWN with admin state as MAINTENANCE No implementation of the status DEGRADED at the moment No implementation of other admin states at the moment
19
Connect. Communicate. Collaborate Design of XMLGenerator Relies on 3 configuration files – –Properties file (init.properties) Contains a key = value entry Key = DOMAIN Value = Enables on-the-fly domain name configuration –Config file (config.csv) A simple CSV file Contains node-link-node information –A sample XML file containing “pieces of XML” to be replicated for each node and link in the final output “template XML” All configuration files are currently manually maintained
20
Connect. Communicate. Collaborate Monitoring data processing “e2e path”
21
Connect. Communicate. Collaborate LHC weather-map live demonstration 1.CERN user-side down 2.CERN user-side up 3.GEN-MIL Lambda down 4.GARR user-side down 5.Back-to-back interconnection in DE broken 6.AMS-FRA lambda down 7.Up DE interconnection 8.AMS-FRA lambda up 9.GARR user-side up 10. GEN-MIL lambda up
22
Connect. Communicate. Collaborate Conclusion Status monitoring via SNMP alarms in an advanced phase and well understood. –Once the characteristic of the equipment/alarms/faults understood the development was easy. XMLGenerator not bonded to a specific equipment and can be used together with the JRA-4 MP and/or to feed an perfSONAR MA
23
Connect. Communicate. Collaborate Questions ? Otto.Kreiter@dante.org.uk Navneet.Daga@dante.org.uk
24
Connect. Communicate. Collaborate T0-T1 CERN-CNAF GARR GÉANT2 CERN (CH) CNAF (IT)
25
Connect. Communicate. Collaborate Technologies
26
Connect. Communicate. Collaborate Domain I – CERN Partial ID Link corresponds to the status of the port MP developed by Martin Swany - export port status information
27
Connect. Communicate. Collaborate Domain II – GÉANT2 Partial ID link – status of the ports facing the adjacent domains Domain Link – status of the lambda perfSonar MA and GN2-JRA4 MP used to export status information
28
Connect. Communicate. Collaborate Domain III - GARR Inter Domain Link – status of the port facing GÉANT2 Domain link – status of the LSP between the two routers + status of the interface facing CNAF (T1) GN2-JRA4 MP used to export measurement data
29
Connect. Communicate. Collaborate View on the E2E monitoring system
30
Connect. Communicate. Collaborate Conclusion Fairly easy to establish the monitoring of the E2E path. –It took around two phone conf with GARR + around 10 e-mails –3-4 phone conf with CERN and Martin Swany + around 10-15 e- mails –All parties were extremely familiar with their equipment and the required softwares. Questions started to pop-up if we need to monitor an End-Point and how should we do it ? –Is an EP a simple client ? –Or we shall redefine the “Client” as somebody who actively participate in the e2e monitoring
31
Connect. Communicate. Collaborate Backup
32
Connect. Communicate. Collaborate CERN user side down
33
Connect. Communicate. Collaborate Lambda CH-IT down
34
Connect. Communicate. Collaborate Lambda and user failure in IT
35
Connect. Communicate. Collaborate Lambda + POP interconnect failure
36
Connect. Communicate. Collaborate Multiple Lambda, user and POP interconnect failure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.