Download presentation
Presentation is loading. Please wait.
1
SAM Alarm Triggering and Masking
Domenico Vicinanza CERN COD 13, Stockholm June, 2007
2
Alarm triggering Procedure to trigger an alarm:
The test result is ERROR or CRIT, The node belongs to a certified site, VO is 'OPS‘, The test is critical for OPS VO, No alarm already for that test, vo and node, The node is not in maintenance.
3
Alarms Info Data stored of each alarm in the SAM DB: alarm identifier
vo identifier test identifier node identifier weight (see next slides on masking) test exec time alarm status (new, assigned, masked, off) update time ticket id (GGUS)
4
Alarms Masking Automatic Alarms Masking:
Simple rule based correlation engine If there is one or more alarms with status='new' for this VO, node and test => new alarm triggered as masked. Rules defining test relationships among alarms: (Not restricted right now. Restricting it from now on)
6
Prioritisation of alarms
A prioritisation mechanism for the alarms is set up according to a scoring schema. Depending on the service a certain amount of “points” are associated to an alarm according to its relevance (i.e. its responsibility in causing other services failure) As an example LFC has a larger score (40000) compared with SE one (10000) since if LFC is failing SE will fail consequently
7
Example of scoring alarms
Example of scoring mechanism depending on the service: points: VOBOX, BDII, VOMS, LFC, WMS, RB. points: SRM, MyProxy, FTS. points: RGMA, sBDII. points: gCE, CE, SE.
8
Alarms responsible for other alarms
If an alarm masks another one (so the alarm is "important" as it causes other alarms): 1000 points are added to the alarm weight to show that it's causing other failures as well, so should be dealt with a high priority. up to a maximum of points.
9
Prioritisation of alarms (cont.)
Depending on the test status: 100 points if ‘INFO’ 200 points if ‘NOTE’ 300 points if ‘WARN’ 400 points if ‘ERROR’ 500 points if ‘CRIT’ Depending on n° of CPUs in the site: Value taken from the 'CE-totalcpu' test divided by 100. This gives a [0-50] number.
10
Happy End... Thanks!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.