Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BSM (OMI) 9.2X STREAM-BASED EVENT.

Similar presentations


Presentation on theme: "© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BSM (OMI) 9.2X STREAM-BASED EVENT."— Presentation transcript:

1 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BSM (OMI) 9.2X STREAM-BASED EVENT CORRELATION TROUBLESHOOTING

2 © Copyright 2012 Hewlett-Packard Development Company, L.P.2 AGENDA Stream-based Event Correlation SBEC – General Feature Overview Details on Rules Troubleshooting Event Suppression

3 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. SBEC – GENERAL FEATURE OVERVIEW

4 © Copyright 2012 Hewlett-Packard Development Company, L.P.4 WHAT IS STREAM-BASED EVENT CORRELATION ? Stream-based event correlation (SBEC) uses rules and filters to identify commonly occurring events or combinations of events and helps simply the handling of such events by automatically identifying events that can be withheld, removed or need a new event to be generated and displayed to the operators. –The following types of SBEC rules can be configured: –Repetition Rules: Frequent repetitions of the same event may indicate a problem that requires attention.Repetition Rules –Combination Rules: A combination of different events occurring together or in a particular order indicates an issue, and requires special treatment.Combination Rules –Missing Recurrence Rules: A regularly recurring event is missing, for example, a regular heartbeat event do not arrive when expected.Missing Recurrence Rules –SBEC Rules are processed in the order defined in the rules list. Modifications are executed as soon as the rule is matched, and subsequent rules see modifications done by earlier rules

5 © Copyright 2012 Hewlett-Packard Development Company, L.P.5 COMBINATION RULES –When a combination of events occur, sometimes in a precise order, within a short period of time, this may be understood as a problem requiring corrective action or even as a scenario that may initially appear to be a problem but which does not require any intervention by an operator. For example, a node-down event followed by a node-up event within 2 minutes usually means that a system reboot has occurred. This is typically viewed as not significant, as long a reboots do not occur too frequently, and does not require action other than the automatic cleaning up of these events. –Configuring a combination rule requires at least two filters to select the events to consider, for example, to select events with a node-down indicator and to select events with a node- up indicator. Certain attributes must be the same to be regarded as originating from the same source, for example, the node CI and source CI must be the same. The time interval between the related events must be short, for example, a maximum of five minutes, before the scenario is considered to be a problem. You can also specify if the events must occur in a particular order for the rule to be matched and executed. –It may be considered advantageous to hold back matching events during the time interval to reduce the number of unnecessary events being sent to the Event Browser. Only when the required combination of events are received within the specified time period is it necessary to inform the operator that action is necessary. This could be to close or discard all events, or modify the last event to inform that a reboot has taken place. Alternatively, a new event can be automatically generated. All matching events can be relate to the new event as symptoms.

6 © Copyright 2012 Hewlett-Packard Development Company, L.P.6 MISSING RECURRENCE RULES –Events are sometimes regularly generated to inform that no problem has occurred, for example "alive" events indicate that a system is running. As soon as the expected regular event is not received, it can be assumed that there is a problem, for example, If a system stops reporting “alive” events every 10 minutes, it is has probably stopped running. –Configuring a missing recurrence rule requires a filter to select the events to consider, for example, to select events with "node alive" in the title. Certain attributes must be the same to be regarded as originating from the same source, for example, the node, CI and source CI must be the same. The time allowable interval before an expected event is considered to be missing must be specified, for example, a maximum of 10 minutes in our example. –It may be considered advantageous to discard recurring events to reduce the number of unnecessary events being sent to the Event Browser. –When the expected event is not received within the specified time period, is it necessary to inform the operator that action is necessary.A new event can be automatically generated. All matching events can be relate to the new event as symptoms.

7 © Copyright 2012 Hewlett-Packard Development Company, L.P.7 REPETITION RULES –The repeated generation of the same event may indicate a problem. For example, more than 10 login failures for the same account within 2 minutes is typically viewed as requiring action and should create a security alert. –Configuring a repetition rule requires a filter to select the events to consider, for example, text "login failed" is contained within the title. Certain attributes must be the same to be regarded as originating from the same source, for example, the host name of the system and the user name being used to log in must be the same. The time interval between login attempts must be short, for example, a maximum of two minutes, and there must be a minimum number of attempted failed logins before the scenario is considered to be a problem. –It may be considered advantageous to hold back matching events during the time interval to reduce the number of unnecessary events being sent to the Event Browser. Only when the minimum number of attempted failed logins exceeds the specified threshold, is it necessary to inform the operator that action is necessary. This could be to close or discard the failed login events, except for the last event which is modified to inform of the series of failed logins. Alternatively, a new event can be automatically generated. All failed-login events can be relate to the new event as symptoms.

8 © Copyright 2012 Hewlett-Packard Development Company, L.P.8 Concept REPETITION –Purpose: Event Repetition indicates a problem –Example: More than 3 Reboots within 1 hour shall create a critical event 1 2 3 Time Interval t “Node rebooted”

9 © Copyright 2012 Hewlett-Packard Development Company, L.P.9 Concept COMBINATION –Purpose: Handle a combination of events a certain way –Example: When a node is down, events about failed SiS monitors should be related to the node down event Time Interval t 13 4 2 “Node down” “SiS monitor failed” “TCP timeout occured”

10 © Copyright 2012 Hewlett-Packard Development Company, L.P.10 Concept MISSING RECURRENCE –Purpose: Detect that regularly-received events are no longer arriving –Example: For auditing and compliance purposes, detailed health data and statistics are collected every day using events. If these audit events do not arrive, a critical event should be sent 12 A t ??

11 © Copyright 2012 Hewlett-Packard Development Company, L.P.11 How SBEC engine works RULE PROCESSING Only when receiving a new event: For each Rule… –in the order defined, all input filters are checked if they match the incoming event –On every match of an input filter, a query is executed to check whether all conditions of the corresponding rule are matched Repetition: enough events received within time frame Combination: at least one event for every filter (“event set”) received within time frame –If all conditions are matched, the Actions configured in that rule are executed with immediate effect on all corresponding events

12 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. SBEC RULES – GOOD TO KNOW, BEST PRACTICES

13 © Copyright 2012 Hewlett-Packard Development Company, L.P.13 MULTIPLE SBEC RULES –Any number of Repetition, Combination, and Missing Recurrence Rules can be created –Processed in defined order (visible to the user, configurable) –Can be chained together First rule that triggers can modify events (e.g. close, discard, create new) Next rule in line will see event modifications –Can filter for the same events (even use the same filter)

14 © Copyright 2012 Hewlett-Packard Development Company, L.P.14 EFFECT OF HOLD BACK WHEN MULTIPLE RULES MATCH THE SAME EVENTS –Note 1: If at least one rule is holding back an event, it‘s held back Even if another rule is not holding it back –Note 2: There is one holding area for all rules Example –Rule 1: combination rule: looking for node down/node up events – holding back node down as it wants to discard it and create reboot event instead –Rule 2: combination rule: looking for node down & SiS events – not holding back the events –Result: node down is hold back as long as within time window of Rule 1 (and as long as it is not released by any other rule)! –Holding area – stored in DB if BSM server is stopped, but no persistency in case of unnatural abort of opr-backend

15 © Copyright 2012 Hewlett-Packard Development Company, L.P.15 EFFECT OF RELEASE WHEN MULTIPLE RULES MATCH THE SAME EVENTS –Note 3: When a rule triggers, all the corresponding input events are removed from the holding area Even if another rule put them there Why? The rule that triggered detected a certain situation where the input events are relevant and therefore it can be seen as the master of these events. It has the right to release or even discard them. –Note 4: If no rule was holding back an event, release has no effect Example Rule 1: combination rule: looking for node down/node up events – holding back node down as it wants to discard it and create reboot event instead Rule 2: combination rule: looking for node down & SiS events – not holding back the events Rule 2 triggers after node down & one SiS monitor event was received. Releases events. Result: node down is no longer held back and correlated with SiS event. Note: Rule 1 might still trigger later and create the reboot event!!

16 © Copyright 2012 Hewlett-Packard Development Company, L.P.16 EFFECT OF DISCARD IF POSSIBLE WHEN MULTIPLE RULES MATCH THE SAME EVENTS –Note 4: Discard if possible will only have an effect if event is still in holding area If no rule was holding the event back or if another rule already triggered and released the event, discard will have no effect (but the close operation is executed) If discard is possible, event will be deleted immediately. For other rules, it will look like as if event never arrived. Example Rule 1: combination rule: looking for node down/node up events – holding back node down as it wants to discard it and create reboot event instead Rule 2: combination rule: looking for node down & SiS events – not holding back the events Rule 2 triggers after node down & one SiS monitor event was received. Releases events. Rule 1 triggers: wants to discard node down event, but this is not possible as it was already released by Rule 2

17 © Copyright 2012 Hewlett-Packard Development Company, L.P.17 GOTCHAS & BEST PRACTICES –Gotcha It‘s quite easy to create a simple repetition rule like this: −repetition rule uses filter title contains „rebooted“ −and creates new event with title: „system rebooted 10 times in 2 hours“ −Guess what happens... –Best Practices In a rule don‘t create events that match the input filter of the rule Include check for event state in filter - look for non-closed events only Avoid too generic filters (like contains „rebooted“) Add custom attribute (e.g. „SBECcreated=true“) and checks for it if you want to avoid that a created event is processed by following rules If possible, avoid matching the same events. If unavoidable, make sure you understand the hold/release/discard behavior When you reuse CI Hint in „Create New Event“, also reuse Node Hint.

18 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. EVENT SUPPRESSION

19 © Copyright 2012 Hewlett-Packard Development Company, L.P.19 EVENT SUPPRESSION –Purpose: All events matching a filter will be discarded from the event pipeline –Example: OMi is receiving unimportant events from data source that is not under control of OpsBridge organization – can’t be filtered out on source level –Configurable by event suppression rules consisting of Event Filter Name Description Enable/Disable –Suppression rules are processed in the event pipeline at an early stage Right after the resolution step, before Post-Resolution-EPI no further processing occurs, events will be lost and not stored in the OMi DB.

20 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. TROUBLESHOO TING

21 © Copyright 2012 Hewlett-Packard Development Company, L.P.21 EVENT HISTORY –An event has changed and you have no idea why? Check the event history Contains information about user / component, that changes event properties –Common Source for unexpected changes on events: Event Forwarding & Back-Synch

22 © Copyright 2012 Hewlett-Packard Development Company, L.P.22 LOGGING / DEBUGGING –Server: DPS –Process: opr-backend –Log config to enable log level “DEBUG”: / /conf/core/Tools/log4j/opr-backend/opr-backend.properties –Log files: / /log/opr-backend/opr-backend.log default location for all logging within this process) / /log/opr-backend_boot.log for more severe issues, e.g. unhandled Exceptions, everything dumped to stdout/stderr)

23 © Copyright 2012 Hewlett-Packard Development Company, L.P.23 1. Debug opr_backend.log HOW TO TRACK ANS SBEC RULE AS EVENTS ARRIVE –2. Make sure event is arrived, you will see an error like this –2013-01-17 05:51:16,726 [Thread-44] DEBUG EventChannelCiResolver.logEvent(309) - resolving event: SBEC(01b0ceb8-8189-4fdb-ae74-cf38b698b6d9), nodeHints=bsm92, relatedCiHint=bsm92, service_id=null –3. Make sure event matches SBEC rule –2013-01-17 05:51:09,951 [Thread-44] DEBUG EventStreamCorrelator.evaluateEventInRule(95) - Event matches filter in rule 'SBEC 3 CRITICAL EVENT RULE' –2013-01-17 05:51:09,951 [Thread-44] DEBUG FilterConfigManagerImpl.getFilterConfig(100) - get filter configuration with id: 93c74ff4-8cd0-463a-899b-2ffd41658d0f

24 © Copyright 2012 Hewlett-Packard Development Company, L.P.24 4. SBEC RULE WILL BE EVALUATED –2013-01-17 05:51:09,958 [Thread-44] DEBUG SbecRuleEvaluatorImpl.findSbecInstances(117) - Found 1 results –2013-01-17 05:51:09,958 [Thread-44] DEBUG SbecRuleEvaluatorImpl.findSbecInstances(130) - New SbecInstance: com.hp.opr.common.streamcorrelation.result.SbecInstance@bfa28a9[R uleId=89299047-5a85-4755-945e- 43e5b9ab837a,MatchedEvtSets=[com.hp.opr.common.streamcorrelatio n.result.MatchedEventSet@54837563[c43278af-044d-b827-ef0f- 484e086159b7,[84ee3e3d-e19f-4f39-818f-c70be3746550]]]] –2013-01-17 05:51:09,958 [Thread-44] DEBUG QueryResultProcessorImpl.resultMatches(82) - Repetition Scenario evaluated: 1 of 3 events collected –2013-01-17 05:51:09,958 [Thread-44] DEBUG EventUpdater.storeCorrelations(247) - Storing correlations –2

25 © Copyright 2012 Hewlett-Packard Development Company, L.P.25 5. AS SECOND EVENT ARRIVES WITHIN TIME FRAME LISTED, MAKE SURE IT IS STORED –2013-01-17 05:51:13,560 [Thread-44] DEBUG SbecRuleEvaluatorImpl.findSbecInstances(117) - Found 2 results –2013-01-17 05:51:13,560 [Thread-44] DEBUG SbecRuleEvaluatorImpl.findSbecInstances(130) - New SbecInstance: com.hp.opr.common.streamcorrelation.result.SbecInstance@5f5fc 212[RuleId=89299047-5a85-4755-945e- 43e5b9ab837a,MatchedEvtSets=[com.hp.opr.common.streamcorrel ation.result.MatchedEventSet@7be5ca9[c43278af-044d-b827-ef0f- 484e086159b7,[999d66c6-08cd-4b20-ac2c-6fc5dd19c11e, 84ee3e3d- e19f-4f39-818f-c70be3746550]]]] –2013-01-17 05:51:13,560 [Thread-44] DEBUG QueryResultProcessorImpl.resultMatches(82) - Repetition Scenario evaluated: 2 of 3 events collected –2013-01-17 05:51:13,560 [Thread-44] DEBUG EventUpdater.storeCorrelations(247) - Storing correlations

26 © Copyright 2012 Hewlett-Packard Development Company, L.P.26 6. CHECK MAKE SURE 3 RD EVENTS ARRIVE –2013-01-17 05:51:16,753 [Thread-44] DEBUG SbecRuleEvaluatorImpl.findSbecInstances(117) - Found 3 results –2013-01-17 05:51:16,753 [Thread-44] DEBUG SbecRuleEvaluatorImpl.findSbecInstances(130) - New SbecInstance: com.hp.opr.common.streamcorrelation.result.SbecInstance@3dc6 3d85[RuleId=89299047-5a85-4755-945e- 43e5b9ab837a,MatchedEvtSets=[com.hp.opr.common.streamcorrel ation.result.MatchedEventSet@21f10672[c43278af-044d-b827-ef0f- 484e086159b7,[01b0ceb8-8189-4fdb-ae74-cf38b698b6d9, 999d66c6- 08cd-4b20-ac2c-6fc5dd19c11e, 84ee3e3d-e19f-4f39-818f- c70be3746550]]]] –2013-01-17 05:51:16,754 [Thread-44] DEBUG QueryResultProcessorImpl.resultMatches(82) - Repetition Scenario evaluated: 3 of 3 events collected –!

27 © Copyright 2012 Hewlett-Packard Development Company, L.P.27 7. ONCE IT MATCHES RULE, IT WILL EXECUTE ACTIONS SPECIFIED –2013-01-17 05:51:16,754 [Thread-44] DEBUG QueryResultProcessorImpl.resultMatches(82) - Repetition Scenario evaluated: 3 of 3 events collected –2013-01-17 05:51:16,754 [Thread-44] DEBUG QueryResultProcessorImpl.processResult(53) - Rule 'SBEC 3 CRITICAL EVENT RULE' matches. Now executing Actions! –2013-01-17 05:51:16,754 [Thread-44] DEBUG BSMConnectionProvider.logOpenConnection(188) - Connection has been retrieved from pool. Number of borrowed connections is now: 1

28 © Copyright 2012 Hewlett-Packard Development Company, L.P.28 8. NEW SBEC EVENT GETS CREATED –2013-01-17 05:51:16,811 [Thread-44] DEBUG PipelineEventPoolImpl.insertNewEvent(330) - New event being inserted into the pipeline: com.hp.opr.common.model.Event@c8dbe6a[dbf5cae8-7a31-4093- 8853-1bf208a100f7,1,SBEC received 3 Critical Evenst in aminute,,OPEN,CRITICAL,,,,,bsm92,,,,com.hp.opr.common.model.ResolutionHints@2dd027 96[bsm92,,, ],com.hp.opr.common.model.ResolutionH ints@3cd70059[,,, ],,,false,-1,- 1,[],{},Thu Jan 17 05:51:16 MST 2013,,Thu Jan 17 05:51:16 MST 2013,0,,,,,,,,,false,,,,,, ] –2013-01-17 05:51:16,811 [Thread-44] DEBUG EventPipeline.reinsertEvent(440) - Event dbf5cae8-7a31-4093-8853- 1bf208a100f7 is now waiting for reinsertion at step PipelineEntry –2013-01-17 05:51:16,811 [Thread-44] DEBUG EventUpdater.storeCorrelations(247) - Storing correlations

29 © Copyright 2012 Hewlett-Packard Development Company, L.P.29 9. TO TROUBLESHOOT, JUST KNOW STEPS AND HOW IT WORKS –If any of above steps fails it will give you a reason why in opr- backend.log ( DEBUG MODE) –To find corrupt people follow the Money, to find non working SBEC events follow the EVENT through opr_backend.log


Download ppt "© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BSM (OMI) 9.2X STREAM-BASED EVENT."

Similar presentations


Ads by Google