Presentation is loading. Please wait.

Presentation is loading. Please wait.

A statistical anomaly-based algorithm for on-line fault detection in complex software critical systems A. Bovenzi – F. Brancati Università degli Studi.

Similar presentations


Presentation on theme: "A statistical anomaly-based algorithm for on-line fault detection in complex software critical systems A. Bovenzi – F. Brancati Università degli Studi."— Presentation transcript:

1 A statistical anomaly-based algorithm for on-line fault detection in complex software critical systems A. Bovenzi – F. Brancati Università degli Studi di NAPOLI "Federico II" Dipartimento di Informatica e Sistemistica Università degli Studi di Firenze Dipartimenti di Sistemi e Informatica DOTS-LCCI PM 5 th Meeting ROMA 30-31 Maggio 2011

2 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 2 Detect Failures in complex & critical SW systems Legacy and OTS ( Off - The - Shelf ) based Several interacting components Different configurations Detection performed at process ( thread ) level Crash failures which cause a process(thread) to terminate unexpectedly Hang failures ( active and passive ) which cause a process(thread) to be suspended and its external state to be constant. Motivations and conclusion From last Meeting Fail - halt ( or Fail - stop ) Systems

3 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 3 Definition of Anomaly With respect to a monitored variable characterizing the behavior of the system, the term anomaly is a change in this variable caused by specific and non - random factors [ Montgomery 00] overload, the activation of faults, malicious attacks Motivations and conclusion From last Meeting On - line anomaly detection is an essential mean to guarantee dependability of complex and critical software systems Difficult task because of system properties Complexity ( lots of interacting components ) Highly dynamic ( frequent reconfigurations, updates ) Several sources of non - determinism

4 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 4 Motivations and conclusion From last Meeting Anomaly Detectors can take advantage of the possibility to evaluate online the expected behavior of monitored variables Internal R & SAClock algorithm ( SPS ) adapted for anomaly detection Comparison with regards to static thresholds Preliminary results improve Fewer False Positives Better Precision and Recall What has been done in the meantime ?

5 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 5Outline 1. SPS - based detection framework a)Static vs Adaptive Thresholds 2. Experimental Evaluation a)Case study b)Monitored variables c)The Experimental Phase 3. Metrics 4. Experimental results 5. Conclusion and future work

6 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 6 Detection framework The Detection Framework Application Middleware Kernel Monitoring tool training Static thresholds limitations Operational conditions of the system similar to those of the training set

7 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 7 Static vs Adaptive Thresholds Failure at ~ 160 sec SPS signals the failure Static thresholds signal the failure but… produce lots of False Positives

8 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 8 Detection framework SPS - base Detection Application Middleware Kernel Monitoring tool training

9 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 9 Case Study : the SWIM - BOX ® ATC domain From Fragmented Systems Towards A network of integrated co - operators SWIM - BOX To cooperate & share information between distributed and heterogeneous ATC legacy Web Services Publisher / Subscriber OTS - based JBOSS AS, OSPL, RTI DDS, Mysql DB

10 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 10 Monitored variables Capture the application behavior indirectly Breakpoints placed in specific kernel functions Probe handlers to quickly collect data e. g., input parameters, return values

11 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 11 The Experimental Phase 1. Workload selection ( Swim - box Validation Plan ) Differing for message rate message per burst time between burst 2. Experiments execution Golden Run Faulty Run Source code mutation tool ( http :// www. mobilab. unina. it / SFI. htm ) 3. Post processing phase Both algorithms applied to the monitored data Varying several algorithm parameters

12 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 12 Post processing methodology

13 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 13 Quality metrics Basic : True Positive ( TP ): if a failure occurs and the detector triggers an alarm ; False Positive ( FP ): if no failure occurs and an alarm is given ; True Negative ( TN ): if no real failure occurs and no alarm is raised ; False Negative ( FN ): if the algorithm fails to detect an occurring failure. Derived :

14 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 14 Experimental results dt(c,m) SPS200.4(0.99,20) ST-Training30.5 ST-No Training30.3

15 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 15 Experimental results aTM (sec)aMRaPAA*CSynthesis SPS Algorithm3.6111110.0098040.950980.960040.914551 Static T. without training11.639510.0327240.6744940.5823560.59768 Static T. with training4.750.0237150.855840.9744360.842296

16 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 16Conclusion Error Detection exploiting OS - level indicators can be improved by means of SPS algorithm Experimental results ( achieved via fault injection ) show the limitations of static threshold algorithms in scenarios, where the operational conditions of system differ from those of the training phase Detector equipped with SPS copes with variable and non - stationary systems needs no training phase performs better in terms of Coverage, Query accuracy probability, Mistake rate and Mistake duration

17 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 17 Future work Investigate SPS - based Detector performance by varying the number and type of monitored variables Is the detection framework application independent ? Explore how the detection framework performs under different OSs Is the detection framework OS independent ? Which OS is best suited for the proposed approach ? New experimental campaign planned Same case study under Windows Server 2008 Compare the Detector performance by varying Predictors Is SPS - based predictor the best choice ? Compare SPS with ARIMA models, neural networks, …

18 A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/2011 - 18 Thank you for your attention Questions ? Insight for the future work ?


Download ppt "A statistical anomaly-based algorithm for on-line fault detection in complex software critical systems A. Bovenzi – F. Brancati Università degli Studi."

Similar presentations


Ads by Google