A statistical anomaly-based algorithm for on-line fault detection in complex software critical systems A. Bovenzi – F. Brancati Università degli Studi.

Slides:

Advertisements

Similar presentations

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.

Advertisements

SeaDataNet Services monitoring Angelos Lykiardopoulos SeaDataNet-2 Training Course July 2012, Oostende, Belgium.

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,

Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.

Supervision of Production Computers in ALICE Peter Chochula for the ALICE DCS team.

Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

September 2003 Fraud Formalization and Detection Bharat Bhargava, Yuhui Zhong, Yunhua Lu Center for Education and Research in Information Assurance and.

Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.

Generic Simulator for Users' Movements and Behavior in Collaborative Systems.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

1 Fault Tolerance in Collaborative Sensor Networks for Target Detection IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004.

Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.

Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.

(C) 2009 J. M. Garrido1 Object Oriented Simulation with Java.

IIT Indore © Neminah Hubballi

University of Coimbra, DEI-CISUC

Ranking the Importance of Alerts for Problem Determination in Large Computer System Guofei Jiang, Haifeng Chen, Kenji Yoshihira, Akhilesh Saxena NEC Laboratories.

Genetic Approximate Matching of Attributed Relational Graphs Thomas Bärecke¹, Marcin Detyniecki¹, Stefano Berretti² and Alberto Del Bimbo² ¹ Université.

nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.

Mining and Analysis of Control Structure Variant Clones Guo Qiao.

Software Metrics and Reliability. Definitions According to ANSI, “ Software Reliability is defined as the probability of failure – free software operation.

Suzhen Lin, A. Sai Sudhir, G. Manimaran Real-time Computing & Networking Laboratory Department of Electrical and Computer Engineering Iowa State University,

Adapted from the original presentation made by the authors Reputation-based Framework for High Integrity Sensor Networks.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

DynaRIA: a Tool for Ajax Web Application Comprehension Dipartimento di Informatica e Sistemistica University of Naples “Federico II”, Italy Domenico Amalfitano.

Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.

CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

EDCC-8 28 April 2010 Valencia, Spain MobiLab Roberto Natella, Domenico Cotroneo {roberto.natella,

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

A Planning Based Approach to Failure Recovery in Distributed Systems Naveed Arshad Dennis Hiembigner, Alexander L. Wolf University of Colorado at Boulder.

Properties Incompleteness Evaluation by Functional Verification IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO. 4, APRIL

Probabilistic Model-Driven Recovery in Distributed Systems Kaustubh R. Joshi, Matti A. Hiltunen, William H. Sanders, and Richard D. Schlichting May 2,

Application Communities Phase 2 (AC2) Project Overview Nov. 20, 2008 Greg Sullivan BAE Systems Advanced Information Technologies (AIT)

Introduction to Performance Testing Performance testing is the process of determining the speed or effectiveness of a computer, network, software program.

Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,

Best detection scheme achieves 100% hit detection with

UC Marco Vieira University of Coimbra

AppAudit Effective Real-time Android Application Auditing Andrew Jeong

DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.

UC Marco Vieira University of Coimbra

Laurea Triennale in Informatica – Corso di Ingegneria del Software I – A.A. 2006/2007 Andrea Polini XVIII. Software Testing.

DiFMon Distributed Flow Monitor Salvatore D’Antonio 1, Claudio Mazzariello 2, Francesco Oliviero 2, Dario Salvi 1 1: Lab Item, Consorzio Interuniversitario.

Experience Report: System Log Analysis for Anomaly Detection

Fail-stutter Behavior Characterization of NFS

Software Metrics and Reliability

CSCE 548 Secure Software Development Risk-Based Security Testing

Testing Tutorial 7.

Introduction to Load Balancing:

ATTRACT TWD Symposium, Barcelona, Spain, 1st July 2016

IEEE Std 1074: Standard for Software Lifecycle

Gestione di Service Level Agreements (SLA) in sistemi Grid

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Transparent Adaptive Resource Management for Middleware Systems

Anne Pratoomtong ECE734, Spring2002

Providing Secure Storage on the Internet

Improving Usability of Fault Injection

An Adaptive Middleware for Supporting Time-Critical Event Response

Smita Vijayakumar Qian Zhu Gagan Agrawal

Approaching an ML Problem

The Organizational Impacts on Software Quality and Defect Estimation

Evaluating Models Part 1

Hardware Counter Driven On-the-Fly Request Signatures

GATES: A Grid-Based Middleware for Processing Distributed Data Streams

Uncertainty-driven Ensemble Forecasting of QoS in Software Defined Networks Kostas Kolomvatsos1, Christos Anagnostopoulos2, Angelos Marnerides3, Qiang.

Student: Mallesham Dasari Faculty Advisor: Dr. Maggie Cheng

Foundations and Definitions

A Study of On-Off Attack Models for Wireless Ad Hoc Networks

Luca Simoncini PDCC, Pisa and University of Pisa, Pisa, Italy

Detecting Attacks Against Robotic Vehicles:

Presentation transcript:

A statistical anomaly-based algorithm for on-line fault detection in complex software critical systems A. Bovenzi – F. Brancati Università degli Studi di NAPOLI "Federico II" Dipartimento di Informatica e Sistemistica Università degli Studi di Firenze Dipartimenti di Sistemi e Informatica DOTS-LCCI PM 5 th Meeting ROMA Maggio 2011

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Detect Failures in complex & critical SW systems Legacy and OTS ( Off - The - Shelf ) based Several interacting components Different configurations Detection performed at process ( thread ) level Crash failures which cause a process(thread) to terminate unexpectedly Hang failures ( active and passive ) which cause a process(thread) to be suspended and its external state to be constant. Motivations and conclusion From last Meeting Fail - halt ( or Fail - stop ) Systems

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Definition of Anomaly With respect to a monitored variable characterizing the behavior of the system, the term anomaly is a change in this variable caused by specific and non - random factors [ Montgomery 00] overload, the activation of faults, malicious attacks Motivations and conclusion From last Meeting On - line anomaly detection is an essential mean to guarantee dependability of complex and critical software systems Difficult task because of system properties Complexity ( lots of interacting components ) Highly dynamic ( frequent reconfigurations, updates ) Several sources of non - determinism

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Motivations and conclusion From last Meeting Anomaly Detectors can take advantage of the possibility to evaluate online the expected behavior of monitored variables Internal R & SAClock algorithm ( SPS ) adapted for anomaly detection Comparison with regards to static thresholds Preliminary results improve Fewer False Positives Better Precision and Recall What has been done in the meantime ?

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Outline 1. SPS - based detection framework a)Static vs Adaptive Thresholds 2. Experimental Evaluation a)Case study b)Monitored variables c)The Experimental Phase 3. Metrics 4. Experimental results 5. Conclusion and future work

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Detection framework The Detection Framework Application Middleware Kernel Monitoring tool training Static thresholds limitations Operational conditions of the system similar to those of the training set

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Static vs Adaptive Thresholds Failure at ~ 160 sec SPS signals the failure Static thresholds signal the failure but… produce lots of False Positives

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Detection framework SPS - base Detection Application Middleware Kernel Monitoring tool training

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Case Study : the SWIM - BOX ® ATC domain From Fragmented Systems Towards A network of integrated co - operators SWIM - BOX To cooperate & share information between distributed and heterogeneous ATC legacy Web Services Publisher / Subscriber OTS - based JBOSS AS, OSPL, RTI DDS, Mysql DB

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Monitored variables Capture the application behavior indirectly Breakpoints placed in specific kernel functions Probe handlers to quickly collect data e. g., input parameters, return values

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ The Experimental Phase 1. Workload selection ( Swim - box Validation Plan ) Differing for message rate message per burst time between burst 2. Experiments execution Golden Run Faulty Run Source code mutation tool ( http :// www. mobilab. unina. it / SFI. htm ) 3. Post processing phase Both algorithms applied to the monitored data Varying several algorithm parameters

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Post processing methodology

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Quality metrics Basic : True Positive ( TP ): if a failure occurs and the detector triggers an alarm ; False Positive ( FP ): if no failure occurs and an alarm is given ; True Negative ( TN ): if no real failure occurs and no alarm is raised ; False Negative ( FN ): if the algorithm fails to detect an occurring failure. Derived :

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Experimental results dt(c,m) SPS200.4(0.99,20) ST-Training30.5 ST-No Training30.3

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Experimental results aTM (sec)aMRaPAA*CSynthesis SPS Algorithm Static T. without training Static T. with training

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Conclusion Error Detection exploiting OS - level indicators can be improved by means of SPS algorithm Experimental results ( achieved via fault injection ) show the limitations of static threshold algorithms in scenarios, where the operational conditions of system differ from those of the training phase Detector equipped with SPS copes with variable and non - stationary systems needs no training phase performs better in terms of Coverage, Query accuracy probability, Mistake rate and Mistake duration

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Future work Investigate SPS - based Detector performance by varying the number and type of monitored variables Is the detection framework application independent ? Explore how the detection framework performs under different OSs Is the detection framework OS independent ? Which OS is best suited for the proposed approach ? New experimental campaign planned Same case study under Windows Server 2008 Compare the Detector performance by varying Predictors Is SPS - based predictor the best choice ? Compare SPS with ARIMA models, neural networks, …

A.Bovenzi and F. Brancati DOTS-LCCI - ROMA 30/05/ Thank you for your attention Questions ? Insight for the future work ?