Achieving self-healing in service delivery software systems by means of case- based reasoning Stefania Montani Cosimo Anglano Presented by Tony Schneider.

Slides:

Advertisements

Similar presentations

By Rick Clements Software Testing 101 By Rick Clements

Advertisements

Seyedehmehrnaz Mireslami, Mohammad Moshirpour, Behrouz H. Far Department of Electrical and Computer Engineering University of Calgary, Canada {smiresla,

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

Software Quality Assurance Plan

Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz

Autonomic Systems Justin Moles, Winter 2006 Security in an Autonomic Computing Environment Paper by: D. M. Chess, C. C. Palmer S. R. White Presentation.

Developer Testing and Debugging. Resources Code Complete by Steve McConnell Code Complete by Steve McConnell Safari Books Online Safari Books Online Google.

Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.

Artificial Intelligence MEI 2008/2009 Bruno Paulette.

Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.

Case Based Reasoning Melanie Hanson Engr 315. What is Case-Based Reasoning? Storing information from previous experiences Using previously gained knowledge.

Soft Computing and Its Applications in SE Shafay Shamail Malik Jahan Khan.

Introduction to z/OS Basics © 2006 IBM Corporation Chapter 8: Designing and developing applications for z/OS.

Fundamentals of Information Systems, Second Edition

Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.

System Development Life Cycle (SDLC)

MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Systems Investigation and Analysis Chapter 12.

9 1 Chapter 9 Database Design Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.

Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.

CHAPTER 19 Building Software.

Understanding and Managing WebSphere V5

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.

Effective Methods for Software and Systems Integration

VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.

1.Database plan 2.Information systems plan 3.Technology plan 4.Business strategy plan 5.Enterprise analysis Which of the following serves as a road map.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.

Software Engineering Muhammad Fahad Khan

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.

ECE 355: Software Engineering

1 Autonomic Computing An Introduction Guenter Kickinger.

Describing Methodologies PART II Rapid Application Development*

-Nikhil Bhatia 28 th October What is RUP? Central Elements of RUP Project Lifecycle Phases Six Engineering Disciplines Three Supporting Disciplines.

SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.

WELCOME. AUTONOMIC COMPUTING PRESENTED BY: NIKHIL P S7 IT ROLL NO: 33.

Chapter Fourteen Windows XP Professional Fault Tolerance.

1 Chapter 9 Database Design. 2 2 In this chapter, you will learn: That successful database design must reflect the information system of which the database.

FCS - AAO - DM COMPE/SE/ISE 492 Senior Project 2 System/Software Test Documentation (STD) System/Software Test Documentation (STD)

1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.

4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.

Ranga Rodrigo. The purpose of software engineering is to find ways of building quality software.

Case-Based Reasoning Shih-Hsiung, Chou.

Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.

The Systems Development Life Cycle

Syllabus Management System. The Problem There is need for a management system for syllabi that: Provides a simple and effective user interface Allows.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.

® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management

9 Systems Analysis and Design in a Changing World, Fourth Edition.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.

Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS

1 Knowledge Acquisition and Learning by Experience – The Role of Case-Specific Knowledge Knowledge modeling and acquisition Learning by experience Framework.

“Content Management and the need for change in Technical Communication.” Written by: Scott P. Abel Presented by: Ayodele Smith.

B. Trousse, R. Kanawati - JTE : Advanced Services on the Web, Paris 7 may 1999 Broadway: a recommendation computation approach based on user behaviour.

1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University

By Anthony W. Hill & Course Technology1 Troubleshooting Computer Problems.

Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.

Waterfall Model Software project Management. Waterfall Model  The Waterfall Model was first Process Model to be introduced. It is also referred to as.

1 SYS366 Week 1 - Lecture 1 Introduction to Systems.

Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,

Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.

Troubleshooting Windows Vista Lesson 11. Skills Matrix Technology SkillObjective DomainObjective # Troubleshooting Installation and Startup Issues Troubleshoot.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Advanced Higher Computing Science

Self Healing and Dynamic Construction Framework:

The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.

Textbook Engineering Web Applications by Sven Casteleyn et. al. Springer Note: (Electronic version is available online) These slides are designed.

Fault Tolerance Distributed Web-based Systems

Chapter 11: Integration- and System Testing

Presentation transcript:

Achieving self-healing in service delivery software systems by means of case- based reasoning Stefania Montani Cosimo Anglano Presented by Tony Schneider Pr

Introduction Background CBR Implementation Experiment / Cavy Results

Autonomic Systems Overview Background | CBR Implementation | Experiment / Cavy | Results Goal is to self-manage system System needs to exhibit ‣ Self-Configuration ‣ Self-Optimization ‣ Self-Protection ‣ Self-Healing

Self-Healing Background | CBR Implementation | Experiment / Cavy | Results “Service Delivery Systems” (SDS) ‣ Aimed at delivering 24/7 services These services prone to breakage ‣ Service failures ‣ Software, Hardware, Network ‣ Can’t handle manually ‣ Need to repair the system autonomously

Self-Healing Background | CBR Implementation | Experiment / Cavy | Results

Internalization ‣ The Self-Healing Engine is integrated with the software ‣ Not extendable ‣ Depends on specific applications Externalization ‣ Great for retrofitting current systems ‣ Allows a general method for SDS self-healing

Self-Healing Background | CBR Implementation | Experiment / Cavy | Results Problems with current approach ‣ MAPE model assumes prior knowledge of the system ‣ Knowledge base is problematic ‣ Large, time consuming, & laborious ‣ Need to keep up-to-date Build the knowledge base automatically ‣ How?

Case-based Reasoning Background | CBR Implementation | Experiment / Cavy | Results Case-Based Reasoning (CBR) ‣ Uses previous experience for problem solving ‣ Retrieves similar cases to current problem ‣ Reuses past successful solutions ‣ Revises retrieved solution if necessary ‣ Retains current case

Case-based Reasoning Background | CBR Implementation | Experiment / Cavy | Results Case-base represents “knowledge” in the MAPE model ‣ Each case represents a previous problem and its solution ‣ Implicit versus Explicit knowledge ‣ Explicit: Rules & models ‣ Implicit: Unstructured & based on experience ‣ Implicit tends to be easier and more conducive to limited interaction

Case-based Reasoning Background | CBR Implementation | Experiment / Cavy | Results Cases are stored by identifying application features ‣ The problem ‣ Applied solution ‣ The outcome of the solution Prevents bottleneck present in other learning methods ‣ E.g., online reinforcement learning

Case-based Reasoning Background | CBR Implementation | Experiment / Cavy | Results CBR relies on large amounts of past cases Pros: ‣ Methods approve with time and experience ‣ Large systems are hosts to recurrent problems Cons ‣ Need to store the data ‣ Need to populate the knowledge base

Case-based Reasoning Background | CBR Implementation | Experiment / Cavy | Results To reiterate: CBR is a methodology designed to assist in the repair of failed systems Questions so far?

System Overview Background | CBR Implementation | Experiment / Cavy | Results SDS is treated as a black box ‣ Self-healing CBR is entirely external to the SDS ‣ Controls the health of the SDS ‣ Components of CBR reflected in MAPE ‣ Analysis Retrieval ‣ Planning Revise ‣ Knowledge Case base

System Overview: MAPE Revised Background | CBR Implementation | Experiment / Cavy | Results Old ModelRevised for CBR

System Overview: MAPE Revised Background | CBR Implementation | Experiment / Cavy | Results Four Additions ‣ Monitoring ‣ Case Preparation ‣ Service Restoration ‣ Repair Module

System Overview: MAPE Revised Background | CBR Implementation | Experiment / Cavy | Results Application Agnostic Portion ‣ Doesn’t rely on specific environment variables Application Specific Portion ‣ Relies on the data from the application Both ‣ Interface between the two layers The managed element is completely external to the healing system

System Overview Background | CBR Implementation | Experiment / Cavy | Results Assumptions ‣ Bad solutions have no effect on the SDS state. Likewise, good solutions don’t produce faults. ‣ Deadlines for producing case solutions aren’t fixed ‣ Every stored case has a unique solution ‣ No transient faults (occur only once) ‣ No intermittent faults (appear, disappear, then reappear again)

CBR Cycle: Retrieve - Reuse/Revise - Retain Background | CBR Implementation | Experiment / Cavy | Results Every stored case is representative of some past failure Need to find the case that approximates current failure Find the average distance between features d f (x, y) ‣ 1 if x or y are missing ‣ overlap(x, y) if f is a symbolic feature ‣ if f is a linear feature

CBR Cycle: Retrieve - Reuse/Revise - Retain Background | CBR Implementation | Experiment / Cavy | Results Apply retrieved case solutions in the order of the bset average ‣ Repeat for all found cases until the problem is solved ‣ Also covers cases with multiple solutions (just use best choice) What if no solution works? ‣ Ask a human

CBR Cycle: Retrieve - Reuse/Revise - Retain Background | CBR Implementation | Experiment / Cavy | Results Just saves the case to the knowledge base ‣ The problem ‣ The solution ‣ The outcome

Odds and Ends Background | CBR Implementation | Experiment / Cavy | Results System initialization ‣ Boot strap phase Prototyping ‣ Makes a general case out of several similar cases in case base ‣ Solves storage space problem ‣ Takes the implicit knowledge and creates explicit knowledge ‣ Used after base case has grown

CBR questions? Background | CBR Implementation | Experiment / Cavy | Results That wraps up the CBR portion. Any Questions?

Experimental Setup Background | CBR Implementation | Experiment / Cavy | Results Implemented CBR-based system using Java ‣ MySQL for the base case storage Used with an SDS testbed “Cavy” Cavy ‣ Configures, deploys, and operates SDS testbeds ‣ Framework that surrounds the healing engine ‣ Injects faults into test bed components

Cavy Components Background | CBR Implementation | Experiment / Cavy | Results Fault managers Diagnoser Service Monitor Integrator Repairer Injector

Cavy Components Background | CBR Implementation | Experiment / Cavy | Results Basically... ‣ The injector breaks the system ‣ The service monitor sees the fault ‣ The diagnoser finds a similar FS pair ‣ Interrogator receives the solution ‣ Repairer tries each solution until one works

Cavy Components Background | CBR Implementation | Experiment / Cavy | Results Cavy implements pieces of the self-healing architecture ‣ Interrogator: Application agnostic pieces ‣ Fault repairer: Application specific pieces ‣ Service monitor: Monitor ‣ Fault managers: Repair

The Experiment Background | CBR Implementation | Experiment / Cavy | Results Rubis ‣ Mimics eBay ‣ Two tiers ‣ Customers interact with web server on the first ‣ Database stored on the second ‣ Several services are tested ‣ Register, Browse, Sell, Home

The Experiment Background | CBR Implementation | Experiment / Cavy | Results Potential Rubis Failures (each can apply to either tier) ‣ Network Problems ‣ Configuration problems ‣ System restart 10 failure descriptors ‣ Boolean values ‣ Represent failed pieces of the system

Initial Base Case (constructed by a human) Background | CBR Implementation | Experiment / Cavy | Results Automatically generated case

Initial Base Case (constructed by a human) Background | CBR Implementation | Experiment / Cavy | Results Distances between current failure and base case

Second Case Background | CBR Implementation | Experiment / Cavy | Results

Results Background | CBR Implementation | Experiment / Cavy | Results Continued like this for 3 days ‣ Of 1016 cases, less than 11 needed human intervention Prototypes functioned correctly ‣ Reduced size of database ‣ Handled new faults with out human intervention ‣ Narrowed down the possible failures to 9 prototype cases ‣ Showed “complex” problems were just simultaneous simple problems

Future Work Use in real-world applications Working around the given assumptions Use of prototyping/generalization Combine CBR with other knowledge sources ‣ Combine CBR with some other methodology

Conclusion ‣ CBR a good solution to self-healing ‣ Repair procedure triggered by service failures ‣ No structured knowledge needed ‣ Worked well even with novel faults