Learning Software Behavior for Automated Diagnosis

Slides:



Advertisements
Similar presentations
Imbalanced data David Kauchak CS 451 – Fall 2013.
Advertisements

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Ira Cohen, Jeffrey S. Chase et al.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
Mining Metrics to Predict Component Failures Nachiappan Nagappan, Microsoft Research Thomas Ball, Microsoft Research Andreas Zeller, Saarland University.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
ELEC 7250 Term Project Presentation Khushboo Sheth Department of Electrical and Computer Engineering Auburn University, Auburn, AL.
Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some slides in this tutorial were borrowed from Chao Liu at UIUC.
Discriminative Training of Kalman Filters P. Abbeel, A. Coates, M
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Software Testing Test Design and Implementation. Agenda Test Design Test Implementation Test Design Sources Automated Testing 2.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Automated Diagnosis of Software Configuration Errors
Expediting Programmer AWAREness of Anomalous Code Sarah E. Smith Laurie Williams Jun Xu November 11, 2005.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Jieming Zhu 1, Pinjia He 1, Qiang Fu 2, Hongyu Zhang 3, Michael R. Lyu 1, Dongmei Zhang 3 1 The Chinese University of Hong Kong, Hong Kong 2 Microsoft,
Structural Abstraction for Strong Fault Models Diagnosis (DX 2014 BISFAI 2015) Roni SternMeir KalechOrel Elimelech Ben Gurion University of the Negev,
1. Topics to be discussed Introduction Objectives Testing Life Cycle Verification Vs Validation Testing Methodology Testing Levels 2.
NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang, Eleni Gessiou NYU Poly, University of British Columbia.
Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.
DOLPHIN INTEGRATION TAMES-2 workshop 23/05/2004 Corsica1 Behavioural Error Injection, Spectral Analysis and Error Detection for a 4 th order Single-loop.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Bug Localization with Machine Learning Techniques Wujie Zheng
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Niv Gafni, Yair Offir and Eliav Ben-zaken Information System Engineering Ben Gurion University 1.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
THE IRISH SOFTWARE ENGINEERING RESEARCH CENTRELERO© What we currently know about software fault prediction: A systematic review of the fault prediction.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.
Jing Ye 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Probabilistic Model-Driven Recovery in Distributed Systems Kaustubh R. Joshi, Matti A. Hiltunen, William H. Sanders, and Richard D. Schlichting May 2,
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]
Week#3 Software Quality Engineering.
Experience Report: System Log Analysis for Anomaly Detection
Regression Testing with its types
Learning to Personalize Query Auto-Completion
Software Testing.
MSA / Gage Capability (GR&R)
Different Types of Testing
Chapter 8 – Software Testing
Outlier Processing via L1-Principal Subspaces
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
A Test Case + Mock Class Generator for Coding Against Interfaces
Collective Network Linkage across Heterogeneous Social Platforms
Software Engineering 1, CS 355 Unit Testing with JUnit
Chapter 10 Verification and Validation of Simulation Models
Bayesian Averaging of Classifiers and the Overfitting Problem
Higher-Order Procedures
Software Testing (Lecture 11-a)
Test Case Purification for Improving Fault Localization
Pose Estimation for non-cooperative Spacecraft Rendevous using CNN
Finding Periodic Discrete Events in Noisy Streams
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Precise Condition Synthesis for Program Repair
Presentation transcript:

Learning Software Behavior for Automated Diagnosis Ori Bar-ilan | Dr. Meir Kalech | Dr. Roni Stern

Background & Motivation </> High Level Research Goal: Integrate ML techniques into software diagnosis

1. Background & Motivation Let’s talk about diagnosis for a moment

Background & Motivation Model-Based [F. Wotawa, ‘02] </> Traditional Diagnosis: Model-based OFTEN ABSENT IN SOFTWARE w x y 𝐴=1,𝐵=1, 𝐶=0,𝑍=1 𝑊 System Model Observations Diagnosis

Background & Motivation Spectrum-based [R. Abreu, ‘09] </> Another approach: Spectrum-based Observing the system for expected behavior V 𝑡𝑒𝑠 𝑡 1 System 𝑡𝑒𝑠 𝑡 2 X ⋮ ⋮ V 𝑡𝑒𝑠 𝑡 𝑁

Background & Motivation Spectrum-based [R. Abreu, ‘09] </> System’s behavior representation – In Practice test failed? M components 𝟏 ⋯ 𝟎 ⋮ ⋱ ⋮ 𝟏 ⋯ 𝟏 𝟏 ⋮ 𝟎 N tests

Background & Motivation Motivation </> Candidate Ranking Challenge: Too many candidates diagnoses Which one to choose? 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 𝒂 𝟏𝟏 ⋯ 𝒂 𝟏𝑴 ⋮ ⋱ ⋮ 𝒂 𝑵𝟏 ⋯ 𝒂 𝑵𝑴 𝒆 𝟏 ⋮ 𝒆 𝑵 Diagnoses Ordered Diagnoses Likelihood for each diagnosis given the observation rui

Background & Motivation Motivation </> “Ranker” Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Given each observation, compute the likelihood for each diagnosis being correct Likelihood for each diagnosis given the observation rui

Background & Motivation Motivation [R. Abreu, ‘09] </> BARINEL’s Ranker Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Observation = Test trace Likelihood for each diagnosis given the observation rui

Background & Motivation Motivation </> Ranker Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Test trace + What more can be observed? Observation = Likelihood for each diagnosis given the observation rui

Research Method and Details 3. Research Research Method and Details

Research Components’ State - Intuition </> test failed? 𝑪 𝟏 𝑪 𝟐 𝑪 𝟑 𝒕𝒆𝒔 𝒕 𝟏 𝟏 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎 𝟏 𝟏 𝟎 𝒕𝒆𝒔 𝒕 𝟐 𝒕𝒆𝒔 𝒕 𝟑 Think of the possible diagnoses and their ranking

Research Components’ State - Intuition </> Assume each component has 2 possible arguments: test failed? 𝑪 𝟏 𝑪 𝟐 𝑪 𝟑 𝟏, 𝟏, 𝟎, 𝟏, 𝟎, 𝟏, 𝟏, 𝟎, 𝟎, 𝟏 𝟏 𝟎 𝒕𝒆𝒔 𝒕 𝟏 𝒕𝒆𝒔 𝒕 𝟐 𝒕𝒆𝒔 𝒕 𝟑 Think of the possible diagnoses and their ranking And now?

Research Components’ State </> State-Oriented Ranker Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Test trace + Components’ State Observation = Likelihood for each diagnosis given the observation rui

Research High Level Methodology SYNTHETIC MODEL Sample a Project Model Components’ Behavior Invoke Diagnosis Algorithm Create State-Oriented Input

Granularity of Atomic Components Statements Blocks Methods Modules … Research Granularity Granularity of Atomic Components Statements Blocks Methods Modules … Chosen for this discussion

Research Component’s State </> Test i What is a method’s state? … … Component j 𝒐𝒖𝒕𝒑𝒖𝒕 … Function Foo(self invoker, boolean a, Object o):

Research Enriching SFL Input </> 𝟏 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎 𝟏 𝟏 𝟎 𝒔 𝒊𝒋 = sampled state of component 𝑗 in test 𝑖 𝟏, 𝑺 𝟏𝟏 𝟏, 𝑺 𝟏𝟐 𝟎, 𝑺 𝟑𝟏 𝟏, 𝑺 𝟐𝟏 𝟎, 𝑺 𝟐𝟐 𝟏, 𝑺 𝟑𝟐 𝟏, 𝑺 𝟑𝟏 𝟎, 𝑺 𝟑𝟐 𝟎, 𝑺 𝟑𝟑 𝟏 𝟏 𝟎 𝑺 𝟏𝟏 Enrich with components’ states 𝑺 𝟏𝟐 𝑺 𝟏𝟑 𝑺 𝟐𝟏 𝑺 𝟐𝟐 𝑺 𝟐𝟑 𝑺 𝟑𝟏 𝑺 𝟑𝟐 𝑺 𝟑𝟑

Research Method Method Observe the system over time and sample components’ state Learn states that correlate to failures Prioritize diagnoses with a stronger correlation to test failures

Research Learning Components’ Behavior Train Set for Method foo() Sample from 𝑪 𝒋 in 𝒕𝒆𝒔 𝒕 𝒊 Self Arg1 ArgN Output/ Exception Failure … 21312 0.756 1 5 23423 0.223 Self Arg1 ArgN Output/ Exception 21312 0.756 1 5 𝑆 𝑖𝑗 𝑏 𝑖𝑗 Correlation with failures 0.82 State ij -> ti fails? Add bij

Research Ranking Policy The Goodness Function Ranking the diagnosis candidates is done according to this policy: 𝝐= 𝑪𝒋∈𝝎 𝒂 𝒊𝒋 =𝟏 𝟏− 𝒃 𝒊𝒋 𝒊𝒇 𝒆 𝒊 =𝟎 𝟏− 𝑪𝒋∈𝝎 𝒂 𝒊𝒋 =𝟏 𝟏−𝒃 𝒊𝒋 𝒊𝒇 𝒆 𝒊 =𝟏

Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results 3. Experiment Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results

4 real world open source Java projects Experiment Setup [Elmishali, 2015] 4 real world open source Java projects Known bugs (using Issue Trackers) Generating 536 instances (134 per project) Project Tests Methods Bug Reported Bugs Fixed Orient DB 790 19,207 4,625 2,459 Eclipse CDT 3,990 66,982 17,713 9,091 Apache Ant 5,190 10,830 5,890 1,176 Apache POI 2,346 21,475 3,361 1,408 elmishali

Experiment Evaluated Algorithms Our State-Augmented diagnoser compared against: BARINEL (Abreu et al.) Data-Augmented variant of BARINEL (Elmishali et al.)

Experiment Synthesizing Behavior Using a synthetic behavior model to control the model’s accuracy Generated Model Software System Diagnosis Spectrum-based Algorithm

Experiment Synthesizing Behavior Using a synthetic behavior model to control the model’s accuracy Use True Diagnosis (Ground Truth) Generated Model Software System Diagnosis Spectrum-based Algorithm

Experiment Synthesizing Behavior Using a synthetic behavior model to control the model’s accuracy Use True Diagnosis (Ground Truth) Synthetic Noise Generated Model Software System Diagnosis Spectrum-based Algorithm

Experiment Synthesizing Behavior Example Given: Ground Truth = { 𝑪 𝟏 } Synthetic Error: 𝟎 𝟎.𝟏 𝑪 𝟏 𝑪 𝟐 𝑪 𝟑 𝒕𝒆𝒔 𝒕 𝟏 𝟏, 𝟏, 𝟎, 𝟏, 𝟎, 𝟏, 𝟏, 𝟎, 𝟎, 𝟏 𝟏 𝟎 𝟏 𝟎.𝟗 𝟎.𝟏 𝟎 𝒕𝒆𝒔 𝒕 𝟐 𝟏 𝟎.𝟗 𝟎.𝟏 𝟎 𝟎 𝟎 𝒕𝒆𝒔 𝒕 𝟑

Experiment Evaluation Metric How to measure a diagnosis’ quality? We used 3 known metrics: Weighted Average Precision Weighted Average Recall Health State Wasted Effort

Experiment Overview Results Precision Recall With 0.15 synthetic error - similar results to the DA diagnoser With 0.2 synthetic error - significantly better results than Barinel

Health State Wasted Effort Experiment Results Health State Wasted Effort With 0.3 synthetic error rate - superior results over both diagnosers

Experiment Conclusions Even with 30% error, this technique can provide a significant improvement in candidate ranking

Challenges & Future Steps 4. Roadmap Challenges & Future Steps

Roadmap Challenges Dealing with small data-sets (model per component): Live systems / test generation Diagnosing on a higher level (e.g. class) Learning states with imbalanced data-sets (only few faults) abnormal states rather than “correlative to faults”

Roadmap Future Steps Instrument real software for a non-synthetic behavior approximation Consider more variations for utilizations of the learned behavior Combine this work with orthogonal diagnosers

THANKS! Any questions?