Learning Software Behavior for Automated Diagnosis

Learning Software Behavior for Automated Diagnosis
Ori Bar-ilan | Dr. Meir Kalech | Dr. Roni Stern

Background & Motivation
</> High Level Research Goal: Integrate ML techniques into software diagnosis

1. Background & Motivation
Let’s talk about diagnosis for a moment

Background & Motivation Model-Based
[F. Wotawa, ‘02] </> Traditional Diagnosis: Model-based OFTEN ABSENT IN SOFTWARE w x y 𝐴=1,𝐵=1, 𝐶=0,𝑍=1 𝑊 System Model Observations Diagnosis

Background & Motivation Spectrum-based
[R. Abreu, ‘09] </> Another approach: Spectrum-based Observing the system for expected behavior V 𝑡𝑒𝑠 𝑡 1 System 𝑡𝑒𝑠 𝑡 2 X ⋮ ⋮ V 𝑡𝑒𝑠 𝑡 𝑁

Background & Motivation Spectrum-based
[R. Abreu, ‘09] </> System’s behavior representation – In Practice test failed? M components 𝟏 ⋯ 𝟎 ⋮ ⋱ ⋮ 𝟏 ⋯ 𝟏 𝟏 ⋮ 𝟎 N tests

Background & Motivation Motivation
</> Candidate Ranking Challenge: Too many candidates diagnoses Which one to choose? 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 𝒂 𝟏𝟏 ⋯ 𝒂 𝟏𝑴 ⋮ ⋱ ⋮ 𝒂 𝑵𝟏 ⋯ 𝒂 𝑵𝑴 𝒆 𝟏 ⋮ 𝒆 𝑵 Diagnoses Ordered Diagnoses Likelihood for each diagnosis given the observation rui

</> “Ranker” Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Given each observation, compute the likelihood for each diagnosis being correct Likelihood for each diagnosis given the observation rui

[R. Abreu, ‘09] </> BARINEL’s Ranker Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Observation = Test trace Likelihood for each diagnosis given the observation rui

</> Ranker Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Test trace + What more can be observed? Observation = Likelihood for each diagnosis given the observation rui

Research Method and Details
3. Research Research Method and Details

Research Components’ State - Intuition
</> test failed? 𝑪 𝟏 𝑪 𝟐 𝑪 𝟑 𝒕𝒆𝒔 𝒕 𝟏 𝟏 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎 𝟏 𝟏 𝟎 𝒕𝒆𝒔 𝒕 𝟐 𝒕𝒆𝒔 𝒕 𝟑 Think of the possible diagnoses and their ranking

Research Components’ State - Intuition
</> Assume each component has 2 possible arguments: test failed? 𝑪 𝟏 𝑪 𝟐 𝑪 𝟑 𝟏, 𝟏, 𝟎, 𝟏, 𝟎, 𝟏, 𝟏, 𝟎, 𝟎, 𝟏 𝟏 𝟎 𝒕𝒆𝒔 𝒕 𝟏 𝒕𝒆𝒔 𝒕 𝟐 𝒕𝒆𝒔 𝒕 𝟑 Think of the possible diagnoses and their ranking And now?

Research Components’ State
</> State-Oriented Ranker Diagnoses 𝐶 2 , 𝐶 3 𝐶 4 , 𝐶 8 , 𝐶 12 … 𝐶 2 𝐶 2 𝐶 2 , 𝐶 3 … 𝐶 4 , 𝐶 8 , 𝐶 12 Ordered Diagnoses Test trace + Components’ State Observation = Likelihood for each diagnosis given the observation rui

Research High Level Methodology
SYNTHETIC MODEL Sample a Project Model Components’ Behavior Invoke Diagnosis Algorithm Create State-Oriented Input

Granularity of Atomic Components Statements Blocks Methods Modules …
Research Granularity Granularity of Atomic Components Statements Blocks Methods Modules … Chosen for this discussion

Research Component’s State
</> Test i What is a method’s state? … … Component j 𝒐𝒖𝒕𝒑𝒖𝒕 … Function Foo(self invoker, boolean a, Object o):

Research Enriching SFL Input
</> 𝟏 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎 𝟏 𝟏 𝟎 𝒔 𝒊𝒋 = sampled state of component 𝑗 in test 𝑖 𝟏, 𝑺 𝟏𝟏 𝟏, 𝑺 𝟏𝟐 𝟎, 𝑺 𝟑𝟏 𝟏, 𝑺 𝟐𝟏 𝟎, 𝑺 𝟐𝟐 𝟏, 𝑺 𝟑𝟐 𝟏, 𝑺 𝟑𝟏 𝟎, 𝑺 𝟑𝟐 𝟎, 𝑺 𝟑𝟑 𝟏 𝟏 𝟎 𝑺 𝟏𝟏 Enrich with components’ states 𝑺 𝟏𝟐 𝑺 𝟏𝟑 𝑺 𝟐𝟏 𝑺 𝟐𝟐 𝑺 𝟐𝟑 𝑺 𝟑𝟏 𝑺 𝟑𝟐 𝑺 𝟑𝟑

Research Method Method Observe the system over time and sample components’ state Learn states that correlate to failures Prioritize diagnoses with a stronger correlation to test failures

Research Learning Components’ Behavior
Train Set for Method foo() Sample from 𝑪 𝒋 in 𝒕𝒆𝒔 𝒕 𝒊 Self Arg1 ArgN Output/ Exception Failure … 21312 0.756 1 5 23423 0.223 Self Arg1 ArgN Output/ Exception 21312 0.756 1 5 𝑆 𝑖𝑗 𝑏 𝑖𝑗 Correlation with failures 0.82 State ij -> ti fails? Add bij

Research Ranking Policy
The Goodness Function Ranking the diagnosis candidates is done according to this policy: 𝝐= 𝑪𝒋∈𝝎 𝒂 𝒊𝒋 =𝟏 𝟏− 𝒃 𝒊𝒋 𝒊𝒇 𝒆 𝒊 =𝟎 𝟏− 𝑪𝒋∈𝝎 𝒂 𝒊𝒋 =𝟏 𝟏−𝒃 𝒊𝒋 𝒊𝒇 𝒆 𝒊 =𝟏

Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results
3. Experiment Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results

4 real world open source Java projects
Experiment Setup [Elmishali, 2015] 4 real world open source Java projects Known bugs (using Issue Trackers) Generating 536 instances (134 per project) Project Tests Methods Bug Reported Bugs Fixed Orient DB 790 19,207 4,625 2,459 Eclipse CDT 3,990 66,982 17,713 9,091 Apache Ant 5,190 10,830 5,890 1,176 Apache POI 2,346 21,475 3,361 1,408 elmishali

Experiment Evaluated Algorithms
Our State-Augmented diagnoser compared against: BARINEL (Abreu et al.) Data-Augmented variant of BARINEL (Elmishali et al.)

Experiment Synthesizing Behavior
Using a synthetic behavior model to control the model’s accuracy Generated Model Software System Diagnosis Spectrum-based Algorithm

Using a synthetic behavior model to control the model’s accuracy Use True Diagnosis (Ground Truth) Generated Model Software System Diagnosis Spectrum-based Algorithm

Using a synthetic behavior model to control the model’s accuracy Use True Diagnosis (Ground Truth) Synthetic Noise Generated Model Software System Diagnosis Spectrum-based Algorithm

Example Given: Ground Truth = { 𝑪 𝟏 } Synthetic Error: 𝟎 𝟎.𝟏 𝑪 𝟏 𝑪 𝟐 𝑪 𝟑 𝒕𝒆𝒔 𝒕 𝟏 𝟏, 𝟏, 𝟎, 𝟏, 𝟎, 𝟏, 𝟏, 𝟎, 𝟎, 𝟏 𝟏 𝟎 𝟏 𝟎.𝟗 𝟎.𝟏 𝟎 𝒕𝒆𝒔 𝒕 𝟐 𝟏 𝟎.𝟗 𝟎.𝟏 𝟎 𝟎 𝟎 𝒕𝒆𝒔 𝒕 𝟑

Experiment Evaluation Metric
How to measure a diagnosis’ quality? We used 3 known metrics: Weighted Average Precision Weighted Average Recall Health State Wasted Effort

Experiment Overview Results
Precision Recall With 0.15 synthetic error - similar results to the DA diagnoser With 0.2 synthetic error - signiﬁcantly better results than Barinel

Health State Wasted Effort
Experiment Results Health State Wasted Effort With 0.3 synthetic error rate - superior results over both diagnosers

Experiment Conclusions
Even with 30% error, this technique can provide a significant improvement in candidate ranking

Challenges & Future Steps
4. Roadmap Challenges & Future Steps

Roadmap Challenges Dealing with small data-sets (model per component): Live systems / test generation Diagnosing on a higher level (e.g. class) Learning states with imbalanced data-sets (only few faults) abnormal states rather than “correlative to faults”

Roadmap Future Steps Instrument real software for a non-synthetic behavior approximation Consider more variations for utilizations of the learned behavior Combine this work with orthogonal diagnosers

THANKS! Any questions?

Learning Software Behavior for Automated Diagnosis

Similar presentations

Presentation on theme: "Learning Software Behavior for Automated Diagnosis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Software Behavior for Automated Diagnosis

Similar presentations

Presentation on theme: "Learning Software Behavior for Automated Diagnosis"— Presentation transcript:

Similar presentations

About project

Feedback