Download presentation
Presentation is loading. Please wait.
Published byChristine Carpenter Modified over 6 years ago
1
Learning Software Behavior for Automated Diagnosis
Ori Bar-ilan | Dr. Meir Kalech | Dr. Roni Stern
2
Background & Motivation
</> High Level Research Goal: Integrate ML techniques into software diagnosis
3
1. Background & Motivation
Letโs talk about diagnosis for a moment
4
Background & Motivation Model-Based
[F. Wotawa, โ02] </> Traditional Diagnosis: Model-based OFTEN ABSENT IN SOFTWARE w x y ๐ด=1,๐ต=1, ๐ถ=0,๐=1 ๐ System Model Observations Diagnosis
5
Background & Motivation Spectrum-based
[R. Abreu, โ09] </> Another approach: Spectrum-based Observing the system for expected behavior V ๐ก๐๐ ๐ก 1 System ๐ก๐๐ ๐ก 2 X โฎ โฎ V ๐ก๐๐ ๐ก ๐
6
Background & Motivation Spectrum-based
[R. Abreu, โ09] </> Systemโs behavior representation โ In Practice test failed? M components ๐ โฏ ๐ โฎ โฑ โฎ ๐ โฏ ๐ ๐ โฎ ๐ N tests
7
Background & Motivation Motivation
</> Candidate Ranking Challenge: Too many candidates diagnoses Which one to choose? ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 ๐ ๐๐ โฏ ๐ ๐๐ด โฎ โฑ โฎ ๐ ๐ต๐ โฏ ๐ ๐ต๐ด ๐ ๐ โฎ ๐ ๐ต Diagnoses Ordered Diagnoses Likelihood for each diagnosis given the observation rui
8
Background & Motivation Motivation
</> โRankerโ Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Given each observation, compute the likelihood for each diagnosis being correct Likelihood for each diagnosis given the observation rui
9
Background & Motivation Motivation
[R. Abreu, โ09] </> BARINELโs Ranker Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Observation = Test trace Likelihood for each diagnosis given the observation rui
10
Background & Motivation Motivation
</> Ranker Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Test trace + What more can be observed? Observation = Likelihood for each diagnosis given the observation rui
11
Research Method and Details
3. Research Research Method and Details
12
Research Componentsโ State - Intuition
</> test failed? ๐ช ๐ ๐ช ๐ ๐ช ๐ ๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐ ๐ ๐ ๐๐๐ ๐ ๐ Think of the possible diagnoses and their ranking
13
Research Componentsโ State - Intuition
</> Assume each component has 2 possible arguments: test failed? ๐ช ๐ ๐ช ๐ ๐ช ๐ ๐, ๐, ๐, ๐, ๐, ๐, ๐, ๐, ๐, ๐ ๐ ๐ ๐๐๐ ๐ ๐ ๐๐๐ ๐ ๐ ๐๐๐ ๐ ๐ Think of the possible diagnoses and their ranking And now?
14
Research Componentsโ State
</> State-Oriented Ranker Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Test trace + Componentsโ State Observation = Likelihood for each diagnosis given the observation rui
15
Research High Level Methodology
SYNTHETIC MODEL Sample a Project Model Componentsโ Behavior Invoke Diagnosis Algorithm Create State-Oriented Input
16
Granularity of Atomic Components Statements Blocks Methods Modules โฆ
Research Granularity Granularity of Atomic Components Statements Blocks Methods Modules โฆ Chosen for this discussion
17
Research Componentโs State
</> Test i What is a methodโs state? โฆ โฆ Component j ๐๐๐๐๐๐ โฆ Function Foo(self invoker, boolean a, Object o):
18
Research Enriching SFL Input
</> ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐๐ = sampled state of component ๐ in test ๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐, ๐บ ๐๐ ๐ ๐ ๐ ๐บ ๐๐ Enrich with componentsโ states ๐บ ๐๐ ๐บ ๐๐ ๐บ ๐๐ ๐บ ๐๐ ๐บ ๐๐ ๐บ ๐๐ ๐บ ๐๐ ๐บ ๐๐
19
Research Method Method Observe the system over time and sample componentsโ state Learn states that correlate to failures Prioritize diagnoses with a stronger correlation to test failures
20
Research Learning Componentsโ Behavior
Train Set for Method foo() Sample from ๐ช ๐ in ๐๐๐ ๐ ๐ Self Arg1 ArgN Output/ Exception Failure โฆ 21312 0.756 1 5 23423 0.223 Self Arg1 ArgN Output/ Exception 21312 0.756 1 5 ๐ ๐๐ ๐ ๐๐ Correlation with failures 0.82 State ij -> ti fails? Add bij
21
Research Ranking Policy
The Goodness Function Ranking the diagnosis candidates is done according to this policy: ๐= ๐ช๐โ๐ ๐ ๐๐ =๐ ๐โ ๐ ๐๐ ๐๐ ๐ ๐ =๐ ๐โ ๐ช๐โ๐ ๐ ๐๐ =๐ ๐โ๐ ๐๐ ๐๐ ๐ ๐ =๐
22
Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results
3. Experiment Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results
23
4 real world open source Java projects
Experiment Setup [Elmishali, 2015] 4 real world open source Java projects Known bugs (using Issue Trackers) Generating 536 instances (134 per project) Project Tests Methods Bug Reported Bugs Fixed Orient DB 790 19,207 4,625 2,459 Eclipse CDT 3,990 66,982 17,713 9,091 Apache Ant 5,190 10,830 5,890 1,176 Apache POI 2,346 21,475 3,361 1,408 elmishali
24
Experiment Evaluated Algorithms
Our State-Augmented diagnoser compared against: BARINEL (Abreu et al.) Data-Augmented variant of BARINEL (Elmishali et al.)
25
Experiment Synthesizing Behavior
Using a synthetic behavior model to control the modelโs accuracy Generated Model Software System Diagnosis Spectrum-based Algorithm
26
Experiment Synthesizing Behavior
Using a synthetic behavior model to control the modelโs accuracy Use True Diagnosis (Ground Truth) Generated Model Software System Diagnosis Spectrum-based Algorithm
27
Experiment Synthesizing Behavior
Using a synthetic behavior model to control the modelโs accuracy Use True Diagnosis (Ground Truth) Synthetic Noise Generated Model Software System Diagnosis Spectrum-based Algorithm
28
Experiment Synthesizing Behavior
Example Given: Ground Truth = { ๐ช ๐ } Synthetic Error: ๐ ๐.๐ ๐ช ๐ ๐ช ๐ ๐ช ๐ ๐๐๐ ๐ ๐ ๐, ๐, ๐, ๐, ๐, ๐, ๐, ๐, ๐, ๐ ๐ ๐ ๐ ๐.๐ ๐.๐ ๐ ๐๐๐ ๐ ๐ ๐ ๐.๐ ๐.๐ ๐ ๐ ๐ ๐๐๐ ๐ ๐
29
Experiment Evaluation Metric
How to measure a diagnosisโ quality? We used 3 known metrics: Weighted Average Precision Weighted Average Recall Health State Wasted Effort
30
Experiment Overview Results
Precision Recall With 0.15 synthetic error - similar results to the DA diagnoser With 0.2 synthetic error - signi๏ฌcantly better results than Barinel
31
Health State Wasted Effort
Experiment Results Health State Wasted Effort With 0.3 synthetic error rate - superior results over both diagnosers
32
Experiment Conclusions
Even with 30% error, this technique can provide a significant improvement in candidate ranking
33
Challenges & Future Steps
4. Roadmap Challenges & Future Steps
34
Roadmap Challenges Dealing with small data-sets (model per component): Live systems / test generation Diagnosing on a higher level (e.g. class) Learning states with imbalanced data-sets (only few faults) abnormal states rather than โcorrelative to faultsโ
35
Roadmap Future Steps Instrument real software for a non-synthetic behavior approximation Consider more variations for utilizations of the learned behavior Combine this work with orthogonal diagnosers
36
THANKS! Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.