Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Software Behavior for Automated Diagnosis

Similar presentations


Presentation on theme: "Learning Software Behavior for Automated Diagnosis"โ€” Presentation transcript:

1 Learning Software Behavior for Automated Diagnosis
Ori Bar-ilan | Dr. Meir Kalech | Dr. Roni Stern

2 Background & Motivation
</> High Level Research Goal: Integrate ML techniques into software diagnosis

3 1. Background & Motivation
Letโ€™s talk about diagnosis for a moment

4 Background & Motivation Model-Based
[F. Wotawa, โ€˜02] </> Traditional Diagnosis: Model-based OFTEN ABSENT IN SOFTWARE w x y ๐ด=1,๐ต=1, ๐ถ=0,๐‘=1 ๐‘Š System Model Observations Diagnosis

5 Background & Motivation Spectrum-based
[R. Abreu, โ€˜09] </> Another approach: Spectrum-based Observing the system for expected behavior V ๐‘ก๐‘’๐‘  ๐‘ก 1 System ๐‘ก๐‘’๐‘  ๐‘ก 2 X โ‹ฎ โ‹ฎ V ๐‘ก๐‘’๐‘  ๐‘ก ๐‘

6 Background & Motivation Spectrum-based
[R. Abreu, โ€˜09] </> Systemโ€™s behavior representation โ€“ In Practice test failed? M components ๐Ÿ โ‹ฏ ๐ŸŽ โ‹ฎ โ‹ฑ โ‹ฎ ๐Ÿ โ‹ฏ ๐Ÿ ๐Ÿ โ‹ฎ ๐ŸŽ N tests

7 Background & Motivation Motivation
</> Candidate Ranking Challenge: Too many candidates diagnoses Which one to choose? ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โ€ฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โ€ฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 ๐’‚ ๐Ÿ๐Ÿ โ‹ฏ ๐’‚ ๐Ÿ๐‘ด โ‹ฎ โ‹ฑ โ‹ฎ ๐’‚ ๐‘ต๐Ÿ โ‹ฏ ๐’‚ ๐‘ต๐‘ด ๐’† ๐Ÿ โ‹ฎ ๐’† ๐‘ต Diagnoses Ordered Diagnoses Likelihood for each diagnosis given the observation rui

8 Background & Motivation Motivation
</> โ€œRankerโ€ Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โ€ฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โ€ฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Given each observation, compute the likelihood for each diagnosis being correct Likelihood for each diagnosis given the observation rui

9 Background & Motivation Motivation
[R. Abreu, โ€˜09] </> BARINELโ€™s Ranker Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โ€ฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โ€ฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Observation = Test trace Likelihood for each diagnosis given the observation rui

10 Background & Motivation Motivation
</> Ranker Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โ€ฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โ€ฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Test trace + What more can be observed? Observation = Likelihood for each diagnosis given the observation rui

11 Research Method and Details
3. Research Research Method and Details

12 Research Componentsโ€™ State - Intuition
</> test failed? ๐‘ช ๐Ÿ ๐‘ช ๐Ÿ ๐‘ช ๐Ÿ‘ ๐’•๐’†๐’” ๐’• ๐Ÿ ๐Ÿ ๐Ÿ ๐ŸŽ ๐Ÿ ๐ŸŽ ๐Ÿ ๐Ÿ ๐ŸŽ ๐ŸŽ ๐Ÿ ๐Ÿ ๐ŸŽ ๐’•๐’†๐’” ๐’• ๐Ÿ ๐’•๐’†๐’” ๐’• ๐Ÿ‘ Think of the possible diagnoses and their ranking

13 Research Componentsโ€™ State - Intuition
</> Assume each component has 2 possible arguments: test failed? ๐‘ช ๐Ÿ ๐‘ช ๐Ÿ ๐‘ช ๐Ÿ‘ ๐Ÿ, ๐Ÿ, ๐ŸŽ, ๐Ÿ, ๐ŸŽ, ๐Ÿ, ๐Ÿ, ๐ŸŽ, ๐ŸŽ, ๐Ÿ ๐Ÿ ๐ŸŽ ๐’•๐’†๐’” ๐’• ๐Ÿ ๐’•๐’†๐’” ๐’• ๐Ÿ ๐’•๐’†๐’” ๐’• ๐Ÿ‘ Think of the possible diagnoses and their ranking And now?

14 Research Componentsโ€™ State
</> State-Oriented Ranker Diagnoses ๐ถ 2 , ๐ถ 3 ๐ถ 4 , ๐ถ 8 , ๐ถ 12 โ€ฆ ๐ถ 2 ๐ถ 2 ๐ถ 2 , ๐ถ 3 โ€ฆ ๐ถ 4 , ๐ถ 8 , ๐ถ 12 Ordered Diagnoses Test trace + Componentsโ€™ State Observation = Likelihood for each diagnosis given the observation rui

15 Research High Level Methodology
SYNTHETIC MODEL Sample a Project Model Componentsโ€™ Behavior Invoke Diagnosis Algorithm Create State-Oriented Input

16 Granularity of Atomic Components Statements Blocks Methods Modules โ€ฆ
Research Granularity Granularity of Atomic Components Statements Blocks Methods Modules โ€ฆ Chosen for this discussion

17 Research Componentโ€™s State
</> Test i What is a methodโ€™s state? โ€ฆ โ€ฆ Component j ๐’๐’–๐’•๐’‘๐’–๐’• โ€ฆ Function Foo(self invoker, boolean a, Object o):

18 Research Enriching SFL Input
</> ๐Ÿ ๐Ÿ ๐ŸŽ ๐Ÿ ๐ŸŽ ๐Ÿ ๐Ÿ ๐ŸŽ ๐ŸŽ ๐Ÿ ๐Ÿ ๐ŸŽ ๐’” ๐’Š๐’‹ = sampled state of component ๐‘— in test ๐‘– ๐Ÿ, ๐‘บ ๐Ÿ๐Ÿ ๐Ÿ, ๐‘บ ๐Ÿ๐Ÿ ๐ŸŽ, ๐‘บ ๐Ÿ‘๐Ÿ ๐Ÿ, ๐‘บ ๐Ÿ๐Ÿ ๐ŸŽ, ๐‘บ ๐Ÿ๐Ÿ ๐Ÿ, ๐‘บ ๐Ÿ‘๐Ÿ ๐Ÿ, ๐‘บ ๐Ÿ‘๐Ÿ ๐ŸŽ, ๐‘บ ๐Ÿ‘๐Ÿ ๐ŸŽ, ๐‘บ ๐Ÿ‘๐Ÿ‘ ๐Ÿ ๐Ÿ ๐ŸŽ ๐‘บ ๐Ÿ๐Ÿ Enrich with componentsโ€™ states ๐‘บ ๐Ÿ๐Ÿ ๐‘บ ๐Ÿ๐Ÿ‘ ๐‘บ ๐Ÿ๐Ÿ ๐‘บ ๐Ÿ๐Ÿ ๐‘บ ๐Ÿ๐Ÿ‘ ๐‘บ ๐Ÿ‘๐Ÿ ๐‘บ ๐Ÿ‘๐Ÿ ๐‘บ ๐Ÿ‘๐Ÿ‘

19 Research Method Method Observe the system over time and sample componentsโ€™ state Learn states that correlate to failures Prioritize diagnoses with a stronger correlation to test failures

20 Research Learning Componentsโ€™ Behavior
Train Set for Method foo() Sample from ๐‘ช ๐’‹ in ๐’•๐’†๐’” ๐’• ๐’Š Self Arg1 ArgN Output/ Exception Failure โ€ฆ 21312 0.756 1 5 23423 0.223 Self Arg1 ArgN Output/ Exception 21312 0.756 1 5 ๐‘† ๐‘–๐‘— ๐‘ ๐‘–๐‘— Correlation with failures 0.82 State ij -> ti fails? Add bij

21 Research Ranking Policy
The Goodness Function Ranking the diagnosis candidates is done according to this policy: ๐= ๐‘ช๐’‹โˆˆ๐Ž ๐’‚ ๐’Š๐’‹ =๐Ÿ ๐Ÿโˆ’ ๐’ƒ ๐’Š๐’‹ ๐’Š๐’‡ ๐’† ๐’Š =๐ŸŽ ๐Ÿโˆ’ ๐‘ช๐’‹โˆˆ๐Ž ๐’‚ ๐’Š๐’‹ =๐Ÿ ๐Ÿโˆ’๐’ƒ ๐’Š๐’‹ ๐’Š๐’‡ ๐’† ๐’Š =๐Ÿ

22 Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results
3. Experiment Experiment Setup, Evaluated Algorithms, Evaluation Metrics & Results

23 4 real world open source Java projects
Experiment Setup [Elmishali, 2015] 4 real world open source Java projects Known bugs (using Issue Trackers) Generating 536 instances (134 per project) Project Tests Methods Bug Reported Bugs Fixed Orient DB 790 19,207 4,625 2,459 Eclipse CDT 3,990 66,982 17,713 9,091 Apache Ant 5,190 10,830 5,890 1,176 Apache POI 2,346 21,475 3,361 1,408 elmishali

24 Experiment Evaluated Algorithms
Our State-Augmented diagnoser compared against: BARINEL (Abreu et al.) Data-Augmented variant of BARINEL (Elmishali et al.)

25 Experiment Synthesizing Behavior
Using a synthetic behavior model to control the modelโ€™s accuracy Generated Model Software System Diagnosis Spectrum-based Algorithm

26 Experiment Synthesizing Behavior
Using a synthetic behavior model to control the modelโ€™s accuracy Use True Diagnosis (Ground Truth) Generated Model Software System Diagnosis Spectrum-based Algorithm

27 Experiment Synthesizing Behavior
Using a synthetic behavior model to control the modelโ€™s accuracy Use True Diagnosis (Ground Truth) Synthetic Noise Generated Model Software System Diagnosis Spectrum-based Algorithm

28 Experiment Synthesizing Behavior
Example Given: Ground Truth = { ๐‘ช ๐Ÿ } Synthetic Error: ๐ŸŽ ๐ŸŽ.๐Ÿ ๐‘ช ๐Ÿ ๐‘ช ๐Ÿ ๐‘ช ๐Ÿ‘ ๐’•๐’†๐’” ๐’• ๐Ÿ ๐Ÿ, ๐Ÿ, ๐ŸŽ, ๐Ÿ, ๐ŸŽ, ๐Ÿ, ๐Ÿ, ๐ŸŽ, ๐ŸŽ, ๐Ÿ ๐Ÿ ๐ŸŽ ๐Ÿ ๐ŸŽ.๐Ÿ— ๐ŸŽ.๐Ÿ ๐ŸŽ ๐’•๐’†๐’” ๐’• ๐Ÿ ๐Ÿ ๐ŸŽ.๐Ÿ— ๐ŸŽ.๐Ÿ ๐ŸŽ ๐ŸŽ ๐ŸŽ ๐’•๐’†๐’” ๐’• ๐Ÿ‘

29 Experiment Evaluation Metric
How to measure a diagnosisโ€™ quality? We used 3 known metrics: Weighted Average Precision Weighted Average Recall Health State Wasted Effort

30 Experiment Overview Results
Precision Recall With 0.15 synthetic error - similar results to the DA diagnoser With 0.2 synthetic error - signi๏ฌcantly better results than Barinel

31 Health State Wasted Effort
Experiment Results Health State Wasted Effort With 0.3 synthetic error rate - superior results over both diagnosers

32 Experiment Conclusions
Even with 30% error, this technique can provide a significant improvement in candidate ranking

33 Challenges & Future Steps
4. Roadmap Challenges & Future Steps

34 Roadmap Challenges Dealing with small data-sets (model per component): Live systems / test generation Diagnosing on a higher level (e.g. class) Learning states with imbalanced data-sets (only few faults) abnormal states rather than โ€œcorrelative to faultsโ€

35 Roadmap Future Steps Instrument real software for a non-synthetic behavior approximation Consider more variations for utilizations of the learned behavior Combine this work with orthogonal diagnosers

36 THANKS! Any questions?


Download ppt "Learning Software Behavior for Automated Diagnosis"

Similar presentations


Ads by Google