Download presentation
Presentation is loading. Please wait.
Published byGabriel Ellis Modified over 9 years ago
1
Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach J. Ruthruff et al., University of Nebraska-Lincoln, NE U.S.A, Google Inc. ICSE 2008. 2015. 06. 02 박 종 화 akdwhd0921@gmail.com 컴퓨터 보안 및 운영체제 연구실
2
Computer Security & OS Lab. IndexIndex 2 Introduction Background Logistic regression models Case study Conclusions
3
Computer Security & OS Lab. IntroductionIntroduction 3 Static analysis tools detect software defects by analyzing a system without actually executing it. There are well-known two challgenge. One challenge involves the accuracy of reported warnings A second challenge receiving less attention is that warnings are not always acted on by developers even if they reveal true defects The core elements of our approach are statistical models They are built using screening, an incremental statistical process to quickly discard factors with low predictive power
4
Computer Security & OS Lab. BackgroundBackground 4 FindBugs at Google FindBugs is an open-source static analysis tool for Java programs The tool analyzes Java bytecode to issue reports for 487 bug patterns These patterns are organized into seven categories: Bad Practice, Correctness, Internationalization, Malicious Code Vulnerability, Multithreaded Correctness, Performance, and Dodgy At Google, we have deployed FindBugs using an enterprise-wide service model. We performed a cost/benefits analysis identifying this as a cost-effective approach for determining sufficiently interesting defects to report to developers.
5
Computer Security & OS Lab. BackgroundBackground 5 Logistic Regression Analysis Logistic regression analysis is a type of categorical data analysis for predicting dependent variable values that follow binomial distributions. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables, which are usually (but not necessarily) continuous, by estimating probabilities. -Wikipedia
6
Computer Security & OS Lab. Logistic regression models 6 We aim to build statistical models that classify incoming static analysis warnings to reduce the cost of this process. Logistic Regression Model Factors We selected 33 factors to incorporate into the experimental screening methodology for generating our required models.
7
Computer Security & OS Lab. Experimental Screening Process 7 Screening experiments are designed to quickly yet systematically narrow down large groups. To focus the direction of research Used to discover the most significant factors we consider a screening methodology with up to four stages that attempts to identify at least six predictive factors for a predictive model. Four stages to Ranges of 5%, 25%, 50%, 100% of the total warnings
8
Computer Security & OS Lab. Experimental Screening Process 8 The first stage of the screening methodology eliminate factors that appear to have little of the predictive power needed to build accurate models. In a second stage, Additional 20% of the static analysis warnings, bringing the total number of considered warnings to 25% The third stage of our screening methodology considers the next 25% of warnings, for a total of half of all warnings. Final stage The last 50% of the data
9
Computer Security & OS Lab. Building Models From Screening Factors 9 Model for Predicting False Positives Examining just 5% in the first stage Screening experiment eliminated 15 of the 33 factors Examining just 25% in the second stage Five factors were eliminated 5 of the 18 factors Examining just 50% in the Third stage Two factors were eliminated 2 of the 13 factors Examining just 50% in the Fourth stage Two factors were eliminated 2 of the 11 factors Values close to 0.0 correspond to false positive predictions, while values close to 1.0 correspond to true defects
10
Computer Security & OS Lab. Building Models From Screening Factors 10 Models for Actionable Warnings Our first model is built using only those warnings identified as true defects. ( 13 factors ) Our second model is designed to predict actionable defects from all warnings (i.e., both false positives and legitimate warnings). ( 15 factors )
11
Computer Security & OS Lab. Case study 11 The data set consists of 1,652 unique warning selected from a population of tens of thousands of warnings seen over a nine-month period The warnings in the data set were manually examined and classified as either false positives or true defects Screening model Classifying warnings that ere built from out screening methodology All-Data model To collect data for every factor, for every sampled warning BOW model Work of Bell et al. Ostrand et al. BOW+ model is added ‘bug pattern’ and ‘priority’
12
Computer Security & OS Lab. Results and Discussion 12
13
Computer Security & OS Lab. ConclusionsConclusions 13 The proposed screening approach for model building accomplishes this by quickly discarding metrics with low predictive power The screening-based models were able to accurately predict false positive warnings over 85% of the time on average, and actionable warnings over 70% of the time This work also indicates that regression models may be effective in settings involving static analysis warnings, and shows promise for future work in this area
14
Computer Security & OS Lab. ReferencesReferences 14 FindBugs. http://findbugs.sourceforge.net/. N. Ayewah, W. Pugh, J. D. Morgenthaler, J. Penix, and Y. Zhou. Evaluating static analysis defect warnings on production software. In Proc. 7thACM Workshop on Prog. Analysis for Softw. Tools and Eng., pages 168–179, 2007 en.wikipedia.org/wiki/.
15
Computer Security & OS Lab. 15 Thank You !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.