Download presentation
Presentation is loading. Please wait.
Published byBrenda Farmer Modified over 9 years ago
1
Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan
2
API Preconditions 2 Constraints on the receiver and parameters that must hold right before calling the API Constraints on the receiver and parameters that must hold right before calling the API java.lang.String.substring(int start, int end) start >= 0 start <= end end <= length() start >= 0 start <= end end <= length()
3
API Preconditions 3 Must hold to guarantee the method behaves as expected Otherwise Bug
4
while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 4
5
while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 5 StringIndexOutOfBoundsException when start > 1
6
while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 6 StringIndexOutOfBoundsException when start > 1
7
while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 7
8
while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } macro.substring(start, start + 1) A precondition-related bug fix in project MSS Code Factory 8 fix bug
9
Study on the precondition-related bug fixing Data collection – SourceForge projects: 3,413 – Revisions: ~2M – Fixing revisions: ~370,000 Method – Analyzing code changes in each revision – Using heuristics to identify candidate fixing changes that added precondition(s) – Verifying candidates manually 9
10
Study on the precondition-related bug fixing Result – Candidates: 3,130 (0.85%) fixing revisions 4,399 call sites – Manually verify a sample of 100 call sites 80 are actually related to missing preconditions 10
11
Use of Preconditions Writing code conforming to the specifications Automated program verification – Runtime assertion checking – Extended static checking Bug detection Automatic test case generation 11
12
Challenges Manually specifying the specifications is time-consuming Not many APIs are released with specified specifications 12 Mining Specifications Automatically
13
13 Prior Work Focuses on single projects This Work Uses consensus across large number of projects to separate API-specific preconditions (wheat) from project-specific constraints (chaff)
14
Precondition Mining – Key Ideas 14 Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff
15
15 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff
16
16 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff
17
17 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff
18
api() C1 C3 C2 api() C1 C3 api() C1 C3 C2 18 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff
19
19 Preconditions can be mined from guard conditions at the call sites of the code using the APIs Client code of API String.substring(int,int) in project SeMoA at revision 1929 completePath_.substring(servletPathStart, extraPathStart) servletPathStart >= 0 extraPathStart >= 0 servletPathStart <= completePath_.length() extraPathStart <= completePath_.length() servletPathStart <= extraPathStart servletPathStart >= 0 extraPathStart >= 0 servletPathStart <= completePath_.length() extraPathStart <= completePath_.length() servletPathStart <= extraPathStart
20
20 Project-specific guard conditions will be filtered completePath_.substring(servletPathStart, extraPathStart) completePath_.charAt(servletPathStart) == ‘/’ completePath_.charAt(extraPathStart) == ‘/’
21
Consensus-based Precondition Mining 21 Client method M 1 Client method M 1 Conditions 0 <= start start <= end end <= length contains(‘@’) 0 <= start start <= end end <= length contains(‘@’) Build CFG Extract and Normalize Conditions Extract and Normalize Conditions Infer 0 <= start start <= end end <= length 0 <= start start <= end end <= length Client method M N Client method M N 0 < start start <= end end <= length ends(‘\n’) 0 < start start <= end end <= length ends(‘\n’)... Client method M i Client method M i... Preconditions 0 = start start <= end end <= length starts(‘/’) 0 = start start <= end end <= length starts(‘/’) api(...) Build CFG Extract and Normalize Conditions Extract and Normalize Conditions Build CFG Extract and Normalize Conditions Extract and Normalize Conditions Filter and Rank Filter and Rank Conditions 0 <= start start <= end end <= length contains(‘@’) 0 <= start start <= end end <= length contains(‘@’) api(...)
22
Precondition Mining 22 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false client method class String... substring (int, int)... class String... substring (int, int)... Build CFG
23
Precondition Mining 23 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG control-dependent
24
Precondition Mining 24 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG
25
Precondition Mining 25 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG
26
Precondition Mining 26 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG Abstract substring (int, int): {arg0 <= arg1}
27
Precondition Mining 27 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0
28
arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Precondition Mining 28 Normalize arg0 < arg1
29
Precondition Mining 29 arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize
30
Precondition Mining 30 Infer arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize arg0 <= arg1
31
Precondition Mining 31 confidence > Filter and Rank arg0 <= arg1 Infer arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize arg0 <= arg1
32
Precondition Mining 32 confidence > confidence < Filter and Rank arg0 <= 12 arg0 <= arg1 Infer arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize arg0 <= arg1 arg0 <= 12
33
Tool Implementation Eclipse plugin 33
34
Datasets 34 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210
35
Datasets 35 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210 Almost 120 millions lines of source code
36
Datasets 36 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210 Not all JDK APIs are used in SourceForge and Apache
37
Datasets 37 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210 More than 20% method calls are to JDK APIs
38
Evaluation – Accuracy 38 Data collection Building ground-truth Extracting preconditions from published formal specifications for JDK APIs on JML website Building ground-truth Extracting preconditions from published formal specifications for JDK APIs on JML website /*@ public normal_behavior @ requires 0 <= beginIndex @ && beginIndex <= endIndex @ && endIndex <= length(); @ … /*@ public behavior @ … @ signals (NoSuchElementException) isEmpty(); @*/
39
Building ground-truth 39 Number of methods: 797 Number of preconditions: 1155
40
Evaluation – Accuracy 40 Data collection Metrics Precision Recall Metrics Precision Recall
41
Running Time for Mining Preconditions of 797 JDK APIs 41 SLOCsClient methodsTime SourceForge92M4.7M17h35m Apache25M1.2M34m
42
Types of Incorrectly-mined Preconditions Type 1. The mined preconditions are stronger than specified – java.util.List.add(Object obj): obj != null 42 DatasetTotalStrongerSpecificAnalysis Error SourceForge173118532 Apache187121651 Both195129660 Type 2. The mined preconditions are project-specific, but common – java.lang.Math.min(double a, double b): a > 0, b > 0 Type 3. The mined preconditions are incorrect due to error in analysis – java.lang.StringBuffer.ensureCapacity(int capacity): capacity <= 0 Developers sometimes check stronger preconditions than specified
43
Types of Missing Preconditions 43 PrivateNo callNo occurLow confidence SF4% 9%3% Apache5% 12%3% Both5%2%10%4% Preconditions involve private element(s) of classes Type 1. Private Methods are never called Type 2. No call Preconditions are never checked Type 3. No occur Preconditions are checked with low confidence Type 4. Low frequency
44
Accuracy over Preconditions 44 PrecisionRecall SourceForge84%79% Apache82%75% Both83%80%
45
Newly Found Preconditions 45 5 preconditions are newly found for the JDK API methods that has already had JML specifications MethodPrecondition String. getChars(int,int,char[],int)arg3 >= 0 StringBuffer.append(char[])arg0 != null BitSet.flip(int, int)arg0 <= arg1 BitSet.set(int, int)arg0 <= arg1 BitSet.set(int, int, boolean)arg0 <= arg1
46
Accuracy by size 46 SourceForgeApache
47
Evaluation – Usefulness Suggesting preconditions for writing formal specification Web-based survey 47
48
Suggesting Preconditions for Writing Formal Specifications 48 ClassMethodSuggestAccept StringBufferdelete(int,int)3Y replace(int,int,String)2Y* setLength(int)1Y subSequence(int,int)3Y substring(int,int)3Y LinkedListadd(int,Object)2Y addAll(int,Collection)3Y get(int)2Y listIterator(int)2Y remove(int)2Y set(int,Object)2Y 2 classes11 methods25 miss
49
Conclusions Mining API preconditions from large code corpus – 120 million SLOCs on SourceForge and Apache Tool implementation: Eclipse plugin High accuracy – Recall: 75–80% and Precision: 82–84% Useful for helping write specifications – All suggestions are accepted by specification writer 49
50
Contact Hoan Nguyen hoan@iastate.edu 50
51
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 51 API method
52
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 52 Documentation links
53
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 53 Mined preconditions
54
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 54 Rating on correctness
55
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 55 Rating on correctness Rating on usefulness
56
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 56
57
Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 57
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.