Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan.

Similar presentations


Presentation on theme: "Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan."— Presentation transcript:

1 Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan

2 API Preconditions 2 Constraints on the receiver and parameters that must hold right before calling the API Constraints on the receiver and parameters that must hold right before calling the API java.lang.String.substring(int start, int end) start >= 0 start <= end end <= length() start >= 0 start <= end end <= length()

3 API Preconditions 3 Must hold to guarantee the method behaves as expected Otherwise  Bug

4 while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 4

5 while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 5 StringIndexOutOfBoundsException when start > 1

6 while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 6 StringIndexOutOfBoundsException when start > 1

7 while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } A bug with String.substring(int, int) in project MSS Code Factory 7

8 while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } while (...) { start++; if (start >= lenMacro) break; ch = macro.substring(start, 1); } macro.substring(start, start + 1) A precondition-related bug fix in project MSS Code Factory 8 fix bug

9 Study on the precondition-related bug fixing Data collection – SourceForge projects: 3,413 – Revisions: ~2M – Fixing revisions: ~370,000 Method – Analyzing code changes in each revision – Using heuristics to identify candidate fixing changes that added precondition(s) – Verifying candidates manually 9

10 Study on the precondition-related bug fixing Result – Candidates: 3,130 (0.85%) fixing revisions 4,399 call sites – Manually verify a sample of 100 call sites 80 are actually related to missing preconditions 10

11 Use of Preconditions Writing code conforming to the specifications Automated program verification – Runtime assertion checking – Extended static checking Bug detection Automatic test case generation 11

12 Challenges Manually specifying the specifications is time-consuming Not many APIs are released with specified specifications 12 Mining Specifications Automatically

13 13 Prior Work Focuses on single projects This Work Uses consensus across large number of projects to separate API-specific preconditions (wheat) from project-specific constraints (chaff)

14 Precondition Mining – Key Ideas 14 Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff

15 15 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff

16 16 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff

17 17 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff

18 api() C1 C3 C2 api() C1 C3 api() C1 C3 C2 18 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 api() C1 C3 C2 Key Ideas Preconditions can be mined from guard conditions at the call sites of the code using the APIs Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff

19 19 Preconditions can be mined from guard conditions at the call sites of the code using the APIs Client code of API String.substring(int,int) in project SeMoA at revision 1929 completePath_.substring(servletPathStart, extraPathStart) servletPathStart >= 0 extraPathStart >= 0 servletPathStart <= completePath_.length() extraPathStart <= completePath_.length() servletPathStart <= extraPathStart servletPathStart >= 0 extraPathStart >= 0 servletPathStart <= completePath_.length() extraPathStart <= completePath_.length() servletPathStart <= extraPathStart

20 20 Project-specific guard conditions will be filtered completePath_.substring(servletPathStart, extraPathStart) completePath_.charAt(servletPathStart) == ‘/’ completePath_.charAt(extraPathStart) == ‘/’

21 Consensus-based Precondition Mining 21 Client method M 1 Client method M 1 Conditions 0 <= start start <= end end <= length contains(‘@’) 0 <= start start <= end end <= length contains(‘@’) Build CFG Extract and Normalize Conditions Extract and Normalize Conditions Infer 0 <= start start <= end end <= length 0 <= start start <= end end <= length Client method M N Client method M N 0 < start start <= end end <= length ends(‘\n’) 0 < start start <= end end <= length ends(‘\n’)... Client method M i Client method M i... Preconditions 0 = start start <= end end <= length starts(‘/’) 0 = start start <= end end <= length starts(‘/’) api(...) Build CFG Extract and Normalize Conditions Extract and Normalize Conditions Build CFG Extract and Normalize Conditions Extract and Normalize Conditions Filter and Rank Filter and Rank Conditions 0 <= start start <= end end <= length contains(‘@’) 0 <= start start <= end end <= length contains(‘@’) api(...)

22 Precondition Mining 22 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false client method class String... substring (int, int)... class String... substring (int, int)... Build CFG

23 Precondition Mining 23 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG control-dependent

24 Precondition Mining 24 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG

25 Precondition Mining 25 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG

26 Precondition Mining 26 Entry Exit string.substring (start, end) c1 start > end Exit do_true do_false true false true false substring (int, int): {start <= end} client method Build CFG Abstract substring (int, int): {arg0 <= arg1}

27 Precondition Mining 27 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0

28 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Precondition Mining 28 Normalize arg0 < arg1

29 Precondition Mining 29 arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize

30 Precondition Mining 30 Infer arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize arg0 <= arg1

31 Precondition Mining 31 confidence >  Filter and Rank arg0 <= arg1 Infer arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize arg0 <= arg1

32 Precondition Mining 32 confidence >  confidence <  Filter and Rank arg0 <= 12 arg0 <= arg1 Infer arg0 – arg1 == 0 arg0 == arg1 arg0 < arg1 arg1 > arg0 arg0 – arg1 < 0 arg1 – arg0 > 0 Normalize arg0 <= arg1 arg0 <= 12

33 Tool Implementation Eclipse plugin 33

34 Datasets 34 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210

35 Datasets 35 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210 Almost 120 millions lines of source code

36 Datasets 36 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210 Not all JDK APIs are used in SourceForge and Apache

37 Datasets 37 SourceForgeApache Projects3,413146 Total source files497,453132,951 Total classes600,274173,120 Total methods4,735,1511,243,911 Total SLOCs92,495,41025,117,837 Total used JDK classes806 (63%)918 (72%) Total used JDK methods7,592 (63%)6,109 (55%) Total method calls22,308,2515,544,437 Total JDK method calls5,588,4871,271,210 More than 20% method calls are to JDK APIs

38 Evaluation – Accuracy 38 Data collection Building ground-truth Extracting preconditions from published formal specifications for JDK APIs on JML website Building ground-truth Extracting preconditions from published formal specifications for JDK APIs on JML website /*@ public normal_behavior @ requires 0 <= beginIndex @ && beginIndex <= endIndex @ && endIndex <= length(); @ … /*@ public behavior @ … @ signals (NoSuchElementException) isEmpty(); @*/

39 Building ground-truth 39 Number of methods: 797 Number of preconditions: 1155

40 Evaluation – Accuracy 40 Data collection Metrics Precision Recall Metrics Precision Recall

41 Running Time for Mining Preconditions of 797 JDK APIs 41 SLOCsClient methodsTime SourceForge92M4.7M17h35m Apache25M1.2M34m

42 Types of Incorrectly-mined Preconditions Type 1. The mined preconditions are stronger than specified – java.util.List.add(Object obj): obj != null 42 DatasetTotalStrongerSpecificAnalysis Error SourceForge173118532 Apache187121651 Both195129660 Type 2. The mined preconditions are project-specific, but common – java.lang.Math.min(double a, double b): a > 0, b > 0 Type 3. The mined preconditions are incorrect due to error in analysis – java.lang.StringBuffer.ensureCapacity(int capacity): capacity <= 0 Developers sometimes check stronger preconditions than specified

43 Types of Missing Preconditions 43 PrivateNo callNo occurLow confidence SF4% 9%3% Apache5% 12%3% Both5%2%10%4% Preconditions involve private element(s) of classes Type 1. Private Methods are never called Type 2. No call Preconditions are never checked Type 3. No occur Preconditions are checked with low confidence Type 4. Low frequency

44 Accuracy over Preconditions 44 PrecisionRecall SourceForge84%79% Apache82%75% Both83%80%

45 Newly Found Preconditions 45 5 preconditions are newly found for the JDK API methods that has already had JML specifications MethodPrecondition String. getChars(int,int,char[],int)arg3 >= 0 StringBuffer.append(char[])arg0 != null BitSet.flip(int, int)arg0 <= arg1 BitSet.set(int, int)arg0 <= arg1 BitSet.set(int, int, boolean)arg0 <= arg1

46 Accuracy by size 46 SourceForgeApache

47 Evaluation – Usefulness Suggesting preconditions for writing formal specification Web-based survey 47

48 Suggesting Preconditions for Writing Formal Specifications 48 ClassMethodSuggestAccept StringBufferdelete(int,int)3Y replace(int,int,String)2Y* setLength(int)1Y subSequence(int,int)3Y substring(int,int)3Y LinkedListadd(int,Object)2Y addAll(int,Collection)3Y get(int)2Y listIterator(int)2Y remove(int)2Y set(int,Object)2Y 2 classes11 methods25 miss

49 Conclusions Mining API preconditions from large code corpus – 120 million SLOCs on SourceForge and Apache Tool implementation: Eclipse plugin High accuracy – Recall: 75–80% and Precision: 82–84% Useful for helping write specifications – All suggestions are accepted by specification writer 49

50 Contact Hoan Nguyen hoan@iastate.edu 50

51 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 51 API method

52 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 52 Documentation links

53 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 53 Mined preconditions

54 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 54 Rating on correctness

55 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 55 Rating on correctness Rating on usefulness

56 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 56

57 Web-based Survey http://boa.cs.iastate.edu/jml/ http://boa.cs.iastate.edu/jml/ 57


Download ppt "Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan."

Similar presentations


Ads by Google