Download presentation
Presentation is loading. Please wait.
Published byPearl Mitchell Modified over 9 years ago
1
University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams
2
University of Maryland Problem How much do you know about your 10 year old code base? –What types of bugs have been most common? Implicit rules build up over time –What do you do with a return value from a function? –Didn’t someone rewrite the matrix objects? how do you apply a transformation to an image now? Failure understand implicit rules leads to bugs –32% of bugs detected during maintenance 1 [1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02
3
University of Maryland Source Code Change History We can discover important properties of the code by looking at code changes –every change is committed –changes highlight misunderstood code –changes highlight new code Studying each commit gives fine-grain knowledge –how quickly does a property emerge? –how fast is a property adopted? –how often is it used later?
4
University of Maryland Applications Bug finding –what types of bugs have been fixed in the past? –what functions were involved? –Return Value Check Bug Finder Code writing –how do we use that API? –how do we access that data structure? –Function Usage Pattern Miner open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f)
5
University of Maryland Return Value Check Bug Returning error code and valid data from a function is a common C idiom int foo(){ … if( error ){ return error_code; } … return data; } … value = foo(); newPosition + = value; // ??? –the return value should be checked before being used –lint checks for this error This type of bug pattern has a high false positive rate –no error value returned –no useful return value Build a bug checker –improve its results with data from CVS
6
University of Maryland Goal Which are most likely true errors –where has the source code been changed to add such a check? –look at each revision of each file in CVS –flag a function as involved in a return value check in the CVS repository Produce a ranking of the errors –group warnings by called function –rank functions that most likely need their return value checked higher value = foo(); newPosition + = value; // ??? value = foo(); if( value != Error) // Check newPosition + = value; CVS commit
7
University of Maryland HistoryAware Ranking Split functions into two groups –flagged with a likely bug fix in a commit –not flagged with a likely bug fix in a commit Rank by how often the function’s return value is checked in the latest version –current context Flagged with likely bug fix in CVS Not flagged with likely bug fix in CVS Ranked by current context data Ranked by current context data 0.99 0.10 0.99 0.51
8
University of Maryland Case Studies Does the HistoryAware ranking push likely bugs to the top? Compare HistoryAware Ranking to Naïve Ranking –current context Inspection criteria for warnings –functions flagged with a bug fix in a commit –functions with return value checked >50% in current context Apache web server 1,129 C source files 41,000 CVS commits Wine, OSS Windows API 3,099 C source files 70,000 CVS commits
9
University of Maryland Results - Apache WarningsLikely BugsFalse Positive Rate CVS Bug Fix flagged functions28410164% Non-CVS Bug Fix flagged functions2837075% Total56717170% Statistical Significance –Chi-square test finds the difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant
10
University of Maryland Results - Wine WarningsLikely BugsFalse Positive Rate CVS Bug Fix flagged functions77826067% Non-CVS Bug Fix flagged functions153728581% Total231554576% Statistical Significance –Chi-square test finds the difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant
11
University of Maryland Function Usage Pattern Miner System specific rules that source code must follow Function Usage Pattern –how functions are invoked with respect to each other in the source code Find new instances of patterns added to the source code mdi = HeapAlloc(GetProcessHeap()); if (!mdi) HeapFree(GetProcessHeap(), 0, cs); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); Called After Conditionally Called After
12
University of Maryland Our Tool Analyze each revision of each file –record instances of the function usage patterns Find new instances of the patterns –instances of a pattern in a revision of a file where that instance was not found in the revision immediately prior –per file, not per function
13
University of Maryland Filtering Lots of instances identified in the Wine software repository –50 million Preliminary filtering heuristic –only look at pairs of functions that are separated by no more than 10 source lines minimal control flow information computed –many APIs contain functions that are called in quick succession –error handling code is close to the error producing function
14
University of Maryland Transitive Patterns called after may be a transitive pattern –only a binary pattern –allow larger patterns to be built Patterns Identified 1 2 3 4 5 6 –may need to add more context information SelectObjectcalled afterBeginPaint SetTextColorcalled afterSelectObject TextOutAcalled afterSetTextColor DeleteObjectcalled afterTextOutA EndPaintcalled afterDeleteObject
15
University of Maryland Preliminary Case Study Mined Wine CVS repository –2,175 unique patterns added to the code 10 or more times –65 unique patterns added 100 or more times Different categories of function pairs –Debug functionality –Heap management –Paired functionality –Error Handling wine_tsx11_lock(); XInternAtoms(thr_dis(), names, cnt, 0, atoms ); wine_tsx11_unlock(); if (RegOpenKeyA(HKEY, name, &key)) { TRACE(message); RegCloseKey(key); SetLastError(NOT_FOUND);
16
University of Maryland Called After Pattern Category New Instances > 9999 - 2524 - 10 Debug1780278 Heap1416 GUI322271 Paired Functionality 0839 Error Handling 0930 1,253 unique patterns added 10 or more times wndClass.hCursor = LoadCursorA (0, (LPSTR)IDC_ARROW); RegisterClassA (&wndClass); Obvious patterns –serves to validate our results Surprising patterns –point to interesting relationships between functions RtlDeleteCriticalSection(&det->waiters_count_lock); … HeapFree(GetProcessHeap(), 0, det);
17
University of Maryland Conditionally Called After 922 unique patterns added 10 or more times Category New Instances > 9999 - 2524 - 10 Debug1495341 Heap7811 Paired Functionality 0626 Error Handling 0334 if (!(hModule = LoadLibraryExA(fileName, 0, LLDF))) WINE_ERR("LoadLibraryExA (%s) failed, %ld\n", fileName, GetLastError()); Error handling code –conditionally report error –which functions need errors handled Debug code –conditionally call a debug function
18
University of Maryland
19
RtlHeapFree Called After RtlHeapAlloc Value: 8 dlls/kernel/heap.c dlls/ntdll/loader.c
20
University of Maryland Future Work Apply our tool to more projects Track removed usage patterns Better filtering heuristic –control flow based –data flow based How do we use the patterns we find? –documentation –feed patterns to static source code checkers to find violations hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps );
21
University of Maryland Demo Demo of the visualization tool tomorrow
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.