University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams.

Slides:



Advertisements
Similar presentations
Software engineering tools for web development Jim Briggs 1CASE.
Advertisements

Javascript Code Quality Check Tools Javascript Code Quality Check Tools JavaScript was originally intended to do small tasks in webpages, but now JavaScript.
Understanding and Detecting Real-World Performance Bugs
Delta Debugging and Model Checkers for fault localization
Chapter 4 Quality Assurance in Context
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Source Control in MATLAB A tool for tracking changes in software development projects. Stuart Nelis & Rachel Sheldon.
Concepts of Version Control A Technology-Independent View.
Coding concerns, are they real? Fadi Wedyan, Dalal Alrmuny May 10 th, 2007.
Applied Software Project Management Andrew Stellman & Jennifer Greene Applied Software Project Management Applied Software.
Low level CASE: Source Code Management. Source Code Management  Also known as Configuration Management  Source Code Managers are tools that: –Archive.
Source Code Management Or Configuration Management: How I learned to Stop Worrying and Hate My Co-workers Less.
/* iComment: Bugs or Bad Comments? */
Source Control Repositories for Enabling Team Working Svetlin Nakov Telerik Corporation
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
Prof. Aiken CS 169 Lecture 71 Version Control CS169 Lecture 7.
BIT 285: ( Web) Application Programming Lecture 07 : Tuesday, January 27, 2015 Git.
1 Software Maintenance and Evolution CSSE 575: Session 8, Part 2 Analyzing Software Repositories Steve Chenoweth Office Phone: (812) Cell: (937)
1 Topics for this Lecture Software maintenance in general Source control systems (intro to svn)
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
CS4723 Software Validation and Quality Assurance
Mining Function Usage Patterns to Find Bugs Chadd Williams.
University of Maryland Bug Driven Bug Finding Chadd Williams.
With Mercurial and Progress.   Introduction  What is version control ?  Why use version control ?  Centralised vs. Distributed  Why Mercurial ?
1 Lecture 19 Configuration Management Software Engineering.
1 Introductory Notes on the Git Source Control Management Ric Holt, 8 Oct 2009.
Software Construction and Evolution - CSSE 375 Software Documentation 2 Shawn & Steve Left – Ideally, online documents would let you comment and highlight,
Bug Localization with Machine Learning Techniques Wujie Zheng
1 SEG4912 University of Ottawa by Jason Kealey Software Engineering Capstone Project Tools and Technologies.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
ITEC 370 Lecture 16 Implementation. Review Questions? Design document on F, feedback tomorrow Midterm on F Implementation –Management (MMM) –Team roles.
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
CHAPTER TEN AUTHORING.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
Progress with migration to SVN Part3: How to work with g4svn and geant4tags tools. Geant4.
MSFC Avionics Department Flight Software Group CMM Level 2 Certified Automated Software Coding Standards System Development Team Assessment Team Luis Trevino.
Problem of the Day  Why are manhole covers round?
Security - Why Bother? Your projects in this class are not likely to be used for some critical infrastructure or real-world sensitive data. Why should.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Applying eXtensible Style Sheets (XSL) Ellen Pearlman Eileen Mullin Programming.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
1 Splint: A Static Memory Leakage tool Presented By: Krishna Balasubramanian.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
David Streader Computer Science Victoria University of Wellington Copyright: David Streader, Victoria University of Wellington Debugging COMP T1.
Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan.
® IBM Software Group © 2009 IBM Corporation Essentials of Modeling with the IBM Rational Software Architect, V7.5 Module 15: Traceability and Static Analysis.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Automatic code inspection.
Add New File or a Directory to a Project in the Repository.
Outline Announcements: –HW II due today! –HW III on web CVS.
Static Analysis Introduction Emerson Murphy-Hill.
BIT 285: ( Web) Application Programming Lecture 07 : Tuesday, January 27, 2015 Git.
Content Coverity Static Analysis Use cases of Coverity Examples
CS5220 Advanced Topics in Web Programming Version Control with Git
Information Systems and Network Engineering Laboratory II
Visual Studio Database Tools (aka SQL Server Data Tools)
CS5220 Advanced Topics in Web Programming Version Control with Git
CAE-SCRUB for Incorporating Static Analysis into Peer Reviews
Mining and Analyzing Data from Open Source Software Repository
CIS16 Application Development Programming with Visual Basic
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Types of CAATs Session 3.
How to stop Fortran programming problems at the source
Software visualization and analysis tool box
Git CS Fall 2018.
Introduction to Static Analyzer
CS 240 – Advanced Programming Concepts
How to debug a website using IE F12 tools
Presentation transcript:

University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams

University of Maryland Problem How much do you know about your 10 year old code base? –What types of bugs have been most common? Implicit rules build up over time –What do you do with a return value from a function? –Didn’t someone rewrite the matrix objects? how do you apply a transformation to an image now? Failure understand implicit rules leads to bugs –32% of bugs detected during maintenance 1 [1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02

University of Maryland Source Code Change History We can discover important properties of the code by looking at code changes –every change is committed –changes highlight misunderstood code –changes highlight new code Studying each commit gives fine-grain knowledge –how quickly does a property emerge? –how fast is a property adopted? –how often is it used later?

University of Maryland Applications Bug finding –what types of bugs have been fixed in the past? –what functions were involved? –Return Value Check Bug Finder Code writing –how do we use that API? –how do we access that data structure? –Function Usage Pattern Miner open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f)

University of Maryland Return Value Check Bug Returning error code and valid data from a function is a common C idiom int foo(){ … if( error ){ return error_code; } … return data; } … value = foo(); newPosition + = value; // ??? –the return value should be checked before being used –lint checks for this error This type of bug pattern has a high false positive rate –no error value returned –no useful return value Build a bug checker –improve its results with data from CVS

University of Maryland Goal Which are most likely true errors –where has the source code been changed to add such a check? –look at each revision of each file in CVS –flag a function as involved in a return value check in the CVS repository Produce a ranking of the errors –group warnings by called function –rank functions that most likely need their return value checked higher value = foo(); newPosition + = value; // ??? value = foo(); if( value != Error) // Check newPosition + = value; CVS commit

University of Maryland HistoryAware Ranking Split functions into two groups –flagged with a likely bug fix in a commit –not flagged with a likely bug fix in a commit Rank by how often the function’s return value is checked in the latest version –current context Flagged with likely bug fix in CVS Not flagged with likely bug fix in CVS Ranked by current context data Ranked by current context data

University of Maryland Case Studies Does the HistoryAware ranking push likely bugs to the top? Compare HistoryAware Ranking to Naïve Ranking –current context Inspection criteria for warnings –functions flagged with a bug fix in a commit –functions with return value checked >50% in current context Apache web server 1,129 C source files 41,000 CVS commits Wine, OSS Windows API 3,099 C source files 70,000 CVS commits

University of Maryland Results - Apache WarningsLikely BugsFalse Positive Rate CVS Bug Fix flagged functions % Non-CVS Bug Fix flagged functions % Total % Statistical Significance –Chi-square test finds the difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant

University of Maryland Results - Wine WarningsLikely BugsFalse Positive Rate CVS Bug Fix flagged functions % Non-CVS Bug Fix flagged functions % Total % Statistical Significance –Chi-square test finds the difference between the false positive rate of the CVS bug fix flagged functions and functions check > 50% in the current context to be significant

University of Maryland Function Usage Pattern Miner System specific rules that source code must follow Function Usage Pattern –how functions are invoked with respect to each other in the source code Find new instances of patterns added to the source code mdi = HeapAlloc(GetProcessHeap()); if (!mdi) HeapFree(GetProcessHeap(), 0, cs); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); Called After Conditionally Called After

University of Maryland Our Tool Analyze each revision of each file –record instances of the function usage patterns Find new instances of the patterns –instances of a pattern in a revision of a file where that instance was not found in the revision immediately prior –per file, not per function

University of Maryland Filtering Lots of instances identified in the Wine software repository –50 million Preliminary filtering heuristic –only look at pairs of functions that are separated by no more than 10 source lines minimal control flow information computed –many APIs contain functions that are called in quick succession –error handling code is close to the error producing function

University of Maryland Transitive Patterns called after may be a transitive pattern –only a binary pattern –allow larger patterns to be built Patterns Identified –may need to add more context information SelectObjectcalled afterBeginPaint SetTextColorcalled afterSelectObject TextOutAcalled afterSetTextColor DeleteObjectcalled afterTextOutA EndPaintcalled afterDeleteObject

University of Maryland Preliminary Case Study Mined Wine CVS repository –2,175 unique patterns added to the code 10 or more times –65 unique patterns added 100 or more times Different categories of function pairs –Debug functionality –Heap management –Paired functionality –Error Handling wine_tsx11_lock(); XInternAtoms(thr_dis(), names, cnt, 0, atoms ); wine_tsx11_unlock(); if (RegOpenKeyA(HKEY, name, &key)) { TRACE(message); RegCloseKey(key); SetLastError(NOT_FOUND);

University of Maryland Called After Pattern Category New Instances > Debug Heap1416 GUI Paired Functionality 0839 Error Handling ,253 unique patterns added 10 or more times wndClass.hCursor = LoadCursorA (0, (LPSTR)IDC_ARROW); RegisterClassA (&wndClass); Obvious patterns –serves to validate our results Surprising patterns –point to interesting relationships between functions RtlDeleteCriticalSection(&det->waiters_count_lock); … HeapFree(GetProcessHeap(), 0, det);

University of Maryland Conditionally Called After 922 unique patterns added 10 or more times Category New Instances > Debug Heap7811 Paired Functionality 0626 Error Handling 0334 if (!(hModule = LoadLibraryExA(fileName, 0, LLDF))) WINE_ERR("LoadLibraryExA (%s) failed, %ld\n", fileName, GetLastError()); Error handling code –conditionally report error –which functions need errors handled Debug code –conditionally call a debug function

University of Maryland

RtlHeapFree Called After RtlHeapAlloc Value: 8 dlls/kernel/heap.c dlls/ntdll/loader.c

University of Maryland Future Work Apply our tool to more projects Track removed usage patterns Better filtering heuristic –control flow based –data flow based How do we use the patterns we find? –documentation –feed patterns to static source code checkers to find violations hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps );

University of Maryland Demo Demo of the visualization tool tomorrow