Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech A Characterization Study of Repeated Bug Fixes Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
Motivation Prior studies showed that developers apply repetitive code changes to multiple locations. Nguyen et al. found that 17-45% bug fixes were repeated [1]. Tools were built to recommend similar bug fixes. Clever detects and tracks code clones [2]. LASE suggests similar edits based on program transformation learned from two or more similar edit examples[3].
Problem Statement Some fundamental research questions are still unexplored. Q1: What is the frequency of repeated bug fixes? Q2: Where are repeated fixes usually applied? Q3: What are the common bugs and fix patterns of repeated fixes?
Study Findings 48-70% of repeated fixes occurred 2 times Code change suggestion tools based on single examples are more helpful 73-100% of repeated fixes spanned at most 3 commits Coding assistance tools should provide edit suggestions as early as possible 39% of repeated fixes added or deleted whole if- structures Automatic program repair tools should focus more on if- statements
Outline Motivation & Related Work Study Approach Experiments
Exemplar Patch diff --git a/dom/CompilationUnit.java b/dom/CompilationUnit.java @@ -484,8 +484,8 @@ * @since 3.0 */ public int getStartPosition(ASTNode node) { - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node !=null) { + return node.getStartPosition(); } else { return this.commentMapper.getStartPosition(node); }
Exemplar Hunk diff --git a/dom/CompilationUnit.java b/dom/CompilationUnit.java @@ -484,8 +484,8 @@ * @since 3.0 */ public int getStartPosition(ASTNode node) { - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node!=null) { + return node.getStartPosition(); } else { return this.commentMapper.getStartPosition(node); } Hunk
Exemplar Hunk Code changes diff --git a/dom/CompilationUnit.java b/dom/CompilationUnit.java @@ -484,8 +484,8 @@ * @since 3.0 */ public int getStartPosition(ASTNode node) { - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node!=null) { + return node.getStartPosition(); } else { return this.commentMapper.getStartPosition(node); } Code changes Hunk
Exemplar Hunk Context lines Code changes Context lines diff --git a/dom/CompilationUnit.java b/dom/CompilationUnit.java @@ -484,8 +484,8 @@ * @since 3.0 */ public int getStartPosition(ASTNode node) { - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node!=null) { + return node.getStartPosition(); } else { return this.commentMapper.getStartPosition(node); } Context lines Code changes Hunk Context lines
Exemplar Fix diff --git a/dom/CompilationUnit.java b/dom/CompilationUnit.java @@ -484,8 +484,8 @@ * @since 3.0 */ public int getStartPosition(ASTNode node) { - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node!=null) { + return node.getStartPosition(); } else { return this.commentMapper.getStartPosition(node); } Fix Hunk
Approach Overview Bug Fix Collection Repeated Bug Fix Detection
Bug Fix Collection Identify Fixing Patches Extract Bug Fixes Retrieve relevant commits using bug IDs in Bugzilla Extract Bug Fixes Exclude less important hunks e.g. Hunks with changes to documentations Extract fixes applied to methods Use AST Parsers to identify methods’ code ranges
Repeated Bug Fix Detection Format Bug Fixes Identify Clone Regions with CCFinder [4] Match Edit Operation Sequences
Two Exemplar Fixes Fix1 Fix2 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node !=null) { + return node.getStartPosition(); Fix2 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || astNode!= null){ + return astNode.getLength(); + else{ + return this.commentMapper.getLength(astNode);
Format Bug Fixes FormattedFix1 FormattedFix2 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node !=null) { + return node.getStartPosition(); FormattedFix2 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || astNode!= null){ + return astNode.getLength(); + else{ + return this.commentMapper.getLength(astNode);
Identify Clone Regions by CCFinder - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node !=null) { + return node.getStartPosition(); CloneRegion2 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || astNode!= null){ + return astNode.getLength(); + else{ + return this.commentMapper.getLength(astNode);
Match Edit Operation Sequences EditOperSeq1 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node !=null) { + return node.getStartPosition(); EditOperSeq2 - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || astNode!= null){ + return astNode.getLength(); + else{ + return this.commentMapper.getLength(astNode);
An Exemplar Repeated-fix Group Similar fixes (e.g. Fix1 and Fix2) are gathered into the same repeated-fix group. - if (this.commentMapper == null) { - return -1; + if (this.commentMapper == null || node !=null) { + return node.getStartPosition(); + if (this.commentMapper == null || astNode!= null){ + return astNode.getLength(); + else{ + return this.commentMapper.getLength(astNode); Fix1 Fix2
Outline Motivation & Related Work Study Approach Experiments
Data Sets Property Eclipse JDT Mozilla Firefox LibreOffice Resolved period of bugs 2005 2014-2015 2014 # of bugs 870 380 1,563 # of fixing patches 1,378 10,051 7,846 # fixes 16,289 3,451 33,057
Q1.D2. What’s the distribution of repeated-fix groups based on fix instance counts? For most bugs, repeated fixes did not occur many times. SYDIT [5] and LibSync [6] may be more helpful than LASE [3].
Q3. What are the common bugs and fix patterns of repeated fixes? Sample Repeated-fix Groups Randomly select 150 repeated-fix groups, with 50 groups from each project Manually Analyze Repeated-fix Groups Bug component: the main syntax component of a fix Fix pattern: the way to resolve the bug
Exemplar Repeated Fix 1. - if (aStart) 2. - currFrame = aStart->GetNextSibling(); 3. - else 4. + if (aStart) { 5. + if (aStart->GetNextSibling()) 6. + currFrame = aStart->GetNextSibling(); 7. + else if (aStart->GetParent()->GetContent() ->IsXUL(nsGkAtom::menugroup)) 8. + currFrame = aStart->GetParent()->GetNextSibling(); 9. + } 10.+ else
Similarity Relationship 1. - if (aStart) 2. - currFrame = aStart->GetNextSibling(); 3. - else 4. + if (aStart) { 5. + if (aStart->GetNextSibling()) 6. + currFrame = aStart->GetNextSibling(); 7. + else if (aStart->GetParent()->GetContent() ->IsXUL(nsGkAtom::menugroup)) 8. + currFrame = aStart->GetParent()->GetNextSibling(); 9. + } 10.+ else
Dependency Relationship 1. - if (aStart) 2. - currFrame = aStart->GetNextSibling(); 3. - else 4. + if (aStart) { 5. + if (aStart->GetNextSibling()) 6. + currFrame = aStart->GetNextSibling(); 7. + else if (aStart->GetParent()->GetContent() ->IsXUL(nsGkAtom::menugroup)) 8. + currFrame = aStart->GetParent()->GetNextSibling(); 9. + } 10.+ else Control depend on
Manual Analysis Bug Component: currFrame’s assignment in line 2 1. - if (aStart) 2. - currFrame = aStart->GetNextSibling(); 3. - else 4. + if (aStart) { 5. + if (aStart->GetNextSibling()) 6. + currFrame = aStart->GetNextSibling(); 7. + else if (aStart->GetParent()->GetContent() ->IsXUL(nsGkAtoms::menugroup)) 8. + currFrame = aStart->GetParent()->GetNextSibling(); 9. + } 10.+ else Bug Component: currFrame’s assignment in line 2 Fix Pattern: modify the value assigned to currFrame
Q3. What are the common bugs and fix patterns of repeated fixes? if-statement and if-condition were the most prevalent bug components. Program repair tools like Prophet [7] and Angelix [8] can be useful for if-condition correction We still need new tools to automate if-statement additions or deletions
Conclusions We studied the frequency, edit locations, and semantic meanings of repeated fixes. 48-70% of repeated fixes occurred 2 times. 73-100% of repeated fixes spanned at most 3 commits. 39% of repeated fixes added or deleted whole if- structures.
Thank you!
Q1.D3: What’s the distribution of repeated-fix groups based on patch counts? Tools should generate edit suggestions as early as possible. Integrating tools to IDE seems more promising than to VCS .
References [1] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. Al-Kofahi, and T. N. Nguyen. Recurring bug fixes in object-oriented programs. In ACM/IEEE International Conference on Software Engineering, pages 315–324, 2010. [2] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Clone-aware configuration management. In ASE, pages 123– 134, 2009. [3] N. Meng, M. Kim, and K. McKinley. LASE: Locating and applying systematic edits. In ICSE, page 10, 2013. [4] T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. TSE, pages 654–670, 2002. [5] N. Meng, M. Kim, and K. S. McKinley. Systematic editing: Generating program transformations from an example. In PLDI, pages 329–342, 2011. [6] H. A. Nguyen, T. T. Nguyen, G. Wilson, Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen. A graph- based approach to API usage adaptation. pages 302–321, 2010. [7] F. Long and M. Rinard. Automatic patch generation by learning correct code. SIGPLAN Not., 2016. [8] S. Mechtaev, J. Yi, and A. Roychoudhury. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In ACM/IEEE International Conference on Software Engineering, pages 691– 701, 2016.