Lase: Locating and Applying Systematic Edits by Learning from Examples Na Meng* Miryung Kim* Kathryn S. McKinley* + The University of Texas at Austin* Microsoft Research +
Motivating Scenario A old A new B old B new C old C new Pat needs to update database transaction code to prevent SQL injection attacks 2
Systematic Editing Similar but not identical changes to multiple contexts Manual, tedious, and error-prone Source transformation tools require describing edits in a formal language[CHP91, ER02, LR95] Bug fixing tools handle simple stylized code changes[WNGF09, JZDLL12, SMS13] Sydit does not find edit locations automatically when applying systematic edits[MKM11] 3
Lase Workflow D old D suggested LASE selects methods & suggests edits A old A new B old B new User selects examples 4 I old I suggested X old X suggested …
5 D new Syntactic program differencing A old Apply edit script Identify common edit Generalize identifier B old A new B new Extract context Find edit location Object next =; Object next =; Object next = v$; Approach Overview ✗ no match ✔ match D old C old Phase: I. Create edit script II. III.
Step 1. Syntactic Program Differencing 6 Input: m old, m new Output: Edit operations
7 Step 2: Identify Common Edit Longest Common Edit Operation Subsequence insert(Object next =…) insert(if(next instanceof MVAction) insert(((MVAction)next).update()) update(print(next.toString())) to … insert(Object next =…) insert(if(next instanceof MVAction) insert(((MVAction)next).update()) update(print(next.toString())) to … insert(Object next =…) update(print(next.getString())) to … insert(if(next instanceof MVAction) insert(((MVAction)next).update()) delete(System.out.println(…)) insert(Object next =…) update(print(next.getString())) to … insert(if(next instanceof MVAction) insert(((MVAction)next).update()) delete(System.out.println(…)) insert(Object next =…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) insert(Object next =…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) insert(Object next =…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) insert(Object next =…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) Edit script A Edit script B
8 Step 3: Generalize Identifier Keep the original type, method, and variable names if examples agree Abstract identifiers if examples disagree 8 Generalized name Name in mA Name in mB Variable Mapnext v$0eiter Method Mapnext Type MapObject Iterator Object next =; Object next =; Object next = v$;
Step 4: Extract Context 9 Object next =; Object next =; Iterator e = fActions.values().iterator(); … while(e.hasNext()) Iterator iter = getActions().values().iterator(); … while(iter.hasNext()) Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); … while(v$0.hasNext()) Generalized nameName in mA Name in mB Uncertain Mapu$0:FieldAccessOrMethodInvocationfActionsgetActions() Variable Mapv$0eiter Method Mapvalues iterator hasNext TypeMapIterator
Phase II. Find Edit Locations 10 D old Iterator e = fActions.values().iterator(); Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); Generalized nameName in mD Uncertain Mapu$0:FieldAccessOrMethodInvocationfActions Variable Mapv$0e Method Mapvalues iterator TypeMapIterator A old B old A new B new
Phase III. Applying Edit Script Customize general edit scripts – Identifier concretization – Edit position concretization Apply the customized edit scripts 11
12 Example 1: Comment[] getLeadingComments( ASTNode node) { - if (this.leadingComments != null) { + if (this.leadingPts >= 0) { - int[] range = (int[]) this.leadingComments.get(node); + int[] range = null; + for (int i = 0; range == null && i <= this.leadingPtr; i++) { + if (this.leadingNodes[i] == node) + range = this.leadingIndexes[i]; + } if (range != null) { … … return leadingComments; }} return null; } Example 2: Comment[] getTrailingComments(ASTNode node) { - if (this.trailingComments != null) { + if (this.trailingPts >= 0) { - int[] range = (int[]) this.trailingComments.get(node); + int[] range = null; + for (int i = 0; range == null && i <= this.trailingPtr; i++) { + if (this.trailingNodes[i] == node) + range = this.trailingIndexes[i]; + } if (range != null) { … … return trailingComments; }} return null; } update (if (this.v$0 != null) ) to (if (this.v$1 >= 0) ) insert (int[] range = null; …) insert (for (int i = 0; range == null && i <= this.v$1; i++) …) insert (if (this.v$2[i] == node) …) insert (range = this.v$3[i]; …) delete (int[] range = (int[]) this.v$0.get(node); ) update (if (this.v$0 != null) ) to (if (this.v$1 >= 0) ) insert (int[] range = null; …) insert (for (int i = 0; range == null && i <= this.v$1; i++) …) insert (if (this.v$2[i] == node) …) insert (range = this.v$3[i]; …) delete (int[] range = (int[]) this.v$0.get(node); )
13 Found location: public int getExtendedEnd (ASTNode node) { int end = node.getStartPosition() + node.getLength(); if (this.trailingComments != null) { int[] range = (int[]) this.trailingComments.get(node); if (range != null) { … … } } else { … … } return end - 1; } Suggested version: public int getExtendedEnd (ASTNode node) { int end = node.getStartPosition() + node.getLength(); if (this.v$1 >= 0) { int[] range = null; for (int i = 0; range == null && i <= this.v$1; i++) { if (this.v$2[i] == node) range = this.v$3[i]; } if (range != null) { … … } } else { … … } return end - 1; }
Outline Phase I: Creating Abstract Edit Scripts – Syntactic Program Diff – Identify Common Edit – Generalize Identifier – Extract Context Phase II: Find Edit Locations Phase III: Apply Edit Script Evaluation 14
Test Suite 24 repetitive bug fixes that require multiple check- ins [Park et al., MSR 2012] – 2 from Eclipse JDT and 22 from Eclipse SWT – Each bug is fixed in multiple commits – Clones of at least two lines between patches checked in at different times – We use the first two changed methods as input examples 37 systematic edits that require similar changes to different methods 15
16 RQ1: Precision, Recall, and Accuracy Precision (P): What percentage of all found locations are correctly identified? Recall (R): What percentage of all expected locations are correctly identified? Accuracy (A): How similar is Lase-generated version to developer-generated version?
17 On average, Lase finds edit locations with 99% precision, 89% recall, and 91% accuracy. For three bugs, Lase suggests in total 9 edits that developers missed and later confirmed. IndexBug(patches)mimi Edit LocationOperations Σ ✔ P%R%A%ECAE%AE% (2) (3) (5) (3) (3) (2)
RQ2: Sensitivity to number of exemplar edits 7 cases in the oracle data set Enumerate subsets of exemplar edits 18
# of exemplars P%R%A% Index Index Index As the number of exemplar edits increases, P does not change because exemplar edits are similar, except for case 12 R is more sensitive to the number of exemplar edits R increases as a function of exemplar edits A decreases when exemplar edits are different A remains the same or increases when the exemplar edits are very similar
Conclusion Lase automates edit location search and program transformation application Lase achieves 99% precision, 89% recall, and 91% accuracy Future Work – Integrate with automated compilation and testing – Automatically detect repetitive change examples to infer program transformations 20
Acknowledgement This work was supported in part by the National Science Foundation under grants CAREER , CCF , CCF , SHF , CCF , CCF , and a Microsoft SEIF award 21
References I [Meng et al. 2011] Na Meng, Miryung Kim and Kathryn S. McKinley. Systematic editing: Generating program transformations from an example. In PLDI ‘11. [Kamiya et al. 2002] Toshihiro Kamiya and Shinji Kusumoto and Katsuro Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. In TSE ’02. [Lozano et al. 2004] Antoni Lozano and Gabriel Valiente. On the maximum common embedded subtree problem for ordered trees. In C. Iliopoulos and T Lecroq, editors, String Algorithmics, [Park et al. MSR 2012] J. Park, M. Kim, B. Ray, and D.-H. Bae. An empirical study of supplementary bug fixes. In MSR ’12. 22
References II 23 [JZDLL12] G. Jin,W. Zhang, D. Deng, B. Liblit, and S. Lu. Automated concurrency bug fixing. In PLDI ’12. [CHP91] J. R. Cordy, C. D. Halpern, and E. Promislow. Txl: A rapid prototyping system for programming language dialects. Computer Languages, [G10] S. Gulwani. Dimensions in program synthesis. In PPDP ’10. [WNGF09] W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In ICSE ’09.
References III [ER02] M. Erwig and D. Ren. A rule-based language for programming software updates. In RULE ’02. [LR95] D. A. Ladd and J. C. Ramming.A*: A language for imple- menting language processors. In TSE’95. [SMS13] S. Son, K. S. McKinley, and V. Shmatikov. Fix Me Up: Repairing access-control bugs in web applications. In NDSS’13. 24
Step 4: Common Edit Context Extraction Extract all potential common context Refine the common context – Consistent identifier mapping – Embedded subtree isomorphism – Program dependence equivalence 25
26 Step 4: Common Edit Context Extraction (1/4) Finding common text with clone detection (CCFinder [Kamiya et al. 2002])
Step 4: Common Edit Context Extraction (2/4) Identifier generalization 27 Iterator e = fActions.values().iterator(); while (e.hasNext()) { Iterator iter = getActions().values().iterator(); while (iter.hasNext()) { Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); while (v$0.hasNext()) { Abstract identifierIdentifier in mA Identifier in mB Uncertain Mapu$0:FieldAccessOrMethodInvocationfActionsgetActions() Variable Mapv$0eiter Method Mapvalues iterator hasNext TypeMapIterator
Step 4: Common Edit Context Extraction (3/4) Maximum Common Embedded Subtree Extraction (MCESE) [Lozano et al. 2004] ,2,3,-3,-2,-11,2,-2,3,-3,-1 1,2,-2,
Step 4: Common Edit Context Extraction (4/4) Program dependence analysis 29 Abstract identifier Identifier in mA Identifier in mB Variable Mapv$0eiter Method Mapvalues … …. Object next =; while (e.hasNext()) { Iterator e = fActions.values().iterator();
?When more than two examples? 30 A old A new B old B new C old C new E AB E AC D old D new E AD E ABC E ACD E ABCD
31 public void setBackgroundPattern (Pattern pattern){ if (handle == 0) SWT.error(SWT.ERROR_GRAPHIC_DISPOS ED); if (pattern == null) SWT.error(SWT.ERROR_NULL_ARGUMENT ); if (pattern.isDisposed()) SWT.error(SWT.ERROR_INVALID_ARGUME NT); initGdip(false, false); if (data.gdipBrush != 0) destroyGdipBrush(data.gdipBrush); data.gdipBrush = Gdip.Brush_Clone(pattern.handle); data.backgroundPattern = pattern; } public void setBackgroundPattern (Pattern pattern){ if (handle == 0) SWT.error(SWT.ERROR_GRAPHIC_DISPOS ED); if (pattern != null && pattern.isDisposed()) SWT.error(SWT.ERROR_INVALID_ARGUME NT); initGdip(false, false); if (data.gdipBrush != 0) destroyGdipBrush(data.gdipBrush); if (pattern != null) { data.gdipBrush = Gdip.Brush_Clone(pattern.handle); } else { data.gdipBrush = 0; } data.backgroundPattern = pattern; }