Lase: Locating and Applying Systematic Edits by Learning from Examples Na Meng* Miryung Kim* Kathryn S. McKinley* + The University of Texas at Austin*

Slides:



Advertisements
Similar presentations
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Advertisements

Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.
Baishakhi Ray *, Miryung Kim *, Suzette Person +, Neha Rungta ! * The University of Texas at Austin + NASA Langley Research Center ! NASA Ames Research.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
1 Software Testing and Quality Assurance Lecture 9 - Software Testing Techniques.
Chapter 2: Algorithm Discovery and Design
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen, 1 David W. Embley 1 Stephen W. Liddle 2 1 Department of Computer Science 2 Rollins Center.
Ontology translation: two approaches Xiangkui Yao OntoMorph: A Translation System for Symbolic Knowledge By: Hans Chalupsky Ontology Translation on the.
Chapter 2: Algorithm Discovery and Design
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March 31, 2004 Funded by National.
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard MIT CSAIL
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Systematic Editing: Generating Program Transformations from an Example Na Meng Miryung Kim Kathryn S. McKinley The University of Texas at Austin.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
XFindBugs: eXtended FindBugs for AspectJ Haihao Shen, Sai Zhang, Jianjun Zhao, Jianhong Fang, Shiyuan Yao Software Theory and Practice Group (STAP) Shanghai.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
PLATFORM INDEPENDENT SOFTWARE DEVELOPMENT MONITORING Mária Bieliková, Karol Rástočný, Eduard Kuric, et. al.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
By Ian Jackman Davit Stepanyan.  User executed untested code.  The order in which statements were meant to be executed are different than the order.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features Robert Dyer The research activities described in this talk were.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Presented by: Ashgan Fararooy Referenced Papers and Related Work on:
Hai Wan School of Software Sun Yat-sen University KRW-2012 June 17, 2012 Boolean Program Repair Reverse Conversion Tool via SMT.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Duplicate code detection using anti-unification Peter Bulychev Moscow State University Marius Minea Institute eAustria, Timisoara.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Le Goues Westley Weimer Stephanie Forrest
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Demonstrating Programming Language Feature Mining using Boa Robert Dyer These research activities supported in part by the US National Science Foundation.
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Interactive Code Review for Systematic Changes
Detecting Table Clones and Smells in Spreadsheets
Towards Trustworthy Program Repair
CBCD: Cloned Buggy Code Detector
Critics: An Interactive Code Review Tool for
Rename Local Variable Refactoring Instances
Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
Accurate and Efficient Refactoring Detection in Commit History
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
Precise Condition Synthesis for Program Repair
Automatically Diagnosing and Repairing Error Handling Bugs in C
Presentation transcript:

Lase: Locating and Applying Systematic Edits by Learning from Examples Na Meng* Miryung Kim* Kathryn S. McKinley* + The University of Texas at Austin* Microsoft Research +

Motivating Scenario A old A new B old B new C old C new Pat needs to update database transaction code to prevent SQL injection attacks 2

Systematic Editing Similar but not identical changes to multiple contexts Manual, tedious, and error-prone Source transformation tools require describing edits in a formal language[CHP91, ER02, LR95] Bug fixing tools handle simple stylized code changes[WNGF09, JZDLL12, SMS13] Sydit does not find edit locations automatically when applying systematic edits[MKM11] 3

Lase Workflow D old D suggested LASE selects methods & suggests edits A old A new B old B new User selects examples 4 I old I suggested X old X suggested …

5 D new Syntactic program differencing A old Apply edit script Identify common edit Generalize identifier B old A new B new Extract context Find edit location Object next = e.next(); Object next = iter.next(); Object next = v$0.next(); Approach Overview ✗ no match ✔ match D old C old Phase: I. Create edit script II. III.

Step 1. Syntactic Program Differencing 6 Input: m old, m new Output: Edit operations

7 Step 2: Identify Common Edit Longest Common Edit Operation Subsequence insert(Object next = e.next()…) insert(if(next instanceof MVAction) insert(((MVAction)next).update()) update(print(next.toString())) to … insert(Object next = e.next()…) insert(if(next instanceof MVAction) insert(((MVAction)next).update()) update(print(next.toString())) to … insert(Object next = iter.next()…) update(print(next.getString())) to … insert(if(next instanceof MVAction) insert(((MVAction)next).update()) delete(System.out.println(…)) insert(Object next = iter.next()…) update(print(next.getString())) to … insert(if(next instanceof MVAction) insert(((MVAction)next).update()) delete(System.out.println(…)) insert(Object next = e.next()…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) insert(Object next = e.next()…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) insert(Object next = iter.next()…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) insert(Object next = iter.next()…) insert(if(next instanceof MVAction)) insert(((MVAction)next).update()) Edit script A Edit script B

8 Step 3: Generalize Identifier Keep the original type, method, and variable names if examples agree Abstract identifiers if examples disagree 8 Generalized name Name in mA Name in mB Variable Mapnext v$0eiter Method Mapnext Type MapObject Iterator Object next = e.next(); Object next = iter.next(); Object next = v$0.next();

Step 4: Extract Context 9 Object next = e.next(); Object next = iter.next(); Iterator e = fActions.values().iterator(); … while(e.hasNext()) Iterator iter = getActions().values().iterator(); … while(iter.hasNext()) Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); … while(v$0.hasNext()) Generalized nameName in mA Name in mB Uncertain Mapu$0:FieldAccessOrMethodInvocationfActionsgetActions() Variable Mapv$0eiter Method Mapvalues iterator hasNext TypeMapIterator

Phase II. Find Edit Locations 10 D old Iterator e = fActions.values().iterator(); Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); Generalized nameName in mD Uncertain Mapu$0:FieldAccessOrMethodInvocationfActions Variable Mapv$0e Method Mapvalues iterator TypeMapIterator A old B old A new B new

Phase III. Applying Edit Script Customize general edit scripts – Identifier concretization – Edit position concretization Apply the customized edit scripts 11

12 Example 1: Comment[] getLeadingComments( ASTNode node) { - if (this.leadingComments != null) { + if (this.leadingPts >= 0) { - int[] range = (int[]) this.leadingComments.get(node); + int[] range = null; + for (int i = 0; range == null && i <= this.leadingPtr; i++) { + if (this.leadingNodes[i] == node) + range = this.leadingIndexes[i]; + } if (range != null) { … … return leadingComments; }} return null; } Example 2: Comment[] getTrailingComments(ASTNode node) { - if (this.trailingComments != null) { + if (this.trailingPts >= 0) { - int[] range = (int[]) this.trailingComments.get(node); + int[] range = null; + for (int i = 0; range == null && i <= this.trailingPtr; i++) { + if (this.trailingNodes[i] == node) + range = this.trailingIndexes[i]; + } if (range != null) { … … return trailingComments; }} return null; } update (if (this.v$0 != null) ) to (if (this.v$1 >= 0) ) insert (int[] range = null; …) insert (for (int i = 0; range == null && i <= this.v$1; i++) …) insert (if (this.v$2[i] == node) …) insert (range = this.v$3[i]; …) delete (int[] range = (int[]) this.v$0.get(node); ) update (if (this.v$0 != null) ) to (if (this.v$1 >= 0) ) insert (int[] range = null; …) insert (for (int i = 0; range == null && i <= this.v$1; i++) …) insert (if (this.v$2[i] == node) …) insert (range = this.v$3[i]; …) delete (int[] range = (int[]) this.v$0.get(node); )

13 Found location: public int getExtendedEnd (ASTNode node) { int end = node.getStartPosition() + node.getLength(); if (this.trailingComments != null) { int[] range = (int[]) this.trailingComments.get(node); if (range != null) { … … } } else { … … } return end - 1; } Suggested version: public int getExtendedEnd (ASTNode node) { int end = node.getStartPosition() + node.getLength(); if (this.v$1 >= 0) { int[] range = null; for (int i = 0; range == null && i <= this.v$1; i++) { if (this.v$2[i] == node) range = this.v$3[i]; } if (range != null) { … … } } else { … … } return end - 1; }

Outline Phase I: Creating Abstract Edit Scripts – Syntactic Program Diff – Identify Common Edit – Generalize Identifier – Extract Context Phase II: Find Edit Locations Phase III: Apply Edit Script Evaluation 14

Test Suite 24 repetitive bug fixes that require multiple check- ins [Park et al., MSR 2012] – 2 from Eclipse JDT and 22 from Eclipse SWT – Each bug is fixed in multiple commits – Clones of at least two lines between patches checked in at different times – We use the first two changed methods as input examples 37 systematic edits that require similar changes to different methods 15

16 RQ1: Precision, Recall, and Accuracy Precision (P): What percentage of all found locations are correctly identified? Recall (R): What percentage of all expected locations are correctly identified? Accuracy (A): How similar is Lase-generated version to developer-generated version?

17 On average, Lase finds edit locations with 99% precision, 89% recall, and 91% accuracy. For three bugs, Lase suggests in total 9 edits that developers missed and later confirmed. IndexBug(patches)mimi Edit LocationOperations Σ ✔ P%R%A%ECAE%AE% (2) (3) (5) (3) (3) (2)

RQ2: Sensitivity to number of exemplar edits 7 cases in the oracle data set Enumerate subsets of exemplar edits 18

# of exemplars P%R%A% Index Index Index As the number of exemplar edits increases,  P does not change because exemplar edits are similar, except for case 12  R is more sensitive to the number of exemplar edits  R increases as a function of exemplar edits A decreases when exemplar edits are different A remains the same or increases when the exemplar edits are very similar

Conclusion Lase automates edit location search and program transformation application Lase achieves 99% precision, 89% recall, and 91% accuracy Future Work – Integrate with automated compilation and testing – Automatically detect repetitive change examples to infer program transformations 20

Acknowledgement This work was supported in part by the National Science Foundation under grants CAREER , CCF , CCF , SHF , CCF , CCF , and a Microsoft SEIF award 21

References I [Meng et al. 2011] Na Meng, Miryung Kim and Kathryn S. McKinley. Systematic editing: Generating program transformations from an example. In PLDI ‘11. [Kamiya et al. 2002] Toshihiro Kamiya and Shinji Kusumoto and Katsuro Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. In TSE ’02. [Lozano et al. 2004] Antoni Lozano and Gabriel Valiente. On the maximum common embedded subtree problem for ordered trees. In C. Iliopoulos and T Lecroq, editors, String Algorithmics, [Park et al. MSR 2012] J. Park, M. Kim, B. Ray, and D.-H. Bae. An empirical study of supplementary bug fixes. In MSR ’12. 22

References II 23 [JZDLL12] G. Jin,W. Zhang, D. Deng, B. Liblit, and S. Lu. Automated concurrency bug fixing. In PLDI ’12. [CHP91] J. R. Cordy, C. D. Halpern, and E. Promislow. Txl: A rapid prototyping system for programming language dialects. Computer Languages, [G10] S. Gulwani. Dimensions in program synthesis. In PPDP ’10. [WNGF09] W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In ICSE ’09.

References III [ER02] M. Erwig and D. Ren. A rule-based language for programming software updates. In RULE ’02. [LR95] D. A. Ladd and J. C. Ramming.A*: A language for imple- menting language processors. In TSE’95. [SMS13] S. Son, K. S. McKinley, and V. Shmatikov. Fix Me Up: Repairing access-control bugs in web applications. In NDSS’13. 24

Step 4: Common Edit Context Extraction Extract all potential common context Refine the common context – Consistent identifier mapping – Embedded subtree isomorphism – Program dependence equivalence 25

26 Step 4: Common Edit Context Extraction (1/4) Finding common text with clone detection (CCFinder [Kamiya et al. 2002])

Step 4: Common Edit Context Extraction (2/4) Identifier generalization 27 Iterator e = fActions.values().iterator(); while (e.hasNext()) { Iterator iter = getActions().values().iterator(); while (iter.hasNext()) { Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); while (v$0.hasNext()) { Abstract identifierIdentifier in mA Identifier in mB Uncertain Mapu$0:FieldAccessOrMethodInvocationfActionsgetActions() Variable Mapv$0eiter Method Mapvalues iterator hasNext TypeMapIterator

Step 4: Common Edit Context Extraction (3/4) Maximum Common Embedded Subtree Extraction (MCESE) [Lozano et al. 2004] ,2,3,-3,-2,-11,2,-2,3,-3,-1 1,2,-2,

Step 4: Common Edit Context Extraction (4/4) Program dependence analysis 29 Abstract identifier Identifier in mA Identifier in mB Variable Mapv$0eiter Method Mapvalues … …. Object next = e.next(); while (e.hasNext()) { Iterator e = fActions.values().iterator();

?When more than two examples? 30 A old A new B old B new C old C new E AB E AC D old D new E AD E ABC E ACD E ABCD

31 public void setBackgroundPattern (Pattern pattern){ if (handle == 0) SWT.error(SWT.ERROR_GRAPHIC_DISPOS ED); if (pattern == null) SWT.error(SWT.ERROR_NULL_ARGUMENT ); if (pattern.isDisposed()) SWT.error(SWT.ERROR_INVALID_ARGUME NT); initGdip(false, false); if (data.gdipBrush != 0) destroyGdipBrush(data.gdipBrush); data.gdipBrush = Gdip.Brush_Clone(pattern.handle); data.backgroundPattern = pattern; } public void setBackgroundPattern (Pattern pattern){ if (handle == 0) SWT.error(SWT.ERROR_GRAPHIC_DISPOS ED); if (pattern != null && pattern.isDisposed()) SWT.error(SWT.ERROR_INVALID_ARGUME NT); initGdip(false, false); if (data.gdipBrush != 0) destroyGdipBrush(data.gdipBrush); if (pattern != null) { data.gdipBrush = Gdip.Brush_Clone(pattern.handle); } else { data.gdipBrush = 0; } data.backgroundPattern = pattern; }