Accurate and Efficient Refactoring Detection in Commit History

Slides:



Advertisements
Similar presentations
Mahadevan Subramaniam and Bo Guo University of Nebraska at Omaha An Approach for Selecting Tests with Provable Guarantees.
Advertisements

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Refactoring-aware Configuration Management for Object-Oriented Programs Written by: Danny Dig, Kashif Manzoor, Ralph Johnson, and Tien Nguyen ACM International.
Aki Hecht Seminar in Databases (236826) January 2009
A Tool Support to Merge Similar Methods with a Cohesion Metric COB ○ Masakazu Ioka 1, Norihiro Yoshida 2, Tomoo Masai 1,Yoshiki Higo 1, Katsuro Inoue 1.
Memories of Bug Fixes Sunghun Kim, Kai Pan, and E. James Whitehead Jr., University of California, Santa Cruz Presented By Gleneesha Johnson CMSC 838P,
Demo of Software Refactoring with Eclipse Giriprasad Sridhara CISC 879 Spring 2007 May
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Regression Test Selection for AspectJ Software Guoqing Xu and Atanas.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
1 Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey Bas Cornelissen (et al.) Delft University of Technology IPA Herfstdagen,
Testing a program Remove syntax and link errors: Look at compiler comments where errors occurred and check program around these lines Run time errors:
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style ADVANCED PROGRAMING PRACTICES API documentation.
Systematic Editing: Generating Program Transformations from an Example Na Meng Miryung Kim Kathryn S. McKinley The University of Texas at Austin.
Unit Testing & Defensive Programming. F-22 Raptor Fighter.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
A Visual Comparison Approach to Automated Regression Testing (PDF to PDF Compare)
Mining Function Usage Patterns to Find Bugs Chadd Williams.
University of Maryland Bug Driven Bug Finding Chadd Williams.
Change Impact Analysis for AspectJ Programs Sai Zhang, Zhongxian Gu, Yu Lin and Jianjun Zhao Shanghai Jiao Tong University.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Generative Programming. Automated Assembly Lines.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features Robert Dyer The research activities described in this talk were.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Chapter 5 Control Structure (Repetition). Objectives In this chapter, you will: Learn about repetition (looping) control structures Explore how to construct.
Date: November 9, 2011 Presenter – Munawar Hafiz Assistant Professor, CSSE, Auburn University A Tale of Four Research Ideas.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
Chapter 8 Testing. Principles of Object-Oriented Testing Å Object-oriented systems are built out of two or more interrelated objects Å Determining the.
 2003 Prentice Hall, Inc. All rights reserved. 1 IS 0020 Program Design and Software Tools Preprocessor Midterm Review Lecture 7 Feb 17, 2004.
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.
Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Chapter 12: Support for Object- Oriented Programming Lecture # 18.
An Algorithm for Detecting and Removing Clones in Java Code Nicolas Juillerat & Béat Hirsbrunner Pervasive and Artificial Intelligence ( ) Research Group,
Experience Report: System Log Analysis for Anomaly Detection
Migrating CSS to Preprocessors by Introducing Mixins
Advanced Programing practices
Why We Refactor? Confessions of GitHub Contributors
CS 326 Programming Languages, Concepts and Implementation
Loop Structures.
Effective Data-Race Detection for the Kernel
CBCD: Cloned Buggy Code Detector
A Refactoring Technique for Large Groups of Software Clones
Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold Xu Zhao*, Kirk Rodrigues*, Yu Luo*, Michael Stumm*,
Clone Refactoring with Lambda Expressions
Rename Local Variable Refactoring Instances
Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
High Coverage Detection of Input-Related Security Faults
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
Software Refactoring Group
Masatomo Hashimoto Akira Mori Tomonori Izumida
Assessing the Refactorability of Software Clones
Adapted from slides by Nicholas Shahan and Dan Grossman
Advanced Programing practices
Nicholas Shahan Spring 2016
Java Programming Loops
Precise Condition Synthesis for Program Repair
Automatically Diagnosing and Repairing Error Handling Bugs in C
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Pointer analysis John Rollinson & Kaiyuan Li
C++ Object Oriented 1.
Presentation transcript:

Accurate and Efficient Refactoring Detection in Commit History My name is Nikolaos Tsantalis, associate professor at Concordia University, and I will present to you our work on Refactoring detection. The rest of co-authors are Matin (Master student in my group), Laleh (PDF in my group), Davood (PhD in my group), and Danny from Oregon State Univeristy. Nikolaos Tsantalis Matin Mansouri Laleh Eshkevari Davood Mazinanian Danny Dig May 31, 2018

Refactoring is noise in evolution analysis Bug-inducing analysis (SZZ): flag refactoring edits as bug-introducing changes Tracing requirements to code: miss traceability links due to refactoring Regression testing: unnecessary execution of tests for refactored code with no behavioral changes Code review/merging: refactoring edits tangled with the actual changes intended by developers

There are many refactoring detection tools Demeyer et al. [OOPSLA’00] UMLDiff + JDevAn [Xing & Stroulia ASE’05] RefactoringCrawler [Dig et al. ECOOP’06] Weißgerber and Diehl [ASE’06] Ref-Finder [Kim et al. ICSM’10, FSE’10] RefDiff [Silva & Valente, MSR’17]

Limitations of previous approaches Dependence on similarity thresholds thresholds need calibration for projects with different characteristics Dependence on built versions only 38% of the change history can be successfully compiled [Tufano et al., 2017] Unreliable oracles for evaluating precision/recall Incomplete (refactorings found in release notes or commit messages) Biased (applying a single tool with two different similarity thresholds) Artificial (seeded refactorings)

Why do we need better accuracy? Empirical studies Library adaptation Framework migration Refactoring detection poor accuracy

Why do we need better accuracy?

Contributions First refactoring detection algorithm operating without any code similarity thresholds RefactoringMiner open-source tool with an API Oracle comprising 3,188 refactorings found in 538 commits from 185 open-source projects Evaluation of precision/recall and comparison with previous state-of-the-art Tool infrastructure for comparing multiple refactoring detection tools

Approach in a nutshell AST-based statement matching algorithm Input: code fragments T1 from parent commit and T2 from child commit Output: M set of matched statement pairs UT1 set of unmatched statements from T1 UT2 set of unmatched statements from T2 Code changes due to refactoring mechanics: abstraction, argumentization Code changes due to overlapping refactorings or bug fixes: syntax-aware AST node replacements

Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; Before After

Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(int count) { List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; Before After

Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; Before After

Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; Before After

Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { } return addresses; try { addresses[i] = new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); Before After

Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); return null; Before After

textual similarity  30% Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); return null; textual similarity  30% Before After

(1) Abstraction Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); return null; (1) Abstraction Before After

(1) Abstraction Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); return null; (1) Abstraction Before After

(2) Argumentization Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address(host, port); } catch (UnknownHostException e) { e.printStackTrace(); return null; (2) Argumentization Before After

(2) Argumentization Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return null; (2) Argumentization Before After

(3) AST Node Replacements private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", ports.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return null; (3) AST Node Replacements Before After

(3) AST Node Replacements private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return null; (3) AST Node Replacements Before After

textual similarity = 100% Before After private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return null; textual similarity = 100% Before After

M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 = {A, B, G} UT2 = {8} Before private static Address[] createAddresses(int count) { Address[] addresses = new Address[count]; for (int i = 0; i < count; i++) { try { addresses[i] = new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return addresses; private static List<Address> createAddresses(AtomicInteger ports, int count){ List<Address> addresses = new ArrayList<Address>(count); for (int i = 0; i < count; i++) { addresses.add(createAddress("127.0.0.1", ports.incrementAndGet())); } return addresses; A 1 B 2 C D 3 E F G 9 protected static Address createAddress(String host, int port) { try { return new Address("127.0.0.1", PORTS.incrementAndGet()); } catch (UnknownHostException e) { e.printStackTrace(); return null; M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 = {A, B, G}  UT2 = {8} 4 5 6 7 Before After 8

Extract Method detection rule (M, UT1, UT2) = statement-matching(createAddresses, createAddress) M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 ={A, B, G} UT2 = {8} createAddress is a newly added method in child commit  createAddresses in parent commit does not call createAddress  createAddresses in child commit calls createAddress  |M| > |UT2|   createAddress has been extracted from createAddresses

Evaluation RQ1: What is the accuracy of RefactoringMiner and how does it compare to the state-of-the-art? RQ2: What is the execution time of RefactoringMiner and how does it compare to the state-of-the-art?

Oracle construction Public dataset with validated refactoring instances: 538 commits from 185 open-source projects [Silva et al., FSE’2016 Distinguished Artifact] We executed two tools: RefactoringMiner and RefDiff [Silva & Valente, MSR’2017] We manually validated all 4,108 detected instances with 3 validators for a period of 3 months 3,188 true positives and 920 false positives

Comparison with state-of-the-art RefDiff [Silva & Valente, MSR’2017] Commit-based refactoring detection tool Evaluation on 448 seeded refactorings in 20 open-source projects RefDiff has much higher precision/recall than Ref-Finder [Kim et al. 2010] and RefactoringCrawler [Dig et al. 2006] Ref-Finder and RefactoringCrawler need fully built Eclipse projects as input

RMiner better precision in all refactoring types In half of the types RefDiff has better recall Overall RMiner has +22% precision and +1.5% recall

Advantage of RefDiff Treats code fragments as bags of tokens and ignores the structure

Disadvantage of RefDiff Inability to deal with changes in the tokens

Execution time per commit [ms] On median, RefactoringMiner is 7 times faster than RefDiff ms

Limitations + Future work Missing context: Pull Up reported as Move, if a class between the source and destination is unchanged. Nested refactorings: unable to detect Extract Method applied within an extracted method Unsupported refactorings: refactoring types, such as Rename Variable/Parameter/Field, Extract/Inline Variable can be supported from the analysis of AST replacements. Oracle bias: plan to add more tools for constructing the oracle (challenge: make tools work without binding information)

Conclusions RefactoringMiner: commit-based refactoring detection https://github.com/tsantalis/RefactoringMiner RefactoringMiner: commit-based refactoring detection No similarity thresholds High accuracy: 98% precision, 87% recall Ultra-fast: 58ms on median per commit Better than competitive tools (RefDiff): +22% precision, 7 times faster Largest and least biased refactoring oracle up to date 3188 true refactoring instances 538 commits 185 open-source projects 3 validators over 3 months (9 person-months) http://refactoring.encs.concordia.ca/oracle/