Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Refactoring Support Based on Code Clone Dependency Relation

Similar presentations


Presentation on theme: "On Refactoring Support Based on Code Clone Dependency Relation"— Presentation transcript:

1 On Refactoring Support Based on Code Clone Dependency Relation
Norihiro Yoshida1, Yoshiki Higo1, Toshihiro Kamiya2, Shinji Kusumoto1, Katsuro Inoue1 1Osaka University 2National Institute of Advanced Industrial Science and Technology Thank you chairperson, My name is Norihiro Yoshida, I’m a master student of the Osaka university in Japan. The title of my presentation is “On Refactoring Support Based on Code Clone Dependency Relation” 2005/09/20 METRICS 2005

2 Background(1) What is a code clone?
A set of code fragments identical or similar to each other Introduced in source program by various reasons such as reusing code by `copy-and-paste’ Make software maintenance more difficult Code Clone At first, I will talk about the background of our research. Code clone is a set of code fragments identical or similar to each other. Code clone is introduced by various reasons such as reusing code by ‘copy-and-paste’. And it is generally said that code clone is one of the factors that make software maintenance more difficult. This figure represents an example of code clone. Code fragments in the code clone are in the same or different files. If we modify one of them, it is necessary to determine whether or not we have to modify the others. 2005/09/20 METRICS 2005

3 Background(2) Refactoring on Code Clones
Refactoring[1] is a way to deal with code clone problem. Refactoring is a technique for restructuring an existing code Alter software’s internal structure without changing its external behavior [2] Improve the maintainability of software Number one in the stink parade is duplicate code [1] New method Call statements Refactoring is a way to deal with code clone problem. Refactoring is a technique for restructuring an existing code, altering its internal structure without changing its external behavior. That is to say, refactoring can improve the maintainability of a software. Martin Fowler is the expert in refactoring. He said that number one in the stink parade is duplicate code. This means that code clone provides a big opportunity to apply refactoring. This figure represents a refactoring on a code clone. In refactoring, we merge the code clone into a new method and replace it by call statements. [1] M. Fowler, Refactoring: improving the design of existing code, Addison Wesley, 1999. [2] 2005/09/20 METRICS 2005

4 Motivation There are dependency relations between methods belonging to the different code clone. Method a Merging (Refactoring) Method a1 Method a2 Code clone A Method b Merging (Refactoring) Method b1 Method b2 Code clone B Next, I will talk about the motivation of our research. In actual Java programs, there are dependency relations between methods belonging to the different code clones This figure is an example of such code clones. Method a1 and a2 belong to the code clone A, and method b1 and b2 belong to the code clone B. The methods in code clone A depend on the methods in code clone B. The relation between code clone B and code clone C is the same. In our previous research, we provided a refactoring guidance for each code clone. For example, in this case, the previous method suggests three separate refactorings. Method c Merging (Refactoring) Method c1 Method c2 Code clone C 2005/09/20 METRICS 2005

5 Motivation There are dependency relations between methods belonging to the different code clone.  Merging all of the code clones at once is more effective Method a1 Method a2 Method a Merging Method b1 Method b2 Method b But, we consider that merging all of the code clones at once is more effective. Method c1 Method c2 Method c 2005/09/20 METRICS 2005

6 Research Goal Define a set of code clones having dependency relations as a chained clone Propose a effective refactoring support method for the chained clone Chained Clone Method a1 Method a2 Method b1 Method b2 Therefore, we define a set of code clones having dependency relations as a chained clone and propose a more effective refactoring support method. Method c1 Method c2 2005/09/20 METRICS 2005

7 Definition of chained clone(1)
Chained Method A set of methods that hold dependency relations Chained Method Graph A node represents a method An edge represents a dependency relation Three types of labels for the dependency relation “Call” : Calling methods “Ai” : Sharing variable i in terms of assignment “Rj” : Sharing variable j in terms of reference A Chained Method Call Call Before going on with the definition of chained clone, let me explain some terminologies that will be used later. Chained method means a set of methods that hold dependency relations. Chained method can be transformed into a labeled graph representation named chained method graph. In chained method graph, a node represents a method and an edge represents a dependency relation. There are three types of labels for the dependency relation. The first one is “Call” It means calling a method. Second one is “Ai”(これはどう読みます? “えーさぶあい”でしょうか?) It means sharing variable I in terms of assignment. The third one is “Rj” It means sharing variable j in terms of reference. The upper-right figure shows an example of a chained method. Each Rectangle in this figure represent a method. In this example, the light blue method calls the black method and the yellow method. The yellow method and the green method share the variable x. The chained method can be transformed into the lower-right graph. Rx Ax A Chained Method Graph 2005/09/20 METRICS 2005

8 Definition of chained clone(2)
CM1 CM2 Chained Clone For 2 given chained methods CM1 and CM2, we transform them into chained method graphs G1 and G2. For G1 and G2, if the following three conditions are satisfied, we call the pair of CM1 and CM2 as a chained clone. G1 and G2 are isomorphic. Each pair of the corresponding nodes between G1 and G2, holds a clone relation. In G1 and G2, labels of the corresponding edge are identical. Chained Clone Set An equivalence class of chained clones G1 G2 Call Call Call Call Rx Ax Rx Ax Next, the definition of chained clone. For 2 given chained methods CM1 and CM2, we transform them into the chained method graph G1 and G2 For G1 and G2, if the three following conditions are satisfied, we call the pair of CM1 and CM2 as a chained clone. First, G1 and G2 are isomorphic. Second, each pair of the corresponding nodes in G1 and G2, holds a clone relation . Third, in G1 and G2, labels of the corresponding edge are identical. The figure on the right shows the transformation from the chained methods into the chained method graphs. In the chained method graphs, a pair of nodes filled with the same color represents a code clone, and these graphs are isomorphic. For each pair of the corresponding nodes between them, holds a clone relation. In addition, labels of the corresponding edge are identical. Therefore, a pair of CM1 and CM2 is called a chained clone. An equivalence class of a chained clone is called a Chained Clone Set. A pair of nodes filled with colored same color is a code clone 2005/09/20 METRICS 2005

9 Applicable Refactorings for Chained Clones
The following refactoring[1] can be applied to merge chained clones. Pull Up Method Refactoring Extract Method Refactoring Extract Super Class Refactoring According to the characteristics of a chained clone, we provide a different appropriate refactoring for it. “Pull Up Method” pattern”, “Extract Method” pattern and “Extract Super Class” pattern can be applied to merge chained clones. According to the characteristics of a chained clone, we provide a different appropriate refactoring pattern for it. [1] M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999. 2005/09/20 METRICS 2005

10 Typical Chained Clones Case 1 : Extract Method Refactoring
All the methods in a chained clone that are contained in a single class. Before Refactoring After Refactoring Class A Class A Chained Clone Method a11 Method a12 Method a1 Method a21 Method a22 Method a2 To give a clear idea on the three refactoring patterns, I will use four cases to explain them. Case 1 is the case of all the methods in a chained clone that are contained in a single class. This figure shows an example of refactoring of Case 1. In this figure, there are two code clones. Method a11 and a12 have a clone relation. Method a21 and a12 also have a clone relation. In addition, method a11 depends on method a21. Also, method a12 depends on method a22. All methods can be merged into two new methods in the class A. And this can be achieved by applying the “Extract Method” refactoring. All methods can be merged into two new methods in the class A. (“Extract Method” Refactoring) 2005/09/20 METRICS 2005

11 Typical Chained Clones Case 2 : Pull Up Method Refactoring
All methods in a chained clone belong to classes that have common parent classes. All methods of each chained method are in the same class respectively. Before Refactoring After Refactoring Super Class Super Class Method 1 Class A Class B Chained Clone Method a1 Method 2 Method b1 Case 2 is when the chained clone satisfies these two conditions. First, all methods in a chained clone have a common parent class. Second, all methods of each chained method are in the same class respectively. This figure shows an example of refactoring of Case 2. Class A and B have the same superclass. /*Method a1 and b1 have a clone relation. Method a2 and b2 also have a clone relation. In addition, method a1 depends on method a2. Also, method b1 depends on method b2.*/ All methods of each code clone can be merged into a new method in the common parent class. And this can be achieved by applying the “Pull Up Method” refactoring. Method a2 Method b2 Class A Class B All methods of each code clone can be merged into a new method in the parent class. (“Pull Up Method” Refactoring) 2005/09/20 METRICS 2005

12 Typical Chained Clones Case 3 : Extract SuperClass Refactoring
Some methods in a chained clone belong to classes that have no common parent class. All method of each chained method are in the same class respectively. Before Refactoring After Refactoring New SuperClass Class A Class B Method 1 Chained Clone Method a1 Method b1 Method 2 Case 3 is when the chained clone satisfies these two conditions. First, some methods in a chained clone have no common parent class. Second, all methods of each chained method are in the same class respectively. This figure shows an example of refactoring of Case 3. There are class A and B in this figure. These classes do not have a common parent class. Class A has method a1 and a2. Class B has method b1 and b2. Method a1 and b1 have a clone relation. Method a2 and b2 also have a clone relation. In addition, method a1 depends on method a2. And, method b1 depends on method b2. All methods of each code clone can be merged into a new method in the new superclass. And this can be achieved by applying the “Extract SuperClass” refactoring. Method a2 Method b2 Class A Class B All methods of each code clone can be merged into a new method in the new superclass. (”Extract SuperClass” Refactoring) 2005/09/20 METRICS 2005

13 Typical Chained Clones Case 4 ( difficult to apply refactoring )
Chained methods exist in different classes. Class S1 Class S2 Chained Clone Class A Class B Class C Class D Method a Method b Method c Method d Case 4 is a case that chained methods exist in different classes This figure shows an example of refactoring of Case 4. There are classes A, B, C and D in this figure. Classes A, B, C and D have methods a, b, c and d respectively. In addition, method a depends on method c. Also, method b depends on method d. Each of the code clones can be achieved by applying “Pull Up Method” pattern. But, It is difficult to apply refactoring to all methods at one time. It is difficult to apply refactoring to all methods at one time. ( The “Pull Up Method” refactoring can be applied to each Code Clone.) 2005/09/20 METRICS 2005

14 Categorization of Chained Clones
We propose a method to classify chained clones by using two metrics. Two method groups for classifying chained clones G1 The group of methods having clone relations G2 The group of methods having dependency relations These metrics evaluate the relationship of distance and position in the class hierarchy among methods belonging to these two groups. R1 All methods belong to classes that exist in the same class. R2 All methods belong to classes that have common parent classes. R3 Some methods belongs to classes that have no common parent class. We propose a method to classify chained clones by using two metrics Now, we will consider these groups of methods for classifying chained clones. The first one is the method groups having clone relations. The second one is the method groups having dependency relations. These metrics evaluate the relationship of distance and position in the class hierarchy among methods belonging to these tow groups. We classify the relationships into three types. First, all methods in its group exist in the same class. Second, all methods in its group have common parent classes. Third, some methods in its group have no common parent class. 2005/09/20 METRICS 2005

15 The metric DCH(S) (the Dispersion in the Class Hierarchy)
DCH(S) : represents the dispersion in the class hierarchy among methods DCH(S) = 1 DCH(S) = 2 Class S3 Class S1 Class S2 2 1 Class A Class B Class C Class D Class E Method a Method b Method c Method d Method e We propose new metrics based on DCH. Now, I will explain the metrics DCH. DCH represents the dispersion in the class hierarchy among methods. In the left example the DCH is 1. The distance between class A and the common parent class S1 is 1. Therefore, the DCH value in this case is 1. In the left example the DCH is 2. The distance between class D and the common parent class S3 is 2. Therefore, the DCH value in this case is 2. If there are classes that have no common parent class, the value of its DCH is undefined. If there are classes that have no common parent class, the value of its DCH is undefined. 2005/09/20 METRICS 2005

16 Metrics to classify chained clone sets (1)
DCHS : Evaluates the dispersion of the methods belonging to G1 (The group of methods having clone relations) in the class hierarchy DCHD : Evaluates the dispersion of the methods belonging to G2 (The group of methods having dependency relations ) in the class hierarchy Calculate a set of DCH(S) metric from methods in each of chained method. Select the maximum value among them as a DCHD. Method a1 Method a2 Calculate a set of DCH(S) metric from methods having each of clone relations. Select the maximum value among them as a DCHS. Method b1 Method b2 We propose a method to classify chained clones by using two metrics The first metric is the DCHS metric. The DCHS metric evaluates the dispersion of the methods belonging to G1. G1 is the group of methods having clone relations in the class hierarchy. The way to calculate the DCHS metric is First, calculate a set of DCH metric from methods in each of chained clones. Then, select the maximum value among them as a DCHS. On the other hand, the DCHD metric evaluates the dispersion of the methods belonging to G2. G2 is the group of method have dependency relations in the class hierarchy. The calculation of DCHD metric is First, Calculate a set of DCH metrics from methods that have each of clone relations Then, select the maximum value among them as a DCHD. Method c1 Method c2 2005/09/20 METRICS 2005

17 Metrics to classify chained clone sets (2)
Using the two metrics, we classify the chained clones into 9 categories. DCHD DCHS greater than 0 cannot be defined Category 11 Category 12 Category 13 grater than 0 Category 21 Category 22 Category 23 cannot be defined Category 31 Category 32 Category 33 Using the two metrics, we classify chained clones into 9 categories 2005/09/20 METRICS 2005

18 Metrics to classify chained clone sets (2)
Using the two metrics, we classify the chained clones into 9 categories. DCHD DCHS grater than 0 cannot be defined Category 11 Category 12 Category 13 Category 21 Category 22 Category 23 cannot be defined Category 31 Category 32 Category 33 Extract Method Refactoring All the methods in a chained clone that are contained in a single class. A chained clone falls into the category 11 if both the values of DCHS and DCHD are 0. A chained clone in this category is contained in a single class. Therefore, Extract Method Pattern can be applied. 2005/09/20 METRICS 2005

19 Metrics to classify chained clone sets (2)
Using the two metrics, we classify the chained clones into 9 categories. DCHD DCHS grater than 0 cannot be defined Category 11 Category 12 Category 13 Category 21 Category 22 Category 23 cannot be defined Category 31 Category 32 Category 33 Pull Up Method Refactoring All methods in a chained clone belong to classes that have common parent classes. All methods of each chained method are in the same class respectively. A chained clone falls into the category 21 if the value of DCHS is greater than 0 and the value of DCHD is 0. A chained clone in the category 21 satisfies these two conditions. In this case, the Pull Up Method Pattern can be applied. 2005/09/20 METRICS 2005

20 Metrics to classify chained clone sets (2)
Using the two metrics, we classify the chained clones into 9 categories. DCHD DCHS grater than 0 cannot be defined Category 11 Category 12 Category 13 Category 21 Category 22 Category 23 cannot be defined Category 31 Category 32 Category 33 Extract SuperClass Refactoring Some methods in a chained clone belong to classes that have no common parent class. All method of each chained method are in the same class respectively. A chained clone falls into the category 31 if the value of DCHS cannot be defined and the value of DCHD is 0. A chained clone in this category satisfies the following two conditions. First, some methods in a chained clone have no common parent class. Second, all methods of each chained method are in the same class respectively. In this case, the Extract SuperClass Pattern can be applied. 2005/09/20 METRICS 2005

21 Metrics to classify chained clone sets (2)
Using the two metrics, we classify the chained clones into 9 categories. DCHD DCHS grater than 0 cannot be defined Category 11 Category 12 Category 13 Category 21 Category 22 Category 23 cannot be defined Category 31 Category 32 Category 33 A chained clone falls into one of these categories If the value of DCHD is greater than 0 or udefined. In these categories, chained methods exist in different classes. It is difficult to apply refactoring to all methods at one time. But, we consider that actually there are not many such chained clones. Difficult to apply refactoring to all methods at one time Chained methods exist in different classes. 2005/09/20 METRICS 2005

22 Evaluation Overview Objective
How many chained clone sets exist in actual Java programs? Is it possible to classify chained clone sets by using the proposed metrics and to apply suggested refactorings to them? Target software Open source software ANTLR (47,000 LOC, 285 Classes) Compiler-Compiler ( Java, C++, C# ) JBoss (640,000 LOC, 3364 Classes) J2EE Application Server Commercial software X ( 70,000 LOC, 309 Classes ) Y ( 81,000 LOC, 290 Classes ) We used CCFinder to detect code clones[1]. Next, I will talk about the evaluation. We have 2 main objectives in our evaluation. The objective of the evaluation is the following. First, how many chained clone sets exist in actual Java programs? Second, is it possible to classify chained clone sets by using the proposed metrics and to apply suggested refactoring patterns to them? We selected both open source software and commercial software as the target. We selected ANTLR from the open software projects. ANTLR is a compiler-compiler which supports Java, C++ and C#. Its LOC is 47,000, and the number of classes is 285. This excludes automatic generation codes. On the other hand, we selected Software X from the commercial software products. X's LOC is 320,000, and the number of classes is 289. This excludes automatic generation codes, too. [1] T. Kamiya, et. al., CCFinder: A multi-linguistic token-based code clone detection system for large scale source code, IEEE TSE, vol.28, no.7, pp , Jul 2005/09/20 METRICS 2005

23 Evaluation Detected chained clone sets
ANTLR 2.7.4 Software X Category # of chained clone sets # of methods max min 11 3 4 21 6 40 31 1 Other Total 10 Category # of chained clone sets # of methods max min 11 2 13 21 31 7 26 4 Other Total 9 In category 21, the maximum number of methods is very large.  Similar functionalities for Java, C# and C++ The number of chained clone sets in category 31 is large. Two packages have similar utility classes. Now, let’ see the number of chained clone sets from target software products. The left table shows the case of ANTLR. We detected 10 chained clone sets in ANTLR. In category 21, the maximum of the number of methods is very large. This is because ANTLR has similar functionalities for Java, C# and C++. The right table shows the case of JBoss. We detected 50 chained clone sets in JBoss. The number of chained clone sets in category 31 is large This is because JBoss contains several products. It has code clones among them. 2005/09/20 METRICS 2005

24 Evaluation Refactoring for Category 31(ANTLR)
We applied suggested refactorings to chained clone sets in ANTLR. Extract Super Class GeneralCharFormatter escapeString call escapeChar CSharp CharFormatter Java CharFormatter escapeString escapeString We applied suggested refactoring patterns to chained clone sets in ANTLR. As a result, all chained clone sets were merged by those refactoring patterns without changing external behaviour. ANTLR has CsharpCharFormatter class and JavaCharFormatter class. They don't have any common parent class. Also, they have the chained clone in category 31. Then we applied the Extract Super Class Pattern to the chained clone. That it to say, we created GeneralCharFormatter class as a superclass of CsharpCharFormatter class and JavaCharFormatter class. And we merged the chained clone into new methods in the superclass. CSharp CharFormatter Java CharFormatter call call escapeChar escapeChar After Refactoring Before Refactoring 2005/09/20 METRICS 2005

25 Conclusion We focus on refactoring for chained clones that consist of sets of the methods with dependency relations Define chained clone Two metrics to classify chained clones according to their applicable refactorings Case studies to show the usefulness of the proposed metrics Future Works Provide information about the internal structure of chained clones Apply our proposed method to some other Java programs As our conclusion, we focus on refactorings for chained clones that consist of sets of methods with dependency relations. At first, we defined a chained clone. Next, we proposed two metrics to classify chained clones according to their applicable refactoring patterns. Finally, we presented case studies to show the usefulness of the proposed metrics. For future works, we are going to provide information about the internal structure of chained clones. Then, we are going to apply our proposed method to some other Java programs. 2005/09/20 METRICS 2005


Download ppt "On Refactoring Support Based on Code Clone Dependency Relation"

Similar presentations


Ads by Google