Download presentation
Presentation is loading. Please wait.
Published byWilfrid Cody Stokes Modified over 9 years ago
1
Object-Oriented Reengineering Patterns 5. Problem Detection
2
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.2 Roadmap Detection Strategies Detecting Duplicated Code
3
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.3 Roadmap Detection Strategies —God Class —Feature Envy —Data Class —Shotgun Surgery Detecting Duplicated Code
4
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.4 Detection strategies A detection strategy is a metrics-based predicate to identify candidate software artifacts that conform to (or violate) a particular design rule
5
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.5 Filters and composition A data filter is a predicate used to focus attention on a subset of interest of a larger data set —Statistical filters – I.e., top and bottom 25% are considered outliers —Other relative thresholds – I.e., other percentages to identify outliers (e.g., top 10%) —Absolute thresholds – I.e., fixed criteria, independent of the data set A useful detection strategy can often be expressed as a composition of data filters
6
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.6 God Class A God Class centralizes intelligence in the system —Impacts understandibility —Increases system fragility
7
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.7 ModelFacade (ArgoUML) 453 methods 114 attributes over 3500 LOC all methods and all attributes are static
8
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.8 Feature Envy Methods that are more interested in data of other classes than their own [Fowler et al. 99]
9
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.9 ClassDiagramLayouter
10
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.10 Data Class A Data Class provides data to other classes but little or no functionality of its own
11
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.11 Data Class (2)
12
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.12 Property
13
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.13 Shotgun Surgery A change in an operation implies many (small) changes to a lot of different operations and classes
14
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.14 Project
15
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.15 Roadmap Detection Strategies Detecting Duplicated Code —What is duplicated code? —Detecting duplication —Visualizing duplication
16
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.16 Roadmap Detection Strategies Detecting Duplicated Code —What is duplicated code? —Detecting duplication —Visualizing duplication
17
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.17 Code is Copied Small Example from the Mozilla Distribution (Milestone 9) Extract from /dom/src/base/nsLocation.cpp
18
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.18 Case StudyLOC Duplication without comments with comments gcc460’0008.7%5.6% Database Server245’00036.4%23.3% Payroll40’00059.3%25.4% Message Board6’50029.4%17.4% How Much Code is Duplicated? Usual estimates: 8 to 12% in normal industrial code 15 to 25 % is already a lot!
19
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.19 What is Duplicated Code? Duplicated Code = Source code segments that are found in different places of a system. — in different files — in the same file but in different functions — in the same function The segments must contain some logic or structure that can be abstracted, i.e., Copied artifacts range from expressions, to functions, to data structures, and to entire subsystems.
20
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.20 Copied Code Problems General negative effect —Code bloat Negative effects on Software Maintenance —Copied Defects —Changes take double, triple, quadruple,... Work —Dead code —Add to the cognitive load of future maintainers Copying as additional source of defects —Errors in the systematic renaming produce unintended aliasing
21
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.21 Nontrivial problem: No a priori knowledge about which code has been copied How to find all clone pairs among all possible pairs of segments? Code Duplication Detection
22
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.22 AuthorLevelTransformed CodeComparison Technique Johnson 94LexicalSubstringsString-Matching Ducasse 99LexicalNormalized StringsString-Matching Baker 95SyntacticalParameterized StringsString-Matching Mayrand 96SyntacticalMetric TuplesDiscrete comparison Kontogiannis 97SyntacticalMetric TuplesEuclidean distance Baxter 98SyntacticalASTTree-Matching General Schema of Detection Process
23
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.23 Roadmap Detection Strategies Detecting Duplicated Code —What is duplicated code? —Detecting duplication —Visualizing duplication
24
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.24 A Map of Reengineering Patterns Tests: Your Life Insurance Detailed Model Capture Initial Understanding First Contact Setting Direction Migration Strategies Detecting Duplicated Code Redistribute Responsibilities Transform Conditionals to Polymorphism
25
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.25 Detecting Duplicated Code Compare Code Mechanically Visualize Code as Dotplots Redistribute Responsibilities Transform Conditionals to Polymorphism Detect Understand
26
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.26 Pattern: Compare Code Mechanically Problem: How do you discover which parts of an application code have been duplicated? Solution: —Textually compare each line of the software source code with all the other lines of code. Steps —Normalize the source files —Remove uninteresting code —Compare files line-by-line
27
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.27 … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ]; … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+] … fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+] Simple Detection Approach (i) Assumption: – Code segments are just copied and changed at a few places Noise elimination transformation – remove white space, comments – remove lines that contain uninteresting code elements – (e.g., just ‘else’ or ‘}’)
28
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.28 Simple Detection Approach (ii) Code Comparison Step —Line based comparison (Assumption: Layout did not change during copying) —Compare each line with each other line. —Reduce search space by hashing: – Preprocessing: Compute the hash value for each line – Actual Comparison: Compare all lines in the same hash bucket Evaluation of the Approach —Advantages: Simple, language independent —Disadvantages: Difficult interpretation
29
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.29 A Perl script for C++ (i)
30
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.30 A Perl script for C++ (ii) Handles multiple files Removes comments and white spaces Controls noise (if, {,) Granularity (number of lines) Possible to remove keywords
31
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.31 Output Sample Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: 6178/6179/6180/6181/6182 6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: 6177/6178 6229/6230 Lines = duplicated lines Locations = file names and line number
32
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.32 Enhanced Simple Detection Approach Code Comparison Step —As before, but now – Collect consecutive matching lines into match sequences – Allow holes in the match sequence Evaluation of the Approach —Advantages – Identifies more real duplication, language independent —Disadvantages – Less simple – Misses copies with (small) changes on every line
33
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.33 Abstraction —Abstracting selected syntactic elements can increase recall, at the possible cost of precision
34
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.34 Recall and Precision
35
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.35 Metrics-based detection strategy Duplication is significant if: —It is the largest possible duplication chain uniting all exact clones that are close enough to each other. —The duplication is large enough.
36
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.36 Automated detection in practice Wettel [ MSc thesis, 2004] uses three thresholds: —Minimum clone length: the minimum amount of lines present in a clone (e.g., 7) —Maximum line bias: the maximum amount of lines in between two exact chunks (e.g., 2) —Minimum chunk size: the minimum amount of lines of an exact chunk (e.g., 3) Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006
37
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.37 Roadmap Detection Strategies Detecting Duplicated Code —What is duplicated code? —Detecting duplication —Visualizing duplication
38
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.38 Pattern: Visualize Code as Dotplots Problem —How can you effectively identify significant duplication in a complex software system? Solution —Visualize the code as a dotplot, where dots represent duplication. Steps —Normalize the source files —Compare files line-by-line —Visualize and interpret the dotplots
39
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.39 Duploc: Detecting Duplicated Code Compare, visualize and navigate multiple source files
40
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.40 Clone detection by string-matching Solid diagonals indicate significant duplication between or within source files.
41
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.41 Dotplot Visualization Sample Dot Configurations:
42
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.42 Detected Problem 4 Object factory clones: a switch statement over a type variable is used to call individual construction code Possible Solution Strategy Method Visualization of Repetitive Structures
43
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.43 Visualization of Cloned Classes Class A Class B Class A Detected Problem: Class A is an edited copy of class B. Editing & Insertion Possible Solution Subclassing …
44
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.44 20 Classes implementing lists for different data types Detail Overview Visualization of Clone Families
45
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.45 Conclusion Duplicated code is a real problem —makes a system progressively harder to change Detecting duplicated code is a hard problem —some simple techniques can help —tool support is needed Visualization of code duplication is useful —basic tool support is easy to build (e.g., 3 days with rapid-prototyping) Curing duplicated code is an active research area
46
© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz ASWEC 2007, Melbourne — OORP Tutorial — Problem Detection 5.46 License > http://creativecommons.org/licenses/by-sa/2.5/ Attribution-ShareAlike 2.5 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. Attribution-ShareAlike 2.5 You are free: to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.