Presentation on the article: Identifying effective software metrics using genetic algorithms Presenter: Randy Hunt Presenter: Vitaliy Krestnikov Date: April 27, 2009 course: comp 589 9/23/2018
Introduction Team Leaders commonly use software metrics as a measure of the overall quality of the design and the eventual implementation of systems. The ability to predict the quality of a software object from a set of software metrics is in essence a problem of classification. 9/23/2018
Classification Take a set of objects with known features (software metrics) Combine them with group labels (quality rankings) And you get a classifier that can predict the quality of new objects using only the computed metrics 9/23/2018
Software Metrics Software metrics are used to quantitatively map a set of numerical values, such as the number of lines of code in a file or the number of methods in a class, to a subjective measure of quality, in terms of the apparent complexity, maintainability and usability. Not all metrics provide the same classification power though, but different combination can yield results that certain people are looking for. 9/23/2018
Proposition This article proposes using a genetic algorithm feature selection procedure to indicate the optimal metrics used in the classification process. To test this proposal, software produced by Evident was used. 9/23/2018
Software Metrics All 338 software objects in EvIdent were subjectively labeled by an experienced software architect in terms of maintainability. Ranked each Java class as low, medium-low, medium or high. High represents easy to modify. Low represents difficult to modify. 9/23/2018
Software Metrics There were 16 different software metrics used. LOC, SLC, CLC, WLC, RCC, RCS, SMC, MET, ANL, CAN, AE, ALC, ASC, ASL, ACC. AEC 9/23/2018
The Genetic Algorithm 9/23/2018
Genetic Algorithm step 1: initialize population Population of Genes: Each chromosome is a software metric Chromo-somes: gene #1 gene #2 gene #3 gene #4 gene #5 T Y P L O C S W R SMC ME ANL CAN AIE … 1 9/23/2018
2. Begin the algorithm* for creating offspring for generation N, starting with generation 1 * The algorithm is shown on the following slides 9/23/2018
3. Calculate fitness by LDA* 4. Select pair based on fitness * LDA is explained later in this presentation Chromo-somes: gene #1 gene #2 gene #3 gene #4 gene #5 T Y P L O C S W R SMC ME ANL CAN AIE … 1 Fitness (LDA %) 44 63 37 67 50 9/23/2018
5. Produce child gene by swapping bits starting from the randomly-picked crossover point Chromo-somes: gene #2 gene #4 crossover T Y P L O C S W R SMC ME ANL CAN AIE … 1 * 9/23/2018
6. mutate each child bit where a random probability number exceeds the control parameter * Control parameter should be small (e.g. 10%) Chromo-somes: Crossover mutated T Y P L O C S W R SMC ME ANL CAN AIE … 1 9/23/2018
7. Insert child into population; replacing the least fit gene Chromo-somes: gene #1 gene #2 child gene #4 gene #5 T Y P L O C S W R SMC ME ANL CAN AIE … 1 Fitness (LDA %) 44 63 N/A 67 50 9/23/2018
8. Return to step 3 and repeat this process until one generation has reproduced. * There is a control parameter, the number of elite genes (those which survive to the next generation) which determines when one generation is complete. 9/23/2018
9. Return to step 2 and repeat for the next generation, until N generations have reproduced. * There is a control parameter, the number of generations, which determines when this loop terminates. 9/23/2018
Control parameters for the GA Number of genes in the population Number of generations Percent of elite genes (those that survive to the next generation) * The probability of mutations * In the previous example, we have a very small population and only one reproduction was demonstrated. There are many reproductions per generation. 9/23/2018
Control parameters for the GA Number of genes in the population Number of generations Percent of elite genes (those that survive to the next generation) * The probability of mutations * In the previous example, we have a very small population and only one reproduction was demonstrated. There are many reproductions per generation. 9/23/2018
Linear Discriminate Analysis (LDA) 9/23/2018
Computing “Fitness” using LDA Java object: Zoo Quality ranking: low SW metrics: TYP: 1, LOC:539, SLC: 401, CLC: 138, Etc. Java object: Bar Quality ranking: low SW metrics: TYP: 1, LOC:539, SLC: 401, CLC: 138, Etc. Objective Function using LDA Fitness Value Java object: Foo Quality ranking: high SW metrics: TYP: 1, LOC:539, SLC: 401, CLC: 138, Etc. * This shows computing fitness for only one gene (set of SW metrics) 9/23/2018
SW metric (“known feature”): Group: High max Group: Low Group: Medium SW metric (“known feature”): LOC Group: medium-low 1 2 3 4 SW Metric (“known feature”):TYP * In reality, we can have up to 16 dimensions (only 2 shown here) 9/23/2018
Aspects of LDA function logic For a point on the previous graph, the LDA algorithm will allocate it to the group based on: the greatest probability distribution The prior probability (for the last SW object processed, presumably) is also a factor 9/23/2018
Results 9/23/2018
Top 5 These are the 6 metrics that were common to the top 5 genes. SLC WLC RCC AE ASL ACC 9/23/2018
Conclusion The GA metrics appear to indicate that code that is easy to read along with comments help developers understand the purpose of the code. 9/23/2018