Presentation is loading. Please wait.

Presentation is loading. Please wait.

FUZZY CLUSTERING OF SOFTWARE METRICS Paper by Scott Dick and Abraham Kandel Presentation by Craig Castaneda.

Similar presentations


Presentation on theme: "FUZZY CLUSTERING OF SOFTWARE METRICS Paper by Scott Dick and Abraham Kandel Presentation by Craig Castaneda."— Presentation transcript:

1 FUZZY CLUSTERING OF SOFTWARE METRICS Paper by Scott Dick and Abraham Kandel Presentation by Craig Castaneda

2 Metrics, Quality, and Resources  Developers monitor various attributes of the development process through metrics in an effort to improve the quality of the final product.  The metrics can be used to guide the allocation of additional development resources to “high-risk” modules.

3 Metrics and the Problems with Metrics  Metrics quantifies some aspect of the source code.  This may be some count of how many things there are, or may be a measure of density.  Unfortunately all of these metrics are strongly correlated not only to quality, but each other as well, known as multicollinearity.

4 Multicolinearity  Multicolinearity arises as a result of the variables not being independent of each other. This means they are not ORTHOGONAL.  When variables are orthogonal to each other, a point in the solution space, can be written uniquely.  When variables are non-orthogonal, the same spot can be written in more than 1 way. Because of this, linear regression fails to produce an answer.

5 Other Problems  While there is data for small modules with small numbers of errors, there really isn’t any sample data when it comes to large modules with large numbers of errors.  This means that the analysis is only a valid predictor over a limited range.

6

7 Lets get fuzzy  In binary logic, things are one thing or another “Raise your hand if you’re a boy”  But what if you asked a group of kids, “Do you like school?”  We need a bit of logic that allows for being a little of both sides, and a way to describe how much in either direction.

8 Fuzzy in groups  In set and group theory, usually an item can belong to a group or not.  Fuzzy groups allows an item to belong to multiple sets, with an indicator of how strongly it belongs in that set.

9 How is it used here  While we don’t know a software’s future failure rate, we do know the metrics involved in creating the module.  These metrics can be used to define a module in feature space, and we can assign a “How much of a problem will this be” number to it.

10 How is it used part 2  By doing this with all of the modules in the software, we can rank which modules are likely to cause the most problems.  Using a pareto analysis, we can assign additional resources appropriately.

11 A look at the data – MIS Dataset  390 total Modules  11 Software Metrics per Module  Number of changes to each module were also recorded.  Assumption that number of changes represents number of failures, but unverified.

12 A look at the data – ProcSoft  Structured Programming  Contains 422 Modules  11 Metrics  No data on change metrics

13 A look at the data – OOSoft  Object Oriented  562 Methods  11 Metrics  Same functionality as ProcSoft  No data on change metrics

14 A look at the method  Fuzzy C-Clustering requires the number of clusters to be determined beforehand  They ran the data for 2 through 10 clusters  Analyzed after which number of clusters would be the best to use.

15 Results – Before we look  Cluster Compactness and Separation  How distinct, each cluster is from the other clusters, and how close the data points are to each epicenter. We want this high.  Fuzzy Separation index  A measure of how fuzzy the line between clusters, we want this low.  Avg SSE – Average Predicted Error  A measure of our predicted error as a %, we want this low

16 MIS – Data Set

17 Results 2 – Before we look  Mean – Average = Total changes / modules  Median – Middle number when ordered by magnitude  If mean < median – The low end was farther away from the median than the high end.

18 MIS – Data Set 2

19 MIS – Data Set 3

20 ProcSoft

21 ProcSoft 2

22 OOSoft

23 OOSoft 2

24 Conclusions  The technique selects modules for additional work that would not normally be found by ranking individual modules. That’s a good thing.  The nature of the technique is repeatable when the programs came from the same paradigm (procedural).  Programs from the Object-oriented paradigm did not fit the pattern established by the procedural programs, this could be due to paradigm itself, and metrics more suited to the paradigm may be needed.


Download ppt "FUZZY CLUSTERING OF SOFTWARE METRICS Paper by Scott Dick and Abraham Kandel Presentation by Craig Castaneda."

Similar presentations


Ads by Google