Download presentation
Presentation is loading. Please wait.
1
FUZZY CLUSTERING OF SOFTWARE METRICS Paper by Scott Dick and Abraham Kandel Presentation by Craig Castaneda
2
Metrics, Quality, and Resources Developers monitor various attributes of the development process through metrics in an effort to improve the quality of the final product. The metrics can be used to guide the allocation of additional development resources to “high-risk” modules.
3
Metrics and the Problems with Metrics Metrics quantifies some aspect of the source code. This may be some count of how many things there are, or may be a measure of density. Unfortunately all of these metrics are strongly correlated not only to quality, but each other as well, known as multicollinearity.
4
Multicolinearity Multicolinearity arises as a result of the variables not being independent of each other. This means they are not ORTHOGONAL. When variables are orthogonal to each other, a point in the solution space, can be written uniquely. When variables are non-orthogonal, the same spot can be written in more than 1 way. Because of this, linear regression fails to produce an answer.
5
Other Problems While there is data for small modules with small numbers of errors, there really isn’t any sample data when it comes to large modules with large numbers of errors. This means that the analysis is only a valid predictor over a limited range.
7
Lets get fuzzy In binary logic, things are one thing or another “Raise your hand if you’re a boy” But what if you asked a group of kids, “Do you like school?” We need a bit of logic that allows for being a little of both sides, and a way to describe how much in either direction.
8
Fuzzy in groups In set and group theory, usually an item can belong to a group or not. Fuzzy groups allows an item to belong to multiple sets, with an indicator of how strongly it belongs in that set.
9
How is it used here While we don’t know a software’s future failure rate, we do know the metrics involved in creating the module. These metrics can be used to define a module in feature space, and we can assign a “How much of a problem will this be” number to it.
10
How is it used part 2 By doing this with all of the modules in the software, we can rank which modules are likely to cause the most problems. Using a pareto analysis, we can assign additional resources appropriately.
11
A look at the data – MIS Dataset 390 total Modules 11 Software Metrics per Module Number of changes to each module were also recorded. Assumption that number of changes represents number of failures, but unverified.
12
A look at the data – ProcSoft Structured Programming Contains 422 Modules 11 Metrics No data on change metrics
13
A look at the data – OOSoft Object Oriented 562 Methods 11 Metrics Same functionality as ProcSoft No data on change metrics
14
A look at the method Fuzzy C-Clustering requires the number of clusters to be determined beforehand They ran the data for 2 through 10 clusters Analyzed after which number of clusters would be the best to use.
15
Results – Before we look Cluster Compactness and Separation How distinct, each cluster is from the other clusters, and how close the data points are to each epicenter. We want this high. Fuzzy Separation index A measure of how fuzzy the line between clusters, we want this low. Avg SSE – Average Predicted Error A measure of our predicted error as a %, we want this low
16
MIS – Data Set
17
Results 2 – Before we look Mean – Average = Total changes / modules Median – Middle number when ordered by magnitude If mean < median – The low end was farther away from the median than the high end.
18
MIS – Data Set 2
19
MIS – Data Set 3
20
ProcSoft
21
ProcSoft 2
22
OOSoft
23
OOSoft 2
24
Conclusions The technique selects modules for additional work that would not normally be found by ranking individual modules. That’s a good thing. The nature of the technique is repeatable when the programs came from the same paradigm (procedural). Programs from the Object-oriented paradigm did not fit the pattern established by the procedural programs, this could be due to paradigm itself, and metrics more suited to the paradigm may be needed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.