Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks.

Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks

Purpose of the Paper Using the textual description of a change, try to understand why that change was performed (Adaptive, Corrective, or Perfective) Observe difficulty, size, and interval on the different types of changes

Three Different Types of Changes Traditionally, the three types of changes are (Taken from ELEC 876 Slides) :

Three Types of Changes in This Paper Adaptive: Adding new features wanted by the customer (Switched with Perfective) Corrective: Fixing Faults Perfective: Restructuring code to accommodate future changes (Switched with Adaptive) They did not say why they changed these definitions

The Case Study Company This paper did not divulge the company it used for its case study It is an actual business Kept developer names/actions anonymous in the study This allowed them to study a real system that has lasted for many years, and has a large (and old) version control system.

Structure of the ECMS The Company’s Source Code Control System - ECMS ( Extended Change Management System) MRs vs. Deltas Each MR could have multiple Deltas of changes to one file Delta – each time a file was “touched”

The Test System Called “System A” for anonymity purposes Has:  2M lines of source code  3000 files  100 modules Over the last 10 years:  33171 MRs  An average of 4 deltas each

How they Classified Maintenance Activities (Adaptive, Corrective, Perfective) If you were given this project You have:  The CVS repository, and access to the descriptions along with commits  The goal of labelling each commit as “Adaptive”, “Corrective”, or “Perfective”. What would you intuitively study in the descriptions?

How they Classified Maintenance Activities (Adaptive, Corrective, Perfective) They had a 5 step process: 1. Cleanup and normalization 2. Word Frequency Analysis 3. Keyword Clustering and Classification 4. MR abstract classification 5. Repeat analysis from step 2 on unclassified MR abstracts

Step 1: Cleanup and Normalization Their approach used WordNet A software that eliminates prefixes and suffixes to get back to the root word. E.g. fixing and fixes are all of the root word fix WordNet also had a synonym feature, but it was not used. They would be hard to correlate properly to the context of SW maintenance, and could be misinterpreted.

Step 2: Word Frequency Analysis Determine the frequency of a set of words in the descriptions (Histogram for each description) What words in the English language would be “neutral” to these classifications and be noise in this experiment?

Step 3: Keyword Clustering Classification was done by reading the description of 20 randomly selected changes for each selected term in their set, such as “cleanup” meaning perfective maintenance. Human reading was done. If word matched less than 75% of cases, then deemed “neutral” Found that “rework” was used a lot during “code inspection” (a new classification)

Step 4: MR Classification Rules Like the “hard-coded” answer when the learning algorithm fails If an inspection word is found, then it is deemed an inspection classification If fix, bug, error, fixup, or fail are present, the change is corrective If more than one type of keyword is present, the dominating frequency wins.

Step 5: Cycle Back to Step 2 As in Step 2 you cannot cover the frequency of every word in your document all at once, take some more now Perform more “learning” and see if new frequent terms fit Use static rules to resolve unclassified descriptions When all else failed, considered fixes to be corrective

Case Study: Compare Against Human Classification 20 Candidates, 150 MRs More than 61% of the time, the tool and the real people came to the same classification Kappa and ANOVA were used to show significance in the results

How Purposes Affect Size and Interval Corrective and Adaptive had the lowest change intervals New Code Development and inspection changes added the most lines Inspection deleted the most lines Distribution functions are significant at a 0.01 level ANOVA described significance as well, but is inappropriate due skewed distributions

Change Difficulty 20 Candidates, 150 MRs Goal: To model the difficulty of each MR. Is classification significant?

Modeling Difficulty Modeling of Size: Deltas (# of files touched) Difficulty changed with number of deltas except in corrective and perfective (changes in SW/HW) changes Length of time modeled in difficulty as well

Likes and Dislikes of this Paper Likes  The algorithm used to make classifications – good way to break down the problem  The accumulation graphs were interesting  Their utilization of a real company is also a breath of fresh air – real data! Dislikes  Asking developers months after the work how hard changes were. No better way at moment, but results can be skewed with time.  Using a real company, the anonymity made the product comparison in the paper less interesting

Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks.

Similar presentations

Presentation on theme: "Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks.

Similar presentations

Presentation on theme: "Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks."— Presentation transcript:

Similar presentations

About project

Feedback