Presented By : Abirami Poonkundran
This paper is a case study on the impact of ◦ Syntactic Dependencies, ◦ Logical Dependencies and ◦ Work Dependencies on a software development project, and identifies which dependencies have the higher impact on fault proneness
Introduction Software Dependencies ◦ Syntactic Dependencies ◦ Logical Dependencies ◦ Work Dependencies Data Collection Measuring Failure Results Conclusion Pro’s and Con’s
Research has shown that software faults are caused by violation of dependencies Dependencies could be: ◦ Software Dependencies Technical Caused by developers ◦ Work Dependencies Organizational Caused by how work is organized This paper examines the relative impact that each of these dependencies have on the fault proneness of the software system
Software Dependencies could be: ◦ Syntactic ◦ Logical
Focuses on Control and Dataflow relationships Dependencies are discovered by analysis of source code or from an intermediate representation like byte code or syntax trees These dependencies could be: ◦ Data Related Dependency - e.g., a particular data structure modified by a function and used in another function ◦ Functional Dependency – e.g., Method A calls Method B
Dependencies between the source code files of a system that are changed together as part of software development Often Logical Dependencies provide more valuable information than Syntactic Dependencies (eg., in Remote Procedure Calls) They can identify important dependencies that are not visible in Syntactic Code analysis
Only recent research have started shedding light on the impact of human and organizational factors on the failure proneness of software systems Caused because of lack of proper communication and coordination between developers Research have shown that identification and management of work dependencies is a major challenge
Examined two large software development projects: ◦ Project A Complex distributed system Data are covered for 3 years of development activity The company had 114 developers grouped into 8 development team and has 3 development locations ≃ 5 million lines of code distributed in 7,737 source code files in C language
◦ Project B: Embedded software system 40 developers in the project over a period of 5 years 1.2 million lines of code were used in both C and C++ language
In both projects, every change to source code was controlled by Modification Requests (MR) Every change made to Source code has to be committed to Version Control System Information Used for this Analysis: ◦ Collected a total of 8,257 and 3,372 MRs for Project A and Project B ◦ Version control system from both projects ◦ The source code itself from both projects
Goal is to investigate failure proneness at the file level File Buggyness – indicates whether a file has been modified in the course of resolving a defect
Used C-REX tool to identify programming language tokens and references in each entity of each source-code file Source code snap shot was taken every quarter Syntactic dependency analysis was done for each source code snapshot Syntactic dependencies between source code file was identified by data, function and method references
Relate source-code files that are modified together as part of an MR If only one file was changed for an MR, then there is no dependencies Using the Commit information from the Version control system, a logical dependency matrix (LDM) was created LDM is a symmetric matrix of source-code files where Cij represents the sum, across all releases, of the number of times files i and j were changed together as part of an MR
Used two measures: ◦ Workflow Dependencies Captures the temporal aspects of the development effort Two developers i and j are said to be interdependent if the MR was transferred from one developer i to developer j some point during that MR ◦ Coordination Requirements Captures the intradeveloper coordination requirements Uses two matrix: Task Assignment Matrix – Developer to file matrix Task Dependency Matrix – File to file matrix
Analysis consists of two stages: ◦ First Stage: Focus on examining the relative impact of each dependency type on failure proneness of source-code files ◦ Second Stage: Verified the consistency of the initial results by conduction a number of confirmatory analysis Constructed several logistic regression models
If Odds Ratio is larger than 1, then positive relationship between the independent and dependent variables If Odds ratio less than 1, then negative relationship Model 1: ◦ Based on LOC and Average Lines Changed ◦ LOC is positively associated with failure proneness ◦ Average lines changed is also positively associated with defects
Model II: ◦ Introduces Syntactic Dependency measures by: Inflow Data Has significant impact on error proneness Inflow Functional This type of syntactic dependency has less impact on failure pronenesss Model III: ◦ Higher number of logical dependencies related to an increase in the likelihood of failure
Model IV: ◦ Workflow dependencies do increase the likelihood of defects Model V: ◦ Coordination requirement has an higher impact in Project A and lesser impact in Project B
All dependencies increases fault proneness Logical Dependencies has the highest impact, followed by Workflow dependencies and then Syntactic Dependencies
Analysis is based on data collection from 2 projects Logical Dependencies has the highest impact when compared to other 2 dependencies Weakness: Data collection from only 2 projects They have not mentioned about other dependencies except software and work dependencies Not provided a method to solve the errors for the dependencies
Need to provided a method to solve the errors for the dependencies Discussion about other dependencies General concepts should be introduced