Evaluating the Efficacy of Test-Driven Development: Industrial Case Studies -Joe Finley
Test- Driven Development First use- 1960 NASA Project Mercury Test code written prior to implementation code. Highly Iterative process. The goal is to improve quality.
Benefits of Test Driven Development Efficiency and Feedback: Test-then code gives continuous feedback Low-Level design: Tests provide a specification of low-level design decisions. Reduction of defects injected: Patched code requires running automated test cases. Test Assets: Requires writing code that is automatically testable. Regression testing assets are already built.
Related Case Studies Empirical study at IBM: 40% fewer defects in functional verification and regression tests than a baseline prior product without reduced productivity. John Deere, Rolemodel Software and Ericson: Small Java program using TDD while the control group used a waterfall-like approach. TDD programmers passed 18% more tests. TDD programmers used 16% more time (control group did not write automated tests). Academic studies: 1.) TDD-variant (Muller and Hagner): Write complete automated unit test cases before any production code. Conclusion: Not higher quality. 2.) Test-First and Test-Last (Erdogmus): 24 undergrads improved productivity but not quality. Conclusion: effectiveness of Test-first depends on backing up code with test cases. 3.) XP: 11 undergrads… from a testing perspective, 87% stated that the execution of test cases strengthened their confidence.
Microsoft Case Studies - Defect Measurement Defect density measured: “…a person makes an error that results in a physical fault (or defect) in a software element. When this element is executed, traversal of the fault/defect may put the element (or system) into an erroneous state. When this erroneous state results in an externally visible anomaly, we say that a failure has occurred” The above definition for defect is used and defects are normalized per thousand lines of code (KLOC)
Microsoft Case Studies TDD evaluations done in two different divisions – Windows and MSN Defect density used to measure quality. Measure development time increase due to TDD Use of CppUnit and NUnit framework indicate generalization of results across languages. Both project managers report to same manager.
Setup – Context Factors Project A – Windows networking team, Project B - MSN Expertise of project B crew is lower than that of project A Project A uses C while project B uses C#.
Product Measures Comparable projects chosen by managers with similar responsibility.
Outcome Measures Project A took at least 10-20% more time than the comparable project over project B. Project B was 5 times larger than A. Project B’s comparable non-TDD project had 4.2 times as many defects / KLOC Comparable non-TDD project in Project A had 2.6 times as many defects / KLOC.. Less due to expertise? or size of project?
Threats to validity TDD developers may have been motivated to produce higher quality code since it was a new process. TDD projects may have been easier to develop. Analysis needs additional repetition in different contexts before results are generalized.
Conclusions and future work Additional research in industry in differing contexts. Cost-benefit economic analysis on the utility of TDD.
Questions ?