Presentation is loading. Please wait.

Presentation is loading. Please wait.

Celia Chen1, Lin Shi2, Kamonphop Srisopha1

Similar presentations


Presentation on theme: "Celia Chen1, Lin Shi2, Kamonphop Srisopha1"— Presentation transcript:

1 Maintainability Index Variation Among PHP, Java, and Python Open Source Projects
Celia Chen1, Lin Shi2, Kamonphop Srisopha1 1 Computer Science Department, USC 2 Laboratory for Internet Software Technologies, ISCAS

2 Agenda Future Work Results Research Question Motivation Conclusion
Data Collection Maintainability Index

3 Motivation Open Source Projects Low maintainability
Global Distributed Collaboration Voluntarily Low maintainability Increase the participate cost: Difficult to modify Increase the communication effort: Developers spend more time on finding solutions Not good for attractiveness Difficult to modify Increase the participation cost Difficult to find solutions for bugs Increase the maintainability effort

4 A successful OSS project requires to be highly maintainable
Motivation Open Source Projects Global Distributed Collaboration Voluntarily Low maintainability A successful OSS project requires to be highly maintainable Increase the participate cost: Difficult to modify Increase the communication effort: Developers spend more time on finding solutions Not good for attractiveness Difficult to modify Increase the participation cost Difficult to find solutions for bugs Increase the maintainability effort

5 Why Programming Languages?
“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off.” — Bjarne Stroustrup Impact of the language choice is significant “like choosing a wife“ — Barry W. Boehm Impact on design, development, later maintenance phases Impact of programming language on maintainability is rarely considered. Explain what Bjarne Stroustrup C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it. Our goal: investigate the impact of programming language on maintainability

6 Maintainability “The ease in which a system can be modified or extended” Maintainability Index (MI) An index that represents the ease of maintaining the code Widely used in the industry

7 Maintainability Index
add footnote to reference Halstead Volume (HV) Cyclomatic complexity (CC) Count of lines (LLOC) Percent of lines of comments (CM) MI is developed by the University of Idaho in 1991 by Oman and Hagemeister

8 Halstead Volume According to Halstead, a computer program is an implementation of an algorithm considered to be a collection of tokens which can be classified as either operators or operands. Operators include: Operand includes: Reserved words (while, if, do, class, etc) Qualifier (const, static) expressions and arithmetic operators (+, >,=) etc. numeric constant literal identifiers etc. of course, program with a lot of expression, arithmatic operation, classifier, and other things will have high program volume which mean it’s more complex. hence, require more effort to maintain or it’s not easy to change. n1 = number of distinct operator n2 = number of distinct operands N1 = Total number of occurrences of operators N2 = Total number of occurrences of operands Program Length: N = N1 + N2 Vocabulary Size: n = n1 + n2 Program Volume = N * log2(n)

9 McCabe’s Cyclomatic Complexity
Cyclomatic Complexity aims to capture the complexity of a code function/method in a single number. The metric develops a Control Flow graph that measures the number of linearly independent paths through a program module* E = number of edges N = number of nodes P = number of module/ connected function/method. CC = E - N + 2 x P \ Lower the Program's cyclomatic complexity, lower the risk to modify and easier to understand. *

10 McCabe’s Cyclomatic Complexity
Cyclomatic Complexity aims to capture the complexity of a code function/method in a single number. The metric develops a Control Flow graph that measures the number of linearly independent paths through a program module* E = number of edges N = number of nodes P = number of module/ connected function/method. CC = E - N + 2 x P \ Lower the Program's cyclomatic complexity, lower the risk to modify and easier to understand. CC = (2 * 1) = 3 *

11 Logical Line of Code Logical Line of Code attempts to measure the number of executable expression/ statements Physical Line of Code Logical Line of Code Comment

12 Logical Line of Code Logical Line of Code attempts to measure the number of executable expression/ statements Physical Line of Code 13 Logical Line of Code 6 Comment 3

13 First Research Question
How does MI vary among Java, PHP, and Python open source projects? Null Hypothesis MI does not vary significantly across PHP, Java and Python OSS projects. Language Hypothesis For PHP, Java and Python OSS projects, MI varies significantly. separate them into 2 slides

14 Second Research Question
Does MI vary among various domains for these open source projects? If yes, does language choice affect MI within each domain? Null Hypothesis MI does not vary significantly across different software development domains Domain Hypothesis For different software development domains, MI of PHP, Java and Python OSS projects varies significantly separate them into 2 slides

15 Data Collection Selecting Criterion Has more than one official release
The latest stable release Selecting Criterion well-presented in the community separate them also and change to the table not the screen cap version Has fully accessible source code Has well- established sizing Exclusion Source codes under: test, example, sample, tutorial

16 Characteristics of project data sources
Language Average LLOC Metrics Collection Tools PHP 18643 Phpmetrics Java 33871 CodePro, LocMetrics Python 6644 Radon separate them also and change to the table not the screen cap version

17 Characteristics of project domains
separate them also and change to the table not the screen cap version * Excluding test, doc, example, tutorial folders

18 Classification on number of projects by LLOC in each domain

19 Results – RQ1 One-way ANOVA Results for language analysis
and this table P-Value <0.1 (Strongly suggestive) MI differs across the three languages at 90% confidence level Reject Null Hypothesis P-Value <0.1 (Strongly suggestive) MI differs across the three languages at 90% confidence level Reject Null Hypothesis

20 Maintainability Index without comment (MIWOC)
Consider the code: PHP>Python>Java(Average MIWOC)

21 Maintainability Index with comment (MIWC)
Comments Ratio: Java>PHP>Python (Average MIWC) How do we know that those comments are useful. Evaluate comment are in itself also a hard problem

22 Maintainability Index = MIWOC + MIWC
After consider CM: PHP > Java > Python (Average MI) Python is designed to be a highly readable language, frequently using English keywords where other languages use punctuation.

23 Results – RQ2 One-way ANOVA for domains P-Value <0.05 (Definitive)
MI differs across the five domains at 95% confidence level Reject Null Hypothesis

24 MI Variation among domains
Web Development Framework has shown the highest medians and the highest maximum value. Audio and Video has both the lowest maximum value and the lowest median value

25 Average MI for each Language
PHP may be a good option for projects that desires higher maintainability within Web Development Framework, Security/Cryptography and Audio and Video domain, Python may be a good option for System Administrative Software Java for Software Testing Tools.

26 Maintainability Index — To be Improved
Maintainability Index only consider Code Quality (Halstead Volume, Cyclomatic complexity), Size (Count of lines), and Comments Ratio as indicators. To comprehensively and accurately indicate the ease to maintain for OSS, there are more aspects need to be considered: For example: Code Structure: Cohesion & Coupling Application Clarity Documentation Quality Community Support

27 Conclusion Based on a dataset of 97 open source projects,
Employed one-way ANOVA to investigate How MI differs across Java, PHP and Python OSS projects How MI differs across 5 software domains. A reference to average OSS developers with more awareness that the potential options on Languages in terms of maintainability

28 Future Works Other languages, e.g., C/C++, Ruby, JavaScript, etc.
More language specific factors e.g. programming types, semantics, etc. The relationships between maintainability and other OSS quality attributes e.g. how does the maintainability impact on reliability of OSS projects? How to improve maintainability estimation for OSS projects? e.g. community contribution, personnel capability, etc. should also be explored and considered in the analysis to explain the results of why certain languages are better than others within specified domains


Download ppt "Celia Chen1, Lin Shi2, Kamonphop Srisopha1"

Similar presentations


Ads by Google