A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis University of Macedonia, Greece BCI 2015, Craiova Romania, September 2015
…we regret to inform you … the evaluation of your approach is rather weak … …unfortunately we had to reject a number of good papers…..the proposed approach lacks a thorough evaluation… …we would like to thank you for your submission, BUT…..further evaluation is required … …Congratulations, your paper has been accepted… …evaluation is backed up by systematic statistical results …
I need some proof = EVALUATION!!
Taxonomies Taxonomy: Τάξις (Arrangement) +Νόμος (Law, Method) “aims at organizing a collection of objects in a hierarchical manner to provide a conceptual framework for discussion and analysis”
Goal of Study To build a taxonomy of evaluation approaches in Software Engineering
Context of Study 3 PhD students, 3 faculty members TSE: ΙΕΕΕ Transactions on Software Engineering, TOSEM: ACM Trans. on Soft. Eng. and Methodology, JSS: Elsevier's Journal of Systems and Software articles that appeared in the corresponding 2012 volume
Context of Study (2) Title, Authors, Journal, Issue Free Keywords & Classification (ACM) Employed Evaluation Approach Pages devoted to the evaluation Total #pages TSE: 81 articles TOSEM: 24 articles JSS: 207 articles Filtered: articles that clearly did not belong in the SE domain, Empirical Studies (Systematic Literature Reviews, surveys, mapping studies) TSE: 58 articles TOSEM: 22 articles JSS: 53 articles 133 Articles
Key Terms Performance: Most typical definition of performance originates from computer architecture: performance refers to the amount of work that a system/computer/program can perform in a given time or for given resources. Effectiveness: By effectiveness we refer to the extent by which a proposed technique/methodology accomplishes the desired goal. For example, a testing approach is effective if it reveals a large number of bugs. Benchmark: A benchmark is a standard, acknowledged data set (consisting of tasks, collection of items, software etc.) designed with the purpose of being representative of problems that occur frequently in real domains.
Proposed Taxonomy
Goal is to make clear the advantages and dis- advantages over previous work, and usually to high- light the added value of the proposed technique
Proposed Taxonomy
By formal treatment we mean the use of a mathematically-based approach for proving theorems, properties, invariants or the correctness of a system. Not all of software engineering research can benefit from the application of formal methods criterion is related to the completeness of the proof, 1. the mathematical reasoning validates the entire approach 2. ensures the fulfillment of certain properties
Proposed Taxonomy
Application of the proposed tool, algorithm, technique on artificially constructed or selected case studies. Results are obtained and discussed to demonstrate the feasibility, performance or effectiveness of the approach. Empirical Evaluation Case Studies Case Study Evaluation Empirical Results Experiments Experimental Results …..
Extent of Evaluation papers with just one page and papers with as many as 24 pages for the evaluation have been encountered
Availability of Data
Validation of the Taxonomy By definition, it is difficult to assess whether taxonomies are valid, since their construction relies on the subjective interpretation of categories we have applied the taxonomy on articles which have not been considered during its development we have classified the papers from the Main Track of the 34th International Conference on Software Engineering (ICSE'2012) 87 articles have been considered We recorded: a)Whether the paper actually introduces any technique b)Whether the paper could be mapped to any of the derived classification categories c)The corresponding category code
Validation of the Taxonomy (2)
Correlation between evaluation and area RQ1: Is the evaluation approach correlated to the area of research? H 0 Variables "Area of Research" and "Evaluation Type" are independent H 1 Variables "Area of Research" and "Evaluation Type" are dependent Areas of research correspond to a second level classification based on the 2012 ACM Computing Classification System A chi-square test revealed that there is no statistically significant correlation between “Evaluation Type” and “Area of Research”
In Software Testing there is a tendency to employ case studies and analysis of effectiveness (i.e. how well a testing strategy achieves its goals)
Correlation between evaluation and area RQ2:Is the extent of the evaluation correlated to the evaluation approach? H 0 The distribution of "Extent of Evaluation" is the same across categories of "Evaluation Type" H 1 The distribution of "Extent of Evaluation" is not the same across categories of "Evaluation Type" we applied the non-parametric Independent-Samples Kruskal- Wallis test to compare the distributions across groups formed by the evaluation type variable result is significant at the 0.05 level. In other words, the extent of evaluation is affected by the employed evaluation strategy.
Evaluation of efficiency on case studies, relying on explicitly stated research questions (E ) devotes a large percentage of the paper to the evaluation.
Conclusion In software engineering there is a vast amount of different evaluation techniques designed and executed to serve the needs of each particular research We have attempted to introduce a taxonomy of evaluation approaches. We identified 17 evaluation types that any approach can adopt either individually or in combination with other types and 8 axes according to which evaluation approaches can be classified.
... We are glad to inform you that your paper: …. has been ACCEPTED by BCI 2015 Program Committee Review 1 … the authors have done good job in supporting their methodology by a convincing evaluation approach ….. So, the next time you receive a review pointing to the strength or weaknesses of the evaluation approach You might be able to classify your approach based on the proposed taxonomy!
BCI 2015, Craiova Romania, September 2015 Thank you for your attention!!