A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Strategies to unlock your research potential. Eighth Biennial National Health Occupations Curriculum Conference Houston, TX October 29 – Nov 2, 2002.
Academic Writing Writing an Abstract.
Research Methodology For reader assistance, have an introductory paragraph in which attention is given to the organization of the section in relation to.
Mapping Studies – Why and How Andy Burn. Resources The idea of employing evidence-based practices in software engineering was proposed in (Kitchenham.
Introduction to Research Methodology
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
Research Design Week 4 Lecture 1 Thursday, Apr. 1, 2004.
Writing Good Software Engineering Research Papers A Paper by Mary Shaw In Proceedings of the 25th International Conference on Software Engineering (ICSE),
Validating and Improving Test-Case Effectiveness Author: Yuri Chernak Presenter: Lam, Man Tat.
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
Experimental Evaluation
Methodology Conceptual Database Design
Arun Srivastava. Types of Non-sampling Errors Specification errors, Coverage errors, Measurement or response errors, Non-response errors and Processing.
Chapter 2: The Research Enterprise in Psychology
Chapter 2: The Research Enterprise in Psychology
Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.
Chapter 1: Introduction to Statistics
The Math Studies Project for Internal Assessment A good project should be able to be followed by a non-mathematician and be self explanatory all the way.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
The Audit Process Tahera Chaudry March Clinical audit A quality improvement process that seeks to improve patient care and outcomes through systematic.
CRITICAL APPRAISAL OF SCIENTIFIC LITERATURE
Behavioral Observation and Archives
Classroom Assessments Checklists, Rating Scales, and Rubrics
Bloom's Taxonomy: The Sequel (What the Revised Version Means for You!)
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Testing Theory cont. Introduction Categories of Metrics Review of several OO metrics Format of Presentation CEN 5076 Class 6 – 10/10.
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
Chapter 1: The Research Enterprise in Psychology.
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
Writing research proposal/synopsis
The Conclusion and The Defense CSCI 6620 Spring 2014 Thesis Projects: Chapters 11 and 12 CSCI 6620 Spring 2014 Thesis Projects: Chapters 11 and 12.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
S14: Analytical Review and Audit Approaches. Session Objectives To define analytical review To define analytical review To explain commonly used analytical.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Assumes that events are governed by some lawful order
Experimentation in Computer Science (Part 1). Outline  Empirical Strategies  Measurement  Experiment Process.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
What is Computer Science?  Three paradigms (CACM 1/89) Theory (math): definitions, theorems, proofs, interpretations Abstraction (science): hypothesize,
Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach J. Ruthruff et al., University of Nebraska-Lincoln, NE U.S.A, Google.
Dr Jamal Roudaki Faculty of Commerce Lincoln University New Zealand.
Methodology Matters: Doing Research in the Behavioral and Social Sciences ICS 205 Ha Nguyen Chad Ata.
Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.
Research Methods in Computational Informatics IST 501 Fall 2014 Dongwon Lee, Ph.D.
Morten Blomhøj and Paola Valero Our agenda: 1.The journal NOMAD’s mission, review policy and process 2.Two reviews of a paper 3.Frequent comments in reviews.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
1 Copyright © 2011 by Saunders, an imprint of Elsevier Inc. Chapter 8 Clarifying Quantitative Research Designs.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
MODES-650 Advanced System Simulation Presented by Olgun Karademirci VERIFICATION AND VALIDATION OF SIMULATION MODELS.
Question paper 1997.
The Research Process.  There are 8 stages to the research process.  Each stage is important, but some hold more significance than others.
Research Methods Ass. Professor, Community Medicine, Community Medicine Dept, College of Medicine.
(c) 2007 McGraw-Hill Higher Education. All rights reserved. Accountability and Teacher Evaluation Chapter 14.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Experimental Control Definition Is a predictable change in behavior (dependent variable) that can be reliably produced by the systematic manipulation.
Analytical Review and Audit Approaches
PSY 219 – Academic Writing in Psychology Fall Çağ University Faculty of Arts and Sciences Department of Psychology Inst. Nilay Avcı Week 9.
Research Word has a broad spectrum of meanings –“Research this topic on ….” –“Years of research has produced a new ….”
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
Regression Testing with its types
Overview Theory of Program Testing Goodenough and Gerhart’s Theory
Research Process №5.
Goal, Question, and Metrics
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Analyzing Reliability and Validity in Outcomes Assessment
By Hyunsook Do, Sebastian Elbaum, Gregg Rothermel
Presentation transcript:

A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis University of Macedonia, Greece BCI 2015, Craiova Romania, September 2015

…we regret to inform you … the evaluation of your approach is rather weak … …unfortunately we had to reject a number of good papers…..the proposed approach lacks a thorough evaluation… …we would like to thank you for your submission, BUT…..further evaluation is required … …Congratulations, your paper has been accepted… …evaluation is backed up by systematic statistical results …

I need some proof = EVALUATION!!

Taxonomies Taxonomy: Τάξις (Arrangement) +Νόμος (Law, Method) “aims at organizing a collection of objects in a hierarchical manner to provide a conceptual framework for discussion and analysis”

Goal of Study To build a taxonomy of evaluation approaches in Software Engineering

Context of Study 3 PhD students, 3 faculty members TSE: ΙΕΕΕ Transactions on Software Engineering, TOSEM: ACM Trans. on Soft. Eng. and Methodology, JSS: Elsevier's Journal of Systems and Software articles that appeared in the corresponding 2012 volume

Context of Study (2) Title, Authors, Journal, Issue Free Keywords & Classification (ACM) Employed Evaluation Approach Pages devoted to the evaluation Total #pages TSE: 81 articles TOSEM: 24 articles JSS: 207 articles Filtered: articles that clearly did not belong in the SE domain, Empirical Studies (Systematic Literature Reviews, surveys, mapping studies) TSE: 58 articles TOSEM: 22 articles JSS: 53 articles 133 Articles

Key Terms Performance: Most typical definition of performance originates from computer architecture: performance refers to the amount of work that a system/computer/program can perform in a given time or for given resources. Effectiveness: By effectiveness we refer to the extent by which a proposed technique/methodology accomplishes the desired goal. For example, a testing approach is effective if it reveals a large number of bugs. Benchmark: A benchmark is a standard, acknowledged data set (consisting of tasks, collection of items, software etc.) designed with the purpose of being representative of problems that occur frequently in real domains.

Proposed Taxonomy

Goal is to make clear the advantages and dis- advantages over previous work, and usually to high- light the added value of the proposed technique

Proposed Taxonomy

By formal treatment we mean the use of a mathematically-based approach for proving theorems, properties, invariants or the correctness of a system. Not all of software engineering research can benefit from the application of formal methods criterion is related to the completeness of the proof, 1. the mathematical reasoning validates the entire approach 2. ensures the fulfillment of certain properties

Proposed Taxonomy

Application of the proposed tool, algorithm, technique on artificially constructed or selected case studies. Results are obtained and discussed to demonstrate the feasibility, performance or effectiveness of the approach. Empirical Evaluation Case Studies Case Study Evaluation Empirical Results Experiments Experimental Results …..

Extent of Evaluation papers with just one page and papers with as many as 24 pages for the evaluation have been encountered

Availability of Data

Validation of the Taxonomy By definition, it is difficult to assess whether taxonomies are valid, since their construction relies on the subjective interpretation of categories we have applied the taxonomy on articles which have not been considered during its development we have classified the papers from the Main Track of the 34th International Conference on Software Engineering (ICSE'2012) 87 articles have been considered We recorded: a)Whether the paper actually introduces any technique b)Whether the paper could be mapped to any of the derived classification categories c)The corresponding category code

Validation of the Taxonomy (2)

Correlation between evaluation and area RQ1: Is the evaluation approach correlated to the area of research? H 0 Variables "Area of Research" and "Evaluation Type" are independent H 1 Variables "Area of Research" and "Evaluation Type" are dependent Areas of research correspond to a second level classification based on the 2012 ACM Computing Classification System A chi-square test revealed that there is no statistically significant correlation between “Evaluation Type” and “Area of Research”

In Software Testing there is a tendency to employ case studies and analysis of effectiveness (i.e. how well a testing strategy achieves its goals)

Correlation between evaluation and area RQ2:Is the extent of the evaluation correlated to the evaluation approach? H 0 The distribution of "Extent of Evaluation" is the same across categories of "Evaluation Type" H 1 The distribution of "Extent of Evaluation" is not the same across categories of "Evaluation Type" we applied the non-parametric Independent-Samples Kruskal- Wallis test to compare the distributions across groups formed by the evaluation type variable result is significant at the 0.05 level. In other words, the extent of evaluation is affected by the employed evaluation strategy.

Evaluation of efficiency on case studies, relying on explicitly stated research questions (E ) devotes a large percentage of the paper to the evaluation.

Conclusion In software engineering there is a vast amount of different evaluation techniques designed and executed to serve the needs of each particular research We have attempted to introduce a taxonomy of evaluation approaches. We identified 17 evaluation types that any approach can adopt either individually or in combination with other types and 8 axes according to which evaluation approaches can be classified.

... We are glad to inform you that your paper: …. has been ACCEPTED by BCI 2015 Program Committee Review 1 … the authors have done good job in supporting their methodology by a convincing evaluation approach ….. So, the next time you receive a review pointing to the strength or weaknesses of the evaluation approach You might be able to classify your approach based on the proposed taxonomy!

BCI 2015, Craiova Romania, September 2015 Thank you for your attention!!