Software Engineering An Introduction Experimentation in Jianyun Zhou Today We will talk about this book: Experimentation in software engineering. It give an introduction to the empirical software engineering and particularly the application and the use of experimentation. Jianyun Zhou Dept. of Computer Science NTNU, 4th Oct 2002
Experimentation in software engineering Focus The application of empirical studies, in particular experimentation, in software engineering: as one way to evaluating new methods and techniques. The focus of this book is about the application of empirical studies,... 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Content Chapter 1-3: Introduction Chapter 4-9: Experiment process Chapter 10-14: Examples and exercises Appendices: Statistical tables and process overview This book contains fourteen chapters totally, chapter one to nine will be presented in these two lectures, they will be divided into two parts: introduction part and experimentation part. 4/5/2019 Experimentation in software engineering
Introduction part: Outline Why empirical studies in software engineering? Fitting empirical studies to software engineering Empirical strategies Research environments in software engineering Measurement theory At first, we will talk about the introduction part, it contains five points: at first, why empirical studies is needed in software engineering? And how to fit the empirical studies to the software engineering context. Then we will look at some general empirical strategies, and how they can be used in the different software engineering research environments. At last we will introduce the measurement theory. 4/5/2019 Experimentation in software engineering
Why empirical studies in software engineering? To have control of the developed software advocacy research: new methods based on marketing and conviction evaluate new methods and tools before using them To turning software engineering into a science put forward hypotheses hypothesis testing through empirical studies More empirical studies is needed to be conducted in software engineering. Why empirical studies is needed? At first, it is a way to have control of the software we developed. Over the years, software engineering has been driven by so-called advocacy research, which means that we have invented and introduced new methods and techniques over the years based on marketing and conviction rather than scientific results. So we develop the software using these methods, we will have no control. Using empirical studies, we can evaluate the new methods or tools before we introduce them, use them, so the software developed can be controlled to a certain extent. Empirical studies is also a way to turn SE to a science. In science, we often build a theory in this way: at first, we put forward a hypothesis, then we made experiments to observe the phenomena, based on the observed phenomena we can either reject or accept the hypothesis. Experiments and empirical studies are thus important to test the hypothesis to be correct or wrong. But it is certainly not a most often used research method in SE today, so more empirical studies are required to be conducted in this field. 4/5/2019 Experimentation in software engineering
Fitting empirical studies in SE Software engineering context Resources Produce idea Software product Software process This figure gives a simple view of the software process. The input to this process include an product idea and some resources, for example,people. The output from a software process is the software product. But the software process is not as simple as showed in this figure, much activities are included in this. It can be performed in a long period and can be very complex. So,for the companies, in order to improve the software product, lower the cost. they want also to improve the software process. Empirical studies can be used in this context to improve the software process. Application: improve software process 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Improvement process Two activities: assessment of the software process identify suitable areas for improvement (problem) identify improvement proposals evaluation of a software process improvement proposal It is necessary to evaluate the proposal before making any major changes through empirical studies In order to improve a software process, we often need to perform two activities. One is the assessment of the software process. Several models in SE are found to do this assessment. Through these models, we can found where the improvement is needed, and further give some improvement proposals. The next step is the need to evaluate the improvement proposals before to make any changes to the software process. Because the process is a human-based activity, it is not possible to build a prototype for such activities, the only real evaluation of a process improvement proposal is to have people to use it. so Empirical studies is very suitable in this context and can be used to evaluate different improvement proposals. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Empirical strategies Three major strategies: Survey Case study Experiment The fourth strategy: Post-mortem analysis (PMA) Using the experiences gained from projects within the organisation to learn. It is performed in retrospect of a project In the book, three major strategies are introduced for empirical software engineering, they are… In addition, in some other literatures, a fourth strategy is often mentioned, it is Post-mortem analysis, which means that we use the experiences gained from projects to learn. It is performed in the retrospect of a project. Compared with other strategies, The main advantage of this strategy is the cost is rather low. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Survey An investigation performed before a tool or technique takes into use, or when been in use for a while Primary means to gather data: interviews questionnaires Provide no control This strategy provides little control about the execution. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Case study Case studies are used for monitoring projects or activities. Data is collected for a specific purpose throughout the study. The level of control is low. The level of control is also low in the case study. The experimenter have no control for the execution of the studied project. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiments are normally done in a laboratory environment. Data is collected by performing experiment. Having good control over the situation We will learn more about this strategy in the next part. 4/5/2019 Experimentation in software engineering
Research environments in SE How to use the strategies when evaluating software process changes? Three research environments Desktop the change proposal is evaluated off-line no people involved in applying methods or tools suitable to conduct surveys Laboratory the change proposal is evaluated in a laboratory setting an experiment is conducted Development projects the change proposal is evaluated in a real development situation (observed on-line) case studies are more appropriate We have introduced three main strategies in empirical studies, the questions now is how to use them in the different research environments in SE? There are three research environments found in the software engineering: … In the desktop environment, … so survey is the suitable strategy ti be used in this environment. 4/5/2019 Experimentation in software engineering
Research environments and strategies Development projects Case study High risk Laboratory Experiment Desktop Survey Low risk The picture shows the placement of the three different research environment, and indicate the increased risks. For example, in order to try out a new design method in a realistic environment, we may apply it in a real development project, this is, of course, more risks compared with desktop and experiments because it will affect the quality of the delivered product. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Measurement theory Measurement is a central part in empirical studies. “You cannot control what you cannot measure.” measure both inputs and outputs Definition A measure is the number or symbol assigned to an entity in order to characterise an attribute of the entity Measurement is a mapping from the empirical world to the formal, relational word, i.e. providing a measure Metrics Attributes to be measured, how to measure (scale), etc. e.g. LOC (lines of code) Now we will give a brief introduction to measurement theory because measurement is a very important part in the empirical studies and in experimentation. You can not control something if you can not measure. Empirical studies are often used to investigate the effects of some input to the studied object. So there are two things we want to measure, they are input and the effect (that is output). Measurement and measure can be defined as: For example, we can measure the length of a table and map it to a number denoted by meters, or inches. This number will be a measure. The term metrics is often used in SE, it is used to denote something to be measured, for example, LOC, the lines of code. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Scale and scale type Scale: the different ways to map an attribute to a measure Scale types are related to statistical analysis Nominal scale: only map attribute to a name or symbol, least powerful e.g. classification: IS, SU, DB… Ordinal scale: ranks after an ordering criterion, more powerful than nominal e.g. grades: poor (1), medium (2), good (3) Ratio scale: most powerful e.g. length: 1.89m Another important aspect in measurement theory is scale. Scales is the different ways to measure the same thing. For example, if we want to measure the length of an object, we can measure it in meters, centimetres or inches. Each of them is a scale. Scale types classify scales into different types, three most common scale types used in software engineering are: … One typical nominal scale is classification. For example, we can classify the students in computer science into different groups, and measure one student as “SE”, and measure another student as “IS”. Grade, 100% or 4/5/2019 Experimentation in software engineering
Classification of measures Objective and subjective measures an objective measure means no judgment in the measurement value, e.g. LOC a subjective measure is made by person through judgment, e.g. personnel skill Direct and indirect measures a direct measure involves no other measurements,e.g. LOC a indirect measure is derived from other measurements,e.g. defect rate = #defects/LOC Objective: lines of code. subjective: personnel skill direct: lines of code indirect: defect density - number of defects divided by the number of lines of code 4/5/2019 Experimentation in software engineering
Measurements in software engineering Three classes objects are of interest Process e.g. testing: effort (I), cost (E) Product e.g. code: size (I), reliability (E) Resources e.g. personnel: age (I), productivity (E) Internal and external attributes (Table 3 p.29) The objects that are of interest in SE can be divided into three different classes: process describe which activities are needed to produce software, an object belonging to this class is like, testing: for testing, we maybe want to measure cost or effort. Product is the result from a process. Resources are the objects needed for a process activity. 4/5/2019 Experimentation in software engineering
Experimentation part (outline) Experimentation basics Experimentation principles Terminology Experiment process Now we will talk about the experimentation part, it contains …, and terminology, that is, the concepts used in experimentation. And the experiment process. 4/5/2019 Experimentation in software engineering
Experimentation basics Experiments are controlled studies often to compare one thing with another. They include a formal hypothesis and statistical tests. A hypothesis means that we have an idea of, for example, a relationship, which are able to stated formally. The main objective of an experiment is mostly to evaluate a hypothesis or relation 4/5/2019 Experimentation in software engineering
Experiment principles Experiment objective Theory cause-effect Cause construct Effect construct construct Observation treatment- outcome The algorithm is probabilistic. Works if round trip times are shortB construct Treatment Outcome Independent variable Dependent variable Experiment operation 4/5/2019 Experimentation in software engineering
Terminology in experimentation Variables Independent variables All the variables in a process that are manipulates and controlled E.g. design method, personnel experience, tool support Dependent variables (or response variables) The variables we want to study to see the effect of the changes in independent variables Often only one dependent variable in an experiment E.g. efficiency, productivity Factors One or more changing independent variables, e.g. design method Treatment (conditions) One particular value of a factor, e.g. design =OO method or FO method Object,subject and test Subjects (persons, students) apply treatments to object (program, document) Each test is a combination of subject, treatment and object, e.g. student A uses OO method to develop the program N. The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Illustration of experiment Treatment Experiment design Process Dependent variable Independent variables Independent variables With fixed level The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiment process Experiment Idea Experiment process Experiment definition Experiment planning Experiment operation Analysis & interpretation The algorithm is probabilistic. Works if round trip times are shortB Presentation & package Conclusions 4/5/2019 Experimentation in software engineering
Definition phase Define experiment Experiment definition Experiment idea The purpose of the definition phase is to define the goals of an experiment according to a defined framwork (Goal definition template). The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Goal definition template The goal template: Object of the study: the entity that is studied in the experiment, e.g. methods, models, processes, final products Purpose: the intention of the experiment, e.g. evaluation Quality of focus: the primary effect under study, e.g. reliability, cost Perspective: from which viewpoint to interpret the results, e.g. customer Context: the environment to run the experiment Single object study; Multi-object variation study; Multi-test within object study; Blocked subject-object study By answering these questions, it is a good way to get the experiment definition Analyze <Object(s) of study> For the purpose of <purpose> With respect to <Quality focus> From the point view of the <Perspective> In the context of <Context> Analyze <Object(s) of study> For the purpose of <purpose> With respect to <Quality focus> From the point view of the <Perspective> In the context of <Context> The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Experimentation in software engineering An example definition Analyze the PBR and checklist techniques For the purpose of evaluation With respect to effectiveness and efficiency From the point of view of the researcher In the context of students reading requirements documents The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiment process Experiment Idea Experiment process Experiment definition Experiment planning Experiment operation Analysis & interpretation The algorithm is probabilistic. Works if round trip times are shortB Presentation & package Conclusions 4/5/2019 Experimentation in software engineering
Experiment planning – phase overview definition Experiment planning Context selection Hypothesis formulation Variables selection Selection of subjects Experiment design Instrumen- tation The algorithm is probabilistic. Works if round trip times are shortB Validity evaluation Experiment design 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Context selection On-line or off-line Students or professional Toy size or real problems Specific or general The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Hypothesis formulation An hypothesis is a specific statement of prediction. Two hypothesis have to be formulated A null hypothesis, H0 : the other possible outcomes An alternative hypothesis, H1: the one you support The objective is to reject the null hypothesis with a certain significance If H0 cannot be rejected, no conclusion can be drawn? Hypothesis testing is the basis for statistical analyze of en experiment Risks in hypothesis testing: Type-I-error: P(type-I-error) =P(reject H0 | H0 true) – significance level Type-II-error: P(type-II-error) =P(not reject H0 | H0 false) The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Planning:design Variables selection Factors: controllable, changeable, have effect on dependent variable Dependent variable: often one, effected by treatments Simultaneously or in reverse order Subjects: representative Design principles: Randomization: the allocation of subjects, objects and the order Blocking: “reducing noses” Balancing: same number of subjects in groups Design types presented One factor with two treatments; One factor with more than two treatments; Two factor with two treatments; More than two factor each with two treatments; Different design types will use different test method. 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Instrumentation The overall goal of the instrumentation is to provide means for performing the experiment and to monitor it, without affecting the control of the experiment The instruments for an experiment are of three types, Objects E.g. specification or code documents Guidelines To guide the participants in the experiment Measurements instruments Data collection via manual forms or in interviews The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Validity evaluation 4 3 3 1 2 The algorithm is probabilistic. Works if round trip times are shortB Conclusion validity: treatment to outcome (“right” analysis) Internal validity: treatment causes outcome (“right” measures) Construct validity: theory to observation (“right” metrics) External validity: generalization (“right” context) 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Validity Conclusion validity Is there a relationship between the two variables? Internal validity Assuming that there is a relationship in this study, is the relationship a causal one? Construct validity Assuming that there is a causal relationship in this study, can we claim that the treatments reflected well our cause construct and that the outcome reflected well our idea of the effect construct ? External validity Assuming that there is a causal relationship in this study between the constructs of the cause and the effect, can we generalize this effect to other persons, places or times? The algorithm is probabilistic. Works if round trip times are shortB 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiment process Experiment Idea Experiment process Experiment definition Experiment planning Experiment operation Analysis & interpretation The algorithm is probabilistic. Works if round trip times are shortB Presentation & package Conclusions 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiment operation Experiment operation Experiment Preparation design Execution Data validation The algorithm is probabilistic. Works if round trip times are shortB Experiment data Preparation: subjects are chosen and forms etc. are prepared Execution: subjects perform their tasks according to different treatments and data is collected Data validation: the collected data is validated 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiment process Experiment Idea Experiment process Experiment definition Experiment planning Experiment operation Analysis & interpretation The algorithm is probabilistic. Works if round trip times are shortB Presentation & package Conclusions 4/5/2019 Experimentation in software engineering
Analysis and interpretation Experiment Descriptive Data statistics Data set reduction Hypothesis testing The algorithm is probabilistic. Works if round trip times are shortB Conclusions 4/5/2019 Experimentation in software engineering
Descriptive statistics and data set reduction The goal is to get a feeling for how data is distributed Descriptive statistics characterize the data by measures of central tendency: mean, median, mode etc. measures of dispersion: variance, range, relative frequency etc. measures of dependency: covariance etc. graphical visualization: scatter plot, box plot, histogram etc. The scale of the measurement restricts the type of statistics to use Data set reduction removes abnormal or false data points (outliners) and reduces the data set to a set of valid data points 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Hypothesis testing Principle The objective is to see the possibility to reject the null hypothesis If the null hypothesis is not rejected, nothing can be said from the experiment, while if it is rejected, it can be stated that the null hypothesis is false with a significance (α). α = P(type-I-error) =P(reject H0 | H0 true) The different types of tests are related to the different design types. Parametric tests based on a model that involves a specific distribution Parameters be measured in ration scale Non-parametric tests Based on a model with very general conditions May be applied to nominal and ordinal scales 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Experiment process Experiment Idea Experiment process Experiment definition Experiment planning Experiment operation Analysis & interpretation The algorithm is probabilistic. Works if round trip times are shortB Presentation & package Conclusions 4/5/2019 Experimentation in software engineering
Presentation and packaging It is essential not to forget important aspects or necessary information, needed to enable others to replicate or take advantage of the experiment, and knowledge gained through the experiment Report outline: Introduction Problem statement Experiment planning Experiment operation Data analysis Interpretation of results Discussion and conclusions Appendix 4/5/2019 Experimentation in software engineering
Experimentation in software engineering Rest of the books Chapter 10: Literature survey references for some published software engineering experiments Chapter 11: An example of process an example to illustrate the experiment process Chapter 12: Experiment example how to report an experiment in a paper Chapter 13: Exercises and data Appendices: Statistical tables and process overview 4/5/2019 Experimentation in software engineering