From myths and fashions to evidence-based software engineering Magne Jørgensen.

Slides:



Advertisements
Similar presentations
SOFTWARE DEVELOPMENT METHODOLOGIES Methodologies Waterfall Prototype model Incremental Iterative V-Model Spiral Scrum Cleanroom RAD DSDM RUP.
Advertisements

Sharif University of Technology Session # 3.  Contents  Systems Analysis and Design Sharif University of Technology MIS (Management Information System),
Session 8b Decision Models -- Prof. Juran.
Alternate Software Development Methodologies
Realism in Assessment of Effort Estimation Uncertainty: It Matters How You Ask By Magne Jorgensen IEEE Transactions on Software Engineering Vol. 30, No.
Analysis of frequency counts with Chi square
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Computer Engineering 203 R Smith Agile Development 1/ Agile Methods What are Agile Methods? – Extreme Programming is the best known example – SCRUM.
Software Development Overview CPSC 315 – Programming Studio Spring 2009.
Software Development Overview CPSC 315 – Programming Studio Spring 2008.
Overview Definition Hypothesis
Method comparison: for Situational Method Engineering Mohssen Ali.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Understanding Statistics
Modern Systems Analysis and Design Third Edition
University of Palestine software engineering department Testing of Software Systems Testing throughout the software life cycle instructor: Tasneem Darwish.
Associate Professor Arthur Dryver, PhD School of Business Administration, NIDA url:
Rapid software development 1. Topics covered Agile methods Extreme programming Rapid application development Software prototyping 2.
One-Sample Tests of Hypothesis. Hypothesis and Hypothesis Testing HYPOTHESIS A statement about the value of a population parameter developed for the purpose.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Software Development Overview CPSC 315 – Programming Studio Spring 2013.
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Agile. Processes Waterfall Traditional With prototyping Sprial Agile Dynamic Systems Development Method (DSDM) Scrum Crystal eXtreme Programming (XP)
Agenda: Overview of Agile testing Difference between Agile and traditional Methodology Agile Development Methodologies Extreme Programming Test Driven.
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Magne Jørgensen Myths, Over- simplifications and Unconfirmed Claims in Software Engineering Magne Jørgensen.
The use and misuse of statistics in studies on software development. Things you should know about statistics that you didn’t learn in school Keynote at.
Methods of Presenting and Interpreting Information Class 9.
Ten years with Evidence-Based Software Engineering What is it
Software Development Overview
Modern Systems Analysis and Design Third Edition
Software Metrics and Reliability
One-Sample Tests of Hypothesis
Software Engineering Process
Magne Jørgensen Simula Research Laboratory
Chapter 11 Asking and Answering Questions About The Difference Between Two Population Proportions Created by Kathy Fritz.
One-Sample Tests of Hypothesis
Valuable Project Management Tools and Techniques
INF397C Introduction to Research in Information Studies Spring, Day 12
Chapter 9: Testing a Claim
Unit 5: Hypothesis Testing
Inference and Tests of Hypotheses
Software Development methodologies
Iterative and Agile Development
Module 02 Research Strategies.
Introduction to Software Engineering
Tim Hirner - Flagship Speakers January 23, 2014
Software Development Process
Methodologies For Systems Analysis.
One-Sample Tests of Hypothesis
Methodologies For Systems Analysis.
Lecture 2 Revision of Models of a Software Process
Significance Tests: The Basics
Experiment Basics: Variables
Significance Tests: The Basics
EQT 272 PROBABILITY AND STATISTICS ROHANA BINTI ABDUL HAMID
One-Sample Tests of Hypothesis
Chapter 9 Hypothesis Testing: Single Population
Modern Systems Analysis and Design Third Edition
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
System Development Methods
Software Development Process
Software Development Overview
Chapter 11 Lecture 2 Section: 11.3.
Types of Statistical Studies and Producing Data
Presentation transcript:

From myths and fashions to evidence-based software engineering Magne Jørgensen

Most of the methods below have once been (some still are) fashionable... The Waterfall model, the sashimi model, agile development, rapid application development (RAD), unified process (UP), lean development, modified waterfall model, spiral model development, iterative and incremental development, evolutionary development (EVO), feature driven development (FDD), design to cost, 4 cycle of control (4CC) framework, design to tools, re-used based development, rapid prototyping, timebox development, joint application development (JAD), adaptive software development, dynamic systems development method (DSDM), extreme programming (XP), pragmatic programming, scrum, test driven development (TDD), model-driven development, agile unified process, behavior driven development, code and fix, design driven development, V- model-based development, solution delivery, cleanroom development,.....

The paper clip was invented by a Norwegian

Short men are more aggressive (The Napoleon complex)

Most (93%) of our communication is non-verbal

There were/is a software crisis (page 13 of their 1994-report): “We then called and mailed a number of confidential surveys to a random sample of top IT executives, asking them to share failure stories.”

45% of features of “traditional projects” are never used (source: The Standish Group, XP 2002) No-one seems to know (and the Standish Group does not tell) anything about this study! Why do so many believe (and use) this non-interpretable, non-validated claim? They benefit from it (agile community) + confirmation bias (we all know at least one instance that fit the claim)

14% Waterfall and 42% of Agile projects are successful (source: The Standish Group, The Chaos Manifesto 2012) Successful = “On cost, on schedule and with specified functionality” Can you spot a serious error of this comparison?

The number one in the stink parade …

The ease of creating myths: Are risk-willing or risk-averse developers better? Study design: Research evidence + Self-generated argument. Question: Based on your experience, do you think that risk-willing programmers are better than risk-averse programmers? 1 (totally agree) – 5 (No difference) - 10 (totally disagree) Neutral group: Average 5.0 Group A: Group B: Initially Average 3.3 Debriefing Average 2: weeks later Average 3: 3.5 Initially Average 5.4 Debriefing Average 2: weeks later Average 3: 4.9

“I see it when I believe it” vs “I believe it when I see it” 26 experienced software managers Different preferences on contract types: Fixed price or per hour –Clients tended to prefer fixed price, while providers were more in favor of per hour Presentation of a data set of 16 projects with information about contract type and project outcome (client benefits and cost-efficiency of the development work) Results: Chi-square of independence gives p=0.01

Regression-based models better Effect size = MMRE_analogy – MMRE_regression Analogy-based models better Bias among researchers …

Regression-based models better Effect size = MMRE_analogy – MMRE_regression Development of own analogy-based model (vested interests) Analogy-based models better

THE EFFECT OF LOW POWER, RESEARCHER BIAS AND PUBLICATION BIAS How many results are incorrect?

1000 statistical tests 500 true relationships 500 false relationships Statistical power is 30% -> 150 True positive (green) Significance level is 5% -> 25 False positive (red) Correct test results: ( )/1000 = 62.5% Correct positive tests: 150/(150+25) = 85.7% (prob. of null hyp. being true when p<0.05 is 14.4%, not 5%) Proportion exp. stat. sign results: (150+25)/1000 = 17.5%

We observe about 50% p<0.05 in published SE experiments We should expect 17.5% Maximum 30%, if we only test true relationships Researcher and publication bias

EFFECT OF ADDING 20% RESEARCHER BIAS AND 30% PUBLICATION BIAS

1000 statistical tests Removes 78 negative tests (30% publication bias) Removes 114 negative tests (30% publication bias) Statistical power is 30% -> 150 true positive (green) Significance level is 5% -> 25 false positive (red) Correct test results: 61% (just above half of the tests) Correct positive tests: 65% One third of the reported positive tests are incorrect! Researcher bias is 20% -> 70 more true positive tests (blue) Researcher bias is 20% -> 95 more false positive tests (blue) 42% positive tests

LOW PROPORTION OF CORRECT RESULTS! WE NEED TO IMPROVE STATISTICAL RESEARCH PRACTICES IN SOFTWARE ENGINEERING! IN PARTICULAR, WE NEED TO INCREASE STATISTICAL POWER (INCREASED SAMPLE SIZE)

FIXED VARIABLES? Have you heard about the assumption of

IIlustration: Salary discrimination? Assume an IT-company which: –Has 100 different tasks they want to complete and for each task hire one male and one female (200 workers) –The “base salary” of a task varies (randomly) from to USD and is the same for the male and the female employees. –The actual salary is the “base salary” added a random, gender independent, bonus. This is done through use of a “lucky wheel” with numbers (bonuses) between 0 and This should lead to (on average): Salary of female = Salary of male Let’s du a regression analysis with: “Salary of female = a + b*Salary of male” –b<1 means that women are discriminated The regression analysis gives b=0.56. Strong discrimination of women!? Let’s repeat the analysis on the same data with the model: “Salary of male = a* + b**Salary of female” The regression analysis gives b*=0.56. Strong discrimination of men????

Salary men Salary women

How would you interpret these data? (from a published study) CR duration = Actual duration (effort) to complete a change request Interpretation by the author of the paper: Larger tasks are more under-estimated.

What about these data? They are from the exact same data set! The only difference is in the use of the estimated instead of actual duration as the task size variable.

Economy of scale? Probably not... (M. Jørgensen and B. Kitchenham. Interpretation problems related to the use of regression models to decide on economy of scale in software development, Journal of Systems and Software, 85(11): , 2012.)

HOW TO MAKE SOFTWARE ENGINEERING MORE EVIDENCE- BASED?

Evidence-based software engineering (EBSE) The main steps of EBSE are as follows: 1.Convert a relevant problem or need for information into an answerable question. 2.Search the literature and practice-based experience for the best available evidence to answer the question. (+ create own local evidence, if needed) 3.Critically appraise the evidence for its validity, impact, and applicability. 4.Integrate the appraised evidence with practical experience and the client's values and circumstances to make decisions about practice. 5.Evaluate performance in comparison with previous performance and seek ways to improve it.

The software industry should learn to formulate questions meaningful for their context/challenge/problem The question “Is Agile better than Traditional methods?” is NOT answerable. What is agile? What is traditional? What is better? What is the context?

Learn to be more critical (myth busting) when claims are made 1.Find out what is meant by the claim. –Is it possible to falsify the claim? If not, what is the function of the claim? 2.Put yourself in a ”critical mode” –Raise the awareness of the tendency to accept claims, even without valid evidence, when you agree/it seems intuitively correct. –Reflect on what you would consider as valid evidence to support the claim. –Vested interests? –Do you agree because of the source? 3.Collect and evaluate evidence –Research-based, practice-base, and “own” evidence 4.Synthesize evidence and conclude (if possible)

Learn to question what statements and claims means

Learn how to evaluate argumentation Data Claim Backing Warrant Qualifier Reservation

Learn how to use google scholar (or similar sources of research-based evidence)

Learn how to collect and evaluate practice-based experience Methods similar to evaluation of research-based evidence and claims Be aware of “organizational over-learning”

Learn how to create local evidence Experimentation is simpler than you think –Pilot studies –Trial-sourcing –Controlled experiments

Is it realistic to achieve an evidence- based software engineering profession? Yes, but there are challenges. Main challenges: –Not much research. –High number of different contexts –Much research has a low reliability, sometimes hard to identify Opportunities: –More and better use of practice-based evidence –More experimenting in local contexts

Coffee dehydrates your body?