From myths and fashions to evidence-based software engineering Magne Jørgensen.

From myths and fashions to evidence-based software engineering Magne Jørgensen

Most of the methods below have once been (some still are) fashionable... The Waterfall model, the sashimi model, agile development, rapid application development (RAD), unified process (UP), lean development, modified waterfall model, spiral model development, iterative and incremental development, evolutionary development (EVO), feature driven development (FDD), design to cost, 4 cycle of control (4CC) framework, design to tools, re-used based development, rapid prototyping, timebox development, joint application development (JAD), adaptive software development, dynamic systems development method (DSDM), extreme programming (XP), pragmatic programming, scrum, test driven development (TDD), model-driven development, agile unified process, behavior driven development, code and fix, design driven development, V- model-based development, solution delivery, cleanroom development,.....

The paper clip was invented by a Norwegian

Short men are more aggressive (The Napoleon complex)

Most (93%) of our communication is non-verbal

There were/is a software crisis (page 13 of their 1994-report): “We then called and mailed a number of confidential surveys to a random sample of top IT executives, asking them to share failure stories.”

45% of features of “traditional projects” are never used (source: The Standish Group, XP 2002) No-one seems to know (and the Standish Group does not tell) anything about this study! Why do so many believe (and use) this non-interpretable, non-validated claim? They benefit from it (agile community) + confirmation bias (we all know at least one instance that fit the claim)

14% Waterfall and 42% of Agile projects are successful (source: The Standish Group, The Chaos Manifesto 2012) Successful = “On cost, on schedule and with specified functionality” Can you spot a serious error of this comparison?

The number one in the stink parade …

The ease of creating myths: Are risk-willing or risk-averse developers better? Study design: Research evidence + Self-generated argument. Question: Based on your experience, do you think that risk-willing programmers are better than risk-averse programmers? 1 (totally agree) – 5 (No difference) - 10 (totally disagree) Neutral group: Average 5.0 Group A: Group B: Initially Average 3.3 Debriefing Average 2: 3.5 2 weeks later Average 3: 3.5 Initially Average 5.4 Debriefing Average 2: 5.0 2 weeks later Average 3: 4.9

“I see it when I believe it” vs “I believe it when I see it” 26 experienced software managers Different preferences on contract types: Fixed price or per hour –Clients tended to prefer fixed price, while providers were more in favor of per hour Presentation of a data set of 16 projects with information about contract type and project outcome (client benefits and cost-efficiency of the development work) Results: Chi-square of independence gives p=0.01

Regression-based models better Effect size = MMRE_analogy – MMRE_regression Analogy-based models better Bias among researchers …

Regression-based models better Effect size = MMRE_analogy – MMRE_regression Development of own analogy-based model (vested interests) Analogy-based models better

THE EFFECT OF LOW POWER, RESEARCHER BIAS AND PUBLICATION BIAS How many results are incorrect?

1000 statistical tests 500 true relationships 500 false relationships Statistical power is 30% -> 150 True positive (green) Significance level is 5% -> 25 False positive (red) Correct test results: (150 + 475)/1000 = 62.5% Correct positive tests: 150/(150+25) = 85.7% (prob. of null hyp. being true when p<0.05 is 14.4%, not 5%) Proportion exp. stat. sign results: (150+25)/1000 = 17.5%

We observe about 50% p<0.05 in published SE experiments We should expect 17.5% Maximum 30%, if we only test true relationships Researcher and publication bias

EFFECT OF ADDING 20% RESEARCHER BIAS AND 30% PUBLICATION BIAS

1000 statistical tests Removes 78 negative tests (30% publication bias) Removes 114 negative tests (30% publication bias) Statistical power is 30% -> 150 true positive (green) Significance level is 5% -> 25 false positive (red) Correct test results: 61% (just above half of the tests) Correct positive tests: 65% One third of the reported positive tests are incorrect! Researcher bias is 20% -> 70 more true positive tests (blue) Researcher bias is 20% -> 95 more false positive tests (blue) 42% positive tests

LOW PROPORTION OF CORRECT RESULTS! WE NEED TO IMPROVE STATISTICAL RESEARCH PRACTICES IN SOFTWARE ENGINEERING! IN PARTICULAR, WE NEED TO INCREASE STATISTICAL POWER (INCREASED SAMPLE SIZE)

FIXED VARIABLES? Have you heard about the assumption of

IIlustration: Salary discrimination? Assume an IT-company which: –Has 100 different tasks they want to complete and for each task hire one male and one female (200 workers) –The “base salary” of a task varies (randomly) from 50.000 to 60.000 USD and is the same for the male and the female employees. –The actual salary is the “base salary” added a random, gender independent, bonus. This is done through use of a “lucky wheel” with numbers (bonuses) between 0 and 10.000. This should lead to (on average): Salary of female = Salary of male Let’s du a regression analysis with: “Salary of female = a + b*Salary of male” –b<1 means that women are discriminated The regression analysis gives b=0.56. Strong discrimination of women!? Let’s repeat the analysis on the same data with the model: “Salary of male = a* + b**Salary of female” The regression analysis gives b*=0.56. Strong discrimination of men????

Salary men Salary women

How would you interpret these data? (from a published study) CR duration = Actual duration (effort) to complete a change request Interpretation by the author of the paper: Larger tasks are more under-estimated.

What about these data? They are from the exact same data set! The only difference is in the use of the estimated instead of actual duration as the task size variable.

Economy of scale? Probably not... (M. Jørgensen and B. Kitchenham. Interpretation problems related to the use of regression models to decide on economy of scale in software development, Journal of Systems and Software, 85(11):2494-2503, 2012.)

HOW TO MAKE SOFTWARE ENGINEERING MORE EVIDENCE- BASED?

Evidence-based software engineering (EBSE) The main steps of EBSE are as follows: 1.Convert a relevant problem or need for information into an answerable question. 2.Search the literature and practice-based experience for the best available evidence to answer the question. (+ create own local evidence, if needed) 3.Critically appraise the evidence for its validity, impact, and applicability. 4.Integrate the appraised evidence with practical experience and the client's values and circumstances to make decisions about practice. 5.Evaluate performance in comparison with previous performance and seek ways to improve it.

The software industry should learn to formulate questions meaningful for their context/challenge/problem The question “Is Agile better than Traditional methods?” is NOT answerable. What is agile? What is traditional? What is better? What is the context?

Learn to be more critical (myth busting) when claims are made 1.Find out what is meant by the claim. –Is it possible to falsify the claim? If not, what is the function of the claim? 2.Put yourself in a ”critical mode” –Raise the awareness of the tendency to accept claims, even without valid evidence, when you agree/it seems intuitively correct. –Reflect on what you would consider as valid evidence to support the claim. –Vested interests? –Do you agree because of the source? 3.Collect and evaluate evidence –Research-based, practice-base, and “own” evidence 4.Synthesize evidence and conclude (if possible)

Learn to question what statements and claims means

Learn how to evaluate argumentation Data Claim Backing Warrant Qualifier Reservation

Learn how to use google scholar (or similar sources of research-based evidence)

Learn how to collect and evaluate practice-based experience Methods similar to evaluation of research-based evidence and claims Be aware of “organizational over-learning”

Learn how to create local evidence Experimentation is simpler than you think –Pilot studies –Trial-sourcing –Controlled experiments

Is it realistic to achieve an evidence- based software engineering profession? Yes, but there are challenges. Main challenges: –Not much research. –High number of different contexts –Much research has a low reliability, sometimes hard to identify Opportunities: –More and better use of practice-based evidence –More experimenting in local contexts

Coffee dehydrates your body?

From myths and fashions to evidence-based software engineering Magne Jørgensen.

Similar presentations

Presentation on theme: "From myths and fashions to evidence-based software engineering Magne Jørgensen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From myths and fashions to evidence-based software engineering Magne Jørgensen.

Similar presentations

Presentation on theme: "From myths and fashions to evidence-based software engineering Magne Jørgensen."— Presentation transcript:

Similar presentations

About project

Feedback