Magne Jørgensen Simula Research Laboratory

Magne Jørgensen Simula Research Laboratory
Evidence-based software engineering: A framework for collaboration between researchers and software professionals? 30 minutes + questions … Suggested title: Evidence-based software engineering: A framework for collaboration between researchers and software professionals Short abstract: An evidence-based software engineer is one who is able to: 1) Formulate a question, related to a decision or judgment, so that it can be answered by the use of evidence, 2) Collect, critically evaluate and summarise relevant evidence from research, practice and local studies, 3) Apply the evidence, integrated with knowledge about the local context, to guide decisions and judgments. The presentation addresses what it in practise means to be evidence-based in software engineering contexts, where the number of different contexts is high and the research-based evidence sparse, and why there is a need for more collaboration between researchers and software professionals to increase the use of evidence-based practises. We summarise our experience from ten years of Evidence-Based Software Engineering in the context of university courses, training of and collaboration with software engineers and systematic literature reviews of software engineering research. While there are challenges in training of and collaborating with software engineers in improving practices through evidence-based software engineering, our experience suggest that it is feasible and that it can make an important difference in terms of quality of software engineering judgment and decisions. Based on our experience we suggest changes in how researchers and software professionals may collaborate on evidence-based software engineering, and how to ease the transfer of research results into evidence-based practises. Magne Jørgensen Simula Research Laboratory University of Oslo Scienta

Who am I? Researcher (100%), professor (20%), consultant (10%)
Research on human judgment, empirical methods, software cost estimation, software project management Working with (and in) industry using: Action research Data analytics (project data analysis) Surveys (interviews, questionnaires) Controlled experiments Member of national board advising Norwegian public IT-projects Founded and actively promoting evidence-based software engineering to: Software professionals Students

Motivation: From myths and fashion to an evidence-based discipline
Motivation: From myths and fashion to an evidence-based discipline! Both the industry and the academia see and claim patterns where there are none and miss true patterns.

Confirmation bias in industry: “I see it when I believe it” vs “I believe it when I see it”
Experimental design: Data sets with randomly set performance data comparing “traditional” and “agile” methods. Survey of each prior developer’s belief in agile methods Question: How much do you, based on the data set, agree in: “Use of agile methods has caused a better performance when looking at the combination of productivity and user satisfaction.” Result: Previous belief in agile determined what they saw in the randomly generated data

Any patterns here. Randomness
Any patterns here? Randomness? (random = each position in the square were equally probable to generate) R-square examples: nr 3: 11% (small effect size), nr 5: 6%, nr 8: 23%. The last one is the only non-randomly generated, with much less pattern than random data will have. The first five was the first five from my process of generating random data – and NOT a selection of ”extreme” random data … How many would show a pattern if allowed to remove 1-2 ”outliers”?

Sometimes we don’t see the patterns …
Assume a sequence of coin throws and two people (A and B) who play against each others. The one who bets on the two-coin sequence that occurs first wins the game. Person A bets on: Head-Head Person B bets on: Tail-Head Do they have the same probability of winning? Examples: Tail-Tail-Head-Head-Head-Tail-...  Tail-Head occurs first and B wins Start by saying that any sequence (in Lotto etc.) is equally probable, whereas the representativeness bias makes us believe that is much less likely than other sequences. Then this task. Answer: It is three times more likely to observe Tail-Head before Head-Head! (If you don’t believe me, we can make a bet where I bet 20 Euro on Alt. 2 and you 10 Euro on Alt. 1. First to win ten times, keeps the 30 Euro.)

Many beliefs are based on very poor empirical basis and are easy to manipulate

The ease of creating beliefs:
Are risk-willing or risk-averse developers better? Group A: Group B: Manipulated Average 3.3 Manipulated Average 5.4 Debriefing Average 3.5 Debriefing Average 5.0 2 weeks later Average 3.5 2 weeks later Average 4.9 Study design: Research evidence + Self-generated argument. Informed that the evidence was misleadning Question: Based on your experience, do you think that risk-willing programmers are better than risk-averse programmers? 1 (totally agree) – 5 (No difference) - 10 (totally disagree) Neutral group: Average 5.0 Group A (B): Evidence and argument in favour of the risk-willing (risk-averse)

Effect sizes in studies on pair programming
Source: Hannay, Jo E., et al. "The effectiveness of pair programming: A meta-analysis." Information and Software Technology 51.7 (2009): n=50, gives ln(50)=3.9

How is this connected to industry collaboration?
You may at this stage start to wonder ... How is this connected to industry collaboration?

First: What is Evidence-based software engineering (EBSE)?

Evidence-based software engineering (EBSE)
Convert a relevant problem or need for information into an answerable question. Search the literature and practice-based experience for the best available evidence to answer the question. (+ create own local evidence, if needed) Critically appraise the evidence for its validity, impact, and applicability. Integrate the appraised evidence with practical experience and the client's values and circumstances to make decisions about practice. Evaluate performance in comparison with previous performance and seek ways to improve it.

How can this help collaboration with industry?
Warning: Unsurprisingly, it does not address all collaboration challenges

EBSE Step 1: Support in formulating the question
Example: A company wants to know whether agile methods leads to improvement or not! Our role: Tell them that they need a more precise (answerable) question. Help them with this by: Clarifying which agile practices are/will be implemented? What is the context? Improvement compared to what? Clarifying which aspect of project success that are important? Include question elements with focus on understanding the mechanisms that makes agile practices work well/not work well. My own experience: This step is key to succeed with industry collaboration. Avoids wasting time on non-answerable questions and collaborations that leads to nothing. A projects where I participated a few years ago spent man-years on studying whether RUP worked well or not, without really defining what RUP is and what ”works well” means.

One example (not perfect, but …)
Agile Frequent deliveries to production Flexible Scope Client benefits 16% 22% 29% Functionality Tech. quality 21% 6% 32% Budget control 2% Time control 8% 11% 24% Efficiency 5% Example of finding: Agile without frequent delivery to production and flexible scope had a negative effect on success measured as client benefit.

Also learn industry to question claims like this … 14% Waterfall and 42% of Agile projects are successful (source: The Standish Group, The Chaos Manifesto 2012) Example of representativeness fallacy: Seminar in Oslo (last week). The providers reported that on average 12% of the projects failed (cancelled or did not deliver much benefits), but when (immediately after) faced with the common finding that about 10-15% of project failes, 45% said that they thought this number would be higher, and 0% though it would be lower. Media focus on failure gives a quite biased picture! Successful = “On cost, on schedule and with specified functionality” Can you spot a serious error in this analysis?

EBSE Steps 2 and 3: Identify, generate and evaluate evidence
Three main sources of evidence: Research Practice-based experience Local experiments Our role: When having formulated an answerable question in collaboration with ”industry” (a company, companies), help them with collecting all three types of evidence: Primary and secondary studies (systematic literature reviews) Collection of practice-based experience by interviewing software professionals Design of local experiments (from piloting to controlled experiments) What we need to offer: Competence in critical collection, evaluation and summary of evidence. Good study designs. What we get: Lots of good research data and, hopefully, interesting results.

Many companies (especially those agile and lean) are willing to experiment
Example experience with different models for experimentation (local evidence): Piloting new technology on a project while researchers observe how it goes, summarizes and compares with default technology. Easy to start, but complex to analyze data. Typically only allowed on small, not critical projects. Including a new tool/process in several existing projects (most recently we introduced “benefit management” in existing processes) and observing/measuring what happens. Requires some skill in convincing that it’s worth it (people’s time is expensive). Use of research grants to pay for the extra effort works well Randomized controlled trials (randomized treatment) Costly and may require funding for participation. Why are so many skeptic about participation payment? It’s common practice elsewhere.

EBSE steps 4: Integrate evidence
Many possible presentation formats: Evidence briefings (see sites.google.com/site/eseportal/evidence-briefings) Guidelines, checklists, principles, …. Presentations Reports with concrete recommendations Our role: Critial evaluation and summary of evidence (relevance and validity) [and possible teaching them how to critically evaluate evidence.] What we get: Evidence summaries for selected contexts.

Example: Presentation of integrated evidence as principles
Examples of an evidence-based principle: 7.1 Keep forecasting methods simple. Description: Complex methods may include errors that propagate through the system or mistakes that are difficult to detect. Select simple methods initially (Principle 6.6). Then use Occam’s Razor; that is, use simple procedures unless you can clearly demonstrate that you must add complexity. Purpose: To improve the accuracy and use of forecasts. Conditions: Simple methods are important when many people participate in the forecasting process and when the users want to know how the forecasts are made. They are also important when uncertainty is high and few data are available. Strength of evidence: Strong empirical evidence. Many analysts find this principle to be counterintuitive. Source of evidence: This principle is based on evidence reviewed by Allen and Fildes (2001), Armstrong (1985), Duncan, Gorr and Szczypula (2001), and Wittink and Bergestuen (2001).

EBSE Step 5: Evaluate outcome
The industry is poor at evaluating the effect of a process/tool change, but has a strong wish to know this. Sometimes, however, they don’t really want to know if the outcome is ”bad” … Typical situation: They have much data, but no-one with competence to analyse them. Our role: Support with expertise on measurement, study design and analysis. What’s in it for us: Lots of data to analyse, but be critical to data quality. Personally, I experience in at least 50% of the cases that the quality is rubbish.

Such analyses may be complex ....
A company measured an increase in productivity of an IT-department (function points/man-month). Everybody were happy, especially since this “proved” that their newly implemented incremental processes were successful. To my surprise, when I grouped the project into those using PowerBuilder, those using Cobol (and a third group) I found a productivity decrease in both groups. Don’t show? Were my analysis incorrect?

Arithmetic ”explanation”: a/b + c/d ≠ (a+c)/(b+d)
Period 1 Powerbuilder Cobol Total FP 500 2000 2500 Effort 4500 5000 Productivity 1.0 0.44 0.50 Period 2 Powerbuilder Cobol Total FP 2000 1000 3000 Effort 1800 4800 Productivity 0.9 0.33 0.63 Change in prod. -0.1 -0.11 0.13 Arithmetic ”explanation”: a/b + c/d ≠ (a+c)/(b+d) Also called ”Simpson’s paradox” and ”missing variables” (the proportion of work done in different groups should have ben included in the analysis)

To summarize … The software industry/companies have challenges/questions, but frequently lack the competence to properly address them (unless being taught EBSE, of course ;-)) We have much of that competence. Or at least we should have it. EBSE may be used as a top level framework/checklist for us, as researchers, to collaboration with them about these challenges. This has the potential to give us valid and relevant (= convincing) empirical research. Many issues, some of the really core for successful collaboration, not addressed in EBSE All collaborations need to cover all five steps and all steps need collaborations.

Magne Jørgensen Simula Research Laboratory

Similar presentations

Presentation on theme: "Magne Jørgensen Simula Research Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Magne Jørgensen Simula Research Laboratory

Similar presentations

Presentation on theme: "Magne Jørgensen Simula Research Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback