GQM and data analysis Example from a Norwegian company Tor Stålhane IDI / NTNU.

Slides:



Advertisements
Similar presentations
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Advertisements

Multiple-choice question
Writing up results This tutorial focuses on writing your results section. Click the next button in the bottom right hand corner to begin. Next QUIT.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Slide 1-1 Copyright © 2004 Pearson Education, Inc. Stats Starts Here Statistics gets a bad rap, and Statistics courses are not necessarily chosen as fun.
Essays for TDT4235 Tor Stålhane IDI / NTNU. Intro The essay counts for 30 of the 100 points used to grade the students of this course The essay must be.
Nonparametric tests and ANOVAs: What you need to know.
Stat 217 – Day 24 Analysis of Variance Have yesterday’s handout handy.
ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Statistics Are Fun! Analysis of Variance
Mean Comparison With More Than Two Groups
Dr. Michael R. Hyman, NMSU Analysis of Variance (ANOVA) (Click icon for audio)
Lecture 9: One Way ANOVA Between Subjects
ANOVA  Used to test difference of means between 3 or more groups. Assumptions: Independent samples Normal distribution Equal Variance.
Chapter 2 Simple Comparative Experiments
Anthony J Greene1 ANOVA: Analysis of Variance 1-way ANOVA.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Introduction to TDT 4235 Tor Stålhane IDI / NTNU.
Conceptual Understanding Complete the above table for an ANOVA having 3 levels of the independent variable and n = 20. Test for significant at.05.
Repeated ANOVA. Outline When to use a repeated ANOVA How variability is partitioned Interpretation of the F-ratio How to compute & interpret one-way ANOVA.
Hypothesis testing – mean differences between populations
Repeated Measures ANOVA
Analysis of Variance Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
How can you find a supported answer to an investigative question?
An example of the use of ISO 9126 Tor Stålhane IDI / NTNU.
One-Factor Analysis of Variance A method to compare two or more (normal) population means.
Using error reports in SPI Tor Stålhane IDI / NTNU.
One-Way Analysis of Variance … to compare 2 or population means.
Test vs. inspection Part 2 Tor Stålhane. Testing and inspection A short data analysis.
Copyright © 2004 Pearson Education, Inc.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests and One-Way ANOVA Business Statistics, A First.
Analysis of Variance ANOVA
Inferential Statistics
1 Review of ANOVA & Inferences About The Pearson Correlation Coefficient Heibatollah Baghi, and Mastee Badii.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Comparing Three or More Means ANOVA (One-Way Analysis of Variance)
EXPERIMENTAL DESIGN Science answers questions with experiments.
One-Way ANOVA ANOVA = Analysis of Variance This is a technique used to analyze the results of an experiment when you have more than two groups.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
GQM – an example from Fjellanger - Widerøe Hans J. Lied Tor Stålhane IDI / NTNU.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Slide 1-1 Copyright © 2004 Pearson Education, Inc. Stats Starts Here Statistics gets a bad rap, and Statistics courses are not necessarily chosen as fun.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 12-3 Two-Way ANOVA.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Psy 230 Jeopardy Related Samples t-test ANOVA shorthand ANOVA concepts Post hoc testsSurprise $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Statistics for Political Science Levin and Fox Chapter Seven
A Brief Introduction to JMP 10 Tim Bruce 16 October 2014.
Analysis of variance Tron Anders Moger
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Factorial Experiments
Two way ANOVA with replication
An Introduction to Two-Way ANOVA
i) Two way ANOVA without replication
Basic Practice of Statistics - 5th Edition
Two way ANOVA with replication
IE-432 Design Of Industrial Experiments
Ch. 14: Comparisons on More Than Two Conditions
Two Sample t-test vs. Paired t-test
What if. . . You were asked to determine if psychology and sociology majors have significantly different class attendance (i.e., the number of days.
Chapter 13 Group Differences
Presentation transcript:

GQM and data analysis Example from a Norwegian company Tor Stålhane IDI / NTNU

The Problem The company in question develops hardware and software. They have two software groups, each with circa 15 developers. Part of the system is developed in SDL. In order to focus their V&V work better they needed to know which SDL module characteristics that caused errors. Possible candidates were number of module states, number of input signals etc.

What to measure? In order to get a consistent and efficient data collection and analysis, we started with a GQM process. A half-day basic course in GQM. A half-day workshop where they identified – the questions that needed answers. – which metrics they needed in order to answer these questions.

The Metrics Some of the metrics collected for each SDL module: Number of errors found. Subjective - perceived - complexity. Number of pages of SDL description. Number of SDL states Number of signals in Number of signals out

Qc – block complexity When we defined the Qc question, we decided to use a Kiviat diagram to display the metrics included in this question. We show the following data: Non-modified metrics. Just to show that it is not a good idea Normalized metrics values Mean metrics values for each complexity class

Non-modified metrics The large value of M6b reduces everything else to “noise”

Normalized metrics values – 1 M5 PMECDM52 PMECOM56 CAMCON32 HPOTBL21 TAMCON25 TAMTST10 TTRMOT6 TTRTST3 M5 PMECDM0,93 PMECOM1,00 CAMCON0,57 HPOTBL0,38 TAMCON0,45 TAMTST0,18 TTRMOT0,11 TTRTST0,05

Normalized metrics values – 2 High complexity Low complexity Be ware different scales on the axis in the two diagrams

Mean metrics values – 1 Be ware different scales on the axis in the three diagrams

Mean metrics values – 2

What makes it complex - 1 We went through all the hypothesis put forward by the developers during the GQM session. We will look at three of them: Number of states - M5 Number of signals out - M9 Number of pages in the SDL description - M10

What makes it complex - 2 The data for the three metrics M5, M9 and M10 were sorted according to the complexity scores (High, Medium and Low). An ANOVA analyses was then performed for each data set. We decided to require a p-value better than 10%

ANOVA results - 1 Number of states – M5: Source of VariationSS df MS P-value Between Group 1190, ,02 0,25 Within Groups 1631, ,37 Number of states does not contribute significantly to the complexity as perceived by the developers.

ANOVA results - 2 Number of signals out – M9: Source of VariationSS df MS P-value Between Group 2779, ,521 0,098 Within Groups 1813, ,77 Number of signals out contribute significantly to the complexity as perceived by the developers

ANOVA results - 3 Number of pages in the SDL description – M10: Source of VariationSS df MS P-value Between Groups 5586, ,02 0,04 Within Groups 2133, ,77 Number of pages in the SDL description contribute significantly to the complexity as perceived by the developers

Summary - 1 SDL module complexity as perceived by the developers depends on two factors: Number of signals out Number of pages in the SDL description The other suspected factors identified during the GQM process did not give a significant contribution.

What about Errors We now have some ideas on what makes a module look complex to the developers. The next step is to see if there is any connection between module complexity and the number of errors in the modules. The ANOVA can give us an answer.

Complexity and Errors - 1 Errors and complexity Source of Variation SS dfMS P-value Between Groups 1646, ,42 0,06 Within Groups 770, ,13 It is reasonable to assume that complex modules have more errors.

Complexity and Errors - 2 If we look at the ANOVA summary table, we see that the differences are quite large: GroupsCount Sum AverageVariance Column Column ,33 41,33 Column Due to few observations for each complexity level, the variances are large. Thus, we should not be too categorical in our conclusions.

Conclusions With all the necessary caveats in mind the company decided as follows: In order to reduce the number of errors we need to single out modules with : Large descriptions - more than 35 pages of SDL description. Many signals out - more than 30. The limiting values are the average values from the ANOVA summary tables.