Swami NatarajanJuly 12, 2015 RIT Software Engineering Measurement Fundamentals.

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Conceptualization, Operationalization, and Measurement
Conceptualization and Measurement
Reliability and Validity
SE 450 Software Processes & Product Metrics 1 Introduction to Quality Engineering.
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 8 Copyright © 2015 by R. Halstead. All rights reserved.
VALIDITY.
Software Quality Engineering Roadmap
Quality Systems Frameworks
Concept of Measurement
Beginning the Research Design
Swami NatarajanJune 17, 2015 RIT Software Engineering Reliability Engineering.
Introduction to Quality Engineering
How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.
Measurement Fundamentals
Variables cont. Psych 231: Research Methods in Psychology.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Today Concepts underlying inferential statistics
Scaling and Attitude Measurement in Travel and Hospitality Research Research Methodologies CHAPTER 11.
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
1 Measurement Theory Ch 3 in Kan Steve Chenoweth, RHIT.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Steps in Using the and R Chart
Chapter 1: Introduction to Statistics
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Statistical Process Control Chapters A B C D E F G H.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Chapter 1: Research Methods
CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.
Chapter 1: Introduction to Statistics
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Measurement Theory Michael J. Watts
Reliability & Validity
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
Chapter 36 Quality Engineering (Part 2) EIN 3390 Manufacturing Processes Summer A, 2012.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.
Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Cmpe 589 Spring Measurement Theory Front-End –Design –Design Review and Inspection –Code –Code Inspections –Debug and Develop Test Cases  Integration.
 Measuring Anything That Exists  Concepts as File Folders  Three Classes of Things That can be Measured (Kaplan, 1964) ▪ Direct Observables--Color of.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
CSE SW Measurement and Quality Engineering Copyright © , Dennis J. Frailey, All Rights Reserved CSE8314M15 version 5.09Slide 1 SMU CSE.
Sampling Design & Measurement Scaling
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
PSY 432: Personality Chapter 1: What is Personality?
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Swami NatarajanOctober 1, 2016 RIT Software Engineering Software Metrics Overview.
Measurement Theory Michael J. Watts
Reliability and Validity in Research
Software Quality Engineering
Finding Answers through Data Collection
Measurement and Metrics Fundamentals
Software metrics.
Measurement Concepts and scale evaluation
Presentation transcript:

Swami NatarajanJuly 12, 2015 RIT Software Engineering Measurement Fundamentals

Swami NatarajanJuly 12, 2015 RIT Software Engineering Operational Definition Concept Definition Operational Definition Measurements Concept is what we want to measure e.g. cycletime We need a definition for this: “elapsed time to do the task” The operational definition spells out the procedural details of how exactly the measurement is done

Swami NatarajanJuly 12, 2015 RIT Software Engineering Operational Definition example At Motorola India, our operational definition of “development cycletime”: –The cycletime clock starts when effort is first put into project requirements activities (still vague!) –The cycletime clock ends on the date of release –If development is suspended due to activities beyond our (local organization’s) control, the cycletime clock will be stopped, and restarted again when development resumes Who decides? The project manager Separate “project cycletime” with no clock stoppage and beginning at first customer contact The operational definition addresses various issues related to gathering the data, so that data gathering is more consistent

Swami NatarajanJuly 12, 2015 RIT Software Engineering Measurement Scales Nominal scale: categorization –Different categories, not better or worse –E.g. Type of risk: business, technical, requirements… Ordinal scale: Categories with ordering –E.g. CMM maturity levels, defect severity –Sometimes averages quoted, but only marginally meaningful Interval scale: Numeric, but “relative” scale –E.g. GPAs. Differences more meaningful than ratios –“2” is not to be interpreted as twice as much as “1” Ratio scale: Numeric scale with “absolute” zero –Ratios are meaningful “Higher” scales carry more information

Swami NatarajanJuly 12, 2015 RIT Software Engineering Using basic measures (see text for good discussion) Ratios are useful to compare magnitudes Proportions (fractions, decimals, percentages) are useful when discussing parts of a whole –Think pie chart! When number of cases is small, percentages are often less meaningful – actual numbers may carry more information –Because percentages can shift so dramatically with single instances (high impact of randomness) When using rates, better if denominator is relevant to opportunity of occurrence of event –Requirements changes per month, or per project, or per page of requirements more meaningful than per staff member

Swami NatarajanJuly 12, 2015 RIT Software Engineering Reliability & Validity Reliability is whether measurements are consistent when performed repeatedly –E.g. Will maturity assessments produce the “same” outcomes when performed by different people? –E.g. If we measure repeatedly the reliability of a product, will we get consistent numbers? Validity is the extent to the measurement measures what we intend to measure –Construct validity: Match between operational definition and the objective –Content validity: Does it cover all aspects? (Do we need more measurements?) –Predictive validity: How well does the measurement serve to predict whether the objective will be met?

Swami NatarajanJuly 12, 2015 RIT Software Engineering Reliability vs. validity Rigorous definitions of how the number will be collected can improve reliability, but worsen validity –E.g. “When does the cycletime clock start?” If we allow too much flexibility in data gathering, the results may be more valid, but less reliable –Too much dependency on who is gathering the data Good measurement systems design often need balance between reliability & validity –A common error is to focus on what can be gathered reliably (“observable & measurable”), and lose out on validity –“We can’t measure this, so I will ignore it”, followed by “The numbers say this, hence it must be true” e.g. SAT scores

Swami NatarajanJuly 12, 2015 RIT Software Engineering Systematic & random error Gaps in reliability lead to “random error” –Variation between “true value” and “measured value” Gaps in validity may lead to systematic error –“Biases” that lead to consistent underestimation or overestimation –E.g. Cycletime clock stops on release date rather when customer completes acceptance testing From a mathematical perspective –We want to minimize the sum of the two error terms, for single measurements to be meaningful –Trend info is better if random error is less –When we use averages of multiple measurements (e.g. organizational data), systematic error is more worrisome

Swami NatarajanJuly 12, 2015 RIT Software Engineering Assessing reliability Can relatively easily check if measurements are highly subject to random variation –Split sample into halves and see if results match –Re-test and see if results match We can figure out how reliable our results are, and factor that into metrics interpretation Can also be used numerically to get better statistical pictures of the data –E.g. Text describes how the reliability measure can be used to correct for attenuation in correlation coefficients

Swami NatarajanJuly 12, 2015 RIT Software Engineering Correlation Checking for relationships between two variables –E.g. Does defect density increase with product size? –Plot one against the other and see if there is a pattern Statistical techniques to compute correlation coefficients –Most of the time, we only look for linear relationships –Text explains the possibility of non-linear relationships, and shows how the curves and data might look Common major error: Assuming correlation implies causality (A changes as B changes, hence A causes B) –E.g. Defect density increases as product size increases -> writing more code increases the chance of coding errors!

Swami NatarajanJuly 12, 2015 RIT Software Engineering Criteria for causality Cause precedes effect in time Observation indicates correlation Is it due to a spurious relationship? –Not so easy to figure out! (See diagrams in book) Maybe common cause for both e.g. problem complexity Maybe there is an intermediate variable: size -> number of dependencies -> defect rate –Why is this important? Because it affects Qmgmt approach –E.g. we may focus on dependency reduction Maybe both are indicators of something else –E.g. developer competence (less competent: more size, defects)

Swami NatarajanJuly 12, 2015 RIT Software Engineering Measuring Process Effectiveness A major concern in process theory (particularly in manufacturing) is “reducing process variation” –It is about “improving process effectiveness” so that the process consistently delivers non-defective results Process effectiveness is measured as “sigma level”

Swami NatarajanJuly 12, 2015 RIT Software Engineering The normal curve Sigma level is area under the curve between the limits i.e. % of situations that are “within tolerable limits”

Swami NatarajanJuly 12, 2015 RIT Software Engineering Six sigma Given “tolerance limits” i.e. the definition of what is defective, if we want +/- 6  to fit within the limits, the curve must become very narrow –We must “reduce process variation” so that the outcomes are highly consistent –Area within +/- 6  is % i.e. ~2 defects per billion –This assumes normal curve. But actual curve is often a “shifted” curve, for which it is a bit different The Motorola (& generally accepted) definition is 3.4 defects per million operations

Swami NatarajanJuly 12, 2015 RIT Software Engineering Why so stringent? Because manufacturing involves thousands of process steps, and output quality is dependent on getting every single one of them right –Need very high reliability at each step to get reasonable probability of end-to-end correctness –At 6 sigma, product defect rate is ~10% with ~1200 process steps –Concept came originally from chip manufacturing Software has sort of the same characteristics –To function correctly, each line has to be correct –A common translation is 3.4 defects per million lines of code

Swami NatarajanJuly 12, 2015 RIT Software Engineering Six sigma focus Six sigma is NOT actually about “achieving the numbers”, but about –A systematic quality management approach –Studying processes and identifying opportunities for defect elimination –Defect prevention approaches –Measuring output quality and improving it constantly

Swami NatarajanJuly 12, 2015 RIT Software Engineering Comments on process variation Note that “reducing” process variation is a “factory view” of engineering development –Need to be careful about applying it to engineering processes Most applicable for activities performed repeatedly e.g. writing code, running tests, creating releases etc. Less applicable for activities that are different every time e.g. innovation, learning a domain, architecting –Many “creative” activities do have a repetitive component –partly amenable to “systematic defect elimination” e.g. design Simple criterion: Are there defects that can be eliminated by systematic process improvement? –Reducing variation eliminates some kinds of defects –Defect elimination is two-outcome model, ignores excellence

Swami NatarajanJuly 12, 2015 RIT Software Engineering Measurements vs. Metrics Definition of “metric”: –A quantitative indicator of the degree to which a system, component, or process possesses a given attribute A measurement just provides information –E.g. “Number of defects found during inspection: 12” A metric is derived from one or more measurements, and provides an assessment of some property of interest –It must facilitate comparisons. Hence it must be meaningful across contexts i.e. it has some degree of context independence. –E.g. “Rate of finding defects during the inspection = 8 / hour” –E.g. “Defect density of the software inspected = 0.2 defects/KLOC” –E.g. “Inspection effort per defect found = 0.83 hours”.

Swami NatarajanJuly 12, 2015 RIT Software Engineering GQM approach for defining metrics A technique called Goal-Question-Metric (GQM) has been developed for defining metrics First, we define the goal of the metric i.e. what attribute we are trying to measure Then we identify the specific questions that we are interested i.e. exactly what we want to know about the attribute Based on these, we identify one or metrics that would provide the desired information

Swami NatarajanJuly 12, 2015 RIT Software Engineering GQM example Goal: Effectiveness of problem-based learning compared to lectures Question: –Do students find problem-based learning more interesting? –Does problem-based learning result in improved student performance? –Do students who used problem-based learning feel like they learned more? Metrics: –% of students who respond “agree” or “strongly agree” to “Class was interesting” in end-of-term surveys in each case –% of D/F grades in each case, % of C grades in each case –Average score on final exam in each case Reliability problems unless same final exam and same grading criteria –Average score on end-of-term surveys to the statement “I learned a lot from the course”, using a scale of StronglyAgree = 2, Agree = 1, NA/Neutral = 0, Disagree = -1, StronglyDisagree = -2”.

Swami NatarajanJuly 12, 2015 RIT Software Engineering Conclusion Measurement starts with an operational definition –We need to put some effort into choosing appropriate measures and scales, and understanding their limitations Measurements have both systematic and random error Measurements must have both reliability and validity –Often, hard to achieve both A common error is confusing correlation with causation –E.g. “There are more theft incidents reported in poor areas” does NOT necessarily indicate that poor people are more likely to steal! A major concern in process design is reducing process variation –Six sigma is actually more about eliminating and identifying defects, and identifying opportunities for process improvement –Defects are NOT the sole concern in process design! –Process optimization is oriented primarily towards repetitive activities