Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 IDI, NTNU 78038 Programvarekvalitet og prosessforbedring vår 2000, 23.3.2000 Forrest Shull et al., Univ. Maryland and Reidar Conradi, NTNU (p.t. Univ.

Similar presentations


Presentation on theme: "1 IDI, NTNU 78038 Programvarekvalitet og prosessforbedring vår 2000, 23.3.2000 Forrest Shull et al., Univ. Maryland and Reidar Conradi, NTNU (p.t. Univ."— Presentation transcript:

1 1 IDI, NTNU 78038 Programvarekvalitet og prosessforbedring vår 2000, 23.3.2000 Forrest Shull et al., Univ. Maryland and Reidar Conradi, NTNU (p.t. Univ. Maryland): Reflections the OORT experiment (adapted from the CS-735 course at UMD, autumn 1999): Classifying the experiment » What happened? Goals, Questions, and Metrics » Why did it happen? Threats to validity » What should have happened?

2 2 Classifying (1) Experience of subjects –Ranged from extreme novices to some development experience –60% ?? had some industrial development experience –60% ?? had done reviews in industry –30% ?? had experience using requirements in industry Experimental setting –In vitro: in the “laboratory” under controlled conditions –(in vivo would mean: was introduced in a real work environment) –Classroom setting imposes its own constraints: e.g. no control group

3 3 Classifying (2) Types of analysis –Qualitative analysis (mostly): naturalistic observation, discovery oriented Not a lot of existing hypotheses; want to be able to propose well-founded ones –Also, some quantitative analysis (e.g. # of defects detected) Level of variable relationship –Descriptive. Will probably end up with e.g. “Amount of expertise has something to do with how effectively the technique was applied.” –Would ideally like correlational, e.g. “As expertise increases past a certain level, effectiveness improves.”

4 4 Classifying (3) 435 (feasibility study): One project, multiple teams -- check feasibility. 735: Multiple project, multiple teams -- check important variables. 78038 (NTNU): same as 735(UMD). 435735

5 5 Classifying (4) 435 (feasibility study): No predef. variables; just observe process. 735: A few (loosely) predef. variables, based on earlier observation. 78038 (NTNU): Same as 735 (UMD). 435735

6 6 GQM: Goals for OORTs (1) Reading Techniques –Just beginning to get experience with OORTs. How many of the lessons from other inspection techniques could be applied? What are we doing right with this approach? Or wrong?

7 7 GQM: Goals for Observational Studies (2) Observational studies –Don’t suffer drawbacks of retrospective studies, e.g: difficulties recreating processes restructuring –Used by von Mayrhauser & Vans to understand program comprehension –Used by Singer & Lethbridge to understand maintenance processes –Used by Shneiderman et al. to improve user interfaces BUT… None of them were debugging procedures.

8 8 GQM: Experiment Goals (3) Research: To analyze... –observational study methods for the purpose of understanding and improving them wrt. their feasibility and effectiveness from the viewpoint of the researcher. –OORTs for the purpose of understanding and improving them wrt. their feasibility from the viewpoint of the researcher. –OORTs for the purpose of understanding wrt. their relationship with user experience from the viewpoint of the researcher. Pedagogical: –To introduce formal software experimentation techniques. –To demonstrate the difficulties of thinking about software processes, and give experience with a strategy that may be applied (observational techniques).

9 9 GQM: Experiment Design (4) Domains Used: Parking Garage (PG) Loan Arranger (LA) -- known? -- unknown?

10 10 GQM: Questions, part 1 (5) Is there a real distinction between horizontal and vertical reading in OORTs? –Measure whether the types of defects found by each team are correlated with the kinds of OORTs used (10?? teams). Are OORTs applied more or less effectively by readers who have domain experience? –Compare qualitative (e.g. reported problems with the OORTs) and quantitative (e.g. numbers of defects found) results for teams with different levels of assumed experience (5?? teams each). –Compare types of defects found (related to the problem domain, or not?) for teams with different levels of experience (5?? teams each).

11 11 GQM: Questions, part 2 (6) What can we do to improve OORTs? –Look for patterns in answers to observational questions. –For example, Does the Executor think a particular step is worthwhile? Can the Executor suggest other techniques or methods that could be a better way to achieve the same goal as this step?

12 12 Threats to Validity (1) Internal Validity: potential problems in the interpretation of the data from the experiment. –Not all questions can be addressed by this experiment because certain variables are “merged” or hidden between groups, I.e. not all combinations of values occur in the experiment. –Small number of subjects (10?? teams) leaves two options: Make larger groups, but with “merged” variation (e.g. doing all OORTs). Use many combinations of values, but have smaller groups (e.g. using two problem domains and two OORT combinations) -- as in our case.

13 13 OORTs: Vertical: #5 #6 #7 Horizontal: #1 #4 #2 #3 “Static” “Dynamic” Threats to Validity: Conclusions about Individual Techniques (2)

14 14 Threats to Validity (3) Internal Validity: potential problems in the collection or interpretation of the data from the experiment. –History: Results in later experiments may be influenced by events before these experiments. –Maturation: Processes occurring within subjects (e.g. learning!) may change results over time. –Learning the techniques: Results may vary over time, as subjects get more comfortable with the actual procedures. –Instrumentation: Results may vary with different measurements. –Selection: Results may vary because of the type of subjects in different groups. –Process Conformance: May be weak or erratic.

15 15 Threats to Validity External Validity: generalizability of results. –Are results valid for use outside this class (e.g. for subjects with more or less experience, or in industry)? –Are results valid for other design documents (problem domains)?

16 16 Final reminder: Please fill out the final questionnaire (15-20 minutes)?


Download ppt "1 IDI, NTNU 78038 Programvarekvalitet og prosessforbedring vår 2000, 23.3.2000 Forrest Shull et al., Univ. Maryland and Reidar Conradi, NTNU (p.t. Univ."

Similar presentations


Ads by Google