Slide 1 UFRJ COPPE Department of Computer Science Experimental Software Engineering Group Fraunhofer Center - Maryland Some experiences at NTNU and Ericsson.

UFRJ COPPE Department of Computer Science Experimental Software Engineering Group Fraunhofer Center - Maryland Some experiences at NTNU and Ericsson with Object-Oriented Reading Techniques (OORTs) for Design Documents INTER-PROFIT/CeBASE seminar on Empirical Software Engineering, Simula Research Lab, Oslo, 22-23 Aug. 2002 Reidar Conradi, NTNU Forrest Shull, Victor R. Basili, FC-Maryland (FC-MD) Guilherme H. Travassos, Jeff Carver, Univ.Maryland (UMD) {travassos, basili, carver}@cs.umd.edu; fshull@fc-md.umd.edu http://www.cs.umd.edu/projects/SoftEng/ESEG/ conradi@idi.ntnu.no, http://www.idi.ntnu.no/grupper/su/

Table of contents Motivation and context p. 3 Reading and Inspections p. 4 OO Reading and UML Documents, OORTs p. 8 OO Reading and Defect Types p. 13 OO Reading-Related Concepts p. 18 Ex. Gas Station Control System p. 20 OORT experimentation p. 27 E0: Post-mortem of SDL-inspections at Ericsson-Oslo, 1997-99 p. 30 E1: Student OORT experiment at NTNU, Spring 2000 p. 32 E2: Feasibility OORT study at NTNU, Autumn 2001 p. 40 E3: Student OORT experiment at NTNU, Spring 2002 p. 42 E4: Industrial OORT experiment at Ericsson-Grimstad, May 2002 p. 47 Conclusion p. 50 Appendix: OORT-1: Sequence Diagram x Class Diagram p. 51

Motivation and context Need to have better techniques for OO reading – UML/Java. But OO software is not a set of simple, ”linear” documents. Norwegian OORT work started at Conradi’s sabbatical at Univ. Maryland in 1999/2000: Adapting OORT experimental material from CS735 course at UMD, Fall 1999. All artifacts and instructions in English. E0. Data mining of SDL inspections at Ericsson-Oslo, 1997-99. Internal database, but marginal data analysis. As 3 MSc theses. E1. 1st OORT exper. at NTNU, 4th year QA course, March 2000, 19 stud. Leaner reading instructions as Qx.ij, no other changes. Pass/no-pass for stud. E2. OORT feasibility study at NTNU: Two MSc NTNU-students, Lars Christian Hegde and Tayyaba Arif, repeating E1 in Autumn 2001. E3. 2nd OORT exper. at NTNU, 4th year sw.arch. course, March 2002, 42 stud. Adjusted reading instructions (for E3), removed trivial defects in UML artifacts. E4. Industrial OORT exper. at Ericsson-Grimstad, 10 developers, 50/50 on old/new reading techniques, adjusted E3 techniques. Meagre internal baseline data. Part of PROFIT project - developers were paid 500 NOK/h, 4 MSc students (NTNU, HiA) and one PhD student at NTNU/Ericsson (Mohagheghi).

Reading and Inspections Reading (reviewing): Systematic reading of most software documents / artifacts (requirements, design, code, test data etc.) can increase: –Reliability: since other persons are much better in finding defects (“errors”) created by you. Often psychological block on this. –Productivity: since defect fixing is much cheaper in earlier life-cycle phases, e.g. 30$ to correct a line in rqmts, $4000 to fix a code line. And 2/3 of all defects can be found before testing, at 1/3 of the price. –Understanding: e.g. for maintenance or error fixing. –Knowledge transfer: novices should read the code of experts, and inversely. –Maintainability: by suggesting a “better” solution/architecture, e.g. to increase reuse. –General design quality/architecture: combining some of the above. We should not only write software (write-once & never-read?), but also read it (own and other’s). But need guidelines (e.g. OORTs) to learn to read efficiently, not ad-hoc. Why read software?

Reading and Inspections (2) Logistical problems: takes effort, how to schedule and attend meetings? Proper skills are unavailable: needs domain and/or technological insight. Sensitive criticism: may offend the author, so counter-productive? Rewards or incentives: may be needed for both readers and writers. Boring: so need incentives, as above? General lack of motivation or insights. Why NOT read software?

Reading and Inspections (3) “Fagan” inspections for defect discovery of any software artifact: –I1. Preparation: what is to be done, plan the work etc. (*). –I2. Individual reading: w/ complementary perspectives to maximize effect (*). –I3. Common inspection meeting: assessing the reported defects. –I4. Follow-up: correct and re-check quality, revise guidelines? Will look at steps I1-I2 here (*), with OORT1-7 guidelines for perspectives. In industry: typically 10% of effort on inspections, with net saving of 10-25%. So quality is “free”! Recently: much emphasis on OO software development, e.g. using Rational Rose tool to create UML design diagrams. But few, tailored reading techniques for such documents. Over 200,000 UML licenses, so big potential. Example: Ericsson in Norway: previously using SDL, now UML and Java. Had an old inspection process, but need new reading techniques. Classic inspections

Reading and Inspections (4) Reading Analysis Defect Detection Usability Design RequirementsCodeUser Interface SCREnglishScreen Shot Defect-based Perspective-basedUsability-based Inconsistent Incorrect Omission Ambiguity TesterUserDeveloper NoviceErrorExpert Technology Family General Goal Specific Goal Document (artifact) Notation Form Technique PROBLEM SPACE (needs) SOLUTION SPACE (techniques, perspectives) UML Diagrams Traceability Horizontal Vertical Different needs and techniques of reading, e.g.: Construction

OO Reading and UML Documents Dynamic ViewDynamic View –Use cases (analysis) * –Activities –Interaction  sequences *  collaborations –State machines * Static ViewStatic View –Classes *  Relationships  Generalization - IsA  Composition - PartsOf  Association - HasA  Dependency - DependsOn  Realization  Extensibility  Constraints  Stereotypes –Descriptions (generated from UML) * –Packages –Deployment UML Artifacts/Diagrams, five used later (marked with *): Unified Modeling Language, UML: just a notational approach, does not propose/define how to organize the design tasks (process). Can be tailored to fit different development situations and software life- cycles (processes)

OO Reading and UML Documents (2) Requirements descriptions, RD: here structured text with numbered items, e.g. for a Gas Station. In other contexts: possibly with extra ER- and flow-diagrams. (Requirement) Analysis documents, UC: here use case diagrams in UML, with associated pseudo-code or comments (“textual” use case). A use case describes important concepts of the system and the functionalities it provides. Design documents, also in UML: –Class diagrams, CD: describe the classes and their attributes, behaviors (functions = message definitions) and relationships. –Sequence diagrams (special interaction diagrams), SqD: describe how the system objects are exchanging messages. –State diagrams, StD: describe the states of the main system objects, and how state transitions can take place. Class descriptions, CDe: separate textual documentation of the classes, partly as UML-generated interfaces in some programming language. Our six relevant software artifacts:

OO Reading and UML Documents (3) The six software artifacts: requirement description, use cases and four design documents in UML:

OO Reading and UML Documents (4) Software Artifacts, with OO Reading Techniques (OORTs) indicated: Requirements Descriptions Use-Cases Requirements Specification/Analysis High Level Design Class Diagrams Class Descriptions State Diagrams Sequence Diagrams OORT-1 Vertical reading Horizontal reading OORT-6 OORT-7 OORT-5 OORT-3 OORT-2 OORT-4

OO Reading and UML Documents (5) OORT-1: Sequence Diagram x Class Diagram (horizontal, static) OORT-2: State Diagram x Class Description (horizontal, dynamic) OORT-3: State Diagram x Sequence Diagram (horizontal, dynamic) OORT-4: Class Diagram x Class Description (horizontal, static) OORT-5: Class Description x Requirement Description (vertical, static) OORT-6: Sequence Diagram x Use Case Diagram (vertical, dyn./stat.) OORT-7: State Diagram x Rqmt Descr. / Use Case Diagr. (vertical, dynamic) Abbreviations: Requirement Description (RD), Use Case Diagram (UC), Class Diagram (CD), Class Description (CDe) to supplement CD, State Diagram (StD), Sequence Diagram (SqD). The seven OO Reading Techniques (OORTs)

Reading Techniques and defect types: Domain Knowledge Software (Design) Artifacts Other Domain General Requirements ambiguity extraneous incorrect fact omission inconsistency Software reading techniques try to increase the effectiveness of inspections by providing procedural guidelines that can be used by individual reviewers to examine (or “read”) a given software artifact (design doc.) and identify defects. As mentioned, empirical evidence that tailored software reading increases the effectiveness of inspections for many software artifacts, not just source code. OO Reading and Defect Types

OO Reading and Defect Types (2)

OO Reading and Defect Types (3) Omission (conceptually using vertical information) – “too little”: Ex. Forgot to consider no-coverage on credit cards, forgot a state transition. Extraneous or irrelevant information (most often vertical) – “too much”: Ex. Has included both gasoline and diesel sales. Incorrect Fact (most often vertical) – “wrong”: Ex. The maximum purchase limit is $1000, not $100. Inconsistency (most often horizontal) – “wrong”: Ex. Class name spelled differently in two diagrams, forgot to declare a class function/attribute etc. Ambiguity (most often horizontal) – “unclear”: Ex. Unclear state transition, e.g. how a gas pump returns to “vacant”. Miscellaneous: other kind of defects or comments. Examples of defect types: May also have defect severity: minor (comments), major, supermajor. Also IEEE STD on code defects: interface, initialization, sequencing, …

OO Reading and Defect Types (4) Horizontal Reading, for internal consistency of a design: Ensure that all design artifacts represent the same system. Design contains complementary views of the information: –Static (class diagrams) –Dynamic (interaction diagrams) Not obvious how to compare these different perspectives. Vertical Reading, for traceability between reqmts/analysis and design: Ensure that the design artifacts represent the same system as described by the requirements and use-cases. Comparing documents from different lifecycle phases: –Level of abstraction and detail are different Horizontal vs. Vertical Reading:

OO Reading and Defect Types (5) Reader 1 Reader 2 Reader 3 looking for consistency horizontal reading looking for consistency horizontal reading looking for traceability vertical reading Meet as a team to discuss a comprehensive defect list. Each reader is an “expert” in a different perspective Final list of all defects sent to designer for repairing The design inspection process with OO reading techniques:

OO Reading-Related Concepts Levels of functionality in a design (used later in the OORTs): –Functionality: high-level behavior of the system, usually from the user’s point of view. Often a use case. Ex. In a text editor: text formatting. Ex. At a gas station: fill-up-gasoline and pay. –Service: medium-level action performed internally by the system; an “atomic unit” out of which system functionalities are composed. Often a part of a use-case, e.g. a step in the pseudo-code. Ex. In a text editor: select text, use pull-down menus, change font selection. Ex. At a gas station: Transfer $$ from account N1 to N2, if there is coverage. –Message (function): lowest-level behavior unit, out of which services and then functionalities are composed. Represents basic communication between cooperating objects to implement system behavior. Messages may be shown on sequence diagrams and must be defined in their respective classes. Ex. In a text editor: Write out on a character. Ex. At a gas station: Add $$ to customer bill: add_to_bill(customer, $$, date).

OO Reading-Related Concepts (2) Constraints/Conditions in requirements: –Condition (i.e. local pre-condition): what must be true, before a functionality/service etc. can be executed. Example from GSCS’s 7. Payment: (see p.20-22) … If (payment time) is now, payment type must be by credit card or cash... … If (payment time) is monthly, payment type must be by billing account... –Constraint (more global): must be always be true for some system functionality etc. Example from GSCS’s 9.2 Credit card problem: … The customer can only wait for 30 seconds for authorization from the Credit Card System. … –Constraints can, of course, be used in conditions to express exceptions. –Both constraints and conditions can be expressed as notes in UML class / state / sequence diagrams.

Ex. Gas Station Control System 1. Gas station: Sells gasoline from gas pumps, rents parking spots, has a cashier and a GSCS. 2. Gas pump: Gasoline is sold in self-service gas pumps. The pump has a computer display and keyboard connected to the GSCS, and similarly for a credit card reader. If the pump is vacant, the customer may dispense gasoline. He is assisted in this by the GSCS, who supervises payment (points 7-9), and finally resets the pump to vacant. Gasoline for up to $1000 can be dispensed at a time. 3. Parking spot: Regular customers may rent parking spots at the gas station. The cashier queries the GSCS for the next available parking spot, and passes this information back to the customer. See points 7-9 for payment. 4. Cashier: An employee of the gas station, representing the gas station owner. One cashier is on-duty at all time. The cashier has a PC and a credit card reader, both communicating with the GSCS. He can rent out parking spots, and receive payment from points 2 & 3 above, while returning a receipt. Ex. Simplified requirement specification for a Gas Station Control System (GSCS), mainly for payment:

Ex. Gas Station Control System (2) 5. Customer May fill up gasoline at a vacant gas pump, rent a parking spot at a cashier, and pay at the gas pump (for gasoline) or at the cashier. Regular customers are employed in a local business, which is cleared for monthly billing. 6. GSCS – Keeps inventory of parking spots and gasoline, a register of regular customers and their businesses and accounts, plus a log of purchases. – Has a user interface at all gas pumps and at the cashier’s PC, and is connected to an external Credit Card System and to local businesses (via Internet). – Computes the price for gasoline fill-ups, informs the cashier about this, and can reset the gas pump to vacant. – Will assist in making payments (points 7-9). 7. Payment in general Payment time and type is selected by the customer. Payment time is either now or monthly: – If it is now, payment type must be by credit card or cash (incl. personal check). – If it is monthly, payment type must be by billing account to local business. There are two kind of purchase items: gasoline fill-up and parking spot rental. A payment transaction involves only one such item.

Ex. Gas Station Control System (3) 8. Payment type 8.1 By cash (or personal check): can only be done at the cashier. 8.2 By credit card: can be done either at the gas pump or at the cashier. The customer must swipe his credit card appropriately, but with no PIN code. 8.3 By billing account: the customer must give his billing account to the cashier, who adds the amount to the monthly bill of a given business account. 9. Payment exception 9.1 Cash (check) problem: The cashier is authorized to improvise. 9.2 Credit card problem: The customer can only wait for 30 seconds for authorization from the Credit Card System. If no response or incorrect credit card number / coverage, the customer is asked for another payment type / credit card. At the gas pump, only one payment attempt is allowed; otherwise the pump is reset to vacant (to not block the lane), and the customer is asked to see the cashier. 9.3 Business account problem: If the account is invalid, the customer is asked for another payment type / account number.

Ex. Gas Station Control System (4) What about no more gasoline, or no more parking spots? How should the user interface dialogs be structured? Are any credit card allowed, including banking cards (VISA etc.)? What kind of information should be transferred between gas pumps and the GSCS, between the cashier and the GSCS etc.? How to collect monthly payment from local businesses? How many payment attempts should be given to the customer at the cashier? What if the customer ultimately cannot pay? Example, part 1: Possible weaknesses in GSCS requirements: Can be found by special reading techniques for requirements, but this is outside our scope here.

Ex. Gas Station Control System (5) Example, part 2: Parking Spot related messages in a sequence diagram for Gas Station.

Ex. Gas Station Control System (6) Example, part 3: Abstracting messages to two services for Gas Station – GetParkingSpot (“dotted” lines) and PayParkingSpot (“whole” lines).

Requirements for Gas Station Control System (7) Example, part 4: Checking whether a constraint is fulfilled in Gas Station class diagram: Credit_Card System + authorize_payment(customer, amount, date)() (from External Systems) [ response time should be less than 30 seconds for all Credit Card Systems ]

OORT experimentation Receiving feedback from users of the techniques:  Controlled Experiments  Observational Studies Revising the techniques based on feedback:  Qualitative (mostly)  Quantitative Continually evaluating the techniques to ensure they remain feasible and useful Negotiating with companies to implement OORTs on real development projects. Goal: To assess effectiveness on industrial projects …  Are time/effort requirements realistic?  Do the techniques address real development needs? … using experienced developers.  Is there “value added” also for more experienced software engineers? Empirical Evaluations of OORTs

OORT experimentation (2) Techniques are feasible Techniques help find defects Vertical reading finds more defects of omission and incorrect fact Horizontal reading finds more defects of inconsistency and ambiguity What we know: What we don’t know: What influence does domain knowledge have on the reading process – Horizontal x Vertical Can we automate a portion of the techniques, e.g. by an XMI-based UML tool? – Some steps are repetitive and mechanical – Need to identify clerical activities See also conclusion. –http://www.cs.umd.edu/Dienst/UI/2.0/Describe/ncstrl.umcp/CS-TR-4070 –http://fc-md.umd.edu/reading.html

OORT experimentation (3) Controlled Experiment I, Autumn 1998: –Undergraduate Software Engineering class, UMD –Goal: Feasibility and Global Improvement Observational Studies, FC-UMD, Summer 1999: –Goal: Feasibility and Local Improvements Observational Studies II, UMD, Autumn 1999: –Two Graduate Software Engineering Classes, UMD, –Goal: Observation and Local Improvement Controlled Experiment III, Spring 2000: –Undergraduate Software Engineering Class, UMD, –Goal: General life-cycle study (part of larger experiment) And more at UMD… One Postmortem, Three Controlled Exper., one Feasibility Study, 2000-02: - The E0-E4 at NTNU/Ericsson Experiments so far:

E0: Post-mortem of SDL-inspections at Ericsson-Oslo, 1997-99 Overall goals: Study existing inspection process, with SDL diagrams (emphasis) and PLEX programs. Possibly suggest process changes. Context: –Inspection process for SDL in Fagan-style, adapted by Tom Gilb / Gunnar Ribe at Ericsson in early 90s. But now UML, not as “linear” as SDL. –Good company culture for inspections. –Internal, file-based inspection database also with data for some test phases. –One project release A with 20,000 person-hours. –Further releases B-F with 100,000 person-hours. Post-mortem study: –Three MSC students dug up and analyzed data. –To learn and to test hypotheses, e.g. about role of complexity and persistent longitudinal relationships. –Good support by local product line manager, Torbjørn Frotveit.

E0: Post-mortem of SDL-inspections at Ericsson-Oslo, 1997-99 (2) Main findings: (PROFES’99 and NASA-SEW’99 papers, chapter in Gilb-book’02) –Three defect types: Supermajor, Major, (+ Comments) – and impossible to extend by us later. –Effectiveness/efficiency: 70% of all defects caught in individual reading (1ph/defect), 6% in later meetings (8ph/defect). Average defect efficiency in later unit&function testing: 8-10ph/defect. –Actual inspection rate: 5ph/SDL-page, recommended rate: 8ph/SDL-page; implies that we could have found 50% more defects at 1/6 of later test costs. –Release A: 1474 ph on inspections, saves net 6700 ph (34%) of 20,000 ph. –Release B-F: 20,515 ph on insp., saves net 21,000 ph (21%) of 100,000 ph. –Release A: no correlation between module complexity (#states) and number of defects found in inspection – i.e. puts the best people on the hardest tasks! –Release B-F: many non-significant correlations on defect-proneness across phases and releases – but too little data on the individual module level. But: Lots of interesting data, but little local after-analysis, e.g. to understand and tune process. However, new process with Java/UML from 1999/2000.

E1: First OORT experiment, NTNU, Spring 2000 Overall goals: Learn defect detection techniques and specially OORTs, check if the OORTs are feasible and receive proposals to improve them, compare discovered defects with similar experiments at Univ. Maryland (course CS735). Context: QA/SPI course, 19 students, 9 groups, pass/no-pass, T.Dingsøyr as TA. Process: –Make groups: two students in each group (a pair), based on questionnaires. Half of the groups are doing OORTs 2,3,6,7 (mainly state diagrams), the other half OORTs 1,4,5,6 (mainly class/sequence diagrams). –General preparation: Two double lectures on principles and techniques (8 h). –Special preparation (I1): Look at requirements and guidelines (2h, self study). –OO Design Reading (I2): Read and fill out defect/observ. reports (8h, paired); one group member executes the reading, the other is observing the first. Given documents: lecture notes w/ guidelines for OORT1-7 and observation studies, defect and observation report forms, questionnaires to form groups and resulting group allocation, set of “defect-seeded” software documents (RD, UC, CD, CDe, SqD, and StD) – either for Loan Arranger (LA) or Parking Garage (PG) example. Three markup pens.

E1: First OORT experiment, Ntnu, Spring 2000 (2) Differences with UMD-experiment Autumn 1999: –OORT instructions operationalized (as Qx.ij) for clarity and tracing by Conradi. See Appendix for OORT-1. –LA is considered harder and more static, PG easier and more dynamic? –UMD and NTNU are different university contexts. Qualitative findings: –Big variation in effort, dedication and results: E.g. some teams did not report effort data, even did the wrong OORTs. –Big variation in UML expertise. –Students felt frustrated by the extent of the assignment, and that indicated efforts were too low -- felt cheated. –Lengthy and tedious pre-annotation of artifacts, before real defect detection could start (”slave labor”). Discovered many defects already during annotation, even defects that remained unreported. –OORTs too ”heavy” for the given (small) artifacts? –Some confusion about the assigments: what, how, on which artifacts,...? –But also many positive and concrete comments.

E1: First OORT experiment, NTNU, Spring 2000 (3) Quantitative findings: –Recorded defects and comments:  Parking Garage: 17 (of 32) seeded defects & 3 more + 43 comments.  Loan Arranger: 17 (of 30) seeded defects & 4 more + 44 comments. –Defect and comm. occurr., 5 PG groups and 4 LA; sum, average, variance:  PG: sum:33, 6 (4..10) & 1 (0..2) + sum:68, 14 (3..22) comments.  LA: sum:52, 11 (7..14) & 2 (0..4) + sum:72, 18 (9..37) comments. –Duplicately reported defects:  PG: 11 of 13 duplicate defect occurrences found by different OORTs.  LA: 12 of 31 duplicate defect occurrences found by different OORTs. –Effort spent: (for 4 OORTs, counting one person per team)  (Discrepancy = defect or comment.)  PG: 6-7 person-hours, ca. 3 discrepancies/ph.  LA: 10-13 person-hours, ca. 2.5 discrepancies/ph. –Note: 2X more ”comments” than pure defects … and long arguments on what is what! A comment can go on details, as well as architecture.

E1: First OORT experiment, NTNU, Spring 2000 (4) Quantitative findings (cont’d 2): Defect/OORT types in PG inspection – for 33 defect occurrences OORT \ Defect type OORT -1 OORT -2 OORT -3 OORT- 4 OORT -5 OORT -6 OORT -7 Sum all OORTs Omission136131116 Extraneous112 Incorrect Fact23131111 Ambiguity11 Inconsistency123 Miscellaneous- Total378842133 ”Mixed” profileHigh Middle

E1: First OORT experiment, NTNU, Spring 2000 (5) Quantitative findings (cont’d 3): Defect/OORT types in LA inspection – for 52 defect occurrences OORT \ Defect type OORT- 1 OORT -2 OORT -3 OORT- 4 OORT -5 OORT -6 OORT -7 Sum all OORTs Omission77 Extraneous- Incorrect Fact3513214 Ambiguity- Inconsistency2027231 Miscellaneous- Total2371109-252 ”Static” profileVery H.MiddleHigh

E1: First OORT experiment, NTNU, Spring 2000 (6) Quantitative findings (cont’d 4): Comment types/causes in PG inspection -- for 43 comments (not 68 occurrences) Comment cause \ type Missing Behav. Miss. Attrib. Typo / Spell. System Border Clarifi cationOther Sum all causes Omission75101225 Extraneous- Incorrect Fact1146 Ambiguity235 Inconsistency44 Miscellaneous33 Total854128643

E1: First OORT experiment, NTNU, Spring 2000 (7) Quantitative findings (cont’d 5): Comment types/causes in LA inspection -- for 44 comments (not 72 occurrences) Comment cause \ type Missing Behav. Miss. Attrib. Typo / Spell. System Border Clarific ationOther Sum all causes Omission161320 Extraneous- Incorrect Fact1157 Ambiguity12227 Inconsistency336 Miscellaneous314 Total172657744

E1: First OORT experiment, NTNU, Spring 2000 (8) Lessons: –Some unclear instructions: Executor/Observer role, Norwegian file names, file access, some typos. First read RD? –Some unclear concepts: service, constraint, condition, … –UML: not familiar by some groups. –Technical comments on artifacts and OORTs:  Add comments/rationale to diagrams: UC and CD are too brief.  CDe hard to navigate in -- add separators.  SqD had method parameters, but CD not -- how to check?  Need several artifacts (also RD) to understand some OORT questions.  Many trivial typos and naming defects in the artifacts, by UML tool?:  Parking Garage artifacts need more work  Fanny May = Loan Arranger? Lot = Parking Garage?  LA vs. Loan Arranger vs. LoanArranger, gate vs. Gate, CardReaders vs. Card_Readers.  All relations in CDia had cardinalities reversed!  …really frustrating to deal with poor-quality artifacts

E2: OORT feasibility study at NTNU, Autumn 2001 Overall goals: Go through all OORT-related material to learn and propose changes, first for NTNU experiment and later for Ericsson experiment. Context: Two senior MSc students at NTNU, Hegde and Arif in Depth Project course (half semester), each repeating E1 as a combined executor/observer. Process: –Repeat E1 process, but so that each executor is doing all OORTs on either LA or PG. –Analysing data and suggesting future improvements based on both E1 and E2. Findings (next slide): –Used about 15 hours each, 2X that of E1 groups – but did all 7 OORTs, not 4. –28 resp. 27 defects found in PG and LA: 3X as many per OORT as in E1. –Found 9 more PG defects: 35+9 = 44, 4 more LA defects: 34+4 = 38. –Found 34 more PG comm.: 43+34 = 77, 34 more LA comm.: 34+10 = 44. –About 4 discrepancies/ph, 50% more than in E1. –Many good suggestions for improvements on OORTs, artifacts, and process. –So motivation means a lot!

E2: OORT feasibility study at NTNU, Autumn 2001 (2) Quantitative findings (cont’d 2): Results from PG/LA inspections – NB: #defects = #occurrences Executor \ Background and results PG (by Hegde, doing all OORTs) LA (by Arif, doing all OORTs) Industrial backgroundLow-Med UML backgroundLowMed-High #defects recorded28 (of 44, +9 new)27 (of 38, +4 new) #comments recorded 40 (w/ 34 new) 25 (w/ 10 new) Total effort (in min.)900910 Effort per Discrepency (Defect+Comment)13/min (4.5/ph)17.5/min (3.8/ph)

E3: Second OORT experiment, NTNU, Spring 2002 Overall goals: As in E1, but also to try out certain OORT changes for later industrial experiments in E4. Context: Sw Arch. course, 42 students, 21 groups, pass/no-pass, Hegde and Arif as TAs. Process: –Mainly as in E1, but a web-application was used to manage artifacts and filled- in forms. –OORTs enhanced for readability (not as terse as in E1), and generally polished and corrected. Given documents: Mostly as in E1, but trivial defects (typos, spellings etc.) in artifacts corrected to reduce the amount of “comments”.

E3: Second OORT experiment at NTNU, Spring 2002 (2) Changes in OORTs for E3 experiment, based on E1/E2 insights: Change type Count Error in original OORTs from UMD3 Error in E1-conversion to Qx.ij10 Question rephrased20 Comments added (more words)22 Total55

E3: Second OORT experiment at NTNU, Spring 2000 (3) Quantitative findings for PG: The 11 groups OORTs Defects seeded Old defect occurr. Old comment occur. New comment occur. Total discrepencies Effort (min) Efficiency (disc./ph) 102,3,6,74441383371.42 112,3,6,74440483701.30 122,3,6,74430582102.29 132,3,6,7444923368702.48 141,4,5,644857202325.17 151,4,5,6445118244453.24 161,4,5,644523102152.79 171,4,5,644777212305.48 181,4,5,644349162663.61 191,4,5,64461011273205.06 301,4,5,64441011253903.85 Sum(44) 53 (21 defects) 59 (77 old) 91 (78 new)2033885 Mean4.85.48.318.5353.23.30 Median457203203.24

E3: Second OORT experiment, NTNU, Spring 2002 (4) Quantitative findings for LA: The 10 groups OORTs Defects seeded Old defect occurr. Old comment occur. New comment occur. Total discrepencies Effort (min) Efficiency (disc./ph) 202,3,6,73820352861.05 212,3,6,738349162453.92 222,3,6,738525121624.44 232,3,6,7383017202704.44 242,3,6,73800332300.78 251,4,5,638119 313854.83 261,4,5,638427134001.95 271,4,5,638141017412808.97 281,4,5,63841492652.04 291,4,5,638111311352707.78 311,4,5,63812199403157.62 Sum(38) 69 (21 defects) 60 (44 old) 96 (72 new)2253108 Mean6.35.58.720.5282.54.30 Median429162704.44

E3: Second OORT experiment, NTNU, Spring 2002 (5) Findings, general : –PG: 4-5 defects found per group, totally found 21 (of 44) defects in 53 occurr., 150 comment occurrences with 78 new comments (*). Mean: 3.3 discr./ph. –LA: 4-6 defects found per group, totally found 21 (of 38) defects in 69 occurr., 156 comment occurrences with 72 new comments (*). Mean: 4.3 discr./ph. –OORT-2,3,6,7: found less defects that OORT-1,4,5,6. –Efficiency: 3-4 discrepencies/ph, or ca. 1 in industrial context (3-4 inspectors). –Group variance: 1:4 (for PG) and 1:11 (for LA) in #defects found and in #defects/ph. Motivation was rather poor and variable? –Cleaning up artifacts for trivial defects (typos etc.): did not reduce number of comments, still 3X as many comments as defects! *) New comments must be analyzed further, possibly new defects here also.

E4: Industrial OORT experiment at Ericsson-Grimstad, Spring 2002 Overall goals: Investigate feasibility of new OORTs vs. old checklist/view-based reading techniques in an industrial environment at Ericsson-Grimstad. Context: –Ericsson site in Grimstad, ca. 400 developers, 250 doing GPRS work, 10 developers paid to perform experiment, part of INCO and PROFIT projects. Much internal turbulence in down-sizing in Spring 2002. –Overall inspection process in place, need upgrade in individual techniques for UML-reading and generally better metrics. –Hegde and Arif as TAs, supplemented with two local MSc students from local HiA, and NTNU PhD-student Mohagheghi from Ericsson as internal coord.. Process: –First lecture with revised OORTs, then individual reading (preparation), then common inspection meeting. Later data analysis by HiA/NTNU. –The OORTs were adapted for lack of CDe (OORT-4: CDxCDe) and RD (using UC instead), e.g. no OORT-5: CDxRD. –Artifacts: as paper and on internal file catalogs, forms: on the web. Given documents: no RD (too big and with appendices), UC, CD (extended and also serving as CDe), one StD (made hastily from a DFD), and SqD. Only increments was to be inspected, according to separate list.

E4: Industrial OORT experiment at Ericsson-Grimstad, Spring 2002 (2) Quantative results from Ericsson-Grimstad: Defects and effort Partial baseline June 2001 - March 2002 Current view-based techn., 5 persons, May 02 Revised OORTs, 4 persons, May 02 #defect occur. by reading8417 (all different)47 (39 different) #defect occur. in meeting8281 Reading effort 99.92 ph (0.84 def/ph) 10 ph (1.70 def/ph) 25.5 ph (2,29 def/ph) Meeting effort 214.92 ph (0.38 def/ph) 8,25 ph (0.97 def/ph) 9 ph 0.11 (def/ph) Total effort 314.84 ph (0.53 def/ph) 18,25 ph (1.37 def/ph) 34,5 ph (1.57 def/ph)

E4: Industrial OORT experiment at Ericsson-Grimstad, Spring 2002 Lessons learned in E4 at Ericsson: –General:  Reading (preparation) generally done too fast – already known.  Weak existing baseline and metrics, but under improvement.  Inspections to find defects; design reviews for “deeper” design comments –OORTs vs. view-based reading/inspection:  OORT data are for 4 inspectors, the 5th did not deliver his forms.  OORTs perform a bit better than view-based in efficiency.  But OORTs finds 2X defects.  View-based technique finds 1/3 of total defects in inspection meeting!  Need to cross-check duplicate defect occurrences, but no overlap in defects found by view-based and OORTs!  Both outperforms inspection “baseline” by almost 3X in efficiency.  So many aspects to follow up!

Conclusion Lessons learned in general for E0-E4: –Industrial artifacts are enormous:  Requirements of 100s of pages, with ER/DF/Sq diagrams (not only text)  Software artifacts cover entire walls – use increments and their CRs? –Industrial baselines are thin, so hard to demonstrate a valid effect. –OORTs seems to do the job, but:  Still many detailed Qx.ij that are not answerable (make transition matrices)  Some redundancies in annotations/questions + industry tailoring.  What to do with “comments” (2-3X the defects)?  Domain knowledge and UML expertise: does not matter? –On using students:  Ethics: slave labor and intellectual rights – many angry comments back.  Norwegian students not so motivated as American – integrate OORTs better in courses, use paid volonteers? A 20h-exercise is too large? –Conclusions:  Must do more experimentation in industry.  How to use more realistic artifacts for student experiments.

OORT-1: Sequence Diagram x Class Diagram Inputs: –1. A class diagram, possibly in several packages. –2. Sequence diagrams. Outputs: –1. Annotated versions of above diagrams. –2. Discrepancy reports. Goal: To verify that the class diagram for the system describes classes and their relationships in such a way that the behaviors specified in the sequence diagrams are correctly captured. Instructions: –Do steps R1.1 and R1.2. Appendix: OORT-1: Sequence Diagram x Class Diagram, Spring 2000 version (E1 and E2)

OORT-1: Sequence Diagram x Class Diagram (2) Inputs: –1. Sequence diagram (SqD). Outputs: –1. System objects, classes and actors (underlined with blue on SqD); –2. System services (underlined with green on SqD); –3. Constraints/conditions on the messages/services (circled in yellow on SqD). I.e., a marked-up SqD is produced, and will be used in R1.2. Instructions: – matches outputs above. –Q11.a: Underline system objects, classes and actors in blue on SqD. –Q11.b: Underline system services in green on SqD. –Q11.c: Circle constraints/conditions on messages/services in yellow on SqD. Step R1.1: From a sequence diagram – identify system objects, system services, and conditions.

OORT-1: Sequence Diagram x Class Diagram (3) Inputs: –1. Marked up sequence diagrams (SqDs) – from R1.1. –2. Class diagrams (CDs). Outputs: –1. Discrepancy reports. Instructions (as questions – here and after): –Q12.a: Can every object/class/actor in the SqD be found in the CD? Possible [inconsistency?] –Q12.b Can every service/message in the SqD be found in the CD, and with proper parameters? [inconsistency?] –Q12.c: Are all system services covered by (low-level) messages in the SqD? Possible [omission?] –Q12.d: Is there an association or other relationship between two classes in case of message exchanges? [omission?] –Q12.e: Is there a mismatch in behavior arguments or in how constraints / conditions are formulated between the two documents? [inconsistency?] Step R1.2: Check related class diagrams, to see if all system objects are covered.

OORT-1: Sequence Diagram x Class Diagram (4) Step R1.2 instructions (cont’d ): –Q12.f: Can the constraints from the SqD in R1.1 be fulfilled? E.g. Number of objects that can receive a message (check cardinality in CD)? E.g. Range of data values? E.g. Dependencies between data or objects? E.g. Timing constraints? Report any problems. [inconsistency?] –Q12.g: Overall design comments, based on own experience, domain knowledge, and understanding: E.g. Do the messages and their parameters make sense for this object? E.g. Are the stated conditions appropriate? E.g. Are all necessary attributes defined? E.g. Do the defined attributes/functions on a class make sense? E.g. Do the classes/attributes/functions have meaningful names? E.g. Are class relationships reasonable and of correct type? (ex. association vs. composition relationships). Report any problems. [incorrect fact?]

Slide 1 UFRJ COPPE Department of Computer Science Experimental Software Engineering Group Fraunhofer Center - Maryland Some experiences at NTNU and Ericsson.

Similar presentations

Presentation on theme: "Slide 1 UFRJ COPPE Department of Computer Science Experimental Software Engineering Group Fraunhofer Center - Maryland Some experiences at NTNU and Ericsson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Slide 1 UFRJ COPPE Department of Computer Science Experimental Software Engineering Group Fraunhofer Center - Maryland Some experiences at NTNU and Ericsson.

Similar presentations

Presentation on theme: "Slide 1 UFRJ COPPE Department of Computer Science Experimental Software Engineering Group Fraunhofer Center - Maryland Some experiences at NTNU and Ericsson."— Presentation transcript:

Similar presentations

About project

Feedback