University of Toronto Department of Computer Science © 2004-5 Steve Easterbrook. This presentation is available free for non-commercial use with attribution.

Slides:



Advertisements
Similar presentations
Integra Consult A/S Safety Assessment. Integra Consult A/S SAFETY ASSESSMENT Objective Objective –Demonstrate that an acceptable level of safety will.
Advertisements

Project What is a project A temporary endeavor undertaken to create a unique product, service or result.
HSE’s Ageing and Life Extension Key Programme (KP4) and Human Factors
Metrics for Process and Projects
Chapter 10 Schedule Your Schedule. Copyright 2004 by Pearson Education, Inc. Identifying And Scheduling Tasks The schedule from the Software Development.
F21DF1 : Databases & Information SystemsLachlan M. MacKinnon & Phil Trinder Introduction to Information Systems Databases & Information Systems Lachlan.
1 Independent Verification and Validation Current Status, Challenges, and Research Opportunities Dan McCaugherty IV&V Program Manager Titan Systems Corporation.
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
Overview Lesson 10,11 - Software Quality Assurance
F29IF2 : Databases & Information Systems Lachlan M. MacKinnon The Domain of Information Systems Databases & Information Systems Lachlan M. MacKinnon.
Computer Engineering 203 R Smith Risk Management 7/ Risk Management The future can never be predicted with 100% accuracy. Failure to plan for risks.
SQM - 1DCS - ANULECTURE Software Quality Management Software Quality Management Processes V & V of Critical Software & Systems Ian Hirst.
Creator: ACSession No: 9 Slide No: 1Reviewer: SS CSE300Advanced Software EngineeringNovember 2005 Risk Management CSE300 Advanced Software Engineering.
University of Toronto Department of Computer Science CSC444 Lec03 1 Lecture 3: Project Management Project Management Planning Tools PERT charts, Gantt.
Title slide PIPELINE QRA SEMINAR. PIPELINE RISK ASSESSMENT INTRODUCTION TO RISK IDENTIFICATION 2.
Software Project Risk Management
Hazards Analysis & Risks Assessment By Sebastien A. Daleyden Vincent M. Goussen.
Software Evolution Planning CIS 376 Bruce R. Maxim UM-Dearborn.
System Software Integration Testing Mars Polar Lander Steven Ford SYSM /05/12.
S/W Project Management
Software Project Management Lecture # 8. Outline Chapter 25 – Risk Management  What is Risk Management  Risk Management Strategies  Software Risks.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
Topic 5 Understanding and learning from error. LEARNING OBJECTIVE Understand the nature of error and how health care can learn from error to improve patient.
Software Project Management Lecture # 8. Outline Earned Value Analysis (Chapter 24) Topics from Chapter 25.
SPEAKER : KAI-JIA CHANG ADVISER : QUINCY WU DATA : Spiral Model.
Project Tracking. Questions... Why should we track a project that is underway? What aspects of a project need tracking?
Software Safety CS3300 Fall Failures are costly ● Bhopal 1984 – 3000 dead and injured ● Therac – 6 dead ● Chernobyl / Three Mile.
Risk Management for Technology Projects Geography 463 : GIS Workshop May
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Risk.
1 Project Risk Management Project Risk Management Dr. Said Abu Jalala.
University of Toronto Department of Computer Science CSC444 Lec02 1 Lecture 2: Examples of Poor Engineering “Software Forensics” Case Studies Mars Pathfinder.
Chapter 12 Project Risk Management
Chapter 3 Project Management Details Tracking Project Progress Project Estimation Project Risk Analysis Project Organization RUP Project Management Workflow.
University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution.
Management & Development of Complex Projects Course Code MS Project Management Perform Qualitative Risk Analysis Lecture # 25.
© The McGraw-Hill Companies, Software Project Management 4th Edition Risk management Chapter 7.
Slide 1V&V 10/2002 Software Quality Assurance Dr. Linda H. Rosenberg Assistant Director For Information Sciences Goddard Space Flight Center, NASA
System Integration Testing Requirements Mars Polar Lander Steven Ford SYSM /11/12.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Telerik Software Academy Software Quality Assurance.
Pre-Project Components
Ch 10 - Risk Management Learning Objectives You should be able to: List and describe risk management processes, inputs, outputs, and tools List and describe.
1 Safety - definitions Accident - an unanticipated loss of life, injury, or other cost beyond a pre-determined threshhold.  If you expect it, it’s not.
SOFTWARE PROJECT MANAGEMENT
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
SE&I Pre-Proposal Meeting GSFC - JPL Systems Engineering Management Colleen McGraw.
Chapter 6: THE EIGHT STEP PROCESS FOCUS: This chapter provides a description of the application of customer-driven project management.
Project management Topic 7 Controls. What is a control? Decision making activities – Planning – Monitor progress – Compare achievement with plan – Detect.
Software Project Management Lecture 5 Software Project Risk Management.
Information Technology Project Management Managing IT Project Risk.
1 Project Management C53PM Session 4 Russell Taylor Staff Work-base – 1 st Floor
Learning Outcomes Discuss current trends and issues in health care and nursing. Describe the essential elements of quality and safety in nursing and their.
MIS.
R i s k If you don’t attack risks, they will attack you.
Copy of the slides: (will also be put on the esse3 website)
Stoimen Stoimenov QA Engineer SitefinityLeads,SitefinityTeam6 Telerik QA Academy Telerik QA Academy.
Failure Modes, Effects and Criticality Analysis
ON “SOFTWARE ENGINEERING” SUBJECT TOPIC “RISK ANALYSIS AND MANAGEMENT” MASTER OF COMPUTER APPLICATION (5th Semester) Presented by: ANOOP GANGWAR SRMSCET,
PST Human Factors Jan Shaw Manchester Royal Infirmary CMFT.
Copy of the slides: (will also be put on the esse3 website)
Software Risk Management
Risk Management for Technology Projects
Software Processes (a)
Air Carrier Continuing Analysis and Surveillance System (CASS)
Software Project Management (SPM)
Knowing When to Stop: An Examination of Methods to Minimize the False Negative Risk of Automated Abort Triggers RAM XI Training Summit October 2018 Patrick.
Software Engineering for Safety: a Roadmap
Hazards Analysis & Risks Assessment
Quality Assurance in Clinical Trials
Presentation transcript:

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 1 Lecture 10: Risk Ü General ideas about Risk Ü Risk Management  Identifying Risks  Assessing Risks Ü Case Study:  Mars Polar Lander

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 2 Risk Management Ü About Risk  Risk is “the possibility of suffering loss”  Risk itself is not bad, it is essential to progress  The challenge is to manage the amount of risk Ü Two Parts:  Risk Assessment  Risk Control Ü Useful concepts:  For each risk: Risk Exposure  RE = p(unsat. outcome) X loss(unsat. outcome)  For each mitigation action: Risk Reduction Leverage  RRL = (RE before - RE after ) / cost of intervention

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 3 Principles of Risk Management Ü Global Perspective  View software in context of a larger system  For any opportunity, identify both:  Potential value  Potential impact of adverse results Ü Forward Looking View  Anticipate possible outcomes  Identify uncertainty  Manage resources accordingly Ü Open Communications  Free-flowing information at all project levels  Value the individual voice  Unique knowledge and insights Ü Integrated Management  Project management is risk management! Ü Continuous Process  Continually identify and manage risks  Maintain constant vigilance Ü Shared Product Vision  Everybody understands the mission  Common purpose  Collective responsibility  Shared ownership  Focus on results Ü Teamwork  Work cooperatively to achieve the common goal  Pool talent, skills and knowledge Source: Adapted from SEI Continuous Risk Management Guidebook

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 4 Continuous Risk Management Ü Identify:  Search for and locate risks before they become problems  Systematic techniques to discover risks Ü Analyse:  Transform risk data into decision- making information  For each risk, evaluate:  Impact  Probability  Timeframe  Classify and Prioritise Risks Ü Plan  Choose risk mitigation actions Ü Track  Monitor risk indicators  Reassess risks Ü Control  Correct for deviations from the risk mitigation plans Ü Communicate  Share information on current and emerging risks Source: Adapted from SEI Continuous Risk Management Guidebook

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 5 Fault Tree Analysis Wrong or inadequate treatment administered Vital signs erroneously reported as exceeding limits Vital signs exceed critical limits but not corrected in time Frequency of measurement too low Vital signs not reported Computer fails to raise alarm Nurse does not respond to alarm Computer does not read within required time limits Human sets frequency too low Sensor failure Nurse fails to input them or does so incorrectly etc Event that results from a combination of causes Basic fault event requiring no further elaboration Or-gate And-gate Source: Adapted from Leveson, “Safeware”, p321

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 6 Risk Assessment Ü Quantitative:  Measure risk exposure using standard cost & probability measures  Note: probabilities are rarely independent Ü Qualitative:  Develop a risk classification matrix  Eg one used by NASA:

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 7 Source: Adapted from Boehm, 1989 Top 10 Development Risks (+ Countermeasures) Ü Personnel Shortfalls  use top talent  team building  training Ü Unrealistic schedules/budgets  multisource estimation  designing to cost  requirements scrubbing Ü Developing the wrong Software functions  better requirements analysis  organizational/operational analysis Ü Developing the wrong User Interface  prototypes, scenarios, task analysis Ü Gold Plating  requirements scrubbing  cost benefit analysis  designing to cost Ü Continuing stream of requirements changes  high change threshold  information hiding  incremental development Ü Shortfalls in externally furnished components  early benchmarking  inspections, compatibility analysis Ü Shortfalls in externally performed tasks  pre-award audits  competitive designs Ü Real-time performance shortfalls  targeted analysis  simulations, benchmarks, models Ü Straining computer science capabilities  technical analysis  checking scientific literature

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 8 Case Study: Mars Polar Lander Ü Launched  3 Jan 1999 Ü Mission  Land near South Pole  Dig for water ice with a robotic arm Ü Fate:  Arrived 3 Dec 1999  No signal received after initial phase of descent Ü Cause:  Several candidate causes  Most likely is premature engine shutdown due to noise on leg sensors

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 9 What happened? Ü Investigation hampered by lack of data  spacecraft not designed to send telemetry during descent  This decision severely criticized by review boards Ü Possible causes:  Lander failed to separate from cruise stage (plausible but unlikely)  Landing site too steep (plausible)  Heatshield failed (plausible)  Loss of control due to dynamic effects (plausible)  Loss of control due to center-of- mass shift (plausible)  Premature Shutdown of Descent Engines (most likely!)  Parachute drapes over lander (plausible)  Backshell hits lander (plausible but unlikely)

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 10 Premature Shutdown Scenario Ü Cause of error  Magnetic sensor on each leg senses touchdown  Legs unfold at 1500m above surface  transient signals on touchdown sensors during unfolding  software accepts touchdown signals if they persist for 2 timeframes  transient signals likely to be long enough on at least one leg Ü Factors  System requirement to ignore the transient signals  But the software requirements did not describe the effect  s/w designers didn’t understand the effect, so didn’t implement the requirement  Engineers present at code inspection didn’t understand the effect  Not caught in testing because:  Unit testing didn’t include the transients  Sensors improperly wired during integration tests (no touchdown detected!)  Full test not repeated after re-wiring Ü Result of error  Engines shut down before spacecraft has landed  When engine shutdown s/w enabled, flags indicated touchdown already occurred  estimated at 40m above surface, travelling at 13 m/s  estimated impact velocity 22m/s (spacecraft would not survive this)  nominal touchdown velocity 2.4m/s

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 11 Figure 7-9. MPL System Requirements Mapping to Flight Software Requirements Adapted from the “Report of the Loss of the Mars Polar Lander and Deep Space 2 Missions -- JPL Special Review Board (Casani Report) - March 2000”. See

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 12 Learning the Right Lessons Ü Understand the Causality  Never a single cause; usually many complex interactions  Seek the set of conditions that are both necessary and sufficient…  …to cause the failure Ü Causal reasoning about failure is very subjective  Data collection methods may introduce bias  e.g. failure to ask the right people  e.g. failure to ask the right questions (or provide appropriate response modes)  Human tendency to over-simplify  e.g. blame the human operator  e.g. blame only the technical factors “In most of the major accidents of the past 25 years, technical information on how to prevent the accident was known, and often even implemented. But in each case… [this was] negated by organisational or managerial flaws.” (Leveson, Safeware)

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 13 Is there an existing “Safety Culture”? Ü Are overconfidence and complacency common?  the Titanic effect - “it can’t happen to us!”  Do managers assume it’s safe unless someone can prove otherwise? Ü Are warning signs routinely ignored?  What happens to diagnostic data during operations?  Does the organisation regularly collect data on anomalies?  Are all anomalies routinely investigated? Ü Is there an assumption that risk decreases?  E.g. Are successful missions used as an argument to cut safety margins? Ü Are the risk factors calculated correctly?  E.g. What assumptions are made about independence between risk factors? Ü Is there a culture of silence?  What is the experience of whistleblowers? (Can you even find any?)

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 14 Failure to manage risk Inadequate Margins Science (functionality) Fixed (growth) Schedule Fixed Cost Fixed Launch Vehicle Fixed (Some Relief) Risk Only variable Adapted from MPIAT - Mars Program Independent Assessment Team Summary Report, NASA JPL, March 14, See

University of Toronto Department of Computer Science © Steve Easterbrook. This presentation is available free for non-commercial use with attribution under a creative commons license. 15 Summary Ü Risk Management is a systematic activity  Requires both technical and management attention  Requires system-level view  Should continue throughout a project Ü Techniques exist to identify and assess risks  E.g. fault tree analysis  E.g. Risk assessment matrix Ü Risk and Requirements Engineering  Risk analysis can uncover new requirements  Especially for safety-critical or security-critical applications  Risk analysis can uncover feasibility concerns  Risk analysis will assist in appropriate management action