Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reliability and Safety

Similar presentations


Presentation on theme: "Reliability and Safety"— Presentation transcript:

1 Reliability and Safety
Week 7 What can go wrong?

2 Issues: Hardware Errors Software Errors Fault vs Error

3 Computer failure causes:
Faulty design Sloppy implementation Careless or insufficiently trained users Poor user interfaces Hardware/Software malfunctions Specification errors Scope/Application inconsistency

4 Computer users perspective
Should understand limitations of the computers Need for proper training Need for responsible use Difference between good products and bad ones

5 Computer Professional Perspective
Study computer failures Study computer ethics

6 Educated Member of Society Perspective
Help us evaluate the reliability and safety of various computer applications Help evaluate computer technology

7 Three Categories of Failures
Problems for individuals System failures that affect large numbers of people or cost large amounts of money Problems in safety-critical applications

8 Problems for Individuals
Billing Errors design and/or implementation of programs Not enough care - input error Not enough testing - reasonable range Not enough training

9 Database Accuracy Problems
Info in database is not accurate Automatic entering of info - mistakes can be overlooked Copies of incorrect info can be sent to other systems Not knowledgeable enough about the system

10 Causes Lack of accountability Large population
Most of our financial interactions are with strangers Automated processing without human common sense Overconfidence in accuracy of data Lack of accountability

11 Consumer Hardware and Software
Usually have more serious errors in their first releases Regularly sold with known bugs Hardware also has flaws tradeoff between cost, debugging, and marketing Dishonesty, denials of problems, lack of adequate response to complaints

12 System Failures Lots of $$$$ Complete shutdown of basic services
Areas: communications Business and financial systems Military

13 WHY? Not enough testing Technical difficulties
Poor management decisions Dishonesty in promoting the system and responding to problems

14 Communications Phone Service How Bad? pagers phone calls 911
Communications for airports cellular phones

15 Business and financial systems
Stock exchange ATM Contest by Pepsi too many winning tickets issued

16 Destroying Business Loss of sales incorrect info affects business
dissatisfied customers incorrect prices loss of data

17 Military Data management Weapons system design Battle simulation
Battle management command/control communications intelligence Nuclear war

18 Why? Not enough testing technical difficulties
poor management decisions dishonesty in promoting the system and responding to problems Results in delays and abandonment of projects Heard Before?

19 The Denver Airport baggage system
Outbound luggage checked at ticket counters or curbside to be delivered to anywhere in <10 minutes via automated system of cars on tracks connecting flights or terminals Laser scanners tracks cars

20 Problems Encountered Cars crash into each other at intersections
Luggage misrouted, dumped or flung Needed cars were idle or put to rest

21 Specific problems Real world problems scanners got dirty
knocked out of alignment Software error rerouting of cars to waiting area - idle

22 Causes Time allowed for development and testing was insufficient
Significant changes in specifications were made after project began Not enough debug time Poor management Unrealistic plan

23 Safety Critical Applications
Use of computers is increasing rapidly in these areas Use of computers in these areas can save $ Areas Military Medical Applications Power plants Aircraft Trains Automobiles

24 Aircraft - Fly by Wire Pilots do not directly control plane
Actions are input to computers that control the aircraft systems Pilot interaction is critical Need for easy way to override computers Easy transfer between automatic and manual control

25 Air Traffic Control Long delays Increased risk of collision
Political - government spends $ elsewhere

26 Case Study - Therac-25 Software controlled radiation therapy machine used to treat people with cancer Problems: Massive overdoses administered Repeated overdoses due to faulty display Death Operated in dual machine mode - electron beam or x-ray photon beam

27 Why? Lapses in good safety design Insufficient testing
Bugs in software that controlled machines Inadequate system of reporting and investigating accidents and deaths

28 Specific problems Some hardware safety features were eliminated in newer models Software used was assumed correct from older systems Malfunctioned frequently Weakness in design of operator interface inadequate explanation of error messages if any

29 Specific problems continued
Machine allowed one-key intervention versus automatic shutdown Inadequate documentation Poor test plan

30 Software Errors - bugs Fatal error was a simple fix
Fixes are complex, expensive, and prevents use of machine while fixing Bugs can be intermittent and hard to detect importance of self checking importance of using good programming techniques

31 Overconfidence Leaving out changes that are necessary
Ignoring error messages Not using backup devices (video or audio)

32 Conclusion and Perspective
Irresponsibility leads to criminal charges Responsibility leads to merit awards Importance of good software development Consequences of carelessness, cutting corners, unprofessional work, or attempts to avoid responsibility Lack of appreciation for risks Poor training

33 Ways to prevent problems
Good computer systems Good training Accountability Individual responsibility Management responsibility IEEE Code of Ethics

34 Increasing Reliability and Safety
What goes wrong? Many lines of code and many programmers Problems are managerial, technical, social, legal, ethical

35 Overconfidence Unappreciative of risks Ignore warnings
Don’t consult manuals

36 Professional Techniques
Use good software engineering techniques at all stages of development: Requirements Specs Design Implementation Documentation Testing (V&V)

37 Professional Techniques
Study the techniques and tools available Knowing or learning enough about the application field and the software or systems being used (Domain Knowledge)

38 Why Study Failures? Provides technical lessons
Leads to improved hardware and software products Provide ethical data Lead to improved ethical codes/laws

39 Lessons Learned Accidents are not the result of unknown scientific principles but rather a failure to apply well-known engineering practices Accidents will not be prevented by technological fixes alone, requires control of all aspects of the development and operation of the system

40 Lessons Learned Software developers need to recognize the limitations of software, and use hardware safety mechanisms

41 Redundancy and Self-checking
Redundancy - judging - expensive Complex systems collect information to diagnose and correct errors Audit trails are vital Detail records help protect against theft and help trace and correct errors

42 Redundancy and Self-checking
Designed to constantly monitor itself and correct problems automatically Half of the computing power is devoted to checking The rest for errors closes off part of the system reroutes corrects problems and reroutes again

43 TESTING CRITICAL! Principles and techniques exist
can use another company to perform Independent verification and validation

44 Dangerous Tendencies Operators
bypass check mechanisms through familiarity Technicians Blame random mechanical or signal glitches rather than software Corporate Managers Initially deny and ignore - then cover up Finally - deal with expensive fixes

45 Overall Lessons Learned
Should not declare problem understood with first hypothesis Should not expect management to follow through on field reports Overconfidence in software leads to economical marginal designs

46 Overall Lessons Learned
Enforcement of software engineering practices is often abysmal Basing risk assessments on individual subsystems often leads to unrealistic optimism

47 Lessons for systems engineering
Hardware backups valuable Software must not be presumed innocent Audit trails are critical Risk estimates are subjective User feedback is valuable

48 Lessons for software engineering
Documentation should be on-going Designs should be kept simple Testing should be built into software Software must be tested out of system and in system Reuse of software should be tested like new software

49 Lessons for oversight Users are more likely to make initial observations than monitoring officials Users need reliable information

50 Laws and Regulations Criminal and Civil penalties
Suits against company that designs or sells the system Criminal charges when fraud or criminal negligence occurs Need contracts Need well designed laws and standards

51 Regulation Requirement for approval by a government agency before a new product can be sold including specific testing requirements The profit motive causes skimping on safety/testing Better to abandon in some cases Inadequate abilities to judge by customer Hard to sue large companies

52 Regulation Expensive and time-consuming
Newer procedures may not be enforced Lots of paperwork

53 Professional licensing
Licensing of software development professionals to protect against poor quality and unethical behavior Specific training Passing competency exam Ethical requirements Continuing education


Download ppt "Reliability and Safety"

Similar presentations


Ads by Google