Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reliability and Safety Week 7 What can go wrong?.

Similar presentations


Presentation on theme: "Reliability and Safety Week 7 What can go wrong?."— Presentation transcript:

1 Reliability and Safety Week 7 What can go wrong?

2 Issues: uHardware Errors uSoftware Errors uFault vs Error

3 Computer failure causes: uFaulty design uSloppy implementation uCareless or insufficiently trained users uPoor user interfaces uHardware/Software malfunctions uSpecification errors uScope/Application inconsistency

4 Computer users perspective uShould understand limitations of the computers uNeed for proper training uNeed for responsible use uDifference between good products and bad ones

5 Computer Professional Perspective uStudy computer failures uStudy computer ethics

6 Educated Member of Society Perspective uHelp us evaluate the reliability and safety of various computer applications uHelp evaluate computer technology

7 Three Categories of Failures uProblems for individuals uSystem failures that affect large numbers of people or cost large amounts of money uProblems in safety-critical applications

8 Problems for Individuals uBilling Errors u design and/or implementation of programs u Not enough care - input error u Not enough testing - reasonable range u Not enough training

9 Database Accuracy Problems uInfo in database is not accurate uAutomatic entering of info - mistakes can be overlooked uCopies of incorrect info can be in other systems uNot knowledgeable enough about the system

10 Causes uLarge population uMost of our financial interactions are with strangers uAutomated processing without human common sense uOverconfidence in accuracy of data uLack of accountability

11 Consumer Hardware and Software uUsually have more serious errors in their first releases uRegularly sold with known bugs uHardware also has flaws utradeoff between cost, debugging, and marketing uDishonesty, denials of problems, lack of adequate response to complaints

12 System Failures uLots of $$$$ uComplete shutdown of basic services uAreas: u communications u Business and financial systems u Military

13 WHY? uNot enough testing uTechnical difficulties uPoor management decisions uDishonesty in promoting the system and responding to problems

14 Communications uPhone Service uHow Bad? u pagers u phone calls u 911 u Communications for airports u cellular phones

15 Business and financial systems uStock exchange uATM uContest by Pepsi u too many winning tickets issued

16 Destroying Business uLoss of sales uincorrect info affects business udissatisfied customers uincorrect prices uloss of data

17 Military uData management uWeapons system design uBattle simulation uBattle management u command/control u communications u intelligence uNuclear war

18 Why? uNot enough testing utechnical difficulties upoor management decisions udishonesty in promoting the system and responding to problems uResults in delays and abandonment of projects

19 The Denver Airport baggage system uOutbound luggage checked at ticket counters or curbside u to be delivered to anywhere in <10 minutes u via automated system of cars on tracks u connecting flights or terminals uLaser scanners utracks - 4000 cars

20 Problems Encountered uCars crash into each other at intersections uLuggage misrouted, dumped or flung uNeeded cars were idle or put to rest

21 Specific problems uReal world problems u scanners got dirty u knocked out of alignment uSoftware error u rerouting of cars to waiting area - idle

22 Causes uTime allows for development and testing was insufficient uSignificant changes in specifications were made after project began uNot enough debug time uPoor management uUnrealistic plan

23 Safety Critical Applications uUse of computers is increasing rapidly in these areas uUse of computers in these areas can save $ uAreas u Military Medical Applications u Power plants u Aircraft u Trains

24 Aircraft - Fly by Wire uPilots do not directly control plane uActions are input to computers that control the aircraft systems uPilot interaction is critical uNeed for easy way to override computers uEasy transfer between automatic and manual control

25 Air Traffic Control uLong delays uIncreased risk of collisions uOld machines - computer systems uPolitical - government spends $ elsewhere

26 Case Study - Therac-25 uSoftware controlled radiation therapy machine used to treat people with cancer uProblems: u Massive overdoses administered u Repeated overdoses due to faulty display u Death uOperated in dual machine mode - electron beam or x-ray photon beam

27 Why? uLapses in good safety design uInsufficient testing uBugs in software that controlled machines uInadequate system of reporting and investigating accidents and deaths

28 Specific problems uSome hardware safety features were eliminated in newer models uSoftware used was assumed correct form older systems uMalfunctioned frequently uWeakness in design of operator interface uinadequate explanation of error messages if any

29 Specific problems continued uMachine allowed one-key intervention versus automatic shutdown uInadequate documentation uPoor test plan

30 Software Errors - bugs uFatal error was a simple fix uFixes are complex, expensive, and prevents use of machine while fixing uBugs u can be intermittent and hard to detect u importance of self checking u importance of using good programming techniques

31 Overconfidence uLeaving out changes that are necessary uIgnoring error messages uNot using backup devices (video or audio)

32 Conclusion and Perspective uIrresponsibility leads to criminal charges uResponsibility leads to merit awards uImportance of good software development uConsequences of carelessness, cutting corners, unprofessional work, or attempts to avoid responsibility uLack of appreciation for risks uPoor training

33 Ways to prevent problems uGood computer systems uGood training uAccountability uIndividual responsibility uManagement responsibility uIE IEEE Code of Ethics

34 Increasing Reliability and Safety uWhat goes wrong? u Many lines of code and many programmers u Problems are managerial, technical, social, legal, ethical

35 Overconfidence uUnappreciative of risks uIgnore warnings uDon’t consult manuals

36 Professional Techniques uUse good software engineering techniques at all stages of development: u Requirements u Specs u design u implementation u documentation u testing

37 Professional Techniques uStudy the techniques and tools available uKnowing or learning enough about the application field and the software or systems being used

38 Why Study Failures? uProvides technical lessons uLeads to improved hardware and software products uProvide ethical data uLead to improved ethical codes/laws

39 Lessons Learned uAccidents are not the result of unknown scientific principles but rather a failure to apply well- known engineering practices uAccidents will not be prevented by technological fixes alone, requires control of all aspects of the development and operation of the system

40 Lessons Learned uSoftware developers need to recognize the limitations of software, and use hardware safety mechanisms

41 Redundancy and Self- checking uRedundancy - judging - expensive uComplex systems collect information to diagnose and correct errors uAudit trails are vital uDetail records help protect against theft and help trace and correct errors

42 Redundancy and Self- checking uDesigned to constantly monitor itself and correct problems automatically uHalf of the computing power is devoted to checking uThe rest for errors u closes off part of teh system u reroutes u corrects problems and reroutes again

43 TESTING uCRITICAL! uPrinciples and techniques exist ucan use another company to perform Independent verification and validation

44 Dangerous Tendencies uOperators u bypass check mechanisms through familiarity uTechnicians u Blame random mechanical or signal glitches rather than software uCorporate Managers u Initially deny and ignore - then cover up u Finally - deal with expensive fixes

45 Overall Lessons Learned uShould not declare problem understood with first hypothesis uShould not expect management to follow through on field reports uOverconfidence in software leads to economical marginal designs

46 Overall Lessons Learned uEnforcement of software engineering practices is often abysmal uBasing risk assessments on individual subsystems often leads to unrealistic optimism

47 Lessons for systems engineering uHardware backups valuable uSoftware must not be presumed innocent uSoftware errors related can be indistinguishable uAudit trails are critical uRisk estimates are subjective uUser feedback is valuable

48 Lessons for software engineering uDocumentation should be on-going uDesigns should be kept simple uTesting should be built into software uSoftware must be tested out of system and in system uReuse of software should be tested like new software

49 Lessons for oversight uUsers are more likely to make initial observations than monitoring officials uUsers need reliable information in order to be maximally valuable

50 Laws and Regulations uCriminal and Civil penalties uSuits against company that designs or sells the system uCriminal charges when fraud or criminal negligence occurs uNeed contracts uNeed well designed laws and standards

51 Regulation uRequirement for approval by a government agency before a new product can be sold u including specific testing requirements uThe profit motive cause skimping on safety uBetter to abandon in some cases uInadequate abilities to judge by customer uHard to sue large companies

52 Regulation uExpensive and time- consuming uNewer procedures may not be enforced uLots of paperwork

53 Professional licensing uLicensing of software development professionals to protect against poor quality and unethical behavior u Specific training u Passing competency exam u Ethical requirements u Continuing education


Download ppt "Reliability and Safety Week 7 What can go wrong?."

Similar presentations


Ads by Google