EN.600.424 Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)

Slides:



Advertisements
Similar presentations
MFA for Business Banking – Security Code Multifactor Authentication: Quick Tip Sheets Note to Financial Institutions: We are providing these QT sheets.
Advertisements

Testing Relational Database
“An Investigation of the Therac-25 Accidents” by Nancy G. Leveson and Clark S. Turner Catherine Schell CSC 508 October 13, 2004.
The Therac-25: A Software Fatal Failure
Vulnerability Analysis. Formal verification Formally (mathematically) prove certain characteristics Proves the absence of flaws in a program or design.
Dynamic Typing COS 441 Princeton University Fall 2004.
An Investigation of the Therac-25 Accidents Nancy G. Leveson Clark S. Turner IEEE, 1993 Presented by Jack Kustanowitz April 26, 2005 University of Maryland.
Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.
Therac-25 Lawsuit for Victims Against the AECL
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
+ THE THERAC-25 - A SOFTWARE FATAL FAILURE Kpea, Aagbara Saturday SYSM 6309 Spring ’12 UT-Dallas.
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
Lecture 2 Page 1 CS 236, Spring 2008 Security Principles and Policies CS 236 On-Line MS Program Networks and Systems Security Peter Reiher Spring, 2008.
A Gift of Fire Third edition Sara Baase
SWE Introduction to Software Engineering
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
1 Today More on random testing + symbolic constraint solving (“concolic” testing) Using summaries to explore fewer paths (SMART) While preserving level.
Xtreme Programming. Software Life Cycle The activities that take place between the time software program is first conceived and the time it is finally.
Implementation. We we came from… Planning Analysis Design Implementation Identify Problem/Value. Feasibility Analysis. Project Management. Understand.
Chapter 11: Testing The dynamic verification of the behavior of a program on a finite set of test cases, suitable selected from the usually infinite execution.
Software Configuration Management
1 CSE 403 Reliability Testing These lecture slides are copyright (C) Marty Stepp, They may not be rehosted, sold, or modified without expressed permission.
Desktop Security: Worms and Viruses Brian Arkills, C&C NDC-Sysmgt.
Lecture 7, part 2: Software Reliability
Dr Andy Brooks1 Lecture 4 Therac-25, computer controlled radiation therapy machine, that killed people. FOR0383 Software Quality Assurance.
DJ Wattam, Han Junyi, C Mongin1 COMP60611 Directed Reading 1: Therac-25 Background – Therac-25 was a new design dual mode machine developed from previous.
Alisha Horsfield INTERNET SAFETY. firewall Firewall- a system made to stop unauthorised access to or from a private network Firewalls also protects your.
Software Quality Chapter Software Quality  How can you tell if software has high quality?  How can we measure the quality of software?  How.
Lecture 18 Page 1 CS 111 Online Design Principles for Secure Systems Economy Complete mediation Open design Separation of privileges Least privilege Least.
Software Safety Case Study Medical Devices : Therac 25 and beyond Matthew Dwyer.
ITGS Software Reliability. ITGS All IT systems are a combination of: –Hardware –Software –People –Data Problems with any of these parts, or a combination.
Chapter 8: Errors, Failures, and Risk
Teaching material for a course in Software Project Management & Software Engineering – part II.
1 Lecture 19 Configuration Management Software Engineering.
INCIDENT RESPONSE & INCIDENT INVESTIGATION. INCIDENT RESPONSE First Aid –On-time treatment –Minor treatments- (small cuts, scrapes, etc.) –Minor medical.
The Protection of Information in Computer Systems Part I. Basic Principles of Information Protection Jerome Saltzer & Michael Schroeder Presented by Bert.
Testing Basics of Testing Presented by: Vijay.C.G – Glister Tech.
Software Development Software Testing. Testing Definitions There are many tests going under various names. The following is a general list to get a feel.
Dimitrios Christias Robert Lyon Andreas Petrou Dimitrios Christias Robert Lyon Andreas Petrou.
What you know… You work at the East Texas Cancer Center in Tyler, Texas as a physicist who “maintains and checks the machine regularly.” (Huff 2005) Patient.
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.
COMP 121 Week 1: Testing and Debugging. Testing Program testing can be used to show the presence of bugs, but never to show their absence! ~ Edsger Dijkstra.
Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.
Lecture 19 Page 1 CS 236 Online 16. Account Monitoring and Control Why it’s important: –Inactive accounts are often attacker’s path into your system –Nobody’s.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 1.
Why you can’t always have what you want Simon Hutchinson – Reckon Product Management.
Therac-25 CS4001 Kristin Marsicano. Therac-25 Overview  What was the Therac-25?  How did it relate to previous models? In what ways was it similar/different?
Sight Word List.
Software Defects.
LECTURE 20 26/11/15. Summary - Testing ◦ Testing affects all stages of software engineering cycle ◦ One strategy is a bottom-up approach – class, integration,
Sight Words.
CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.
Dr. Rob Hasker. Classic Quality Assurance  Ensure follow process Solid, reviewed requirements Reviewed design Reviewed, passing tests  Why doesn’t “we.
CHAPTER 2 Laws of Security. Introduction Laws of security enable user make the judgment about the security of a system. Some of the “laws” are not really.
Agenda: Overview of Agile testing Difference between Agile and traditional Methodology Agile Development Methodologies Extreme Programming Test Driven.
Lecture 15 Page 1 CS 236 Online Evaluating Running Systems Evaluating system security requires knowing what’s going on Many steps are necessary for a full.
Dr. Rob Hasker. Classic Quality Assurance  Ensure follow process Solid, reviewed requirements Reviewed design Reviewed, passing tests  Why doesn’t “we.
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
By the end of this lesson you will be able to: 1. Determine the preventive support measures that are in place at your school.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Lecture 19 Page 1 CS 236 Online 6. Application Software Security Why it’s important: –Security flaws in applications are increasingly the attacker’s entry.
EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN
A Gift of Fire Third edition Sara Baase
Therac-25.
CSE 303 Concepts and Tools for Software Development
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Week 13: Errors, Failures, and Risks
A Gift of Fire Third edition Sara Baase
6. Application Software Security
Presentation transcript:

EN Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)

SECURITY AND RELIABILITY A reliable system is not necessarily secure However, it is highly unlikely that an unreliable system is secure Attacks are often payloads piggy-backed on vulnerabilities Moreover, many of the principles of building a reliable system apply Testing Adversarial viewpoint

CASE STUDY – THERAC 25 Computer controlled radiation medical therapy machine Between 6/’85 and 1/’87, it overdosed 6 people (3 died) The problems were primarily software failures

ATOMIC ENERGY OF CANADA LIMITED (AECL) In conjunction with a company named CGR built in the early 70’s The Therac 6 The Therac 20 Afterwards, and on their own, built the Therac 25 between ‘76 and ‘82

RELATIONSHIP BETWEEN THE THERACS The Therac 6 and 20: Stand alone machines with software added for convenience Hardware safety locks Therac-25 Software built from the beginning (although derived from the 6 and 20) Some hardware safety locks replaced with software safety locks

THE FIRST ACCIDENT: JUNE ‘85 IN GEORGIA 61 year old woman received radiation treatment after lumpectomy Felt heat during the treatment and told the tech, “you burned me” Nobody believed her, no action was taken Shortly after, where she was “treated” became red and swollen Two weeks later, reddening on her back as if a burn had gone through her Skin began to fall off her Physicist later estimated she received 1 or 2 15k-20k rads Typical single doses are in the 200 rad range 500 rads to the whole body will kill 50%

FIRST ACCIDENT AFTERMATH Woman had breast removed Shoulder and arm were paralyzed Constant pain Corporate/Regulatory Response Lawsuit settled out of court Accident not reported to FDA until after other accidents Other Therac-25 users not informed

THE SECOND ACCIDENT: JULY ‘85 IN ONTARIO 40 year old woman received 24 th treatment for Cervix cancer Tech tried to give the dose, but machine shut down “NO DOSE” “Treatment Paused” Tried to give the dose 5 times before the machine suspended NOTE: tech’s frequently experienced problems like this but without problems Patient complained about burning sensation in her hip

SECOND ACCIDENT AFTERMATH Woman died from the cancer But, it was determined that if she’d lived, she would have needed hip replacement AECL tech later estimated she received 13k-17k rads Corporate response FDA, users, others told there was a problem and to visually inspect “turntable” Investigated the problem, assumed it was with the turntable and “fixed” it However, they fully admitted they could not reproduce or be sure Still, claimed it was better by “five orders of magnitude” Was told to reduce the number of times NO DOSE failures allowed; DID NOT! Was asked to install independent turntable safety mechanism; DID NOT!

THIRD ACCIDENT: DEC ‘85 IN WASHINGTON Not reported until a second incident later The damage was much smaller and the patient lived AECL said it could not be the fault of the Therac-25 because it was “Fixed” Hospital wasn’t told about other failures and assumed that Therac-25 had good record!

FOURTH ACCIDENT: MAR ‘96 IN TEXAS Male patient came for 9 th treatment after removal of tumor from back Tech in separated room quickly entered/corrected values and started treatment Got a weird error (internal error!) and a pause. She hit “proceed” Turned out the first “error” had send him a huge dose; he got up to get help The “proceed” sent a second dose as he was getting up (in his arm) He pounded on the door to stop the procedure Estimated he received between 16.5k to 25k rads His entire body was damaged and he died 5 months later AECL tech came next day and said the machine CANT overdose Also said there were no reports of over dosing patients (!!!!!!)

FIFTH ACCIDENT: APRIL ‘86 IN TEXAS Three weeks later in same hospital (and tech!) as previous accident Another male patient getting treatment for skin cancer Again, tech entered and corrected values before starting treatment This time, intercom was working and she heard the unusual buzzing of the machine She rushed in where the patient was moaning He said he felt fire on the side of his face Saw flash of light and heard sizzling like frying eggs Patient died three weeks later from radiation overdose to the brain

HOSPITAL INVESTIGATION Physicist and Tech now new for sure something was wrong despite AECL claims Began their own investigation and eventually repeated the error Determined that the error occurred when the data was entered quickly The tech was very fast and could do it The physicist needed practice before he could enter it fast enough AECL couldn’t recreate without help from the physicist When they finally did, measured the rads to be 25k

SIXTH ACCIDENT: JAN ‘87 IN WASHINGTON Same hospital as the third accident Patient was to received 86 rads Machine again paused and the tech pressed “proceed” Now the patient complained of burning sensation The console said 7 rads but later determined it was between 8-10k rads It was determined that the electron beam came on in the “Field light” position

RACE CONDITION BUG Real time operating system gathers details from the UI Setting the bending magnets takes 8 seconds Checks for data edits (again, in real time) as it is setting the magnets However, cleared variable mean subsequent edits are not recorded (but show up in UI)

OVERFLOW BUG Error-checking and integrity checking code protects software One variable would perform a check if the value was non-zero But the variable was just 8 bit Every 256 th check would overflow back to zero When the tech hit “set” when this overflow happened would allow full, maximum exposure

ANALYSIS OF CAUSES Overconfidence in Software Confusing reliability with safety Lack of defensive design (I call it “adversarial”) Failure to eliminate root causes Complacency Unrealistic risk assessment Inadequate investigations Inadequate software engineering practices Software reuse Safe versus user friendly

SOFTWARE ENGINEERING PRACTICES Software specifications and documentation should not be an afterthought Rigorous software QA practices and standards Designs should be simple and dangerous coding practices avoided Software has to be designed to be testable! Auditing and error detections should be designed in from the start Extensive testing and formal analysis UI needs to be carefully designed (Users need to understand, for example, error messages)

UNDERSTANDING FAILURES Everything fails. Everything. Don’t be like AECL (“It can’t fail that way…”) I recently had engineers of a client say the exact same things They couldn’t understand why I thought their software would fail How will your software fail? You have to ensure that you fail safely Some failures can never be tolerated; those features may need to be removed Related: Make sure you use fail safe defaults

PLANNING FOR FAILURES (FAIL SAFELY) On On failure, restore to a secure state (preserve safe configuration) Always check return values Always include a safe default on conditional checks Preserve confidentiality/integrity even when availability is lost For example, C++ exceptions do better than most C runtime errors Ensure that failures do not alter access controls and other safety features

SPECIAL: FAIL SAFE DEFAULTS For secure systems, deny by default! Access is based on permissions rather than on exclusions For examples, firewalls should block everything by default Guest access should be disabled by default Router defaults are horrible Default should be inoperable until passwords are changed Another example of security v user friendliness (On the other hand, it’s still a business decision…)

RELATIONSHIP TO OTHER PRINCIPLES Least Privilege: In systems where least privilege is followed, failures tend not to expose privileges Also, the error handling system should only have access to error information Minimal Attack Surface: Error handling code needs to be minimal and simplistic Also, write code so that error paths are forced by language to revert to safe state In Python, for example, always open files using the “with” construct Consider wrapping ultra-critical functions in a second layer that does error handling In C++, you can write special “Smart Pointers” that enforce safety

CONCRETE FAILURE PROPOSAL You and your team should come up with your own failure strategy I propose this as a starting point: Take your “Attack Tree” for your PLAYGROUND node Identify all software failure nodes Determine: Which failures should just be eliminated (remove a feature) For remaining failures, how to make the failure safer Also, Identify all “default” values. Disallow any defaults that enable an attack Review your “Failure Safety” plan any time you prepare to change the software

SPEAKING OF CHANGING SOFTWARE I’m assuming you and your group will use an appropriate design cycle You should have a requirements-design-implementation-test-repeat plan You also need policies such as: “No code check-in’s without a walk through” “No code check-in’s without running regression tests” “New features require a ‘failure safety’ review” I’m not going to tell you how to do this; Please come up with a plan

TESTING Testing needs to be designed in from the start You will notice that the PLAYGROUND framework does not have tests First of all, this code is not designed to be secure or safe Second of all, it is experimental and under development Third of all, I want different groups to try different approaches It’s hard to test from the start when you are “experimenting” Try a “2 system” approach Prototype once for feasibility Re-implement with appropriate testing

UNIT TESTING Create unit tests for each unit Every public method should be tested For inherited classes, you can create inherited test classes Test: Boundary conditions Special cases Negative tests (test failures!) and fault injection Representative cases Known answer tests

BLACK BOX TESTING A system testing method (test the system as a “black box”) Should be driven by “requirements analysis” Perform the same category of tests as unit tests Fault injection is especially important

WHITE BOX TESTING Ensures that every “branch” of the code has been tested Or in other words, code coverage checks This is especially critical for interpreted languages like Python

PENETRATION TESTING A “friendly attacker” actively tries to break into the system The “attacker” should use knowledge of the system This is obviously more powerful than the real attacker The attacker should also try attacking “dumb” Fuzzing is a good example

AUTOMATING THE TESTING Testing should be automated as much as possible Unit tests are usually the easiest There are frameworks for this Black box tests can be automated scripts White box tests can use code coverage tools Penetration tests can use tools like MetaSploit These tests can be automated as well Tests that succeed should definitely be automated for regression testing

REGRESSION TESTING When the code changes, test to ensure that new bugs are not introduced Also, that “Fixed” bugs stay fixed It’s a good policy to always run regression tests before a new code check-in You can even set up a script that checks out code automatically, builds, and tests Set this up once a day and have it you the results Send nasty messages to a group member that “Breaks the Build”

BUG TRACKING Bugs should be reported and tracked Read online for “best practices” for bug report descriptions An automated test should be created that reliably reproduces the bug If there are random values, your test system should fix a seed if possible The test should include in the comments or description the bug number it tracks When a “fix” is checked in to the code, the bug number should be included in the check-in comments

SUMMARY This class is not a software engineering class Nevertheless, we have had to talk about it today because it impacts security I’ve only touched on topics that could be (and are!) a full course I *strongly* recommend that at least one of your group be the “architect” If one of your group already knows it, that’s great! If not, this person should spend extra time researching good SE practices