EN.600.424 Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)

SECURITY AND RELIABILITY A reliable system is not necessarily secure However, it is highly unlikely that an unreliable system is secure Attacks are often payloads piggy-backed on vulnerabilities Moreover, many of the principles of building a reliable system apply Testing Adversarial viewpoint

CASE STUDY – THERAC 25 http://sunnyday.mit.edu/papers/therac.pdf Computer controlled radiation medical therapy machine Between 6/’85 and 1/’87, it overdosed 6 people (3 died) The problems were primarily software failures

ATOMIC ENERGY OF CANADA LIMITED (AECL) In conjunction with a company named CGR built in the early 70’s The Therac 6 The Therac 20 Afterwards, and on their own, built the Therac 25 between ‘76 and ‘82

RELATIONSHIP BETWEEN THE THERACS The Therac 6 and 20: Stand alone machines with software added for convenience Hardware safety locks Therac-25 Software built from the beginning (although derived from the 6 and 20) Some hardware safety locks replaced with software safety locks

THE FIRST ACCIDENT: JUNE ‘85 IN GEORGIA 61 year old woman received radiation treatment after lumpectomy Felt heat during the treatment and told the tech, “you burned me” Nobody believed her, no action was taken Shortly after, where she was “treated” became red and swollen Two weeks later, reddening on her back as if a burn had gone through her Skin began to fall off her Physicist later estimated she received 1 or 2 15k-20k rads Typical single doses are in the 200 rad range 500 rads to the whole body will kill 50%

FIRST ACCIDENT AFTERMATH Woman had breast removed Shoulder and arm were paralyzed Constant pain Corporate/Regulatory Response Lawsuit settled out of court Accident not reported to FDA until after other accidents Other Therac-25 users not informed

THE SECOND ACCIDENT: JULY ‘85 IN ONTARIO 40 year old woman received 24 th treatment for Cervix cancer Tech tried to give the dose, but machine shut down “NO DOSE” “Treatment Paused” Tried to give the dose 5 times before the machine suspended NOTE: tech’s frequently experienced problems like this but without problems Patient complained about burning sensation in her hip

SECOND ACCIDENT AFTERMATH Woman died from the cancer But, it was determined that if she’d lived, she would have needed hip replacement AECL tech later estimated she received 13k-17k rads Corporate response FDA, users, others told there was a problem and to visually inspect “turntable” Investigated the problem, assumed it was with the turntable and “fixed” it However, they fully admitted they could not reproduce or be sure Still, claimed it was better by “five orders of magnitude” Was told to reduce the number of times NO DOSE failures allowed; DID NOT! Was asked to install independent turntable safety mechanism; DID NOT!

THIRD ACCIDENT: DEC ‘85 IN WASHINGTON Not reported until a second incident later The damage was much smaller and the patient lived AECL said it could not be the fault of the Therac-25 because it was “Fixed” Hospital wasn’t told about other failures and assumed that Therac-25 had good record!

FOURTH ACCIDENT: MAR ‘96 IN TEXAS Male patient came for 9 th treatment after removal of tumor from back Tech in separated room quickly entered/corrected values and started treatment Got a weird error (internal error!) and a pause. She hit “proceed” Turned out the first “error” had send him a huge dose; he got up to get help The “proceed” sent a second dose as he was getting up (in his arm) He pounded on the door to stop the procedure Estimated he received between 16.5k to 25k rads His entire body was damaged and he died 5 months later AECL tech came next day and said the machine CANT overdose Also said there were no reports of over dosing patients (!!!!!!)

FIFTH ACCIDENT: APRIL ‘86 IN TEXAS Three weeks later in same hospital (and tech!) as previous accident Another male patient getting treatment for skin cancer Again, tech entered and corrected values before starting treatment This time, intercom was working and she heard the unusual buzzing of the machine She rushed in where the patient was moaning He said he felt fire on the side of his face Saw flash of light and heard sizzling like frying eggs Patient died three weeks later from radiation overdose to the brain

HOSPITAL INVESTIGATION Physicist and Tech now new for sure something was wrong despite AECL claims Began their own investigation and eventually repeated the error Determined that the error occurred when the data was entered quickly The tech was very fast and could do it The physicist needed practice before he could enter it fast enough AECL couldn’t recreate without help from the physicist When they finally did, measured the rads to be 25k

SIXTH ACCIDENT: JAN ‘87 IN WASHINGTON Same hospital as the third accident Patient was to received 86 rads Machine again paused and the tech pressed “proceed” Now the patient complained of burning sensation The console said 7 rads but later determined it was between 8-10k rads It was determined that the electron beam came on in the “Field light” position

RACE CONDITION BUG Real time operating system gathers details from the UI Setting the bending magnets takes 8 seconds Checks for data edits (again, in real time) as it is setting the magnets However, cleared variable mean subsequent edits are not recorded (but show up in UI)

OVERFLOW BUG Error-checking and integrity checking code protects software One variable would perform a check if the value was non-zero But the variable was just 8 bit Every 256 th check would overflow back to zero When the tech hit “set” when this overflow happened would allow full, maximum exposure

ANALYSIS OF CAUSES Overconfidence in Software Confusing reliability with safety Lack of defensive design (I call it “adversarial”) Failure to eliminate root causes Complacency Unrealistic risk assessment Inadequate investigations Inadequate software engineering practices Software reuse Safe versus user friendly

SOFTWARE ENGINEERING PRACTICES Software specifications and documentation should not be an afterthought Rigorous software QA practices and standards Designs should be simple and dangerous coding practices avoided Software has to be designed to be testable! Auditing and error detections should be designed in from the start Extensive testing and formal analysis UI needs to be carefully designed (Users need to understand, for example, error messages)

UNDERSTANDING FAILURES Everything fails. Everything. Don’t be like AECL (“It can’t fail that way…”) I recently had engineers of a client say the exact same things They couldn’t understand why I thought their software would fail How will your software fail? You have to ensure that you fail safely Some failures can never be tolerated; those features may need to be removed Related: Make sure you use fail safe defaults

PLANNING FOR FAILURES (FAIL SAFELY) On On failure, restore to a secure state (preserve safe configuration) Always check return values Always include a safe default on conditional checks Preserve confidentiality/integrity even when availability is lost For example, C++ exceptions do better than most C runtime errors Ensure that failures do not alter access controls and other safety features

SPECIAL: FAIL SAFE DEFAULTS For secure systems, deny by default! Access is based on permissions rather than on exclusions For examples, firewalls should block everything by default Guest access should be disabled by default Router defaults are horrible Default should be inoperable until passwords are changed Another example of security v user friendliness (On the other hand, it’s still a business decision…)

RELATIONSHIP TO OTHER PRINCIPLES Least Privilege: In systems where least privilege is followed, failures tend not to expose privileges Also, the error handling system should only have access to error information Minimal Attack Surface: Error handling code needs to be minimal and simplistic Also, write code so that error paths are forced by language to revert to safe state In Python, for example, always open files using the “with” construct Consider wrapping ultra-critical functions in a second layer that does error handling In C++, you can write special “Smart Pointers” that enforce safety

CONCRETE FAILURE PROPOSAL You and your team should come up with your own failure strategy I propose this as a starting point: Take your “Attack Tree” for your PLAYGROUND node Identify all software failure nodes Determine: Which failures should just be eliminated (remove a feature) For remaining failures, how to make the failure safer Also, Identify all “default” values. Disallow any defaults that enable an attack Review your “Failure Safety” plan any time you prepare to change the software

SPEAKING OF CHANGING SOFTWARE I’m assuming you and your group will use an appropriate design cycle You should have a requirements-design-implementation-test-repeat plan You also need policies such as: “No code check-in’s without a walk through” “No code check-in’s without running regression tests” “New features require a ‘failure safety’ review” I’m not going to tell you how to do this; Please come up with a plan

TESTING Testing needs to be designed in from the start You will notice that the PLAYGROUND framework does not have tests First of all, this code is not designed to be secure or safe Second of all, it is experimental and under development Third of all, I want different groups to try different approaches It’s hard to test from the start when you are “experimenting” Try a “2 system” approach Prototype once for feasibility Re-implement with appropriate testing

UNIT TESTING Create unit tests for each unit Every public method should be tested For inherited classes, you can create inherited test classes Test: Boundary conditions Special cases Negative tests (test failures!) and fault injection Representative cases Known answer tests

BLACK BOX TESTING A system testing method (test the system as a “black box”) Should be driven by “requirements analysis” Perform the same category of tests as unit tests Fault injection is especially important

WHITE BOX TESTING Ensures that every “branch” of the code has been tested Or in other words, code coverage checks This is especially critical for interpreted languages like Python

PENETRATION TESTING A “friendly attacker” actively tries to break into the system The “attacker” should use knowledge of the system This is obviously more powerful than the real attacker The attacker should also try attacking “dumb” Fuzzing is a good example

AUTOMATING THE TESTING Testing should be automated as much as possible Unit tests are usually the easiest There are frameworks for this Black box tests can be automated scripts White box tests can use code coverage tools Penetration tests can use tools like MetaSploit These tests can be automated as well Tests that succeed should definitely be automated for regression testing

REGRESSION TESTING When the code changes, test to ensure that new bugs are not introduced Also, that “Fixed” bugs stay fixed It’s a good policy to always run regression tests before a new code check-in You can even set up a script that checks out code automatically, builds, and tests Set this up once a day and have it email you the results Send nasty messages to a group member that “Breaks the Build”

BUG TRACKING Bugs should be reported and tracked Read online for “best practices” for bug report descriptions An automated test should be created that reliably reproduces the bug If there are random values, your test system should fix a seed if possible The test should include in the comments or description the bug number it tracks When a “fix” is checked in to the code, the bug number should be included in the check-in comments

SUMMARY This class is not a software engineering class Nevertheless, we have had to talk about it today because it impacts security I’ve only touched on topics that could be (and are!) a full course I *strongly* recommend that at least one of your group be the “architect” If one of your group already knows it, that’s great! If not, this person should spend extra time researching good SE practices

EN.600.424 Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)

Similar presentations

Presentation on theme: "EN.600.424 Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EN.600.424 Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)

Similar presentations

Presentation on theme: "EN.600.424 Lecture Notes Spring 2016 FUNDAMENTALS OF SECURE DESIGN (SOFTWARE)"— Presentation transcript:

Similar presentations

About project

Feedback