Presentation is loading. Please wait.

Presentation is loading. Please wait.

CML CSE 520: Advanced Computer Architecture: Reliability Aviral Shrivastava.

Similar presentations


Presentation on theme: "CML CSE 520: Advanced Computer Architecture: Reliability Aviral Shrivastava."— Presentation transcript:

1 CML CSE 520: Advanced Computer Architecture: Reliability Aviral Shrivastava

2 CML Web page: aviral.lab.asu.edu CML Therac-25 1985-1987  The Therac-25 was a machine for administering radiation therapy, generally for treating cancer patients.  ‘arithmetic overflow’ sometimes occurred during automatic safety checks.  If, at this precise moment, the operator was configuring the machine, the safety checks would fail and the metal target would not be moved into place.  The result was that beams 100 times higher than the intended dose would be fired into a patient, giving them radiation poisoning.  This happened on 6 known occasions, causing the later death of 4 patients.

3 CML Web page: aviral.lab.asu.edu CML Patriot Missile Bug - February 25th, 1991  During Operation Desert Shield, the US military fired a patriot missile against an incoming missile, but hit a US base where it killed 28 soldiers and injured a further 98.  The internal clock would ‘drift’ (much like any clock) further and further from accurate time. It was left running for 100 hours, by which point, the internal clock had drifted out by 0.34 of a second.  So when it calculated the target over half a kilometer away from missile’s true location.

4 CML Web page: aviral.lab.asu.edu CML Skynet Brings Judgement Day (1997)  Cost: 6 billion dead, near-total destruction of human civilization and animal ecosystems (fictional)  Disaster: Human operators attempt to shut off the Skynet global computer network. Skynet responds by firing U.S. nuclear missiles at Russia, initiating global nuclear war on what became known as Judgement Day (August 29, 1997).  Cause: Cyberdyne, the leading weapons manufacturer, installed Skynet technology in all military hardware including stealth bombers and missile defense systems. The Skynet technology formed a seamless network and effectively removed humans from strategic defense. Eventually Skynet became sentient, was threatened when the humans tried to take it offline, sought to survive, and retaliated with nuclear war.

5 CML Web page: aviral.lab.asu.edu CML Cold War Missile Crisis September 26, 1983  Soviet military officer Stanislav Petrov received an alert that the US had launched five Minuteman intercontinental ballistic missiles.  Petrov found it strange that the US would attack with just a handful of warheads.  Considering that the early warning system was known to have flaws and had been rushed into service, Petrov decided to rule the alert as a false alarm.  It was later determined that the early detection software had picked up the sun’s reflection from the top of clouds and misinterpreted it as missile launches.

6 CML Web page: aviral.lab.asu.edu CML Michigan Dept. of Corrections Grants Prisoners Early Release  In October 2005, The Register reported on the early release of 23 prisoners due to a computer programming glitch with the Michigan Department of Corrections.early release of 23 prisoners  The accidental early release dates came around 39 to 161 days early while an undisclosed number of inmates were kept in jail past their release dates.  State assembly representative Rick Jones was concerned about the matter, but noted that he was “glad it’s not murderers.”

7 CML Web page: aviral.lab.asu.edu CML North American Blackout August 14, 2003  Affecting around 55 million people, mainly in the North Eastern United States, but also Ontario Canada, this was one of the biggest power blackouts in history.  While the causes of this blackout were nothing to do with a software bug, it could have been averted were it not for a software bug in the control centre alarm system.  The centre alarm system had a ‘race condition’, which caused the alarm system to freeze and stop processing alerts. The alarm system failed ‘silently’, and didn’t notify anybody.

8 CML Web page: aviral.lab.asu.edu CML Blue screen of death

9 CML Web page: aviral.lab.asu.edu CML Source of Errors  Specification errors  Functionality in footnotes  Programming errors  Incorrect implementation (Michigan prison error)  Algorithm error (Cold war missile crisis)  Floating point errors (Patriot missile)  Race conditions (Blackout)  Manufacturing errors  Process variations  Silicon failures  Runtime errors  Negative Bias Temperature Instability (NBTI)  Noise effects  Voltage emergencies  Environmental  Soft errors Assuming systems are mechanically and physically protected!

10 CML Web page: aviral.lab.asu.edu CML Fault Tolerant Computing is not new!  1940s:ENIAC, with 17.5K vacuum tubes and 1000s of other electrical elements, failed once every 2 days  1950s: Early ideas by von Neumann (multichannel, with voting) and Moore-Shannon (“crummy” relays)

11 CML Web page: aviral.lab.asu.edu CML Need is changing: Automation  Space age  Age of Automation  Proliferation of robots

12 CML Web page: aviral.lab.asu.edu CML Need is changing: Proximity  Near body computing  Google glass  In-body computing  Accurate drug delivery  Robotic surgery

13 CML Web page: aviral.lab.asu.edu CML Need is changing: Technology  Transistors are smaller  Even low-energy particles can cause soft errors.  Exponentially more low-energy particles

14 CML Web page: aviral.lab.asu.edu CMLWelcome  To the course on designing reliable computing systems  Focus of the course will be on “soft errors”  Class webpage  http://www.public.asu.edu/~ashriva6/teaching/ARC/


Download ppt "CML CSE 520: Advanced Computer Architecture: Reliability Aviral Shrivastava."

Similar presentations


Ads by Google