What is Software Quality? Chapter 2
Pressman's definition of "Software Quality" Conformance to explicitly stated functional and performance requirements, explicitly documented development standards, and implicit characteristics that are expected of all professionally developed software. text page 25
IEEE Definition of "Software Quality" 1. The degree to which a system, component, or process meets specified requirements. 2. The degree to which a system, component, or process meets customer or user needs or expectations. text page 24
IEEE Definition of "Software Quality Assurance" 1. A planned and systematic pattern of all actions necessary to provide adequate confidence that an item or product conforms to established technical requirements. 2. A set of activities designed to evaluate the process by which the products are developed or manufactured. Contrast with quality control.
CMM "The Capability Maturity Model for Software developed by the SEI is a framework that describes the key elements of an effective software process. The CMM describes an evolutionary improvement path for software organizations from an ad hoc, immature process to a mature, disciplined one." "The CMM covers practices for planning, engineering, and managing software development and maintenance. When followed, these practices improve the ability of organizations to meet goals for cost, schedule, functionality, and product quality."
More Definitions "software error" "software fault" "software failure" types of errors 1. code error 2. procedure error 3. documentation error 4. software data error text sections 2.1 and 2.2
Causes of software errors 1. faulty requirements definition 2. client-developer communication failures 3. deliberate deviations from software requirements 4. logical design errors 5. coding errors 6. non-compliance with documentation and coding instructions 7. shortcomings of the testing process 8. procedure errors 9. documentation errors text section 2.3
Cost of Errors "Software bugs, or errors, are so prevalent and so detrimental that they cost the U.S. economy an estimated $59.5 billion annually, or about 0.6 percent of the gross domestic product. … Although all errors cannot be removed, more than a third of these costs, or an estimated $22.2 billion, could be eliminated by an improved testing infrastructure that enables earlier and more effective identification and removal of software defects. These are the savings associated with finding an increased percentage (but not 100 percent) of errors closer to the development stages in which they are introduced. Currently, over half of all errors are not found until "downstream" in the development process or during post-sale software use." US Dept of Commerce June 2002
Some Famous Software Errors Therac-25 Patriot Missile System NASA's Mars Polar Lander ESA's Ariane 5 Launch System 2003 Blackout many details stolen from
Therac-25 - the problem When operating in soft X-ray mode, the machine was designed to rotate three components into the path of the electron beam, in order to shape and moderate the power of the beam. … The accidents occurred when the high-energy electron- beam was activated without the target having been rotated into place; the machine's software did not detect that this had occurred, and did not therefore determine that the patient was receiving a potentially lethal dose of radiation, or prevent this from occurring.
Therac-25 - the reasons The design did not have any hardware interlocks to prevent the electron- beam from operating in its high-energy mode without the target in place. The engineer had reused software from older models. These models had hardware interlocks and were therefore not as vulnerable to the software defects. The hardware provided no way for the software to verify that sensors were working correctly. The equipment control task did not properly synchronize with the operator interface task, so that race conditions occurred if the operator changed the setup too quickly. This was evidently missed during testing, since it took some practice before operators were able to work quickly enough for the problem to occur. The software set a flag variable by incrementing it. Occasionally an arithmetic overflow occurred, causing the software to bypass safety checks.
Patriot Missile System On February 25, 1991, the Patriot missile battery at Dharan, Saudi Arabia had been in operation for 100 hours, by which time the system's internal clock had drifted by one third of a second. For a target moving as fast as an inbound TBM, this was equivalent to a position error of 600 meters. The radar system had successfully detected the Scud and predicted where to look for it next, but because of the time error, looked in the wrong part of the sky and found no missile. With no missile, the initial detection was assumed to be a spurious track and the missile was removed from the system. No interception was attempted, and the missile impacted on a barracks killing 28 soldiers.
Mars Polar Lander The last telemetry from Mars Polar Lander was sent just prior to atmospheric entry on December 3, No further signals have been received from the lander. The cause of this loss of communication is unknown. According to the investigation that followed, the most likely cause of the failure of the mission was a software error that mistakenly identified the vibration caused by the deployment of the lander's legs as being caused by the vehicle touching down on the Martian surface, resulting in the vehicle's descent engines being cut off whilst it was still 40 meters above the surface, rather than on touchdown as planned. Another possible reason for failure was inadequate preheating of catalysis beds for the pulsing rocket thrusters
Ariane 5 Rocket June 4, 1996 was the first test flight of the Ariane 5 launch system. The rocket tore itself apart 37 seconds after launch, making the fault one of the most expensive computer bugs in history. The Ariane 5 software reused the specifications from the Ariane 4, but the Ariane 5's flight path was considerably different and beyond the range for which the reused code had been designed. Specifically, the Ariane 5's greater acceleration caused the back-up and primary inertial guidance computers to crash, after which the launcher's nozzles were directed by spurious data. Pre-flight tests had never been performed on the re-alignment code under simulated Ariane 5 flight conditions, so the error was not discovered before launch. Because of the different flight path, a data conversion from a 64-bit floating point to 16-bit signed integer caused a hardware exception (more specifically, an arithmetic overflow, as the floating point number had a value too large to be represented by a 16-bit signed integer). Efficiency considerations had led to the disabling of the exception handler for this error. This led to a cascade of problems, culminating in destruction of the entire flight.
2003 North America Blackout August 14, :15 p.m. Inaccurate data input renders a system monitoring tool in Ohio ineffective. 1:31 p.m. The Eastlake, Ohio, generating plant shuts down. 2:02 p.m. First 345-kV line in Ohio fails due to contact with a tree in Walton Hills, Ohio. 2:14 p.m. An alarm system fails at FirstEnergy's control room and is not repaired. 2:27 p.m. Second 345-kV line fails due to tree. 3:05 p.m. A 345-kV transmission line fails in Parma, south of Cleveland due to a tree. 3:17 p.m. Voltage dips temporarily on the Ohio portion of the grid. Controllers take no action, but power shifted by the first failure onto another 345-kV power line causes it to sag into a tree. While Mid West ISO and FirstEnergy controllers try to understand the failures, they fail to inform system controllers in nearby states. 3:39 p.m. A First Energy 138-kV line fails. 3:41 and 3:46 p.m. Two breakers connecting FirstEnergy’s grid with American Electric Power are tripped as a 345-kV power line and kV lines fail in northern Ohio. Later analysis suggests that this could have been the last possible chance to save the grid if controllers had cut off power to Cleveland at this time. 4:06 p.m. A sustained power surge on some Ohio lines begins uncontrollable cascade after another 345-kV line fails. 4:09:02 p.m. Voltage sags deeply as Ohio draws 2 GW of power from Michigan. 4:10:34 p.m. Many transmission lines trip out, first in Michigan and then in Ohio, blocking the eastward flow of power. Generators go down, creating a huge power deficit. In seconds, power surges out of the East, tripping East coast generators to protect them, and the blackout is on. 4:10:37 p.m. Eastern Michigan grid disconnects from western part of state. 4:10:38 p.m. Cleveland separates from Pennsylvania grid. 4:10:39 p.m. 3.7 GW power flow from East through Ontario to southern Michigan and northern Ohio, more than ten times larger than the condition 30 seconds earlier, causing voltage drop across system. 4:10:40 p.m. Flow flips to 2 GW eastward from Michigan through Ontario, then flip westward again in a half second. 4:10:43 p.m. International connections begin failing. 4:10:45 p.m. Western Ontario separates from east when power line north of Lake Superior disconnects. First Ontario plants go offline in response to unstable system. Quebec is protected because its lines are DC, not AC. 4:10:46 p.m. New York separates from New England grid. 4:10:50 p.m. Ontario separates from Western New York grid. 4:11:57 p.m. Last lines between Michigan and Ontario fail. 4:13 p.m. End of cascade. 256 power plants are off-line. 85% went offline after the grid separations occurred, most of them on automatic controls. 50 million people without power.
Next… Software Quality Factors how do you know quality when you see it? how do you measure quality?