Fiascos1 Debbie Bartlett November 07, 2011
Fiascos 2 Have you ever noticed … Computers do EXACTLY what YOU SAY NOT necessarily what YOU WANT them to do For example The file name snowman.java is different from SnowMan.java SnowMan.java in your home directory is different from SnowMan.java in your public.html directory Wanting someone to be able to display your file won’t happen unless permission bits are set for read
Fiascos 3 This can lead to … Engineering Fiasco’s According to the dictionary: “A complete Failure”
Fiascos 4 Example “Fiasco”: NASA’s Mars Climate Orbiter (1999) Mission Objective: Monitor the daily weather & atmospheric conditions of Mars NASA’s Administrator's design objective: “Faster, Better, Cheaper” Fate Crashed on the Mar’s surface Why? Error in the software to control thrusters: NASA sub- contractor used English units for navigation when NASA had used Metric units So off by factor of 4.45 with the ground station Thus, Orbiter got too close to the surface Costs to NASA $327.6 million for the orbiter and lander Lost opportunity costs: loss of mission data Loss of reputation Lessons learned Integrated testing should have done Error checking should have been added to the software, i.e. can be no less than x units from the surface, if so, then invalid number
Fiascos 5 Swedish ship VASA (1628) “Great” Swedish King ordered 4 new warships One was to be a royal ship, greater than any ship ever built The king (not a ship designer) specified the measurements and design Insisted on the strongest, heaviest northern oak: 40 acres of timber use, triple-laminated oaken walls 18” thick Main mast was 190 ft tall Floating work of art: Carved ornaments, painted red, gold, blue Two gun decks Insisted on 64 bronze cannons; weighing 100 tons Ballast equalled 120 tons of stone Carried additional weight of cannon balls, gunpowder, ancillary firearms, foods, officers and a crew of 133 sailors Launched August 10, 1628, Fate: Sailed less than a nautical mile before capsizing Lessons learned: Listen and take advice from the experts
Fiascos 6 Y2K (Year 2000 problem, millennium bug) A clicking time bomb From 1960s until late 1980s, widespread practice in all computer software to use two digits for representing a year rather than using 4 digits Save computer disk and memory space which was very expensive In year 2000, ’00’ to a computer would mean 1900 not 2000 When calculating difference in dates, 01/01/2000 and 12/31/1999 would be 100 years rather than 1 day. Threatened all major industries including utilities, banking, Mfging, telecom, airlines Widespread fear, further brought about by the media Industry testing & results: Tractor factory, waste treatment plan in California Will all planes fall from the sky at midnight 01/01/2000? All computer and software companies spent years and billions of dollars preparing for it Year 2000 compliant operating systems Reading millions of line of application source code 1000s of engineers and support personnel on call that second When clock ticked 01/01/2000 no major problems reported, a few minor problems that were fixed within a day Question: was it not as bad as the media made it out to be or did the years of prep work pay off? Lessons learned: Anticipate the future when designing software ..
Fiascos 7 HP: Integrated disk and tape device Customer requested Highly leverage-able from existing stand-alone devices Designed and supported for both Basic & UNIX Operating System A while later a customer support problem: Corrupted Unix File System Thought: What ‘incorrect’ thing DID the customer do to cause this problem A couple of weeks later: a second report Still: Customer caused problem? After all several sold and being used with no problem Third problem: This IS serious Stop shipment of Unix system – greater impact to the bottom line the longer on shipment hold Investigation, what did these three have in common: Unix, integrated disk/tape unit Shipment hold only on combination of Unix & Integrated storage device What could be going on: Hours of time trying to duplicate problem, code reading, analyzing what was incorrectly written on disk Problem found Unix OS submit write operation to tape device, tape device has write error, Unix OS says try again, instead of putting request on tape device queue, put it on the disk drive device queue Lesson Learned Importance of thorough, integration testing Importance of thinking through error scenarios and testing those cases ……
Fiascos 8 Lessons Learned from “Fiascos” Seek the input and knowledge of experts Anticipate what it will be like in the future Do design reviews Thoroughly test Include error checks in the software Anticipate possible error scenarios Computers Will do what YOU tell them to do!