NIST (2002) cost to US $59B annual Why do we care?  Therac-25 (1985) 6 massive radiation overdoses  Multiple space fiascos (1990s) Ariane V exploded.

Slides:



Advertisements
Similar presentations
Test process essentials Riitta Viitamäki,
Advertisements

Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Software Failure: Reasons Incorrect, missing, impossible requirements * Requirement validation. Incorrect specification * Specification verification. Faulty.
T. E. Potok - University of Tennessee Software Engineering Dr. Thomas E. Potok Adjunct Professor UT Research Staff Member ORNL.
(ADAPTED FROM DIANE POZEFSKY) Software and System Testing.
15 November Essay 1  Methodologies Points on the spectrum All can adapt to changes Required vs. permitted  Releases vs. iterations  Spool’s.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
Testing an individual module
Software Testing. “Software and Cathedrals are much the same: First we build them, then we pray!!!” -Sam Redwine, Jr.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
March 16, Calendar Next week: Thursday meeting instead of Tuesday (May 25) Web update later today.
Chapter 11: Testing The dynamic verification of the behavior of a program on a finite set of test cases, suitable selected from the usually infinite execution.
1 CSE 403 Reliability Testing These lecture slides are copyright (C) Marty Stepp, They may not be rehosted, sold, or modified without expressed permission.
MSF Testing Introduction Functional Testing Performance Testing.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Formal Methods 1. Software Engineering and Formal Methods  Every software engineering methodology is based on a recommended development process  proceeding.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
TESTING STRATEGY Requires a focus because there are many possible test areas and different types of testing available for each one of those areas. Because.
Software Quality Assurance Lecture #8 By: Faraz Ahmed.
What is Software Engineering? the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software”
Software Testing. Definition To test a program is to try to make it fail.
CompSci 230 Software Design and Construction
© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
What is Software Testing? And Why is it So Hard J. Whittaker paper (IEEE Software – Jan/Feb 2000) Summarized by F. Tsui.
Chapter 8 – Software Testing Lecture 1 1Chapter 8 Software testing The bearing of a child takes nine months, no matter how many women are assigned. Many.
1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.
 CS 5380 Software Engineering Chapter 8 Testing.
March 25, Announcements Friday – recruiting event 12-6 Dean Smith Center Bloomberg LP, New York City full-time positions and summer internships.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.
16 October Reminder Types of Testing: Purpose  Functional testing  Usability testing  Conformance testing  Performance testing  Acceptance.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Software Quality and Testing CPS 109: Program Design and Construction October 28, 2003.
18 October Why do we care?  Therac-25 (1985) 6 massive radiation overdoses  Multiple space fiascos (1990s) Ariane V exploded after 40 seconds.
What is Testing? Testing is the process of finding errors in the system implementation. –The intent of testing is to find problems with the system.
The Software Development Process
Software Defects.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
Formal Methods.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Testing. Today’s Topics Why Testing? Basic Definitions Kinds of Testing Test-driven Development Code Reviews (not testing) 1.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Software Quality Assurance and Testing Fazal Rehman Shamil.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
29 March Software Quality and Testing. Why do we care? Therac-25 (1985) Multiple space fiascos (1990s) Ariane V exploded after 40 seconds (conversion)
Dynamic Testing.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
CS 160 and CMPE/SE 131 Software Engineering April 12 Class Meeting Department of Computer Science Department of Computer Engineering San José State University.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Testing and Evolution CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L.
Introduction to Software Testing Maili Markvardt.
1 Software Testing. 2 What is Software Testing ? Testing is a verification and validation activity that is performed by executing program code.
Testing Verification and the Joy of Breaking Code
Software Testing.
Chapter 8 – Software Testing
Chapter 13 & 14 Software Testing Strategies and Techniques
Introduction to Software Testing
Lecture 09:Software Testing
Testing and Test-Driven Development CSC 4700 Software Engineering
Project Management: Inspections and Reviews Formal Specifications
CMPE/SE 131 Software Engineering April 11 Class Meeting
Presentation transcript:

NIST (2002) cost to US $59B annual

Why do we care?  Therac-25 (1985) 6 massive radiation overdoses  Multiple space fiascos (1990s) Ariane V exploded after 40 seconds (conversion) Mars Pathfinder computer kept turning itself off (system timing) Patriot missile misguided (floating point accuracy)  Millenium bug (2000)  Healthcare.gov (2013)

Case Study:Healthcare.gov  Lacked project management  Requirement changes Log in to view LAST MINUTE  Lacked adequate testing  Political decision to roll out

Healthcare.gov Project Management  Too little time in schedule from start As large as Windows XP  No prime contractor in charge  System integration by inexperienced government agency  Project manager didn’t know there were problems Ignored subcontractor that said it was broken

Healthcare.gov Testing  Failed test with 200 users Still failing with thousand Capacity was supposed to be twice  Bypassed their own deploy rules for security No tests  No end-to-end testing

Quality and testing  “Errors should be found and fixed as close to their place of origin as possible.” Fagan  “Trying to improve quality by increasing testing is like trying to lose weight by weighing yourself more often.” McConnell  and more... and more...

Testing Classification  Purpose  Scope  Access  Risk-based  Structured vs Free Form

For Different Purposes  Testing (functional, unit, integration)  Usability testing  Acceptance testing  Performance testing  Reliability testing  Conformance testing (standards) …… Types of Testing

Other classifications Scope Unit, component, system, regression, … Time After design/coding Before (test-driven development, agile) During… ongoing Code visibility Black box: code treated as input/output function and no use of code structure is used (programming by contract) White box: code structure is used to determine test cases and coverage

Best (and Worst) Testing Practices (Boris Beizer)Boris Beizer  Unit testing to 100% coverage: necessary but not sufficient for new or changed  Testing to requirements: test to end users AND internal users  Test execution automation: not all tests can be automated  Test design automation: implies building a model. Use only if you can manage the many tests  Independent test groups: not for unit and integration testing

Best (and Worst) Testing Practices (Boris Beizer)Boris Beizer  Integration testing: at every step; not once  System testing: AFTER unit and integration testing  Stress testing: only need to do it at the start of testing. Runs itself out  Regression testing: needs to be automated and frequent  Reliability testing: not always applicable. statistics skills required  Performance testing: need to consider payoff  Usability testing: only useful if done early  Beta testing: not instead of in-house testing

How important is unit test?  The Voyager bug (sent the probe into the sun).  ‘90: The AT&T bug that took out 1/3 of US telephones (crash on receipt of crash notice). The DCS bug that took out the other 1/3 a few months later.  ‘93: The Intel Pentium chip bug (it was software, not hardware).  ‘96: The Ariane V bug: auto-destruct (data conversion).

Life Testing  Used regularly in hardware Addresses “normal use”  n specimens put to test  Test until r failures have been observed  Choose n and r to obtain the desired statistical errors  As r and n increase, statistical errors decrease  Expected time in test = mu 0 (r / n) Where mu 0 = mean failure time

Butler and Finelli  “The Infeasibility of Experimental Quantification of Life-Critical Software Reliability”  In order to establish that the probability of failure of software is less than in 10 hours, testing required with one computer (1990s technology) is greater than 1 million years

What are you trying to test?  Basic Functionality? many techniques  Most common actions? Cleanroom (Harlan Mills)  Most likely problem areas? Risk-based testing

Risks  Identify criteria of concern: availability, quality, performance, …  Risk of it not being met ○ likelihood ○ consequences  If I’m testing code for a grocery store, what is the impact of the code failing (or down)?  What about missile guidance?  What about nuclear power plant control?

Mills: Cleanroom  Test based on likelihood of user input  User profile: study users, determine most probable input actions and data values  Test randomly, drawing data values from the distribution (contrast monkey testing)  Means most likely inputs occur more often in testing  Finds errors most likely to happen during user executions

How to identify what to test  New features  New technology  Overworked developers  Regression  Dependencies  Complexity  Bug history  Language specific bugs  Environment changes  Late changes  Slipped in “pet” features  Ambiguity  Changing requirements  Bad publicity  Liability  Learning curve  Criticality  Popularity

Four Parts of Testing  Model  Select test cases  Execute test cases  Measure

Basic Software Model capabilities environment User interfaces APIs Operating system Files Input Output Storage Processing Whittaker, How to Break Software

Test Case Selection Environments What happens if a file changes out from under you? Consider all error cases from system calls ○ (e.g., you can’t get memory) Test on different platforms: software and hardware Test on different versions and with different languages

Test Case Selection Capabilities Inputs (boundary conditions, equivalence classes) Outputs (can I generate a bad output?) States (reverse state exploration) Processing

Working backwards  Here’s the case I’m worried about  How could I have gotten here? Different order of entry Skipping intialization Reverse state traversal

From the User Interface: Inputs  Error messages  Default values  Character sets and data types  Overflow input buffers  Input interactions  Repeated inputs  Unlikely inputs  How easy is it to find bugs in Word?

Questions to Ask for Each Test  How will this test find a defect?  What kind of defect?  How powerful is this test against that type of defect? Are there more powerful ones?

Test Coverage Metrics Statement coverage Basic block coverage Decision coverage Each Boolean expression Condition coverage Each entity in Boolean expressions Path coverage Loops, branch combinations These are “white box” methods

Estimating how many bugs are left  Historical data  Capture-recapture model  Error seeding

Historical Data  Lots of variants based on statistical modeling  What data should be kept?  When are releases comparable?  Dangers with a good release Test forever Adversarial relation between developers and tester  Dangers with a bad release Stop too soon

Capture-recapture model  Estimate animal populations: How many deer in the forest? ○ Tag and recount ○ If all tagged, assume you’ve seen them all  Applied to software by Basin in 73  Number of errors = |e 1 | * |e 2 | / |e 1 ∩ e 2 | where e n = errors found by tester n 2 testers: 25, 27, 12 overlap: 56 total errors  What’s wrong with this model (aside from the fact the denominator can be 0)? Assumptions about independence of testers

Error “seeding”  Also called mutation testing  Deliberately put errors into code  Testing finds the seeded errors as well as one put there by developers  Percentage of seeded errors found is related to percentage of “normal” errors found

Usability Testing  Frequency with which the problem occurs Common or rare?  Impact of the problem if it occurs Easy or difficult to overcome?  Persistence of the problem One-time problem or repeated?

Wizard of Oz Testing  Inputs and outputs are as expected  How you get between the two is “anything that works”  Particularly useful when you have An internal interface UI choices to make Children’s Intuitive Gestures in Vision-Based Action Games CACM Jan 2005, vol. 48, no. 1, p. 47

Number of Usability Test Users Needed Usability problems found = N(1-(1-L) n ) N = total number of usability problems in the design L = proportion of usability problems discovered by a single user n = number of users L=31%

Using as an Estimator  Found 100 problems with 10 users  Assumption: each user finds 10% of problems  How many are left?  found = N(1-(1-L) n ) 100 = N(1-(1-.1) 10 ) N = 100/( )=154  54 left

Unit Test What do they do? Regression testing framework Incrementally build test suites Typically language specific What is available? JUnit most well known (Java) SUnit was the first (Smalltalk) xUnit where x is most every language known Eclipse has unit test plugins for many languages

Regression Test  Automated  Run with every build  Issues Time GUI based  Tools Wide range of capabilities and quality Avoid pixel-based tools Random vs. scripted testing

GUI Testing  Has improved  Position -> DOM for web  Run through the steps and it remembers  One list One list

Performance Test Tools  What do they do? Orchestrate test scripts Simulate heavy loads  What is available? JMeter (Apache project) Grinder (Java framework)

Stress Testing  A kind of performance testing  Determines the robustness of software by testing beyond the limits of normal operation  Particularly important for "mission critical" software, where failure has high costs  “stress” makes most sense for code with timing constraints… such as servers that are made to handle, say, 1000 requests per minute, etc.  Tests demonstrate robustness, availability, and error handling under a heavy load

Fuzz Testing, or Fuzzing  Involves providing invalid, unexpected, or random data as inputs of a program  Program is monitored for exceptions such as  Crashes  failing built-in code assertions  memory leaks  Fuzzing is commonly used to test for security problems in software or computer systems

Other Test Tools  Tons of test tools  Web site tools

Reaction to “Integration Phase”  Taken from Extreme Programming  Martin Fowler, Continuous IntegrationContinuous Integration

 Maintain a single source repository  Automate the build  Make your build self-testing  Every commit build on an integration machine  Keep the build fast  Test in a clone of the production environment  Make it easy to get the latest executable  Everyone can see what’s happening  Automate deployment

Continuous Testing  Automate unit tests  Automate component tests  Automate system tests  Automate functional tests  Categorize developer tests  Run faster tests first  Write tests for defects  Make component tests repeatable

Tools  Jenkins Open source Came from Oracle Cloudbee  What does it do? Integrate repository and build engine Runs tests and publishes results

Other Ways of Improving Quality  Reviews and inspections  Formal specification  Program verification and validation  Self-checking (paranoid) code  Deploy with capabilities to repair

Formal Methods and Specifications  Mathematically-based techniques for describing system properties  Used in inference systems Do not require executing the program Proving something about the specification not already stated Formal proofs Mechanizable Examples: theorem provers and proof checkers

Uses of Specifications Requirements analysis  rigor System design  Decomposition, interfaces Verification  Specific sections Documentation System analysis and evaluation  Reference point, uncovering bugs

Examples of Formal Methods  Abstract data types Algebras, theories, and programs ○ VDM (Praxis: UK Civil aviation display system CDIS) ○ Z (Oxford and IBM: CICS) ○ Larch (MIT)  Concurrent and distributed systems State or event sequences, transitions ○ Hoare’s CSP ○ Transition axioms ○ Lamport’s Temporal Logic  Programming languages!

Self Checking Code Exploit redundancy Run multiple copies of the code, vote on critical results and decisions Identifies erroneous system via its disageement Develop functionally identical versions with different code compositions, different teams Perhaps use different hardware hosts Used on space shuttle

Self-Repairing Code ClearView DARPA Cyber Grand Challenge HP BIOSphere