Predicting problems caused by component upgrades

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.
An Integration of Program Analysis and Automated Theorem Proving Bill J. Ellis & Andrew Ireland School of Mathematical & Computer Sciences Heriot-Watt.
CSE 373: Data Structures and Algorithms
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
CSC 395 – Software Engineering Lecture 21: Overview of the Term & What Goes in a Data Dictionary.
Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.
EFFICIENCY & SORTING CITS1001. Listen to the sound of sorting Various algorithms Quicksort
1 What NOT to do I get sooooo Frustrated! Marking the SAME wrong answer hundreds of times! I will give a list of mistakes which I particularly hate marking.
Procedure specifications CSE 331. Outline Satisfying a specification; substitutability Stronger and weaker specifications - Comparing by hand - Comparing.
Reading and Writing Mathematical Proofs
Types for Programs and Proofs Lecture 1. What are types? int, float, char, …, arrays types of procedures, functions, references, records, objects,...
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Reading and Writing Mathematical Proofs Spring 2015 Lecture 4: Beyond Basic Induction.
Semantics In Text: Chapter 3.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.
Proof And Strategies Chapter 2. Lecturer: Amani Mahajoub Omer Department of Computer Science and Software Engineering Discrete Structures Definition Discrete.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Principles of Programming & Software Engineering
Modular Decomposition, Abstraction and Specifications
Creativity of Algorithms & Simple JavaScript Commands
Advanced Computer Systems
SDN Network Updates Minimum updates within a single switch
User-Written Functions
Debugging Intermittent Issues
Types for Programs and Proofs
CPSC 121: Models of Computation 2008/9 Winter Term 2
Testing Hypotheses about Proportions
Quickly Detecting Relevant Program Invariants
CompSci 230 Software Construction
Input Space Partition Testing CS 4501 / 6501 Software Testing
Automatic Generation of Program Specifications
Chapter 8 – Software Testing
CSE 374 Programming Concepts & Tools
Simple Sorting Algorithms
1. Welcome to Software Construction
Principles of Programming and Software Engineering
Debugging Intermittent Issues
The Pseudocode Programming Process
Reasoning About Code.
BASICS OF SOFTWARE TESTING Chapter 1. Topics to be covered 1. Humans and errors, 2. Testing and Debugging, 3. Software Quality- Correctness Reliability.
Design by Contract Fall 2016 Version.
The Future of Software Engineering: Tools
Automated Support for Program Refactoring Using Invariants
CSCE 315 – Programming Studio, Fall 2017 Tanzir Ahmed
Portability CPSC 315 – Programming Studio
Objective of This Course
Human Complexity of Software
Generic programming in Java
CSCE 489- Problem Solving Programming Strategies Spring 2018
Chapter 13 Quality Management
Semantics In Text: Chapter 3.
Software visualization and analysis tool box
Regression Testing.
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Testing & Security Dr. X.
Software Specifications
CS 165: Project in Algorithms and Data Structures Michael T. Goodrich
Presentation transcript:

Predicting problems caused by component upgrades [Be excited! Have a twinkle in my eye! This is good research; it is exciting; share the fun I had and enthusiasm I feel.] [Smile when I make a joke -- don’t be too dry, don’t sound mean.] [Don’t go too fast through first part of talk.] [Make eye contact.] This is a technique for helping programmers to cope with inadequately documented programs. Invite questions and comments. Daikon: an Asian radish (not the Asian radish) Perhaps don’t say this, but this is but a summary: I can’t talk about every aspect of this research, but welcome questions either now or in individual meetings. Michael Ernst MIT Lab for Computer Science http://pag.lcs.mit.edu/~mernst/ Joint work with Stephen McCamant

An upgrade problem You are a happy user of Stucco Photostand You install Microswat Monopoly Photostand stops working Why? Step 2 upgraded winulose.dll Photostand is not compatible with the new version

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion

Upgrade safety System S uses component C A new version C’ is released Might C’ cause S to misbehave? (This question is undecidable.)

Previous solutions Integrate new component, then test Resource-intensive Vendor tests new component Impossible to anticipate all uses User, not vendor, must make upgrade decision (We require this) Static analysis to guarantee identical or subtype behavior Difficult and inadequate

Behavioral subtyping Subtyping guarantees type compatibility No information about behavior Behavioral subtyping [Liskov 94] guarantees behavioral compatibility Provable properties of supertype are provable about subtype Operates on human-supplied specifications Ill-matched to the component upgrade problem

Behavioral subtyping is too strong and too weak OK to change APIs that the application does not call … or other aspects of APIs that are not depended upon Too weak: Application may depend on implementation details Example: Component version 1 returns elements in order Application depends on that detail Component version 2 returns elements in a different order Who is at fault in this example? It doesn’t matter!

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion 

Features of our solution Application-specific Can warn before integrating, testing Minimal disruption to the development process Requires no source code Requires no formal specification Warns regardless of who is at fault Accounts for internal and external behaviors Caveat emptor: no guarantee of (in)compatibility!

Run-time behavior comparison Compare run-time behaviors of components Old component, in context of the application New component, in context of vendor test suite Compatible if the vendor tests all the functionality that the application uses Consider comparing test suites “Behavioral subtesting” Vendor tests must produce the same results, of course!

Reasons for behavioral differences Differences between application and test suite use of component require human judgment True incompatibility Change in behavior might not affect application Change in behavior might be a bug fix Vendor test suite might be deficient It may be possible to work around the incompatibility

Operational abstraction Abstraction of run-time behavior of component Set of program properties – mathematical statements about component behavior Syntactically identical to formal specification

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion 

Dynamic invariant detection Goal: recover invariants from programs Technique: run the program, examine values Artifact: Daikon http://pag.lcs.mit.edu/daikon Experiments demonstrate accuracy, usefulness

Goal: recover invariants Detect invariants (as in asserts or specifications) x > abs(y) x = 16*y + 4*z + 3 array a contains no duplicates for each node n, n = n.child.parent graph g is acyclic if ptr  null then *ptr > i

Uses for invariants Write better programs [Gries 81, Liskov 86] Document code Check assumptions: convert to assert Maintain invariants to avoid introducing bugs Locate unusual conditions Validate test suite: value coverage Provide hints for higher-level profile-directed compilation [Calder 98] Bootstrap proofs [Wegbreit 74, Bensalem 96] [This is a laundry list of potential uses for invariants. I'm trying to motivate why one would care about invariants.] Better programs when invariants are used in their design. But here I will focus on uses of invariants in already-constructed programs. Documentation is guaranteed to be up-to-date; for assert, is machine-checkable. Invariants established at one point may be depended upon elsewhere. This was the original motivation for the system. There’s evidence that maintenance introduces many bugs (and anecdotal evidence that some are due to violating invariants). The next four are arguably more useful with dynamic than with static invariants: Unusual (exceptional) conditions may indicate bugs or special cases that should be brought to a programmer’s attention. (But finding bugs isn’t my focus or what I consider the tool’s strength, though it was effective at that task. I would rather prevent a bug than have to find it any day.) Value coverage may be inadequate even if a test suite covers every line of a program. [Example: only small values.] Use invariants like profile-directed compilation, but at a higher and possibly more valuable level (cf “Value Profiling and Optimization” by Calder et al). Bootstrap proofs: or, formally prove the invariants. In theorem-proving, many consider it harder to determine what to prove than to actually do the proof. [Don’t say “’turn the crank”; it belittles theorem-proving.] [If you prove the property, the system becomes sound: but probably don’t raise this red flag yet.] Now, I’ll try to convince you that this is at least possible.

Ways to obtain invariants Programmer-supplied Static analysis: examine the program text [Cousot 77, Gannod 96] properties are guaranteed to be true pointers are intractable in practice Dynamic analysis: run the program complementary to static techniques [Joke] I have heard that some programs lack exhaustively-specified invariants; we’ll assume for the rest of the talk that we have such invariants. Claim: programmers have these in mind, consciously or unconsciously, when they write code. We want access to that hidden part of the design space. We might want to check/refine/strengthen programmer-supplied invariants, and the program may be modified/maintained. (Leveson has noted that program self-checks are often ineffective.) May want to check/strengthen programmer-supplied assertions. Static analysis: For instance, abstract interpretation. The results of a conservative, sound analysis are guaranteed to be true. Static techniques can’t report true but undecidable properties. (or properties of a program context) Pointers: problematic to represent program state and the heap; must approximate, resulting in too-weak results. Dynamic techniques are complementary to static ones.

Dynamic invariant detection Look for patterns in values the program computes: Instrument the program to write data trace files Run the program on a test suite Invariant engine reads data traces, generates potential invariants, and checks them [No setup needed; dive right into this.] [Remain excited! (Here and on next slide.)] All of these steps are automatic. Instrumenters are source-to-source translators. Define the front end here, or say “instrumenter” instead of “front end”. Possibly define back end here; but I think I don’t use that term.

Checking invariants For each potential invariant: instantiate (determine constants like a and b in y = ax + b) check for each set of variable values stop checking when falsified This is inexpensive: many invariants, each cheap All three steps are inexpensive: instantiation, checking, and number of checks (most checks fail quickly). There are many invariants, but each one is cheap. Can think of the property as a generalization from the first few values; then check the rest of the values. Example: 2 degrees of freedom in y=ax+b, so two (x,y) pairs uniquely determine a and b. (I don’t actually do it that way for reasons of numerical stability, but that is the idea.)

Improving invariant detection Add desired invariants: implicit values, unused polymorphism Eliminate undesired invariants: unjustified properties, redundant invariants, incomparable variables Traverse recursive data structures Conditionals: compute invariants over subsets of data (if x>0 then yz) [Make it clear that this is all completed work; I have solved these problems.] A naive implementation wouldn’t (and didn’t) produce that exact output. The simplified picture I have given you so far isn’t enough; it wouldn’t produce all the output you saw in the initial examples, and it would produce other undesirable output. I will now give some additional details. Discuss what relevance is. Unused polymorphism: overly wide types. I will discuss each of these in turn. Collectively/overall, how much improvement? I can’t say, because these are not just quantitative but qualitative improvements.

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion 

Testing upgrade compatibility User computes operational abstraction of old component, in context of application Vendor computes operational abstraction of new component, over its test suite Vendor supplies operational abstraction along with new component User compares operational abstractions OAapp for old component OAtest for new component

New operational abstraction must be stronger Approximate test: OAtest  OAapp OA consists of precondition and postcondition Per behavioral subtyping: Preapp  Pretest Posttest  Postapp Sufficient, but not necessary goal Preapp  Pretest Define “precondition”? Application   Test suite Postapp Posttest  known

Comparing operational abstractions Sufficient but not necessary: Preapp  Pretest Posttest  Postapp Sufficient and necessary: Preapp & Posttest  Postapp x is even  x is an integer increment test suite Application   x’ = x + 1 x’ is odd x’ = x + 1  Introduce the x’ notation.

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion 

Sorting application // Sort the argument into ascending order static void bubble_sort(int[] a) { for (int x = a.length - 1; x > 0; x--) { // Compare adjacent elements in a[0..x] for (int y = 0; y < x; y++) { if (a[y] > a[y+1]) swap(a, y, y+1); }

Swap component // Exchange the two array elements at i and j static void swap(int[] a, int i, int j) { int temp = a[i]; a[i] = a[j]; a[j] = temp; } There is a “serious” problem with this implementation: it uses extra space!

Upgrade to swap component // Exchange the two array elements at i and j static void swap(int[] a, int i, int j) { a[i] ^= a[j]; a[j] ^= a[i]; } This upgrade uses less temporary space! 

Compare abstractions     a != null 0 <= i < size(a[])-1 1 <= j <= size(a[])-1 i < j j == i + 1 a[i] == a[j-1] a[i] > a[j] a != null 0 <= i <= size(a[])-1 0 <= j <= size(a[])-1 i != j  bubble_sort application   swap test suite a’[i] == a[j] a’[j] == a[i] a’[i] == a’[j-1] a’[j] == a[j-1] a’[i] < a’[j]  a’[i] == a[j] a’[j] == a[i]

Compare abstractions Preapp  Pretest     a != null 0 <= i < size(a[])-1 1 <= j <= size(a[])-1 i < j j == i + 1 a[i] == a[j-1] a[i] > a[j] a != null 0 <= i <= size(a[])-1 0 <= j <= size(a[])-1 i != j  bubble_sort application   swap test suite a’[i] == a[j] a’[j] == a[i] a’[i] == a’[j-1] a’[j] == a[j-1] a’[i] < a’[j]  a’[i] == a[j] a’[j] == a[i] Preapp  Pretest

Compare abstractions     Preapp & Posttest  Postapp a != null 0 <= i < size(a[])-1 1 <= j <= size(a[])-1 i < j j == i + 1 a[i] == a[j-1] a[i] > a[j] a != null 0 <= i <= size(a[])-1 0 <= j <= size(a[])-1 i != j  bubble_sort application   swap test suite a’[i] == a[j] a’[j] == a[i] a’[i] == a’[j-1] a’[j] == a[j-1] a’[i] < a’[j]  a’[i] == a[j] a’[j] == a[i] Preapp & Posttest  Postapp

Compare abstractions Upgrade succeeds     a != null 0 <= i < size(a[])-1 1 <= j <= size(a[])-1 i < j j == i + 1 a[i] == a[j-1] a[i] > a[j] a != null 0 <= i <= size(a[])-1 0 <= j <= size(a[])-1 i != j  bubble_sort application   swap test suite a’[i] == a[j] a’[j] == a[i] a’[i] == a’[j-1] a’[j] == a[j-1] a’[i] < a’[j]  a’[i] == a[j] a’[j] == a[i] Upgrade succeeds

Another sorting application // Sort the argument into ascending order static void selection_sort(int[] a) { for (int x = 0; x <= a.length - 2; x++) { // Find the smallest element in a[x..] int min = x; for (int y = x; y < a.length; y++) { if (a[y] < a[min]) min = y; } swap(a, x, min);

Compare abstractions    Test suite  a != null 0 <= i < size(a[])-1 i <= j <= size(a[])-1 a[i] >= a[j] a != null 0 <= i <= size(a[])-1 0 <= j <= size(a[])-1 i != j  selection_sort application   Test suite a’[i] == a[j] a’[j] == a[i] a’[i] <= a’[j]  a’[i] == a[j] a’[j] == a[i]

Compare abstractions Upgrade fails    Test suite  a != null 0 <= i < size(a[])-1 i <= j <= size(a[])-1 a[i] >= a[j] a != null 0 <= i <= size(a[])-1 0 <= j <= size(a[])-1 i != j  selection_sort application   Test suite a’[i] == a[j] a’[j] == a[i] a’[i] <= a’[j]  a’[i] == a[j] a’[j] == a[i] Upgrade fails

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion 

Currency case study Application: Math-Currency Component: Math-BigInt (versions 1.40, 1.42) Both from Comprehensive Perl Archive Network Our technique is needed: a wrong version of BigInt induces two errors in Currency

Downgrade from BigInt 1.42 to 1.40 (Why downgrade? Fix bugs, porting.) Inconsistency is discovered: In 1.42, bcmp returns -1, 0, or 1 In 1.40, bcmp returns any integer Do not downgrade without further examination Application might do (a <=> b) == (c <=> d) (This change is not reflected in the documentation.)

Upgrade from BigInt 1.40 to 1.42 Inconsistency: In 1.40, bcmp($1.67, $1.75)  0 In 1.42, bcmp($1.67, $1.75)  -1 Our system did not discover this property … … but it discovered differences in behavior of other components that interacted with it Do not upgrade without further examination The change was a change to the mantissa during the right-shift that truncated to whole-dollar valuees. We don’t know whether this change introduces or fixes a bug.

Outline The upgrade problem Solution: Compare observed behavior Capturing observed behavior Comparing observed behavior (details) Example: Sorting and swap Case study: Currency Conclusion 

Getting to Yes: Limits of the technique Rejecting an upgrade is easier than approving it Application postconditions may be hard to prove Can check the reason for the rejection Key problem is limits of the theorem prover Adjust grammar of operational abstractions Stronger or weaker properties may be provable Weak properties may depend on strong ones

Implementation status Operational abstractions are automatically generated (by the Daikon invariant detector) In Currency case study, operational abstractions were compared by hand Operational abstractions are automatically compared (by the Simplify theorem prover) Requires background theory for each property

Contributions New technique for early detection of upgrade problems Compares run-time behavior of old & new components Technique is Application-specific Pre-integration Lightweight Source-free Specification-free Blame-neutral Output-independent Unvalidated

Questions? The point of this slide is to separate my extra slides from the main part of the talk.