Program Analysis via Satisfiability Modulo Path Programs

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
50.530: Software Engineering
Semantics Static semantics Dynamic semantics attribute grammars
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
Program Representations. Representing programs Goals.
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.
ISBN Chapter 3 Describing Syntax and Semantics.
Weakest pre-conditions and towards machine consistency Saima Zareen.
SAT and Model Checking. Bounded Model Checking (BMC) A.I. Planning problems: can we reach a desired state in k steps? Verification of safety properties:
The Software Model Checker BLAST by Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala and Rupak Majumdar Presented by Yunho Kim Provable Software Lab, KAIST.
Using Statically Computed Invariants Inside the Predicate Abstraction and Refinement Loop Himanshu Jain Franjo Ivančić Aarti Gupta Ilya Shlyakhter Chao.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Solving Partial Order Constraints for LPO termination.
1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.
Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???
Technion 1 Generating minimum transitivity constraints in P-time for deciding Equality Logic Ofer Strichman and Mirron Rozanov Technion, Haifa, Israel.
Technion 1 (Yet another) decision procedure for Equality Logic Ofer Strichman and Orly Meir Technion.
Computing Over­Approximations with Bounded Model Checking Daniel Kroening ETH Zürich.
Describing Syntax and Semantics
Invisible Invariants: Underapproximating to Overapproximate Ken McMillan Cadence Research Labs TexPoint fonts used in EMF: A A A A A.
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
By: Pashootan Vaezipoor Path Invariant Simon Fraser University – Spring 09.
CSI 3125, Axiomatic Semantics, page 1 Axiomatic semantics The assignment statement Statement composition The "if-then-else" statement The "while" statement.
Ethan Jackson, Nikolaj Bjørner and Wolfram Schulte Research in Software Engineering (RiSE), Microsoft Research 1. A FORMULA for Abstractions and Automated.
CS 363 Comparative Programming Languages Semantics.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Reasoning about programs March CSE 403, Winter 2011, Brun.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
Semantics In Text: Chapter 3.
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
SS 2017 Software Verification Bounded Model Checking, Outlook
Weakest Precondition of Unstructured Programs
Control Flow Testing Handouts
Computability and Complexity
CSE 311 Foundations of Computing I
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Reasoning About Code.
Reasoning about code CSE 331 University of Washington.
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Data Structures and Algorithms
Complexity 6-1 The Class P Complexity Andrei Bulatov.
Axiomatic semantics Points to discuss: The assignment statement
SAT-Based Area Recovery in Technology Mapping
IS 2935: Developing Secure Systems
Logic Use mathematical deduction to derive new knowledge.
Semantics In Text: Chapter 3.
Static Single Assignment Form (SSA)
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CSCI1600: Embedded and Real Time Software
Computer Security: Art and Science, 2nd Edition
CSC-682 Advanced Computer Security
This Lecture Substitution model
Predicate Abstraction
Instructor: Aaron Roth
Course: CS60030 Formal Systems
COP4020 Programming Languages
Lecture 27: More Graph Algorithms
Presentation transcript:

Program Analysis via Satisfiability Modulo Path Programs William Harris, Sriram Sankaranarayanan, Franjo Ivančić, Aarti Gupta -designing reliable software remains an open and key problem POPL 2010

Assertions as Specifications Lightweight Often automatic from semantics Null-pointer dereferences Buffer overflows -Developing reliable software remains an open but important challenge. -Assertions serve are a useful mechanism for specification. -However, checking property assertions for real-world programs is hard. In our experiments, we found that real-world programs with tens of thousands of lines of code often have thousands of properties that can’t be proven using flow-sensitive analysis.

proc. foo(int* p, int pLen, int mode) assume (pLen > 1); int off, L := 1, bLen := 0; if (p = NULL) pLen := -1; if (mode = 0) off := 1; else off := 0; while (L <= pLen) bLen := L – off; L := 2 * L; assert(!p || bLen <= pLen); -To illustrate this challenge, I’ll use the following running example. -This toy program models typical code that manipulates some buffers, after the code’s been length variables that track the buffer lengths. -Don’t worry about all of the details of the program at this point. For now, just observe that it contains a pointer variable and buffer length variables pLen and bLen. Correctness is specified as an assertion that by the end, either p is null or bLen <= pLen.

Left Branch, Left Branch Path Program: Left Branch, Left Branch L := 1 bLen := 0 pLen >= 1 p = 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 ... Æ p = 0 p != 0 pLen := -1 … Æ p = 0 Æ pLen = -1 mode = 0 mode != 0 … Æ p = 0 Æ mode = 0 off := 0 off := 1 … Æ p = 0 -To try and prove that the assertion is always valid, we can represent the program using this CFG, where the red node denotes a state in which the assertion fails. We can then analyze the program using abstract interpretation over a relational domain, and if this analysis determines that the invariant “False” holds at the red node, then the assertion will never be violated. -However, if we apply a flow-sensitive analysis, then after the first merge-point, the analysis can no longer show that p is NULL or that pLen <= bLen, and so it fails to prove the assertion. So a degree of path sensitivity is required. -Now, instead of focusing on individual paths, our approach actually focuses on path programs. All that a path program is -Each path in this program lies in one of four path programs, that each correspond to different branches taken before reaching the loop. … Æ p = 0 L > pLen p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

Left Branch, Right Branch Path Program: Left Branch, Right Branch L := 1 bLen := 0 pLen >= 1 p = 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 … Æ p = 0 p != 0 pLen := -1 … Æ p = 0 mode = 0 mode != 0 … Æ p = 0 Æ mode  0 off := 0 off := 1 … Æ p = 0 -If we analyze the left branch, right branch path program, then we prove the invariant False at the error block, intuitively also because the path program contains the edge that assumes p = 0. … Æ p = 0 L > pLen L <= pLen bLen := L – off L := L * 2 p != 0 && bLen > pLen False

Right Branch, Left Branch Path Program: Right Branch, Left Branch L := 1 bLen := 0 pLen >= 1 p = 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 p != 0 pLen := -1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 mode = 0 mode != 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 off := 0 off := 1 bLen · pLen bLen · pLen -Are these the right invariants at each program point? L > pLen L <= pLen bLen := L – off L := L * 2 p != 0 && bLen > pLen False

Right Branch, Right Branch Path Program: Right Branch, Right Branch L := 1 bLen := 0 pLen >= 1 p = 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 p != 0 pLen := -1 L = 1 Æ bLen = 0 Æ p  0 mode = 0 mode != 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 off := 0 off := 1 bLen · pLen bLen · pLen Step through path program of right branch, then right branch. L > pLen p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

Control-Flow Abstraction L := 1 bLen := 0 pLen >= 1 L := 1 mode != 0 bLen · pLen off := 1 bLen · pLen -The traditional approach to this is to use predicate abstraction. In that case, we would take the predicates from the invariants found and use them to refine an abstract state space. So the predicates over state are used to generalize the proof for one path program. But in many cases, the abstract state space can greatly increase as predicates are added. -Another scheme that’s been proposed recently is to use control-flow abstraction. In control-flow abstraction, we use the inductive invariants to find a set of edges such that any path program that contains the set must be safe. We then use this fact to select the next path program to analyze. In the example, such a set is denoted by the green edges. bLen · pLen L > pLen p != 0 && bLen > pLen p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

Key Issues Need: abstraction refinement that allow us to analyze a small set of path programs, generalize proofs. -Up till now, the key challenge with control-flow abstraction was to find an abstraction-refinement scheme that allows us to efficiently rule out multiple path programs at once. And that’s the main problem that this work aims to solve.

Road Map Satisfiability Modulo Path Programs (SMPP) Experimental Evaluation

Abstraction Encode unproven path programs as propositional formula. Query SAT solver for solution. From the solution, extract an unverified path program. -Our abstraction takes advantage of the fact that modern SAT solvers are very efficient in practice. -I’ll first describe how we construct our initial abstraction, and then describe how we refine the abstraction using the analysis of each path program.

Propositional Variables for Edges q0 q1 q2 q3 q4 q5 q6 q7 We first collapse every loop in the original CFG into a single node to get a graph that is the strongly connected component decomposition of the original CFG. Every path in the resulting graph naturally corresponds to a path program, and we now construct a propositional formula that corresponds to these path programs. For every edge in the decompostion graph, we generate a fresh propositional variable. If a set of variables are true in a solution to our abstraction formula, then we want the corresponding edges to form a valid, potentially unsafe path program that doesn’t contain any branches. We do this by building up our abstraction formula from the following constraints. q8 q9

Encoding Path Programs CFG Form Depiction Encoding entry edges error edges q0 = True q0 Our initial abstraction formula is then a conjunction of the following constraints. We want to constrain that a potentially unsafe path program starts at the entry node, so we constrain that the propositional variable for one of the edges leaving the initial node of the CFG. Similarly, we want to constraint that a potentially unsafe path program ends at the error node, so we constrain that the prop variable for one of the edges enter q9 q9 = True

Encoding Path Programs q3 ! exactlyOne(q4, q5) q2 ! exactlyOne(q4, q5) q4 ! exactlyOne(q3, q2) q5 ! exactlyOne(q3, q2) q3 q2 q4 q5 -Kirkhoff’s Laws

Initial Abstraction of Example q0 Æ q0 $ exactlyOne(q1, q2) Æ q1 $ q3 Æ exactlyOne(q3, q2) $ exactlyOne(q4, q5) Æ (q4 $ q6) Æ (q5 $ q7) Æ exactlyOne(q6, q7) $ q8 Æ q8 $ q9 Æ q9 q0 Æ q0 $ exactlyOne(q1, q2) Æ q1 $ q3 Æ exactlyOne(q3, q2) $ exactlyOne(q4, q5) Æ (q4 $ q6) Æ (q5 $ q7) Æ exactlyOne(q6, q7) $ q8 Æ q8 $ q9 Æ q9 ¤ := Give the propositional formula for a path program to be well-formed.

A Path Program from a SAT Solution q0 q0 q1 q2 q2 q3 q5 q5 q4 q6 q7 q7 TODO: get this slide right (currently is a copy of the encoding slide) -extracting a path program from the SAT solution q8 q8 q9 q9

Refinement Apply program analysis oracle to determine safety of path program If safe, then encode safety in the abstraction -the question now is how to take a safety proof of a single path program and encode this information in the abstraction, ideally in such a way that the safety proof of one path program actually proves the safety of multiple path programs.

Apply Analysis Oracle: Naïve bLen := 0 pLen >= 1 p = 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 p != 0 pLen := -1 L = 1 Æ bLen = 0 Æ p  0 mode = 0 mode != 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 off := 0 off := 1 bLen · pLen bLen · pLen TODO: consider cutting if short on time L > pLen p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

Prop. Encoding: Naïve q0 q1 q2 q3 q4 q5 q6 q7 q8 q9 Step through path program of right branch, then right branch. q8 q9

Naïve Blocking Clause ¤ := ¤ Æ : (q0 Æ q2 Æ q5 Æ q7 Æ q8 Æ q9)

Apply Analysis Oracle: Local Repair bLen := 0 pLen >= 1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 p = 0 p != 0 pLen := -1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 Æ p  0 mode = 0 mode != 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 Æ p  0 Æ mode  0 TODO: -make invariants consistent -Suppose that we’ve applied our analysis oracle to the path program. off := 0 off := 1 bLen · pLen Æ off = 1 bLen · pLen Æ off = 1 L > pLen P != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

Apply Analysis Oracle: Local Repair bLen := 0 pLen >= 1 L := 1 bLen := 0 pLen >= 1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 p = 0 p != 0 pLen := -1 pLen := -1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 Æ p  0 mode = 0 mode != 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 L = 1 Æ bLen = 0 Æ pLen ¸ 1 Æ p  0 Æ Mode  0 -For each control-flow edge, we then do the following. We reset the invariant at the source of the edge to True, and then iteratively add conjuncts back in from the original invariant. We stop when the new invariant combined with the transition relation of the edge is strong enough to imply the invariant at the destination of the edge. -So in the example, consider the edge going into the error node. The original invariant at this point contained two conjuncts. When we perform local repair, we reset this invariant to be True, and then need only add a single invariant, bLen <= pLen back in to support the invariant “False” at the destination of the edge. -Once we have these weaker invariants, we then perform the following check on each edge. If the edge is an assignment, then we check if the target of the assignment occurs in the invariant at the destination of the edge. If the edge is a condition, then we check if the new invariant at the destination of the edge supports the condition while the new invariant at the source of the edge does not. We call the set of all edges that satisfy these conditions a sufficient set, and we can show that any path program that contains a sufficient set is itself safe. The exception to this is if the path program also contains an edge that “interferes” with the invariant established by the sufficient set. bLen · pLen off := 0 off := 1 bLen · pLen Æ off = 1 bLen · pLen Æ off = 1 bLen · pLen L > pLen p != 0 && bLen > pLen p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

Prop. Encoding: Local Repair q0 q3 q9 q1 q2 q4 q5 q6 q7 q8 TODO: add fade effects for green text -emphasize support edges Illustrate analysis of right branch, right branch program path with local repair. q8 q9

Blocking Clause with Local Repair : (q0 Æ q2 Æ q5 Æ q7 Æ q8 Æ q9) ¤ := ¤ Æ

One More Path Program Suffices L := 1 bLen := 0 pLen >= 1 p = 0 … Æ p = 0 Æ pLen = -1 mode = 0 mode != 0 … Æ p = 0 Æ mode = 0 off := 0 off := 1 … Æ p = 0 Step through path program of right branch, then left branch. … Æ p = 0 L > pLen L <= pLen bLen := L – off L := L * 2 p != 0 && bLen > pLen p != 0 && bLen > pLen False

Experiments

Zitser Benchmarks Real world programs: wu-ftpd bind sendmail Checked for buffer overflow bugs BLAST proves more properties on average SMPP completes faster (>100x) -dozens to hundreds of properties that can’t be proven using flow-sensitive abstract interpretation -failure in BLAST caused by not finding the right loop invariant

Larger Benchmarks Real-world programs: thttpd ssh-server xvidcore Checked function pre, post conditions for buffer accesses SMPP proved ~35% of thousands of properties -thttpd: an http server -xvidcore: video codec library

Conclusion SMPP uses a symbolic abstraction refinement scheme for control-flow. SMPP is slightly coarser than predicate abstraction, but converges much faster.

Questions?

Predicate Abstraction L := 1 bLen := 0 pLen >= 1 : (bLen · pLen) bLen · pLen p != 0 : (bLen · pLen) bLen · pLen mode != 0 : (bLen · pLen) bLen · pLen : (bLen · pLen) off := 1 L > pLen bLen · pLen TODO: cut this slide for time requirements (?) -In general though, we can’t hope to analyze each path program, so we need a way of enumerating only few path programs, and using the proof of safety for each to prove safety of the whole program. -one refines an abstract state machine that overapproximates the behavior of the original program, and where every state corresponds to an evaluation of some finite set of predicates -So suppose that you were -use predicates atomic predicates in the invariants to refine the state space -each refinement can double the state space along the spurious error trace bLen · pLen : (bLen · pLen) p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

SMPP: Path Program 1 L := 1 bLen := 0 pLen >= 1 L := 1 bLen := 0 mode = 0 mode != 0 L = 1 Æ bLen = 0 Æ pLen ¸ 1 off := 0 off := 1 bLen · pLen bLen · pLen -intuitively, we’d like to establish that any path program that contains the green edges and not the orange edge is safe, just from analyzing a single path L > pLen p != 0 && bLen > pLen p != 0 && bLen > pLen L <= pLen bLen := L – off L := L * 2 False

SMPP: Path Program 2 L := 1 bLen := 0 pLen >= 1 p = 0 p = 0 … Æ p = 0 Æ pLen = -1 mode = 0 mode != 0 … Æ p = 0 Æ mode = 0 off := 0 off := 1 … Æ p = 0 Step through path program of right branch, then left branch. … Æ p = 0 L > pLen L <= pLen bLen := L – off L := L * 2 p != 0 && bLen > pLen p != 0 && bLen > pLen False

Problem Statement SMPP SMT Given a branching program disjunctive formula determine safety satisfiability using theory solvers analyses for path programs. conjunctive formulas. TODO: can this flow naturally into abstraction-refinement structure slide? Is it just out of place?

Key Analog Abstraction using a propositional encoding Refinement using blocking clauses conjoined to abstraction