Profs. Aiken, Barrett & Dill CS357 Lecture 3 1 Saturn Lecture 3 CS357.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Finding bugs: Analysis Techniques & Tools Symbolic Execution & Constraint Solving CS161 Computer Security Cho, Chia Yuan.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Satisfiability Modulo Theories (An introduction)
Signals and Systems March 25, Summary thus far: software engineering Focused on abstraction and modularity in software engineering. Topics: procedures,
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Control Structures Any mechanism that departs from straight-line execution: –Selection: if-statements –Multiway-selection: case statements –Unbounded iteration:
Lecture # 21 Chapter 6 Uptill 6.4. Type System A type system is a collection of rules for assigning type expressions to the various parts of the program.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
Automatic Predicate Abstraction of C-Programs T. Ball, R. Majumdar T. Millstein, S. Rajamani.
Constraint-Based Analysis CS void f(state *x, state *y) { result = spin_trylock( & x->lock); spin_lock( & y->lock); … if (!result) spin_unlock(
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Saturn1 Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project Alex Aiken Stanford University.
Scalable Error Detection using Boolean Satisfiability 1 Yichen Xie and Alex Aiken Stanford University.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Introduction to Computers and Programming Lecture 4: Mathematical Operators New York University.
CS 536 Spring Global Optimizations Lecture 23.
CS 536 Spring Run-time organization Lecture 19.
1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
22-Jun-15 Introduction to Primitives. 2 Overview Today we will discuss: The eight primitive types, especially int and double Declaring the types of variables.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Intraprocedural Points-to Analysis Flow functions:
ESC Java. Static Analysis Spectrum Power Cost Type checking Data-flow analysis Model checking Program verification AutomatedManual ESC.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Run-time Environment and Program Organization
CS 267: Automated Verification Lecture 13: Bounded Model Checking Instructor: Tevfik Bultan.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Chapter 2: Algorithm Discovery and Design
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Procedure specifications CSE 331. Outline Satisfying a specification; substitutability Stronger and weaker specifications - Comparing by hand - Comparing.
 Value, Variable and Data Type  Type Conversion  Arithmetic Expression Evaluation  Scope of variable.
Lecture 8. How to Form Recursive relations 1. Recap Asymptotic analysis helps to highlight the order of growth of functions to compare algorithms Common.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Race Checking by Context Inference Tom Henzinger Ranjit Jhala Rupak Majumdar UC Berkeley.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Predicate Abstraction of ANSI-C Programs Using SAT By Edmund Clarke, Daniel Kroening, Natalia Sharygina, Karen Yorav Presented by Yunho Kim Provable Software.
Saturn Overview1 An Overview of the Saturn Project.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Reasoning about programs March CSE 403, Winter 2011, Brun.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Dataflow Analysis for Concurrent Programs using Datarace Detection Ravi Chugh, Jan W. Voung, Ranjit Jhala, Sorin Lerner LBA Reading Group Michelle Goodstein.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
CS357 Lecture 13: Symbolic model checking without BDDs Alex Aiken David Dill 1.
SATURN: An Overview Shrawan Kumar
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 11: Pointers.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
Lecture 3: More Java Basics Michael Hsu CSULA. Recall From Lecture Two  Write a basic program in Java  The process of writing, compiling, and running.
Formal methods: Lecture
SS 2017 Software Verification Bounded Model Checking, Outlook
Software Testing.
GC211Data Structure Lecture2 Sara Alhajjam.
Object Oriented Programming
Run-time organization
Software Testing (Lecture 11-a)
Pointer analysis.
Predicate Abstraction
Presentation transcript:

Profs. Aiken, Barrett & Dill CS357 Lecture 3 1 Saturn Lecture 3 CS357

Goals Scale to the largest programs Be precise –path-sensitive & context-sensitive –model most operations exactly Back in 2004, this hadn’t been done. Profs. Aiken, Barrett & Dill CS357 Lecture 3 2

Limitations Give up on soundness –At least temporarily Why? Profs. Aiken, Barrett & Dill CS357 Lecture 3 3

4 void f(state *x, state *y) { result = spin_trylock( & x->lock); spin_lock( & y->lock); … if (!result) spin_unlock( & x->lock); spin_unlock( & y->lock); } Code Example Path Sensitivity result (!result) Pointers & Heap ( & x->lock); ( & y->lock); Inter- procedural Flow Sensitivity spin_trylock spin_lock spin_unlock Locked Unlocked Error unlock lock unlock lock

Profs. Aiken, Barrett & Dill CS357 Lecture 3 5 Saturn What? –SAT-based approach to static bug detection How? –Program constructs  Boolean constraints –Inference  SAT solving Why SAT? –Program states naturally expressed as bits –The theory for bits is SAT –Efficient SAT solvers available –SMT wasn’t available

Profs. Aiken, Barrett & Dill CS357 Lecture 3 6 Straight-Line Code void f(int x, int y) { int z = x & y ; assert(z == x); } x 31 …x0x0 y 31 …y0y0 == x 31  y 31 … x0y0x0y0 Bitwise-AND R y & x z == ;

Profs. Aiken, Barrett & Dill CS357 Lecture 3 7 Straight-line Code void f(int x, int y) { int z = x & y; assert(z == x); } R Query: Is-Satisfiable(  ) Answer: Yes x = [00…1] y = [00…0] Negated assertion is satisfiable. Therefore, the assertion may fail.

Primitive Operations What language primitives can be represented by boolean functions? All of them! –Every finite function is equivalent to some boolean expression Profs. Aiken, Barrett & Dill CS357 Lecture 3 8

Easy Ones Logical operations –&, |, ~, bit shifts Relational operations –==, >, <=, ! Profs. Aiken, Barrett & Dill CS357 Lecture 3 9

Slightly Harder Addition –Chain 32 1-bit adders together Multiplication by a constant k –Chain k 32-bit adders together Profs. Aiken, Barrett & Dill CS357 Lecture bit adder Add a to b with carry c Bit: (a xor b) xor c Carry: (a  b)  (b  c)  (a  c)

Casts Casts don’t change the value of any bits But –May increase or decrease the number of bits Widening casts pad Drop appropriate bits for narrowing casts Casts are easy in this representation –But a major problem for most C analyzers. Profs. Aiken, Barrett & Dill CS357 Lecture 3 11

Too Hard Some operations are too complicated –Multiplication of variables –Division, mod –Floating point operations Model by introducing unconstrained boolean variables for the result –Represents unknown outcome Profs. Aiken, Barrett & Dill CS357 Lecture 3 12

Summary Model primitive operations as boolean functions Most can be modeled exactly –No loss of information Profs. Aiken, Barrett & Dill CS357 Lecture 3 13

Profs. Aiken, Barrett & Dill CS357 Lecture 3 14 if (c) Control Flow Merges –preserve path sensitivity –select bits based on the values of incoming guards G = c, x: [a 31 …a 0 ] G =  c, x: [b 31 …b 0 ] G = c   c, x: [v 31 …v 0 ] where v i = (c  a i )  (  c  b i ) c x = a; x = b; res = x; cc if (c) x = a; else x = b; res = x; true

Profs. Aiken, Barrett & Dill CS357 Lecture 3 15 Loops Loops not allowed –Unroll loops k times, drop backedges May miss errors that are deeply buried –Bug finding, not verification –Hypothesis: Many errors surface in a few iterations Advantages –Simplicity, reduces false positives

Profs. Aiken, Barrett & Dill CS357 Lecture 3 16 Pointers – Overview May point to different locations… –Thus, use points-to sets p: { l 1,…,l n } … but path sensitive –Use guards on points-to relationships p: { (g 1, l 1 ), …, (g n, l n ) }

Profs. Aiken, Barrett & Dill CS357 Lecture 3 17 G = c, p: { (true, y) } Pointers – Example G = true, p: { (true, x) } p = & x; if (c) p = & y; res = *p; G = true, p: { (c, y); (  c, x)} if (c) res = y; else if (  c) res = x;

Profs. Aiken, Barrett & Dill CS357 Lecture 3 18 Pointers – Recap Guarded Location Sets { (g 1, l 1 ), …, (g n, l n ) } Guards –Condition under which points-to relationship holds –Collected from statement guards Pointer Dereference –Conditional assignments

Modeling the Environment f(foo *x) { … *x … } What do we get when x is dereferenced? –Build environment lazily. –When x is dereferenced, create new object (new bits) of the appropriate type Profs. Aiken, Barrett & Dill CS357 Lecture 3 19

Modeling the Environment F(foo *x) { … y = x->next … … z = y->next … } All environment objects are assumed to be unaliased. Profs. Aiken, Barrett & Dill CS357 Lecture 3 20 x y z next

Profs. Aiken, Barrett & Dill CS357 Lecture 3 21 What can we do with Saturn? int f(lock_t *l) { lock(l); … unlock(l); } if (l->state == Unlocked) l->state = Locked; else l->state = Error; if (l->state == Locked) l->state = Unlocked; else l->state = Error; Locked Unlocked Error unlock lock unlock lock

Profs. Aiken, Barrett & Dill CS357 Lecture 3 22 General FSM Checking Encode FSM in the program –State  Integer –Transition  Conditional Assignments Check code behavior –SAT queries

Profs. Aiken, Barrett & Dill CS357 Lecture 3 23 How are we doing so far? Precision: Scalability:  –SAT limit is 1M clauses –About 10 functions Solution: –Divide and conquer –Function summaries

Profs. Aiken, Barrett & Dill CS357 Lecture 3 24 Function Summaries (1 st try) Function behavior can be summarized with a set of state transitions Summary: * l: Unlocked  Unlocked Locked  Error int f(lock_t *l) { lock(l); … unlock(l); return 0; }

Computing Function Summaries int f(lock_t *l) { lock(l); … unlock(l); return 0; } SAT( Formula(body of f)  (*l starts unlocked)  (*l ends unlocked) ) Profs. Aiken, Barrett & Dill CS357 Lecture 3 25

Computing Function Summaries SAT( Formula(body of f)  (loc l starts in state x)  (loc l ends in state y) ) for every combination of l, x, and y in f State transition exists if satisfiable Profs. Aiken, Barrett & Dill CS357 Lecture 3 26

Profs. Aiken, Barrett & Dill CS357 Lecture 3 27 int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } A Difficulty Problem –two possible output states –distinguished by return value (retval == 0)… Summary 1. (retval == 0) *l: Unlocked  Unlocked Locked  Error 2.  (retval == 0) *l: Unlocked  Locked Locked  Error

Profs. Aiken, Barrett & Dill CS357 Lecture 3 28 FSM Function Summaries Summary representation (simplified): { P in, P out, R } User gives: –P in : predicates on initial state –P out : predicates on final state –Express interprocedural path sensitivity Saturn computes: –R: guarded state transitions –Used to simulate function behavior at call site

Profs. Aiken, Barrett & Dill CS357 Lecture 3 29 int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Lock Summary (2 nd try) Output predicate: –P out = { (retval == 0) } Summary (R): 1. (retval == 0) *l: Unlocked  Unlocked Locked  Error 2.  (retval == 0) *l: Unlocked  Locked Locked  Error

Scalability How big can a summary based on FSM be? –# of FSM state pairs –times 2^(# of predicates) –times # of modeled locations –Note: Not dependent on size of the code! Locking example: –3 states, 1 predicate, 1 location = (3 * 2) * 2 * 1 = 12 SAT tests Profs. Aiken, Barrett & Dill CS357 Lecture 3 30

Profs. Aiken, Barrett & Dill CS357 Lecture 3 31 Lock Checker for Linux Parameters: –States: { Locked, Unlocked, Error } –P in = {} –P out = { (retval == 0) } Experiment: –Linux Kernel 2.6.5: 4.8MLOC –~40 lock/unlock/trylock primitives –20 hours to analyze 3.0GHz Pentium IV, 1GB memory

Profs. Aiken, Barrett & Dill CS357 Lecture 3 32 Double Locking/Unlocking static void sscape_coproc_close(…) { spin_lock_irqsave(&devc->lock, flags); if (…) sscape_write(devc, DMAA_REG, 0x20); … } static void sscape_write(struct … *devc, …) { spin_lock_irqsave(&devc->lock, flags); … }

Profs. Aiken, Barrett & Dill CS357 Lecture 3 33 Ambiguous Return State int i2o_claim_device(…) { down(&i2o_configuration_lock); if (d->owner) { up(&i2o_configuration_lock); return –EBUSY; } if (…) { return –EBUSY; } … }

Profs. Aiken, Barrett & Dill CS357 Lecture 3 34 Bugs TypeBugsFalse Pos.% Bugs Double Locking % Ambiguous State % Total % Previous Work: MC (31), CQual (18), <20% Bugs

Profs. Aiken, Barrett & Dill CS357 Lecture 3 35 Function Summary Database 63,000 functions in Linux – More than 23,000 are lock related – 17,000 with locking constraints on entry – Around 9,000 affect more than one lock – 193 lock wrappers – 375 unlock wrappers – 36 with return value/lock state correlation

The Rest Talk about some details –Strengths and weaknesses A few general principles –Illustrated using Saturn Profs. Aiken, Barrett & Dill CS357 Lecture 3 36

Bailing Out It is always useful to understand how an analysis gives up In Saturn, unanalyzable expressions are modeled by fresh bit variables –Makes constraints easier to satisfy Profs. Aiken, Barrett & Dill CS357 Lecture 3 37

Loop Unrolling Unrolling a few iterations of loops pitched as a feature –Bounded model-checking work does the same But this is not just convenient –It’s necessary! –No good way to express feedback constraints of loops/recursive functions when variables range only over 0 and 1 Profs. Aiken, Barrett & Dill CS357 Lecture 3 38

Function Side-Effects To be sound(er), must take into account side-effects of functions f(x) { … g(y) … } g(z) { … *z = w … } Saturn tracks the set of locations modified by a function –Modified locations set to unknown bits in the caller Profs. Aiken, Barrett & Dill CS357 Lecture 3 39

Conditionals There is a problem with conditionals if (p) then s 1 else s 2 Assume the guard for s 1 is g p What is the guard for s 2 ? –if g p is exact, then it is  g p –but if g p is an overapproximation, then it is not  g p Profs. Aiken, Barrett & Dill CS357 Lecture 3 40

The Heap Saturn is very strong at reasoning about –Control flow –Local variables –Expressions But it is weak at deep heap reasoning –Assumes no aliasing in environment –Can’t really express recursive data structures Profs. Aiken, Barrett & Dill CS357 Lecture 3 41

Optimization Saturn builds a formula for every bit used in a function body –But not all are used in queries! Example x = y * z a = b + c assert(x == 0) This optimization is important! –The difference between terminating and not terminating Profs. Aiken, Barrett & Dill CS357 Lecture 3 42

Lesson 1 Contrary to popular belief, there is no magic in solvers. Smaller formulas take less time to solve. Profs. Aiken, Barrett & Dill CS357 Lecture 3 43

Lesson 2 Be as lazy as possible! Work delayed long enough is never done. Profs. Aiken, Barrett & Dill CS357 Lecture 3 44

Lesson 3 Scalability requires bounding the size of formulas. Space is the critical resource, not time. Profs. Aiken, Barrett & Dill CS357 Lecture 3 45

Profs. Aiken, Barrett & Dill CS357 Lecture 3 46 Lesson 4 Analyzing in one direction is problematic –Forwards or backwards Constraints –Give a global picture of the program –Allow more efficient order of solution

Profs. Aiken, Barrett & Dill CS357 Lecture 3 47 Why SAT? (Revisited …) Moore’s Law Uniform modeling of constructs as bits Constraints –Local specification –Global solution Incremental SAT solving –makes multiple queries efficient

Profs. Aiken, Barrett & Dill CS357 Lecture 3 48 Why SAT? (Cont.) Path sensitivity is important –To find bugs –To reduce false positives –Much easier to model precisely with SAT Compositionality is important –Function summaries critical for scalability –Easy to construct with SAT queries