Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.

Slides:



Advertisements
Similar presentations
Advanced programming tools at Microsoft
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
Symbolic execution © Marcelo d’Amorim 2010.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
The Software Model Checker BLAST by Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala and Rupak Majumdar Presented by Yunho Kim Provable Software Lab, KAIST.
Taming Win32 Threads with Static Analysis Jason Yang Program Analysis Group Center for Software Excellence (CSE) Microsoft Corporation.
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Aliases in a bug finding tool Benjamin Chelf Seth Hallem June 5 th, 2002.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
Approach #1 to context-sensitivity Keep information for different call sites separate In this case: context is the call site from which the procedure is.
KQS More exercises/practice What about research frontier? Reading material Meetings for project Post notes more promptly.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Intraprocedural Points-to Analysis Flow functions:
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Swerve: Semester in Review. Topics  Symbolic pointer analysis  Model checking –C programs –Abstract counterexamples  Symbolic simulation and execution.
Recap from last time g() { lock; } h() { unlock; } f() { h(); if (...) { main(); } } main() { g(); f(); lock; unlock; } mainfgh ;;;;;;; u u ” ”””” ” ”
ESP [Das et al PLDI 2002] Interface usage rules in documentation –Order of operations, data access –Resource management –Incomplete, wordy, not checked.
A simple approach Given call graph and CFGs of procedures, create a single CFG (control flow super-graph) by: –connecting call sites to entry nodes of.
Approach #1 to context-sensitivity Keep information for different call sites separate In this case: context is the call site from which the procedure is.
Projects. Dataflow analysis Dataflow analysis: what is it? A common framework for expressing algorithms that compute information about a program Why.
Another lock protocol example g() { if(isLocked()) { unlock; } else { lock; } } mainfg ;;;;; u ” ”””” ” ””” ” ” ”” ” ” ” ”””” u l l ” ””” ” ” ” ” ” {u,l}
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Transfer functions Our pairs of summaries look like functions from input information to output information We call these transfer functions Complete transfer.
Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Precision Going back to constant prop, in what cases would we lose precision?
Unleashing the Power of Static Analysis Manuvir Das Principal Researcher Center for Software Excellence Microsoft Corporation.
Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation.
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
Software Testing. 2 CMSC 345, Version 4/12 Topics The testing process  unit testing  integration and system testing  acceptance testing Test case planning.
SWE 619 © Paul Ammann Procedural Abstraction and Design by Contract Paul Ammann Information & Software Engineering SWE 619 Software Construction cs.gmu.edu/~pammann/
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Test Coverage CS-300 Fall 2005 Supreeth Venkataraman.
Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
The Yogi Project Software property checking via static analysis and testing Aditya V. Nori, Sriram K. Rajamani, Sai Deep Tetali, Aditya V. Thakur Microsoft.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
CS223: Software Engineering Lecture 26: Software Testing.
Inter-procedural analysis
Manuel Fahndrich Jakob Rehof Manuvir Das
Control Flow Testing Handouts
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
Input Space Partition Testing CS 4501 / 6501 Software Testing
Dataflow analysis.
Harry Xu University of California, Irvine & Microsoft Research
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Structural testing, Path Testing
Pointer analysis.
Implementation support
Implementation support
Presentation transcript:

Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft Corporation

PASTE'05Jason Yang, Microsoft 2 Gist of Results Symbolic path simulation engine supporting: 1. Merge – For merge-based path-sensitive analysis 2. Function summaries – For scalable global analysis 3. Pointers – Our main client is Windows

PASTE'05Jason Yang, Microsoft 3 Infeasible Path  False Positive extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } START OPENCLOSE ERROR OpenHandle UseHandle CloseHandle UseHandle

PASTE'05Jason Yang, Microsoft 4 Infeasible Path  False Positive extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } START OPENCLOSE ERROR OpenHandle UseHandle CloseHandle UseHandle

PASTE'05Jason Yang, Microsoft 5 Need for Merge The “knob” for scalability vs. precision tradeoff – Always merge (traditional dataflow)  false errors – Always separate: exponential blow-up Driven by client analyses

PASTE'05Jason Yang, Microsoft 6 Merge Criterion for ESP Selective merging based on property states – Partition symbolic states into property states and everything else – If the incoming paths differ in property states, track them separately; otherwise, merge them.

PASTE'05Jason Yang, Microsoft 7 extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } Merge Criterion for ESP  Example Property states different along paths

PASTE'05Jason Yang, Microsoft 8 extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } Merge Criterion for ESP  Example Property states different along paths  Do not merge

PASTE'05Jason Yang, Microsoft 9 extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } Merge Criterion for ESP  Example Property states are the same Property states change along paths  Do not merge

PASTE'05Jason Yang, Microsoft 10 extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } Merge Criterion for ESP  Example Property states are the same  Merge Property states change along paths  Do not merge

PASTE'05Jason Yang, Microsoft 11 extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = 1; else y = 2; if (x != 1) UseHandle(handle); } Merge Criterion for ESP  Example Property states are the same  Merge Still maintains the needed fact: “If CloseHandle is called, branch should fail.” Property states change along paths  Do not merge

PASTE'05Jason Yang, Microsoft 12 extern int a, b; void Process(int handle) { int x, y; if (a > 0) { CloseHandle(handle); x = 1; } else x = 2; if (b > 0) y = Foo(b); else y = 2; if (x != 1) UseHandle(handle); } Need for Function Summaries Partial transfer functions Computed on-demand Enforced by “into-binding” and “back-binding”

PASTE'05Jason Yang, Microsoft 13 Support for Language Features Pointers Field-based objects Operator expressions …

PASTE'05Jason Yang, Microsoft 14 Symbolic Simulator Architecture Client Application Simulation Interface (SI) Simulation Interface (SI) Simulation State Manager (SSM) Defect detection, core dump analysis, test generation code review... “Semantic translator” “Theorem prover”

PASTE'05Jason Yang, Microsoft 15 Semantic Domains Environment – ProgramSymbol  Loc – Managed by Simulation Interface Store – Loc  Val – Managed by Simulation State Manager Region-based model for symbolic store – region  Loc – value  Val

PASTE'05Jason Yang, Microsoft 16 Simulation State Manager (SSM) Tracking symbolic simulation states to answer queries about path feasibility What should be tracked? – Mapping of store region  value – Constraints on values

PASTE'05Jason Yang, Microsoft 17 Regions Variable regions vs. deref regions – Important for pointer dereference – Important for supporting merge and binding void Process(int *p, int *q) { int x = *p; int y = *q; if (p != q) return; if (*p != *q) … // Not reachable } Variable regions: R(p), R(q), R(x), R(y) Deref regions: R(*p), R(*q)

PASTE'05Jason Yang, Microsoft 18 Values Constant values (integers, floats, …) Operator values (arithmetic, bitwise, relational) Symbolic values (general constraint variables) Region-initial values (constraint variables for initial values) Pointer values (for points-to relationship) Field-based values (for compound types)

PASTE'05Jason Yang, Microsoft 19 Need for Region-Initial Values Important for function summary – Pre-condition: simulation state at Entry node – Post-condition: simulation state at Exit node – Input values vs. current values To support lazy initialization for input values – An input region gets region-initial values by default, unless it has been killed – Need to maintain a kill set

PASTE'05Jason Yang, Microsoft 20 Decision Procedures Current implementation: – Equality (e.g. a == b): equivalence classes – Disequality (e.g. a != b): multi-maps between equivalence classes – Inequality (e.g. a< b): a graph (nodes are equivalence classes and edges are inequality relations) Can plug in other theorem provers if needed

PASTE'05Jason Yang, Microsoft 21 Merge Moves symbolic states upwards in the lattice – Less constraints on path feasibility after merge Maps the memory graphs and the associated constraints on values R1 R2 R1’ R2’ R1’’ R2’’  0xEFD0 $1 $3 $2 JOIN $1 > 0 $3 > 0$2 > 0

PASTE'05Jason Yang, Microsoft 22 Example Client Analysis  ESP Path-sensitive, context sensitive, inter- procedural defect detection tool for large C/C++ programs

PASTE'05Jason Yang, Microsoft 23 Simulation Interface (SI) Fetching regions and values Assignments – E.g., x = 1; Branches – E.g., a == b; Procedure call (into-binding) Call back (back-binding)

PASTE'05Jason Yang, Microsoft 24 Into-Binding Two approaches: – Binding precise calling context into callee Less demand in reasoning power to refute infeasible path More suitable for top-down analysis – Binding no constraints (TOP) into callee More demand in reasoning power to refute infeasible path More suitable for bottom-up analysis Binding from caller Call node to callee Entry node – Bind parameters – Bind global variables – Bind constraints

PASTE'05Jason Yang, Microsoft 25 Back-Binding Binding from callee Exit node to caller Return node – Bind the region-initial values of input regions – Bind values of output regions – Bind constraints

PASTE'05Jason Yang, Microsoft 26 Experiences Security properties for future version of Windows Difficult to check with other tools Scalability – E.g., for all device drivers, found ~500 errors in 20 hours Precision: – E.g., for Windows kernel (216,000 LOC, 9755 functions) BugsFalse PositivesTime (sec) With Path Simulation Without Path Simulation

PASTE'05Jason Yang, Microsoft 27 Summary Critical for improving precision Scalable enough for industrial programs Other client analyses – PSE – Iterative refinement for ESP Beneficial to have built-in support for merge, function summaries, and pointers

PASTE'05Jason Yang, Microsoft 28 Thank You! For more information, please visit