Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005.

Slides:



Advertisements
Similar presentations
Assertion Checking over Combined Abstraction of Linear Arithmetic and Uninterpreted Functions Sumit Gulwani Microsoft Research, Redmond Ashish Tiwari SRI.
Advertisements

A Randomized Satisfiability Procedure for Arithmetic and Uninterpreted Function Symbols Sumit Gulwani George Necula EECS Department University of California,
A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula.
Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,
Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula.
Global Value Numbering using Random Interpretation Sumit Gulwani George C. Necula CS Department University of California, Berkeley.
Variations of the Turing Machine
3.6 Support Vector Machines
2. Getting Started Heejin Park College of Information and Communications Hanyang University.
October 31, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL13.1 Introduction to Algorithms LECTURE 11 Amortized Analysis Dynamic tables.
Copyright © Cengage Learning. All rights reserved.
Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley.
Logical Abstract Interpretation Sumit Gulwani Microsoft Research, Redmond.
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
EE384y: Packet Switch Architectures
1 Knowledge and reasoning – second part Knowledge representation Logic and representation Propositional (Boolean) logic Normal forms Inference in propositional.
Constraint Satisfaction Problems
Analysis of Computer Algorithms
Sugar 2.0 Formal Specification Language D ana F isman 1,2 Cindy Eisner 1 1 IBM Haifa Research Laboratory 1 IBM Haifa Research Laboratory 2 Weizmann Institute.
1 Verification of Infinite State Systems by Compositional Model Checking Ken McMillan Cadence Berkeley Labs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Programming Language Concepts
Chapter 7 Sampling and Sampling Distributions
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Mathematics for Economics Beatrice Venturi 1 Economics Faculty CONTINUOUS TIME: LINEAR DIFFERENTIAL EQUATIONS Economic Applications LESSON 2 prof. Beatrice.
Pole Placement.
Spoofing State Estimation
Copyright © Cengage Learning. All rights reserved.
Chapter 11: Models of Computation
Randomized Algorithms Randomized Algorithms CS648 1.
Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.
Hash Tables.
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
CENG536 Computer Engineering Department Çankaya University.
1 Decision Procedures An algorithmic point of view Equality Logic and Uninterpreted Functions.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Simplex Method MSci331—Week 3~4.
© 2012 National Heart Foundation of Australia. Slide 2.
Science as a Process Chapter 1 Section 2.
6.4 Best Approximation; Least Squares
More Two-Step Equations
Lilian Blot CORE ELEMENTS SELECTION & FUNCTIONS Lecture 3 Autumn 2014 TPOP 1.
Vector Algebra One Mark Questions PREPARED BY:
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Exponents and Radicals
PSSA Preparation.
Insertion Sort Introduction to Algorithms Insertion Sort CSE 680 Prof. Roger Crawfis.
1 Functions and Applications
9. Two Functions of Two Random Variables
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
State Variables.
The Pumping Lemma for CFL’s
Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.
Precise Inter-procedural Analysis Sumit Gulwani George C. Necula using Random Interpretation presented by Kian Win Ong UC Berkeley.
Program Analysis Using Randomization Sumit Gulwani, George Necula (U.C. Berkeley)
Global Value Numbering Using Random Interpretation OSQ Retreat, May 2003 Sumit Gulwani George Necula EECS Department University of California, Berkeley.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Presentation transcript:

Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005

1 Program Analysis Applications in all aspects of software development, e.g. Program correctness –Software bugs are expensive! Compiler optimizations –Provide people freedom to write code the way they want (leaving performance issues to compilers). Translation validation –Semantic equivalence of programs before and after compilation (difficult to trust o/p of compiler for safety-critical systems).

2 Design choices in Program Analysis Completeness (precision, # of false positives) Computational complexity Ease of implementation Soundness = If analysis says no bugs, it means no bugs. What if we allow probabilistic soundness ? –We get more precise, efficient and even simpler algorithms. –Earlier probabilistic algorithms were used in other areas like networks, but not in program analysis. –We obtain a new class of analyses: random interpretation.

3 Random Interpretation = Random Testing + Abstract Interpretation Random Testing: Test program on random inputs Simple, efficient but unsound (cant prove absence of bugs) Abstract Interpretation: Class of deterministic program analyses Interpret (analyze) an abstraction (approximation) of program Sound but usually complicated, expensive Random Interpretation: Class of randomized program analyses Almost as simple, efficient as random testing Almost as sound as abstract interpretation

4 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1

5 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Random Testing Need to test blue path to falsify second assertion. Chances of choosing blue path from set of all 4 paths are small. Hence, random testing is unsound.

6 a+b=i a+b=i, c=-d a=i-2, b=2 a+b=i c=2a+b, d=b-2i a+b=i c=b-a, d=i-2b a=0, b=i a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Abstract Interpretation Computes invariant at each program point. Operations are usually complicated and expensive.

7 a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert(c+d = 0); assert(c = a+i) c := 2a + b; d := b – 2i; True False True * * Example 1: Random Interpretation Choose random values for input variables. Execute both branches of a conditional. Combine values of variables at join points. Test the assertion.

8 Outline Random Interpretation Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005) –Other applications

9 Linear relationships in programs with linear assignments Linear relationships (e.g., x=2y+5) are useful for –Program correctness (e.g. buffer overflows) –Compiler optimizations (e.g., constant and copy propagation, CSE, Induction variable elimination etc.) programs with linear assignments does not mean inapplicability to real programs –abstract other program stmts as non-deterministic assignments (standard practice in program analysis)

10 Basic idea in random interpretation Generic algorithm: Choose random values for input variables. Execute both branches of a conditional. Combine the values of variables at join points. Test the assertion.

11 Idea #1: The Affine Join operation w = 7 a = 2 b = 3 a = 4 b = 1 a = 7 (2,4) = -10 b = 7 (3,1) = 15 Affine join of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (a+b=5) It does not introduce false relationships w.h.p.

12 Idea #1: The Affine Join operation Affine join of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (a+b=5) It does not introduce false relationships w.h.p. Unfortunately, non-linear relationships are not preserved (e.g. a £ (1+b) = 8) w = 5 a = 5 (2,4) = -6 b = 5 (3,1) = 11 w = 7 a = 2 b = 3 a = 4 b = 1 a = 7 (2,4) = -10 b = 7 (3,1) = 15

13 Geometric Interpretation of Affine Join a b a + b = 5 b = 2 (a = 2, b = 3) (a = 4, b = 1) : State before the join : State after the join satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5) Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability

i=3, a=0, b=3 i=3 a := 0; b := i; a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) i=3, a=-4, b=7 c=23, d=-23 c := 2a + b; d := b – 2i; i=3, a=1, b=2 i=3, a=-4, b=7 c=-1, d=1 i=3, a=-4, b=7 c=11, d=-11 False w 1 = 5 w 2 = 2 True * * Example 1 Choose a random weight for each join independently. All choices of random weights verify first assertion Almost all choices contradict second assertion

15 Example 2 We need to make use of the conditional x=y on the true branch to prove the assertion. a := x + y b := a b := 2x assert (b = 2x) TrueFalse x = y ?

16 Idea #2: The Adjust Operation Execute multiple runs of the program in parallel. Sample S = Collection of states at a program point Adjust(S, e=0) is the sample obtained by linear combination of states in S such that –The equality conditional is satisfied. –Note that original relationships are preserved. Use Adjust(S, e=0) on true branch of the conditional e=0

17 Geometric Interpretation of Adjust Program states = points Adjust = projection onto the hyperplane Adjust operation loses one point. Algorithm to obtain S = Adjust(S, e=0) S4S4 S2S2 S3S3 S1S1 S3S3 S1S1 S2S2 Hyperplane e = 0

18 Correctness of Random Interpreter R Completeness: If e 1 =e 2, then R ) e 1 =e 2 –assuming non-det conditionals Soundness: If e 1 e 2, then R e 1 = e 2 –error prob. · b, j : number of branches and joins d: size of set from which random values are chosen k: number of points in the sample –If j = b = 10, k = 15, d ¼ 2 32, then error ·

19 Proof Methodology Proving correctness was the most complicated part in this work. We used the following methodology. Design an appropriate deterministic algorithm (need not be efficient) Prove (by induction) that the randomized algorithm simulates each step of the deterministic algorithm with high probability.

20 Outline Random Interpretation –Linear arithmetic (POPL 2003) Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005) –Other applications

21 Problem: Global value numbering a := 5; x := a*b; y := 5*b; z := b*a; a := 5; x := F(a,b); y := F(5,b); z := F(b,a); Abstraction x=y and x=z Reasoning about multiplication is undecidable only x=y Reasoning is decidable but tricky in presence of joins Axiom: If x 1 =y 1 and x 2 =y 2, then F(x 1,x 2 )=F(y 1,y 2 ) Goal: Detect expression equivalence when program operators are abstracted using uninterpreted functions Application: Compiler optimizations, Translation validation

assert(x = y); assert(z = F(y)); * x = (a,b) y = (a,b) z = (F(a),F(b)) F(y) = F( (a,b)) Typical algorithms treat as uninterpreted –Hence cannot verify the second assertion The randomized algorithm interprets –as affine join operation w x := a; y := a; z := F(a); x := b; y := b; z := F(b); Example True False

23 How to execute uninterpreted functions ? Expression Language e := y | F(e 1,e 2 ) Choose a random interpretation for F Non-linear interpretation –E.g. F(e 1,e 2 ) = r 1 e r 2 e 2 2 –Preserves all equivalences in straight-line code –But not across join points Lets try linear interpretation

24 Random Linear Interpretation Encode F(e 1,e 2 ) = r 1 e 1 + r 2 e 2 Preserves all equivalences across a join point Introduces false equivalences in straight-line code. E.g. e and e have same encodings even though e e Problem: Scalar multiplication is commutative. Solution: Choose r 1 and r 2 to be random matrices and evaluate expressions to vectors F FF abcd e =e =F FF acbd e = Encodings e = r 1 (r 1 a+r 2 b) + r 2 (r 1 c+r 2 d) = r 1 2 (a)+r 1 r 2 (b)+r 2 r 1 (c)+r 2 2 (d) e = r 1 2 (a)+r 1 r 2 (c)+r 2 r 1 (b)+r 2 2 (d)

25 Outline Random Interpretation –Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) Inter-procedural analysis (POPL 2005) –Other applications

26 Example a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert (c + d = 0); assert (c = a + i) c := 2a + b; d := b – 2i; True False The second assertion is true in the context i=2. Interprocedural Analysis requires computing procedure summaries. True * *

i=2 a=0, b=i a := 0; b := i;a := i-2; b := 2; c := b – a; d := i – 2b; assert (c+d = 0); assert (c = a+i) a=8-4i, b=5i-8 c=21i-40, d=40-21i c := 2a + b; d := b – 2i; a=i-2, b=2 a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i False w 1 = 5 w 2 = 2 Idea #1: Keep input variables symbolic Do not choose random values for input variables (to later instantiate by any context). Resulting program state at the end is a random procedure summary. a=0, b=2 c=2, d=-2 True * *

28 Experiments

29 Experiments Randomized algorithm discovers 10-70% more facts. Randomized algorithm is slower by a factor of 2. Randomized Deterministic

30 Experimental measure of error The % of incorrect relationships decreases with increase in S = size of set from which random values are chosen. N = # of random summaries used S N The experimental results are better than what is predicted by theory

31 Outline Random Interpretation –Linear arithmetic (POPL 2003) –Uninterpreted functions (POPL 2004) –Inter-procedural analysis (POPL 2005) Other applications

32 Other applications of random interpretation Model Checking –Randomized equivalence testing algorithm for FCEDs, which represent conditional linear expressions and are generalization of BDDs. (SAS 04) Theorem Proving –Randomized decision procedure for linear arithmetic and uninterpreted functions. This runs an order of magnitude faster than det. algo. (CADE 03) Ideas for deterministic algorithms –PTIME algorithm for global value numbering, thereby solving a 30 year old open problem. (SAS 04)

Summary Linear ArithmeticAffine Join, Adjust Lessons Learned Randomization buys efficiency, simplicity at cost of prob. soundness. Randomization suggests ideas for deterministic algorithms. Combining randomized and symbolic techniques is powerful. Uninterpreted Fns.Vectors Interproc. AnalysisSymbolic i/p variables Key Idea