Global Value Numbering using Random Interpretation Sumit Gulwani George C. Necula CS Department University of California, Berkeley
2 Global Value Numbering Problem –To detect equivalences of expressions in a program –To obtain a complete algorithm under the assumptions: Conditionals are non-deterministic Operators are uninterpreted –F(e 1,e 2 ) = F(e 1,e 2 ), F=F, e 1 =e 1, e 2 =e 2 Existing algorithms –Precise but expensive –Efficient but imprecise Use randomization to obtain a precise, efficient but probabistically sound algorithm –Complements our POPL 03 algorithm, which handles only arithmetic
3 Outline Two key ideas in the algorithm –The affine join operation –K-linear interpretations Correctness of the algorithm Termination of the algorithm
4 assert(x = y); assert(z = F(y)); Example * x = (a,b) y = (a,b) z = (F(a),F(b)) F(y) = F( (a,b)) Typical algorithms treat as uninterpreted –Hence cannot verify the second assertion The randomized algorithm interprets –Similar to the randomized algorithm for linear arithmetic x := a; y := a; z := F(a); x := b; y := b; z := F(b);
5 Review: Randomized Algorithm for Linear Arithmetic a := 0; b := 1;a := 1; b := 0; c := b – a; d := 1 – 2b; assert (c + d = 0); assert (c = a + 1) c := 2a + b; d := b – 2; T T F F Between random testing and abstract interpretation Choose random values for input variables Execute both branches Combine the values of a variable at join points using a random affine combination
6 Review: The Affine Join Operation Affine combination of v 1 and v 2 w.r.t. weight w w (v 1,v 2 ) ´ w v 1 + (1-w) v 2 Affine join preserves common linear relationships (e.g. a+b=5) It does not introduce false relationships w.h.p. Unfortunately, non-linear relationships are not preserved (e.g. a (1+b) = 8) a := 2; b := 3; a := 4; b := 1; a = 7 (2,4) = -10 b = 7 (3,1) = 15 (w = 7)
7 a := 0; b := 1;a := 1; b := 0; c := b – a; d := 1 – 2b; assert (c + d = 0); assert (c = a + 1) a = -4, b = 5 c = -39, d = 39 c := 2a + b; d := b – 2; a = 1, b = 0a = 0, b = 1 a = -4, b = 5 c = -3, d = 3 a = -4, b = 5 c = 9, d = -9 T T F F w 1 = 5 w 2 = -3 Review: Example Choose a random weight for each join independently. All choices of random weights verify the first assertion Almost all choices contradict the second assertion
8 Uninterpreted Functions e := y | F(e 1,e 2 ) Choose a random interpretation for F Non-linear interpretation –E.g. F(e 1,e 2 ) = r 1 e r 2 e 2 2 –Preserves all equivalences in straight-line code –But not across join points Lets try linear interpretation
9 (Naïve) Linear Interpretation Encode F(e 1,e 2 ) = r 1 e 1 + r 2 e 2 Preserves all equivalences across a join point Introduces false equivalences in straight-line code F FF abcd e =e =F FF acbd e = E.g. e and e have same encodings even though e e Problem: too few random coefficients! Encodings e = r 1 (r 1 a+r 2 b) + r 2 (r 1 c+r 2 d) = r 1 2 (a)+r 1 r 2 (b)+r 2 r 1 (c)+r 2 2 (d) e = r 1 2 (a)+r 1 r 2 (c)+r 2 r 1 (b)+r 2 2 (d)
10 k-linear Interpretations Encode F(e 1,e 2 ) = R 1 e 1 + R 2 e 2 –Every expression evaluates to a vector of length k –R 1 and R 2 are random k £ k matrices –2k 2 random variables, k = o(n) Works since matrix multiplication is not commutative –e = R 1 2 (a) + R 1 R 2 (b) + R 2 R 1 (c) + R 2 2 (d) –e = R 1 2 (a) + R 1 R 2 (c) + R 2 R 1 (b) + R 2 2 (d) F(e 1,e 2 ) 1 F(e 1,e 2 ) k e11e11 …e1ke1k e21e21 …e2ke2k …
11 The Random Interpreter R y := e V1V1 V * True False V V1V1 V2V2 V1V1 V2V2 V V 1 = V[y à V(e)] V 1 = V V 2 = V V: Variables ! Vectors V(e): defined inductively as V(F(e 1,e 2 )) = R 1 V(e 1 ) + R 2 V(e 2 ) V j (e): the j th component of vector V(e) V j (y) = w (V 1 (y),V 2 (y)) for all y,j jj
12 Outline Two key ideas in the algorithm –The affine join operation –K-linear interpretations Correctness of the algorithm Termination of the algorithm
13 Completeness and soundness of R We compare the random interpreter R with a suitable abstract interpreter A R mimics A with high probability –R is as complete as A –R is (probabilistically) as sound as A
14 The Abstract Interpreter A * TrueFalse S S1S1 S2S2 S1S1 S2S2 S S 1 = S S 2 = S S = { e 1 =e 2 | S 1 ) e 1 =e 2, S 2 ) e 1 =e 2 } S 1 = S[y/y] [ { y = e[y/y] } S: set of symbolic equivalences y := e S1S1 S
15 Completeness Theorem If S ) e 1 = e 2, then V(e 1 ) = V(e 2 ) Proof: –Uninterpreted operators are modeled as linear functions –The affine join operation preserves linear relationships
16 Soundness Theorem If S ) e 1 = e 2, then with high probability V(e 1 ) V(e 2 ) Error probability · –n: number of function applications –d: size of set from which random values are chosen –t : number of repetitions If n = 100, d ¼ 2 32, t = 5, then error probability ·
17 Outline Two key ideas in the algorithm –The affine join operation –K-linear interpretations Correctness of the algorithm Termination of the algorithm
18 Loops and Fixed Point Computation The lattice of sets of equivalences has finite height n. Thus, the abstract interpreter A converges to a fixed point. Thus, the random interpreter R also converges (probabilistically) We can detect convergence by comparing the set of symbolic relationships implied by vectors in two successive iterations
19 Related Work Efficient but imprecise algorithms –Congruence partitioning [Rosen, Wegman, Zadeck, POPL 88] –Rewrite rules [Ruthing, Knoop, Steffen, SAS 99] - Balanced algorithms [Gargi PLDI 2002] Precise but inefficient algorithms –Abstract interpretation on uninterpreted functions [Kildall 73] Affine join operation –Random interpretation for linear arithmetic [Gulwani, Necula POPL 03]
20 Conclusion and Future Work Key ideas in the paper (e 1,e 2 ) = w e 1 + (1-w) e 2 –Linearity, Preserves equivalences across a join point F(e 1,e 2 ) = R 1 e 1 + R 2 e 2 –Vectors ) Introduce no false equivalence Random interpretation vs. deterministic algorithms –Linear arithmetic O(n 2 ) vs. O(n 4 ) [POPL 2003] –Uninterpreted functions O(n 3 ) vs. O(n 5 log n) [this talk] Future work –Inter-procedural analysis using random interpretation –Random interpretation for other theories –Combining two random interpreters