Ben Livshits Based in part of Stanford class slides from
Lecture 1: Introduction to static analysis Lecture 2: Introduction to runtime analysis Lecture 3: Applications of static and runtime analysis reliability & bug finding security performance 2
Really basic stuff Flow Graphs Constant Folding Global Common Subexpression Elimination Data-flow analysis Proving Little Theorems Data-Flow Equations Major Examples: reaching definitions Looking forward… 3
A never-published Stanford technical report by Fran Allen in 1968 Flow graphs of intermediate code 4
5
Source code Lexing Parsing Analysis IR Executable code Code generation 6
Source code Lexing Parsing Analysis IR Executable code Code generation 7
Source code Lexing Parsing Analysis IR Executable code Code generation 8
Source code Lexing Parsing Analysis IR Executable code Code generation 9
Source code Lexing Parsing Analysis IR Executable code Code generation 10
Source code Lexing Parsing Analysis IR Executable code Code generation 11
for (i=0; i<n; i++) A[i] = 1; Make flow explicit by breaking into basic blocks = sequences of steps with entry at beginning, exit at end Intermediate code exposes optimizable constructs we cannot see at source-code level 12 i = 0 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1
Sometimes a variable has a known constant value at a point If so, replacing the variable by the constant simplifies and speeds-up the code Easy within a basic block; harder across blocks 13
14 i = 0 n = 100 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 t1 = 0 if t1>=800 goto … A[t1] = 1 t1 = t1+8
Suppose block B has a computation of x+y Suppose we are sure that when we reach this computation, we are sure to have: 1.Computed x+y, and 2.Not subsequently reassigned x or y Then we can hold the value of x+y and use it in B 15
16 a = x+y b = x+y c = x+y t = x+y a = t t = x+y b = t c = t
17 t = x+y a = t t = x+y b = t c = t t = x+y a = t b = t c = t t = x+y b = t
Proving Little Theorems Data-Flow Equations Major Examples: reaching definitions 18
boolean x = true; while (x) {... // no change to x } Let’s prove that this loop doesn’t terminate Proof: only assignment to x is at top, so x is always true 19
20 x = true if x == true “body”
Each place some variable x is assigned is a definition Ask: for this use of x, where could x last have been defined In our example: only at x=true 21
22 d 1 : x = true if x == true d 2 : a = 10 d2d2 d1d1 d1d1 d2d2 d1d1
Since at x == true, d 1 is the only definition of x that reaches, it must be that x is true at that point The conditional is not really a conditional and can be replaced by a branch 23
int i = 2; int j = 3; while (i != j) { if (i < j) i += 2; else j += 2; } We’ll develop techniques for this problem, but later … 24
25 d 1 : i = 2 d 2 : j = 3 if i != j if i < j d 4 : j = j+2d 3 : i = i+2 d 1, d 2, d 3, d 4 d1d1 d3d3 d4d4 d2d2 d 2, d 3, d 4 d 1, d 3, d 4 d 1, d 2, d 3, d 4
In this example, i can be defined in two places, and j in two places No obvious way to discover that i!=j is always true But OK, because reaching definitions is sufficient to catch most opportunities for constant folding (replacement of a variable by its only possible value) 26
It’s OK to discover a subset of the opportunities to make some code- improving transformation. It’s not OK to think you have an opportunity that you don’t really have. Must we always be conservative? 27
boolean x = true; while (x) {... *p = false;... } Is it possible that p points to x? 28
29 d 1 : x = true if x == true d 2 : *p = false d1d1 d2d2 Another def of x
Just as data-flow analysis of “reaching definitions” can tell what definitions of x might reach a point, another DFA can eliminate cases where p definitely does not point to x Example: in a C program the only definition of p is p = &y and there is no possibility that y is an alias of x 30
A definition d of a variable x is said to reach a point p in a flow graph if: 1.Every path from the entry of the flow graph to p has d on the path, and 2.After the last occurrence of d there is no possibility that x is redefined 31
A basic block can generate a definition A basic block can either 1.Kill a definition of x if it surely redefines x 2.Transmit a definition if it may not redefine the same variable(s) as that definition 32
Variables: 1.IN(B) = set of definitions reaching the beginning of block B. 2.OUT(B) = set of definitions reaching the end of B. 33
Two kinds of equations: 1.Confluence equations: IN(B) in terms of outs of predecessors of B 2.Transfer equations: OUT(B) in terms of of IN(B) and what goes on in block B. 34
IN(B) = ∪ predecessors P of B OUT(P) 35 P2P2 B P1P1 {d 1, d 2, d 3 } {d 2, d 3 }{d 1, d 2 }
Generate a definition in the block if its variable is not definitely rewritten later in the basic block. Kill a definition if its variable is definitely rewritten in the block. An internal definition may be both killed and generated. 36
37 d 1 : y = 3 d 2 : x = y+z d 3 : *p = 10 d 4 : y = 5 IN = {d 2 (x), d 3 (y), d 3 (z), d 5 (y), d 6 (y), d 7 (z)} Kill includes {d 1 (x), d 2 (x), d 3 (y), d 5 (y), d 6 (y),…} Gen = {d 2 (x), d 3 (x), d 3 (z),…, d 4 (y)} OUT = {d 2 (x), d 3 (x), d 3 (z),…, d 4 (y), d 7 (z)}
For any block B: OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B) 38
For an n-block flow graph, there are 2n equations in 2n unknowns Alas, the solution is not unique Use iteration to get the least fixed- point Has nice performance guarantees 39
IN(entry) = ∅ ; for each block B do OUT(B)= ∅ ; while (changes occur) do for each block B do { IN(B) = ∪ predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); } 40
41 d 1 : x = 5 if x == 10 d 2 : x = 15 B1B1 B3B3 B2B2 IN(B 1 ) = {} OUT(B 1 ) = { OUT(B 2 ) = { OUT(B 3 ) = { d1}d1} IN(B 2 ) = {d1,d1, d1,d1, IN(B 3 ) = {d1,d1, d2}d2} d2}d2} d2}d2} d2}d2}
Not only the most conservative assumption about when a def is killed or gen’d Also the conservative assumption that any path in the flow graph can actually be taken Must we always be conservative? 42
Inter-procedural analysis Context-sensitive analysis Object-sensitive analysis Pointers or references Reflection Dynamic code generation 43 Some is covered in Yannis’s lectures… research
f = function(x){ return x+1; } function compose(){ return function(x){ return f(f(x)) }; } compose(f)(3); this[‘compose’](f)(3) this['comp' + 'os' + ("E".toLowerCase())] (f)(3 ) function(f){ return ; eval("compose(f)(" + arguments[0] + ");"); }()(3); 44