Download presentation
Presentation is loading. Please wait.
Published byHarvey White Modified over 9 years ago
1
Compilation 0368-3133 (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav 1
2
What is a compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 2
3
AST + Sym. Tab. Stages of compilation Source code (program) Lexical Analysis Syntax Analysis Parsing Context Analysis Portable/Retarg etable code generation Target code (executable) Assembly IR TextToken streamAST Code Generation 3
4
Instruction selection AST + Sym. Tab. Stages of Compilation Source code (program) Lexical Analysis Syntax Analysis Parsing Context Analysis Portable/Retarg etable code generation Target code (executable) Assembly IR TextToken streamAST “Naive” IR IR OptimizationIR (TAC) generation “Assembly” (symbolic registers) Peephole optimization Assembly Code Generation register allocation 4
5
Optimization points source code Front end IR Code generator target code Program Analysis Abstract interpretation today 5
6
Program Analysis In order to optimize a program, the compiler has to be able to reason about the properties of that program An analysis is called sound if it never asserts an incorrect fact about a program All the analyses we will discuss in this class are sound –(Why?) 6
7
Optimization path IR Control-Flow Graph CFG builder Program Analysis Annotated CFG Optimizing Transformation Target Code Code Generation (+optimizations) done with IR optimizations IR optimizations 7
8
Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x holds some integer value” 8
9
Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is either 137 or 42” 9
10
(Un) Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is 137” 10
11
Soundness & Precision int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is either 137, 42, or 271” 11
12
Semantics-preserving optimizations An optimization is semantics-preserving if it does not alter the semantics (meaning) of the original program Eliminating unnecessary temporary variables Computing values that are known statically at compile- time instead of computing them at runtime Evaluating iteration-independent expressions outside of a loop instead of inside ✗ Replacing bubble sort with quicksort (why?) The optimizations we will consider in this class are all semantics-preserving 12
13
Types of optimizations An optimization is local if it works on just a single basic block An optimization is global if it works on an entire control-flow graph 13
14
Local optimizations _t0 = 137; y = _t0; IfZ x Goto _L0; start _t1 = y; z = _t1; _t2 = y; x = _t2; start int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 14
15
Local optimizations y = 137; IfZ x Goto _L0; start z = y;x = y; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 15
16
Global optimizations y = 137; IfZ x Goto _L0; start z = y;x = y; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 16
17
Global optimizations y = 137; IfZ x Goto _L0; start z = 137;x = 137; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 17
18
Global Analyses Common subexpression elimination Copy propagation Dead code elimination Live variables –Used for register allocation 18
19
Global liveness analysis a=b+c s2s3 IN[s2]IN[s3] OUT[s]=IN[s2] IN[s3] IN[s]=(OUT[s] – {a}) {b, c} 19
20
Live variable analysis- initialization Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} 20
21
Live variable analysis Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a} 21
22
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a, b, c} {a} Live variable analysis IN[s2]=(OUT[s2] – {d}) {b, c} IN[s1]=(OUT[s1] – {a}) {a, b} ? 22
23
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a, b, c} {a} {a, b, c} Live variable analysis 23
24
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {} {a, b, c} {a} {a, b, c} CFGs with loops - iteration 24
25
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {} {a, b, c} {a} {a, b, c} {b, c} CFGs with loops - iteration 25
26
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} CFGs with loops - iteration 26
27
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} Live variable analysis 27
28
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} Live variable analysis 28
29
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 29
30
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 30
31
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 31
32
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 32
33
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} Live variable analysis 33
34
Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {a, c, d} {a, b, c} {a, c, d} {a, b, c} Live variable analysis 34
35
Live variable analysis Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {a, c, d} {a, b, c} {a, c, d} {a, b, c} 35
36
Kill Gen Global liveness analysis: Kill/Gen a=b+c s2s3 IN[s2]IN[s3] OUT[s]=IN[s2] IN[s3] IN[s]=(OUT[s] – {a}) {b, c} 36
37
Formalizing data-flow analyses Define an analysis of a basic block as a quadruple (D, V, , F, I) where –D is a direction (forwards or backwards) –V is a set of values the program can have at any point – merging (joining) information –F is a family of transfer functions defining the meaning of any expression as a function f : V V –I is the initial information at the top (or bottom) of a basic block 37
38
Liveness Analysis Direction: Backward Values: Sets of variables Transfer functions: Given a set of variable assignments V and statement a = b + c: Remove a from V (any previous value of a is now dead.) Add b and c to V (any previous value of b or c is now live.) Formally: V in = ( V out \ { a }) { b,c } Initial value: Depends on semantics of language –E.g., function arguments and return values (pushes) –Result of local analysis of other blocks as part of a global analysis Merge: = 38
39
Available Expressions Direction: Forward Values: Sets of expressions assigned to variables Transfer functions: Given a set of variable assignments V and statement a = b + c: –Remove from V any expression containing a as a subexpression –Add to V the expression a = b + c –Formally: V out = ( V in \ {e | e contains a }) {a = b + c} Initial value: Empty set of expressions Merge: = ? 39
40
Why does this work? (Live Var.) To show correctness, we need to show that –The algorithm eventually terminates, and –When it terminates, it has a sound answer Termination argument: –Once a variable is discovered to be live during some point of the analysis, it always stays live –Only finitely many variables and finitely many places where a variable can become live Soundness argument (sketch): –Each individual rule, applied to some set, correctly updates liveness in that set –When computing the union of the set of live variables, a variable is only live if it was live on some path leaving the statement 40
41
Abstract Interpretation Theoretical foundations of program analysis Cousot and Cousot 1977 Abstract meaning of programs –“Executed” at compile time Execution = Analysis 41
42
Another view of program analysis We want to reason about some property of the runtime behavior of the program Could we run the program and just watch what happens? Idea: Redefine the semantics of our programming language to give us information about our analysis 42
43
Another view of program analysis We want to reason about some property of the runtime behavior of the program Could we run the program and just watch what happens? 43
44
Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input 44
45
Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input Inside a basic block, it is simpler –Basic blocks contain no loops –There is only one path through the basic block 45
46
Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input Inside a basic block, it is simpler –Basic blocks contain no loops –There is only one path through the basic block –But still … 46
47
Assigning new semantics (Local) Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics 47
48
Assigning new semantics (Global) Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics Merge information from different paths 48
49
Abstract Interpretation Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics Merge information from different paths Treat the optimizer as an interpreter for these new semantics 49
50
Theory of Program Analysis Building up all of the machinery to design this analysis was tricky The key ideas, however, are mostly independent of the analysis: –We need to be able to compute functions describing the behavior of each statement –We need to be able to merge several subcomputations together –We need an initial value for all of the basic blocks There is a beautiful formalism that captures many of these properties 50
51
Join semilattices A join semilattice is a ordering defined on a set of elements –0 ≤ 1 ≤ 2 ≤ … –{} ≤ {0} ≤ {0,1}, {1,2} ≤ {0,1,2} ≤ {1,2,3,4} Any two elements have some join that is the smallest element larger than both elements There is a unique bottom element, which is smaller than all other elements –The join of two elements represents combining (merging) information from two elements by an overapproximation The bottom element represents “no information yet” or “the least conservative possible answer” / 51
52
Join semilattice for liveness {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element 52
53
What is the join of {b} and {c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 53
54
What is the join of {b} and {c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 54
55
What is the join of {b} and {a,c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 55
56
What is the join of {b} and {a,c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 56
57
What is the join of {a} and {a,b}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 57
58
What is the join of {a} and {a,b}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 58
59
Formal definitions A join semilattice is a pair (V, ), where V is a domain of elements is a join operator that is –commutative: x y = y x –associative: (x y) z = x (y z) –idempotent: x x = x If x y = z, we say that z is the join or (least upper bound) of x and y Every join semilattice has a bottom element denoted such that x = x for all x 59
60
Join semilattices and ordering {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Greater Lower 60
61
Join semilattices and ordering {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Least precise Most precise 61
62
Join semilattices and orderings Every join semilattice (V, ) induces an ordering relationship over its elements Define x y iff x y = y Need to prove –Reflexivity: x x –Antisymmetry: If x y and y x, then x = y –Transitivity: If x y and y z, then x z 62
63
An example join semilattice The set of natural numbers and the max function Idempotent –max{a, a} = a Commutative –max{a, b} = max{b, a} Associative –max{a, max{b, c}} = max{max{a, b}, c} Bottom element is 0: –max{0, a} = a What is the ordering over these elements? 63
64
A join semilattice for liveness Sets of live variables and the set union operation Idempotent: –x x = x Commutative: –x y = y x Associative: –(x y) z = x (y z) Bottom element: –The empty set: Ø x = x What is the ordering over these elements? 64
65
Semilattices and program analysis Semilattices naturally solve many of the problems we encounter in global analysis How do we combine information from multiple basic blocks? What value do we give to basic blocks we haven't seen yet? How do we know that the algorithm always terminates? 65
66
Semilattices and program analysis Semilattices naturally solve many of the problems we encounter in global analysis How do we combine information from multiple basic blocks? –Take the join of all information from those blocks What value do we give to basic blocks we haven't seen yet? –Use the bottom element How do we know that the algorithm always terminates? –Actually, we still don't! More on that later 66
67
A general framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, not the order in which to visit the basic blocks –V is a set of values – is a join operator over those values –F is a set of transfer functions f : V V –I is an initial value The only difference from local analysis is the introduction of the join operator 67
68
Running global analyses Assume that ( D, V, , F, I ) is a forward analysis Set OUT[s] = for all statements s Set OUT[entry] = I Repeat until no values change: –For each statement s with predecessors p 1, p 2, …, p n : Set IN[s] = OUT[p 1 ] OUT[p 2 ] … OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter –This is sometimes called chaotic iteration 68
69
The dataflow framework This form of analysis is called the dataflow framework Can be used to easily prove an analysis is sound With certain restrictions, can be used to prove that an analysis eventually terminates –Again, more on that later 69
70
Global constant propagation Constant propagation is an optimization that replaces each variable that is known to be a constant value with that constant An elegant example of the dataflow framework 70
71
Global constant propagation exit x = 4; z = x; w = x; y = x;z = y; x = 6; entry 71
72
Global constant propagation exit x = 4; z = x; w = x; y = x;z = y; x = 6; entry 72
73
Global constant propagation exit x = 4; z = x; w = 6; y = 6;z = y; x = 6; entry 73
74
Constant propagation analysis In order to do a constant propagation, we need to track what values might be assigned to a variable at each program point Every variable will either –Never have a value assigned to it, –Have a single constant value assigned to it, –Have two or more constant values assigned to it, or –Have a known non-constant value. –Our analysis will propagate this information throughout a CFG to identify locations where a value is constant 74
75
Properties of constant propagation For now, consider just some single variable x At each point in the program, we know one of three things about the value of x: –x is definitely not a constant, since it's been assigned two values or assigned a value that we know isn't a constant –x is definitely a constant and has value k –We have never seen a value for x Note that the first and last of these are not the same! –The first one means that there may be a way for x to have multiple values –The last one means that x never had a value at all 75
76
Defining a join operator The join of any two different constants is Not-a-Constant –(If the variable might have two different values on entry to a statement, it cannot be a constant) The join of Not a Constant and any other value is Not-a- Constant –(If on some path the value is known not to be a constant, then on entry to a statement its value can't possibly be a constant) The join of Undefined and any other value is that other value –(If x has no value on some path and does have a value on some other path, we can just pretend it always had the assigned value) 76
77
A semilattice for constant propagation One possible semilattice for this analysis is shown here (for each variable): Undefined 0-212... Not-a-constant The lattice is infinitely wide 77
78
A semilattice for constant propagation One possible semilattice for this analysis is shown here (for each variable): Undefined 0-212... Not-a-constant Note: The join of any two different constants is Not-a-Constant The join of Not a Constant and any other value is Not-a-Constant The join of Undefined and any other value is that other value 78
79
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined x = 6; Undefined entry Undefined x=Undefined y=Undefined z=Undefined w=Undefined 79
80
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined x = 6; Undefined entry Undefined 80
81
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; Undefined entry Undefined 81
82
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; x = 6, y=z=w= entry Undefined 82
83
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; x = 6, y=z=w= entry Undefined 83
84
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; Undefined z = y; Undefined Undefined x = 6; x = 6 entry Undefined 84
85
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 85
86
Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined y=6 y=Undefined gives what? 86
87
Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 87
88
Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 88
89
Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 89
90
Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 90
91
Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 91
92
Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 92
93
Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 93
94
Global constant propagation exit x=y=w=z=6 x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 94
95
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 95
96
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 96
97
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 97
98
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 98
99
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 99
100
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined x=6 x=4 gives what? 100
101
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 y=w=6, x= z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 101
102
Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 y=w=6,x= z = x; y=w=6,z=x= x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 102
103
Global constant propagation exit x=y=w=z=6 x = 4; x=4,y=w=6,z= y=w=6,x= z = x; y=w=6,z=x= x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 103
104
Global constant propagation exit y=w=6,z=x= x = 4; x=4,y=w=6,z= y=w=6,x= z = x; y=w=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 104
105
Global constant propagation exit y=w=6,z=x= x = 4; x=4,y=w=6,z= y=w=6,x= z = x; y=w=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 105
106
Global constant propagation exit y=w=6,z=x= x = 4; x=4,y=w=6,z= y=w=6, z=x= z = x; y=w=6, z=x= x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 106
107
Global constant propagation exit y=w=6,z=x= x = 4; x=4,y=w=6,z= y=w=6, z=x= z = x; y=w=6, z=x= x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined Global analysis reached fixpoint 107
108
Dataflow for constant propagation Direction: Forward Semilattice: Vars {Undefined, 0, 1, -1, 2, -2, …, Not-a-Constant} –Join mapping for variables point-wise {x 1,y 1,z 1} {x 1,y 2,z Not-a-Constant} = {x 1,y Not-a-Constant,z Not-a-Constant} Transfer functions: –f x=k (V) = V| x k (update V by mapping x to k) –f x=a+b (V) = V| x ? Initial value: x is Undefined –(When might we use some other value?) 108
109
Dataflow for constant propagation Direction: Forward Semilattice: Vars {Undefined, 0, 1, -1, 2, -2, …, Not-a-Constant} –Join mapping for variables point-wise {x 1,y 1,z 1} {x 1,y 2,z Not-a-Constant} = {x 1,y Not-a-Constant,z Not-a-Constant} Transfer functions: –f x=k (V) = V| x k (update V by mapping x to k) –f x=a+b (V) = V| x ? Initial value: x is Undefined –(When might we use some other value?) 109
110
Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Given this, how do we know the analyses will eventually terminate? –In general, we don‘t 110
111
Terminates? 111
112
Liveness Analysis A variable is live at a point in a program if later in the program its value will be read before it is written to again 112
113
Join semilattice definition A join semilattice is a pair (V, ), where V is a domain of elements is a join operator that is –commutative: x y = y x –associative: (x y) z = x (y z) –idempotent: x x = x If x y = z, we say that z is the join or (Least Upper Bound) of x and y Every join semilattice has a bottom element denoted such that x = x for all x 113
114
Partial ordering induced by join Every join semilattice (V, ) induces an ordering relationship over its elements Define x y iff x y = y Need to prove –Reflexivity: x x –Antisymmetry: If x y and y x, then x = y –Transitivity: If x y and y z, then x z 114
115
A join semilattice for liveness Sets of live variables and the set union operation Idempotent: –x x = x Commutative: –x y = y x Associative: –(x y) z = x (y z) Bottom element: –The empty set: Ø x = x Ordering over elements = subset relation 115
116
Join semilattice example for liveness {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element 116
117
Dataflow framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, NOT the order in which to visit the basic blocks –V is a set of values (sometimes called domain) – is a join operator over those values –F is a set of transfer functions f s : V V (for every statement s) –I is an initial value 117
118
Running global analyses Assume that ( D, V, , F, I ) is a forward analysis For every statement s maintain values before - IN[s] - and after - OUT[s] Set OUT[s] = for all statements s Set OUT[entry] = I Repeat until no values change: –For each statement s with predecessors PRED[s]={p 1, p 2, …, p n } Set IN[s] = OUT[p 1 ] OUT[p 2 ] … OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter –Chaotic iteration 118
119
Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Problem: how do we know the analyses will eventually terminate? 119
120
A non-terminating analysis The following analysis will loop infinitely on any CFG containing a loop: Direction: Forward Domain: ℕ Join operator: max Transfer function: f(n) = n + 1 Initial value: 0 120
121
A non-terminating analysis start end x++ 121
122
Initialization start end x++ 0 0 122
123
Fixed-point iteration start end x++ 0 0 123
124
Choose a block start end x ++ 0 0 124
125
Iteration 1 start end x ++ 0 0 0 125
126
Iteration 1 start end x++ 1 0 0 126
127
Choose a block start end x++ 1 0 0 127
128
Iteration 2 start end x ++ 1 0 0 128
129
Iteration 2 start end x++ 1 0 1 129
130
Iteration 2 start end x++ 2 0 1 130
131
Choose a block start end x++ 2 0 1 131
132
Iteration 3 start end x++ 2 0 1 132
133
Iteration 3 start end x++ 2 0 2 133
134
Iteration 3 start end x++ 3 0 2 134
135
Why doesn’t this terminate? Values can increase without bound Note that “increase” refers to the lattice ordering, not the ordering on the natural numbers The height of a semilattice is the length of the longest increasing sequence in that semilattice The dataflow framework is not guaranteed to terminate for semilattices of infinite height Note that a semilattice can be infinitely large but have finite height –e.g. constant propagation 0 1 2 3 4... 135
136
Height of a lattice An increasing chain is a sequence of elements a 1 a 2 … a k –The length of such a chain is k The height of a lattice is the length of the maximal increasing chain For liveness with n program variables: –{} {v 1 } {v 1,v 2 } … {v 1,…,v n } For available expressions it is the number of expressions of the form a=b op c –For n program variables and m operator types: m n 3 136
137
Another non-terminating analysis This analysis works on a finite-height semilattice, but will not terminate on certain CFGs: Direction: Forward Domain: Boolean values true and false Join operator: Logical OR Transfer function: Logical NOT Initial value: false 137
138
A non-terminating analysis start end x=!x 138
139
Initialization start end x=!x false 139
140
Fixed-point iteration start end x=!x false 140
141
Choose a block start end x=!x false 141
142
Iteration 1 start end x=!x false 142
143
Iteration 1 start end x=!x true false 143
144
Iteration 2 start end x=!x true false true 144
145
Iteration 2 start end x=!x false true 145
146
Iteration 3 start end x=!x false 146
147
Iteration 3 start end x=!x true false 147
148
Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever fals e true fals e true fals e... 148
149
Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever How can we fix this? fals e true fals e true fals e... 149
150
Monotone transfer functions A transfer function f is monotone iff if x y, then f(x) f(y) Intuitively, if you know less information about a program point, you can't “gain back” more information about that program point Many transfer functions are monotone, including those for liveness and constant propagation Note: Monotonicity does not mean that x f(x) –(This is a different property called extensivity) 150
151
Liveness and monotonicity A transfer function f is monotone iff if x y, then f(x) f(y) Recall our transfer function for a = b + c is –f a = b + c (V) = (V – {a}) {b, c} Recall that our join operator is set union and induces an ordering relationship X Y iff X Y Is this monotone? 151
152
Is constant propagation monotone? A transfer function f is monotone iff if x y, then f(x) f(y) Recall our transfer functions –f x=k (V) = V| x k (update V by mapping x to k) –f x=a+b (V) = V| x Not-a-Constant (assign Not-a-Constant) Is this monotone? Undefined 0-212... Not-a-constant 152
153
The grand result Theorem: A dataflow analysis with a finite- height semilattice and family of monotone transfer functions always terminates Proof sketch: –The join operator can only bring values up –Transfer functions can never lower values back down below where they were in the past (monotonicity) –Values cannot increase indefinitely (finite height) 153
154
An “optimality” result A transfer function f is distributive if f(a b) = f(a) f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths –Join over all paths Optimal if we ignore program conditions 154
155
An “optimality” result If(*) A else B; If(*) C else D; AB * * CD exit AA * BB CDCD f C (f A ( ) f B ( )) f D (f A ( ) f B ( )) f C (f A ( )) f D (f A ( )) f C (f B ( )) f D (f B ( )) ?=?= 155
156
An “optimality” result A transfer function f is distributive if f(a b) = f(a) f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths –Join over all paths Optimal if we pretend all control-flow paths can be executed by the program Which analyses use distributive functions? 156
157
Loop optimizations Most of a program’s computations are done inside loops –Focus optimizations effort on loops The optimizations we’ve seen so far are independent of the control structure Some optimizations are specialized to loops –Loop-invariant code motion –(Strength reduction via induction variables) Require another type of analysis to find out where expressions get their values from –Reaching definitions (Also useful for improving register allocation) 157
158
Loop invariant computation y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … 158
159
Loop invariant computation y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … t*4 and y+z have same value on each iteration 159
160
Code hoisting x < w end x = x + 1 start y = … t = … z = … y = t * 4 w = y + z 160
161
What reasoning did we use? y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … y is defined inside loop but it is loop invariant since t*4 is loop- invariant Both t and z are defined only outside of loop constants are trivially loop- invariant 161
162
What about now? y = t * 4 x < y + z end x = x + 1 t = t + 1 start y = … t = … z = … Now t is not loop-invariant and so are t*4 and y 162
163
Loop-invariant code motion d: t = a 1 op a 2 –d is a program location a 1 op a 2 loop-invariant (for a loop L) if computes the same value in each iteration –Hard to know in general Conservative approximation –Each a i is a constant, or –All definitions of a i that reach d are outside L, or –Only one definition of of a i reaches d, and is loop-invariant itself Transformation: hoist the loop-invariant code outside of the loop 163
164
Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined 164
165
Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined Direction: Forward Domain: sets of program locations that are definitions ` Join operator: union Transfer function: f d: a=b op c (RD) = (RD - defs(a)) {d} f d: not-a-def (RD) = RD –Where defs(a) is the set of locations defining a (statements of the form a=...) Initial value: {} 165
166
Reaching definitions analysis d4: y = t * 4 d4:x < y + z d6: x = x + 1 d1: y = … d2: t = … d3: z = … start end {} 166
167
Reaching definitions analysis d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … end {} 167
168
Initialization d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {} 168
169
Iteration 1 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {} 169
170
Iteration 1 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {} 170
171
Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} {} 171
172
Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {} 172
173
Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 173
174
Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 174
175
Iteration 3 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 175
176
Iteration 3 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 176
177
Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 177
178
Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 178
179
Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} 179
180
Iteration 5 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4} {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 180
181
Iteration 6 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 181
182
Which expressions are loop invariant? t is defined only in d2 – outside of loop z is defined only in d3 – outside of loop y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {d2, d3, d4, d5} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} x is defined only in d5 – inside of loop so is not a loop-invariant 182
183
Inferring loop-invariant expressions For a statement s of the form t = a 1 op a 2 A variable a i is immediately loop-invariant if all reaching definitions IN[s]={d 1,…,d k } for a i are outside of the loop LOOP-INV = immediately loop-invariant variables and constants LOOP-INV = LOOP-INV {x | d: x = a 1 op a 2, d is in the loop, and both a 1 and a 2 are in LOOP-INV} –Iterate until fixed-point An expression is loop-invariant if all operands are loop-invariants 183
184
Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 184
185
Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t} 185
186
Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 186
187
Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 187
188
Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 188
189
end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} LOOP-INV = {t, z, 4} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} Computing LOOP-INV 189
190
Computing LOOP-INV d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4, d5} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} LOOP-INV = {t, z, 4, y} 190
191
Induction variables while (i < x) { j = a + 4 * i a[j] = j i = i + 1 } i is incremented by a loop- invariant expression on each iteration – this is called an induction variable j is a linear function of the induction variable with multiplier 4 191
192
Strength-reduction j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1 } Prepare initial value Increment by multiplier 192
193
Summary of optimizations Enabled OptimizationsAnalysis Common-subexpression elimination Copy Propagation Available Expressions Constant foldingConstant Propagation Dead code elimination Register allocation Live Variables Loop-invariant code motionReaching Definitions 193
194
Ad Advanced course on program analysis and verification Workshop on compile time techniques for detecting malicious JavaScripts –With Trusteer Now IBM 194
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.