Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compilation 0368-3133 (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav.

Similar presentations


Presentation on theme: "Compilation 0368-3133 (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav."— Presentation transcript:

1 Compilation 0368-3133 (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav 1

2 What is a compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 2

3 AST + Sym. Tab. Stages of compilation Source code (program) Lexical Analysis Syntax Analysis Parsing Context Analysis Portable/Retarg etable code generation Target code (executable) Assembly IR TextToken streamAST Code Generation 3

4 Instruction selection AST + Sym. Tab. Stages of Compilation Source code (program) Lexical Analysis Syntax Analysis Parsing Context Analysis Portable/Retarg etable code generation Target code (executable) Assembly IR TextToken streamAST “Naive” IR IR OptimizationIR (TAC) generation “Assembly” (symbolic registers) Peephole optimization Assembly Code Generation register allocation 4

5 Optimization points source code Front end IR Code generator target code Program Analysis Abstract interpretation today 5

6 Program Analysis In order to optimize a program, the compiler has to be able to reason about the properties of that program An analysis is called sound if it never asserts an incorrect fact about a program All the analyses we will discuss in this class are sound –(Why?) 6

7 Optimization path IR Control-Flow Graph CFG builder Program Analysis Annotated CFG Optimizing Transformation Target Code Code Generation (+optimizations) done with IR optimizations IR optimizations 7

8 Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x holds some integer value” 8

9 Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is either 137 or 42” 9

10 (Un) Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is 137” 10

11 Soundness & Precision int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is either 137, 42, or 271” 11

12 Semantics-preserving optimizations An optimization is semantics-preserving if it does not alter the semantics (meaning) of the original program Eliminating unnecessary temporary variables Computing values that are known statically at compile- time instead of computing them at runtime Evaluating iteration-independent expressions outside of a loop instead of inside ✗ Replacing bubble sort with quicksort (why?) The optimizations we will consider in this class are all semantics-preserving 12

13 Types of optimizations An optimization is local if it works on just a single basic block An optimization is global if it works on an entire control-flow graph 13

14 Local optimizations _t0 = 137; y = _t0; IfZ x Goto _L0; start _t1 = y; z = _t1; _t2 = y; x = _t2; start int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 14

15 Local optimizations y = 137; IfZ x Goto _L0; start z = y;x = y; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 15

16 Global optimizations y = 137; IfZ x Goto _L0; start z = y;x = y; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 16

17 Global optimizations y = 137; IfZ x Goto _L0; start z = 137;x = 137; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 17

18 Global Analyses Common subexpression elimination Copy propagation Dead code elimination Live variables –Used for register allocation 18

19 Global liveness analysis a=b+c s2s3 IN[s2]IN[s3] OUT[s]=IN[s2]  IN[s3] IN[s]=(OUT[s] – {a})  {b, c} 19

20 Live variable analysis- initialization Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} 20

21 Live variable analysis Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a} 21

22 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a, b, c} {a} Live variable analysis IN[s2]=(OUT[s2] – {d})  {b, c} IN[s1]=(OUT[s1] – {a})  {a, b} ? 22

23 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a, b, c} {a} {a, b, c} Live variable analysis 23

24 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {} {a, b, c} {a} {a, b, c} CFGs with loops - iteration 24

25 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {} {a, b, c} {a} {a, b, c} {b, c} CFGs with loops - iteration 25

26 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} CFGs with loops - iteration 26

27 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} Live variable analysis 27

28 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} Live variable analysis 28

29 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 29

30 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 30

31 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 31

32 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 32

33 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} Live variable analysis 33

34 Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {a, c, d} {a, b, c} {a, c, d} {a, b, c} Live variable analysis 34

35 Live variable analysis Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {a, c, d} {a, b, c} {a, c, d} {a, b, c} 35

36 Kill Gen Global liveness analysis: Kill/Gen a=b+c s2s3 IN[s2]IN[s3] OUT[s]=IN[s2]  IN[s3] IN[s]=(OUT[s] – {a})  {b, c} 36

37 Formalizing data-flow analyses Define an analysis of a basic block as a quadruple (D, V, , F, I) where –D is a direction (forwards or backwards) –V is a set of values the program can have at any point –  merging (joining) information –F is a family of transfer functions defining the meaning of any expression as a function f : V  V –I is the initial information at the top (or bottom) of a basic block 37

38 Liveness Analysis Direction: Backward Values: Sets of variables Transfer functions: Given a set of variable assignments V and statement a = b + c: Remove a from V (any previous value of a is now dead.) Add b and c to V (any previous value of b or c is now live.) Formally: V in = ( V out \ { a })  { b,c } Initial value: Depends on semantics of language –E.g., function arguments and return values (pushes) –Result of local analysis of other blocks as part of a global analysis Merge:  =  38

39 Available Expressions Direction: Forward Values: Sets of expressions assigned to variables Transfer functions: Given a set of variable assignments V and statement a = b + c: –Remove from V any expression containing a as a subexpression –Add to V the expression a = b + c –Formally: V out = ( V in \ {e | e contains a })  {a = b + c} Initial value: Empty set of expressions Merge:  = ? 39

40 Why does this work? (Live Var.) To show correctness, we need to show that –The algorithm eventually terminates, and –When it terminates, it has a sound answer Termination argument: –Once a variable is discovered to be live during some point of the analysis, it always stays live –Only finitely many variables and finitely many places where a variable can become live Soundness argument (sketch): –Each individual rule, applied to some set, correctly updates liveness in that set –When computing the union of the set of live variables, a variable is only live if it was live on some path leaving the statement 40

41 Abstract Interpretation Theoretical foundations of program analysis Cousot and Cousot 1977 Abstract meaning of programs –“Executed” at compile time Execution = Analysis 41

42 Another view of program analysis We want to reason about some property of the runtime behavior of the program Could we run the program and just watch what happens? Idea: Redefine the semantics of our programming language to give us information about our analysis 42

43 Another view of program analysis We want to reason about some property of the runtime behavior of the program Could we run the program and just watch what happens? 43

44 Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input 44

45 Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input Inside a basic block, it is simpler –Basic blocks contain no loops –There is only one path through the basic block 45

46 Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input Inside a basic block, it is simpler –Basic blocks contain no loops –There is only one path through the basic block –But still … 46

47 Assigning new semantics (Local) Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics 47

48 Assigning new semantics (Global) Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics Merge information from different paths 48

49 Abstract Interpretation Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics Merge information from different paths Treat the optimizer as an interpreter for these new semantics 49

50 Theory of Program Analysis Building up all of the machinery to design this analysis was tricky The key ideas, however, are mostly independent of the analysis: –We need to be able to compute functions describing the behavior of each statement –We need to be able to merge several subcomputations together –We need an initial value for all of the basic blocks There is a beautiful formalism that captures many of these properties 50

51 Join semilattices A join semilattice is a ordering defined on a set of elements –0 ≤ 1 ≤ 2 ≤ … –{} ≤ {0} ≤ {0,1}, {1,2} ≤ {0,1,2} ≤ {1,2,3,4} Any two elements have some join that is the smallest element larger than both elements There is a unique bottom element, which is smaller than all other elements –The join of two elements represents combining (merging) information from two elements by an overapproximation The bottom element represents “no information yet” or “the least conservative possible answer” / 51

52 Join semilattice for liveness {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element 52

53 What is the join of {b} and {c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 53

54 What is the join of {b} and {c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 54

55 What is the join of {b} and {a,c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 55

56 What is the join of {b} and {a,c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 56

57 What is the join of {a} and {a,b}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 57

58 What is the join of {a} and {a,b}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 58

59 Formal definitions A join semilattice is a pair (V,  ), where V is a domain of elements  is a join operator that is –commutative: x  y = y  x –associative: (x  y)  z = x  (y  z) –idempotent: x  x = x If x  y = z, we say that z is the join or (least upper bound) of x and y Every join semilattice has a bottom element denoted  such that   x = x for all x 59

60 Join semilattices and ordering {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Greater Lower 60

61 Join semilattices and ordering {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Least precise Most precise 61

62 Join semilattices and orderings Every join semilattice (V,  ) induces an ordering relationship  over its elements Define x  y iff x  y = y Need to prove –Reflexivity: x  x –Antisymmetry: If x  y and y  x, then x = y –Transitivity: If x  y and y  z, then x  z 62

63 An example join semilattice The set of natural numbers and the max function Idempotent –max{a, a} = a Commutative –max{a, b} = max{b, a} Associative –max{a, max{b, c}} = max{max{a, b}, c} Bottom element is 0: –max{0, a} = a What is the ordering over these elements? 63

64 A join semilattice for liveness Sets of live variables and the set union operation Idempotent: –x  x = x Commutative: –x  y = y  x Associative: –(x  y)  z = x  (y  z) Bottom element: –The empty set: Ø  x = x What is the ordering over these elements? 64

65 Semilattices and program analysis Semilattices naturally solve many of the problems we encounter in global analysis How do we combine information from multiple basic blocks? What value do we give to basic blocks we haven't seen yet? How do we know that the algorithm always terminates? 65

66 Semilattices and program analysis Semilattices naturally solve many of the problems we encounter in global analysis How do we combine information from multiple basic blocks? –Take the join of all information from those blocks What value do we give to basic blocks we haven't seen yet? –Use the bottom element How do we know that the algorithm always terminates? –Actually, we still don't! More on that later 66

67 A general framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, not the order in which to visit the basic blocks –V is a set of values –  is a join operator over those values –F is a set of transfer functions f : V  V –I is an initial value The only difference from local analysis is the introduction of the join operator 67

68 Running global analyses Assume that ( D, V, , F, I ) is a forward analysis Set OUT[s] =  for all statements s Set OUT[entry] = I Repeat until no values change: –For each statement s with predecessors p 1, p 2, …, p n : Set IN[s] = OUT[p 1 ]  OUT[p 2 ]  …  OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter –This is sometimes called chaotic iteration 68

69 The dataflow framework This form of analysis is called the dataflow framework Can be used to easily prove an analysis is sound With certain restrictions, can be used to prove that an analysis eventually terminates –Again, more on that later 69

70 Global constant propagation Constant propagation is an optimization that replaces each variable that is known to be a constant value with that constant An elegant example of the dataflow framework 70

71 Global constant propagation exit x = 4; z = x; w = x; y = x;z = y; x = 6; entry 71

72 Global constant propagation exit x = 4; z = x; w = x; y = x;z = y; x = 6; entry 72

73 Global constant propagation exit x = 4; z = x; w = 6; y = 6;z = y; x = 6; entry 73

74 Constant propagation analysis In order to do a constant propagation, we need to track what values might be assigned to a variable at each program point Every variable will either –Never have a value assigned to it, –Have a single constant value assigned to it, –Have two or more constant values assigned to it, or –Have a known non-constant value. –Our analysis will propagate this information throughout a CFG to identify locations where a value is constant 74

75 Properties of constant propagation For now, consider just some single variable x At each point in the program, we know one of three things about the value of x: –x is definitely not a constant, since it's been assigned two values or assigned a value that we know isn't a constant –x is definitely a constant and has value k –We have never seen a value for x Note that the first and last of these are not the same! –The first one means that there may be a way for x to have multiple values –The last one means that x never had a value at all 75

76 Defining a join operator The join of any two different constants is Not-a-Constant –(If the variable might have two different values on entry to a statement, it cannot be a constant) The join of Not a Constant and any other value is Not-a- Constant –(If on some path the value is known not to be a constant, then on entry to a statement its value can't possibly be a constant) The join of Undefined and any other value is that other value –(If x has no value on some path and does have a value on some other path, we can just pretend it always had the assigned value) 76

77 A semilattice for constant propagation One possible semilattice for this analysis is shown here (for each variable): Undefined 0-212... Not-a-constant The lattice is infinitely wide 77

78 A semilattice for constant propagation One possible semilattice for this analysis is shown here (for each variable): Undefined 0-212... Not-a-constant Note: The join of any two different constants is Not-a-Constant The join of Not a Constant and any other value is Not-a-Constant The join of Undefined and any other value is that other value 78

79 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined x = 6; Undefined entry Undefined x=Undefined y=Undefined z=Undefined w=Undefined 79

80 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined x = 6; Undefined entry Undefined 80

81 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; Undefined entry Undefined 81

82 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; x = 6, y=z=w=  entry Undefined 82

83 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; x = 6, y=z=w=  entry Undefined 83

84 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; Undefined z = y; Undefined Undefined x = 6; x = 6 entry Undefined 84

85 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 85

86 Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined y=6  y=Undefined gives what? 86

87 Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 87

88 Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 88

89 Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 89

90 Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 90

91 Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 91

92 Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 92

93 Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 93

94 Global constant propagation exit x=y=w=z=6 x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 94

95 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 95

96 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 96

97 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 97

98 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 98

99 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 99

100 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined x=6  x=4 gives what? 100

101 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 y=w=6, x=  z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 101

102 Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 y=w=6,x=  z = x; y=w=6,z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 102

103 Global constant propagation exit x=y=w=z=6 x = 4; x=4,y=w=6,z=  y=w=6,x=  z = x; y=w=6,z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 103

104 Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6,x=  z = x; y=w=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 104

105 Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6,x=  z = x; y=w=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 105

106 Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6, z=x=  z = x; y=w=6, z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 106

107 Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6, z=x=  z = x; y=w=6, z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined Global analysis reached fixpoint 107

108 Dataflow for constant propagation Direction: Forward Semilattice: Vars  {Undefined, 0, 1, -1, 2, -2, …, Not-a-Constant} –Join mapping for variables point-wise {x  1,y  1,z  1}  {x  1,y  2,z  Not-a-Constant} = {x  1,y  Not-a-Constant,z  Not-a-Constant} Transfer functions: –f x=k (V) = V| x  k (update V by mapping x to k) –f x=a+b (V) = V| x  ? Initial value: x is Undefined –(When might we use some other value?) 108

109 Dataflow for constant propagation Direction: Forward Semilattice: Vars  {Undefined, 0, 1, -1, 2, -2, …, Not-a-Constant} –Join mapping for variables point-wise {x  1,y  1,z  1}  {x  1,y  2,z  Not-a-Constant} = {x  1,y  Not-a-Constant,z  Not-a-Constant} Transfer functions: –f x=k (V) = V| x  k (update V by mapping x to k) –f x=a+b (V) = V| x  ? Initial value: x is Undefined –(When might we use some other value?) 109

110 Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Given this, how do we know the analyses will eventually terminate? –In general, we don‘t 110

111 Terminates? 111

112 Liveness Analysis A variable is live at a point in a program if later in the program its value will be read before it is written to again 112

113 Join semilattice definition A join semilattice is a pair (V,  ), where V is a domain of elements  is a join operator that is –commutative: x  y = y  x –associative: (x  y)  z = x  (y  z) –idempotent: x  x = x If x  y = z, we say that z is the join or (Least Upper Bound) of x and y Every join semilattice has a bottom element denoted  such that   x = x for all x 113

114 Partial ordering induced by join Every join semilattice (V,  ) induces an ordering relationship  over its elements Define x  y iff x  y = y Need to prove –Reflexivity: x  x –Antisymmetry: If x  y and y  x, then x = y –Transitivity: If x  y and y  z, then x  z 114

115 A join semilattice for liveness Sets of live variables and the set union operation Idempotent: –x  x = x Commutative: –x  y = y  x Associative: –(x  y)  z = x  (y  z) Bottom element: –The empty set: Ø  x = x Ordering over elements = subset relation 115

116 Join semilattice example for liveness {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element 116

117 Dataflow framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, NOT the order in which to visit the basic blocks –V is a set of values (sometimes called domain) –  is a join operator over those values –F is a set of transfer functions f s : V  V (for every statement s) –I is an initial value 117

118 Running global analyses Assume that ( D, V, , F, I ) is a forward analysis For every statement s maintain values before - IN[s] - and after - OUT[s] Set OUT[s] =  for all statements s Set OUT[entry] = I Repeat until no values change: –For each statement s with predecessors PRED[s]={p 1, p 2, …, p n } Set IN[s] = OUT[p 1 ]  OUT[p 2 ]  …  OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter –Chaotic iteration 118

119 Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Problem: how do we know the analyses will eventually terminate? 119

120 A non-terminating analysis The following analysis will loop infinitely on any CFG containing a loop: Direction: Forward Domain: ℕ Join operator: max Transfer function: f(n) = n + 1 Initial value: 0 120

121 A non-terminating analysis start end x++ 121

122 Initialization start end x++ 0 0 122

123 Fixed-point iteration start end x++ 0 0 123

124 Choose a block start end x ++ 0 0 124

125 Iteration 1 start end x ++ 0 0 0 125

126 Iteration 1 start end x++ 1 0 0 126

127 Choose a block start end x++ 1 0 0 127

128 Iteration 2 start end x ++ 1 0 0 128

129 Iteration 2 start end x++ 1 0 1 129

130 Iteration 2 start end x++ 2 0 1 130

131 Choose a block start end x++ 2 0 1 131

132 Iteration 3 start end x++ 2 0 1 132

133 Iteration 3 start end x++ 2 0 2 133

134 Iteration 3 start end x++ 3 0 2 134

135 Why doesn’t this terminate? Values can increase without bound Note that “increase” refers to the lattice ordering, not the ordering on the natural numbers The height of a semilattice is the length of the longest increasing sequence in that semilattice The dataflow framework is not guaranteed to terminate for semilattices of infinite height Note that a semilattice can be infinitely large but have finite height –e.g. constant propagation 0 1 2 3 4... 135

136 Height of a lattice An increasing chain is a sequence of elements   a 1  a 2  …  a k –The length of such a chain is k The height of a lattice is the length of the maximal increasing chain For liveness with n program variables: –{}  {v 1 }  {v 1,v 2 }  …  {v 1,…,v n } For available expressions it is the number of expressions of the form a=b op c –For n program variables and m operator types: m  n 3 136

137 Another non-terminating analysis This analysis works on a finite-height semilattice, but will not terminate on certain CFGs: Direction: Forward Domain: Boolean values true and false Join operator: Logical OR Transfer function: Logical NOT Initial value: false 137

138 A non-terminating analysis start end x=!x 138

139 Initialization start end x=!x false 139

140 Fixed-point iteration start end x=!x false 140

141 Choose a block start end x=!x false 141

142 Iteration 1 start end x=!x false 142

143 Iteration 1 start end x=!x true false 143

144 Iteration 2 start end x=!x true false true 144

145 Iteration 2 start end x=!x false true 145

146 Iteration 3 start end x=!x false 146

147 Iteration 3 start end x=!x true false 147

148 Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever fals e true fals e true fals e... 148

149 Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever How can we fix this? fals e true fals e true fals e... 149

150 Monotone transfer functions A transfer function f is monotone iff if x  y, then f(x)  f(y) Intuitively, if you know less information about a program point, you can't “gain back” more information about that program point Many transfer functions are monotone, including those for liveness and constant propagation Note: Monotonicity does not mean that x  f(x) –(This is a different property called extensivity) 150

151 Liveness and monotonicity A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer function for a = b + c is –f a = b + c (V) = (V – {a})  {b, c} Recall that our join operator is set union and induces an ordering relationship X  Y iff X  Y Is this monotone? 151

152 Is constant propagation monotone? A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer functions –f x=k (V) = V| x  k (update V by mapping x to k) –f x=a+b (V) = V| x  Not-a-Constant (assign Not-a-Constant) Is this monotone? Undefined 0-212... Not-a-constant 152

153 The grand result Theorem: A dataflow analysis with a finite- height semilattice and family of monotone transfer functions always terminates Proof sketch: –The join operator can only bring values up –Transfer functions can never lower values back down below where they were in the past (monotonicity) –Values cannot increase indefinitely (finite height) 153

154 An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths –Join over all paths Optimal if we ignore program conditions 154

155 An “optimality” result If(*) A else B; If(*) C else D; AB * * CD exit AA * BB CDCD f C (f A (  )  f B (  ))  f D (f A (  )  f B (  ))   f C (f A (  ))  f D (f A (  ))  f C (f B (  ))  f D (f B (  )) ?=?= 155

156 An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths –Join over all paths Optimal if we pretend all control-flow paths can be executed by the program Which analyses use distributive functions? 156

157 Loop optimizations Most of a program’s computations are done inside loops –Focus optimizations effort on loops The optimizations we’ve seen so far are independent of the control structure Some optimizations are specialized to loops –Loop-invariant code motion –(Strength reduction via induction variables) Require another type of analysis to find out where expressions get their values from –Reaching definitions (Also useful for improving register allocation) 157

158 Loop invariant computation y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … 158

159 Loop invariant computation y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … t*4 and y+z have same value on each iteration 159

160 Code hoisting x < w end x = x + 1 start y = … t = … z = … y = t * 4 w = y + z 160

161 What reasoning did we use? y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … y is defined inside loop but it is loop invariant since t*4 is loop- invariant Both t and z are defined only outside of loop constants are trivially loop- invariant 161

162 What about now? y = t * 4 x < y + z end x = x + 1 t = t + 1 start y = … t = … z = … Now t is not loop-invariant and so are t*4 and y 162

163 Loop-invariant code motion d: t = a 1 op a 2 –d is a program location a 1 op a 2 loop-invariant (for a loop L) if computes the same value in each iteration –Hard to know in general Conservative approximation –Each a i is a constant, or –All definitions of a i that reach d are outside L, or –Only one definition of of a i reaches d, and is loop-invariant itself Transformation: hoist the loop-invariant code outside of the loop 163

164 Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined 164

165 Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined Direction: Forward Domain: sets of program locations that are definitions ` Join operator: union Transfer function: f d: a=b op c (RD) = (RD - defs(a))  {d} f d: not-a-def (RD) = RD –Where defs(a) is the set of locations defining a (statements of the form a=...) Initial value: {} 165

166 Reaching definitions analysis d4: y = t * 4 d4:x < y + z d6: x = x + 1 d1: y = … d2: t = … d3: z = … start end {} 166

167 Reaching definitions analysis d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … end {} 167

168 Initialization d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {} 168

169 Iteration 1 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {} 169

170 Iteration 1 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {} 170

171 Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} {} 171

172 Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {} 172

173 Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 173

174 Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 174

175 Iteration 3 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 175

176 Iteration 3 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 176

177 Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 177

178 Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 178

179 Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} 179

180 Iteration 5 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4} {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 180

181 Iteration 6 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 181

182 Which expressions are loop invariant? t is defined only in d2 – outside of loop z is defined only in d3 – outside of loop y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {d2, d3, d4, d5} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} x is defined only in d5 – inside of loop so is not a loop-invariant 182

183 Inferring loop-invariant expressions For a statement s of the form t = a 1 op a 2 A variable a i is immediately loop-invariant if all reaching definitions IN[s]={d 1,…,d k } for a i are outside of the loop LOOP-INV = immediately loop-invariant variables and constants LOOP-INV = LOOP-INV  {x | d: x = a 1 op a 2, d is in the loop, and both a 1 and a 2 are in LOOP-INV} –Iterate until fixed-point An expression is loop-invariant if all operands are loop-invariants 183

184 Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 184

185 Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t} 185

186 Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 186

187 Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 187

188 Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 188

189 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} LOOP-INV = {t, z, 4} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} Computing LOOP-INV 189

190 Computing LOOP-INV d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4, d5} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} LOOP-INV = {t, z, 4, y} 190

191 Induction variables while (i < x) { j = a + 4 * i a[j] = j i = i + 1 } i is incremented by a loop- invariant expression on each iteration – this is called an induction variable j is a linear function of the induction variable with multiplier 4 191

192 Strength-reduction j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1 } Prepare initial value Increment by multiplier 192

193 Summary of optimizations Enabled OptimizationsAnalysis Common-subexpression elimination Copy Propagation Available Expressions Constant foldingConstant Propagation Dead code elimination Register allocation Live Variables Loop-invariant code motionReaching Definitions 193

194 Ad Advanced course on program analysis and verification Workshop on compile time techniques for detecting malicious JavaScripts –With Trusteer Now IBM 194


Download ppt "Compilation 0368-3133 (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav."

Similar presentations


Ads by Google