Compilation 0368-3133 (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Lecture 11: Code Optimization CS 540 George Mason University.
Compiler Principles Winter Compiler Principles Loop Optimizations and Register Allocation Mayer Goldberg and Roman Manevich Ben-Gurion University.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
Program Representations. Representing programs Goals.
Compilation (Semester A, 2013/14) Lecture 11: Data Flow Analysis & Optimizations Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
1 Copy Propagation What does it mean? Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
CS 536 Spring Global Optimizations Lecture 23.
Correctness. Until now We’ve seen how to define dataflow analyses How do we know our analyses are correct? We could reason about each individual analysis.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
Administrative stuff Office hours: After class on Tuesday.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Ben Livshits Based in part of Stanford class slides from
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Precision Going back to constant prop, in what cases would we lose precision?
Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.
1 CS 201 Compiler Construction Data Flow Analysis.
Solving fixpoint equations
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Winter Compiler Principles Global Optimizations Mayer Goldberg and Roman Manevich Ben-Gurion University.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Compilation Lecture 8 Abstract Interpretation Noam Rinetzky 1.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Compilation Lecture 7 IR + Optimizations Noam Rinetzky 1.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Optimization Simone Campanoni
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Dataflow Analysis CS What I s Dataflow Analysis? Static analysis reasoning about flow of data in program Different kinds of data: constants, variables,
Compiler Principles Fall Compiler Principles Lecture 9: Dataflow & Optimizations 2 Roman Manevich Ben-Gurion University of the Negev.
Compiler Principles Fall Compiler Principles Lecture 8: Dataflow & Optimizations 1 Roman Manevich Ben-Gurion University of the Negev.
Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.
DFA foundations Simone Campanoni
Data Flow Analysis Suman Jana
Fall Compiler Principles Lecture 7: Dataflow & Optimizations 2
Fall Compiler Principles Lecture 6: Dataflow & Optimizations 1
Fall Compiler Principles Lecture 8: Loop Optimizations
IR + Optimizations Noam Rinetzky
Global optimizations.
University Of Virginia
Abstract Interpretation Noam Rinetzky
Optimizations Noam Rinetzky
Fall Compiler Principles Lecture 10: Global Optimizations
Fall Compiler Principles Lecture 10: Loop Optimizations
Data Flow Analysis Compiler Design
Fall Compiler Principles Lecture 6: Dataflow & Optimizations 1
Presentation transcript:

Compilation (Semester A, 2013/14) Lecture 12: Abstract Interpretation Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv and Eran Yahav 1

What is a compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 2

AST + Sym. Tab. Stages of compilation Source code (program) Lexical Analysis Syntax Analysis Parsing Context Analysis Portable/Retarg etable code generation Target code (executable) Assembly IR TextToken streamAST Code Generation 3

Instruction selection AST + Sym. Tab. Stages of Compilation Source code (program) Lexical Analysis Syntax Analysis Parsing Context Analysis Portable/Retarg etable code generation Target code (executable) Assembly IR TextToken streamAST “Naive” IR IR OptimizationIR (TAC) generation “Assembly” (symbolic registers) Peephole optimization Assembly Code Generation register allocation 4

Optimization points source code Front end IR Code generator target code Program Analysis Abstract interpretation today 5

Program Analysis In order to optimize a program, the compiler has to be able to reason about the properties of that program An analysis is called sound if it never asserts an incorrect fact about a program All the analyses we will discuss in this class are sound –(Why?) 6

Optimization path IR Control-Flow Graph CFG builder Program Analysis Annotated CFG Optimizing Transformation Target Code Code Generation (+optimizations) done with IR optimizations IR optimizations 7

Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x holds some integer value” 8

Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is either 137 or 42” 9

(Un) Soundness int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is 137” 10

Soundness & Precision int x; int y; if (y < 5) x = 137; else x = 42; Print(x); “At this point in the program, x is either 137, 42, or 271” 11

Semantics-preserving optimizations An optimization is semantics-preserving if it does not alter the semantics (meaning) of the original program Eliminating unnecessary temporary variables Computing values that are known statically at compile- time instead of computing them at runtime Evaluating iteration-independent expressions outside of a loop instead of inside ✗ Replacing bubble sort with quicksort (why?) The optimizations we will consider in this class are all semantics-preserving 12

Types of optimizations An optimization is local if it works on just a single basic block An optimization is global if it works on an entire control-flow graph 13

Local optimizations _t0 = 137; y = _t0; IfZ x Goto _L0; start _t1 = y; z = _t1; _t2 = y; x = _t2; start int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 14

Local optimizations y = 137; IfZ x Goto _L0; start z = y;x = y; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 15

Global optimizations y = 137; IfZ x Goto _L0; start z = y;x = y; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 16

Global optimizations y = 137; IfZ x Goto _L0; start z = 137;x = 137; End int main() { int x; int y; int z; y = 137; if (x == 0) z = y; else x = y; } 17

Global Analyses Common subexpression elimination Copy propagation Dead code elimination Live variables –Used for register allocation 18

Global liveness analysis a=b+c s2s3 IN[s2]IN[s3] OUT[s]=IN[s2]  IN[s3] IN[s]=(OUT[s] – {a})  {b, c} 19

Live variable analysis- initialization Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} 20

Live variable analysis Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a} 21

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a, b, c} {a} Live variable analysis IN[s2]=(OUT[s2] – {d})  {b, c} IN[s1]=(OUT[s1] – {a})  {a, b} ? 22

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {a, b, c} {a} {a, b, c} Live variable analysis 23

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {} {a, b, c} {a} {a, b, c} CFGs with loops - iteration 24

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {} {a, b, c} {a} {a, b, c} {b, c} CFGs with loops - iteration 25

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} CFGs with loops - iteration 26

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} Live variable analysis 27

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a} {a, b, c} {b, c} {a, b, c} Live variable analysis 28

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 29

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 30

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 31

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} {b, c} {a, b, c} Live variable analysis 32

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {c, d} {a, b, c} {a, c, d} {a, b, c} Live variable analysis 33

Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {a, c, d} {a, b, c} {a, c, d} {a, b, c} Live variable analysis 34

Live variable analysis Exit a = a + b; d = b + c; c = a + b; a = b + c; d = a + c; b = c + d; c = c + d; Entry {a} {a, b} {b, c} {a, c, d} {a, b, c} {a, c, d} {a, b, c} 35

Kill Gen Global liveness analysis: Kill/Gen a=b+c s2s3 IN[s2]IN[s3] OUT[s]=IN[s2]  IN[s3] IN[s]=(OUT[s] – {a})  {b, c} 36

Formalizing data-flow analyses Define an analysis of a basic block as a quadruple (D, V, , F, I) where –D is a direction (forwards or backwards) –V is a set of values the program can have at any point –  merging (joining) information –F is a family of transfer functions defining the meaning of any expression as a function f : V  V –I is the initial information at the top (or bottom) of a basic block 37

Liveness Analysis Direction: Backward Values: Sets of variables Transfer functions: Given a set of variable assignments V and statement a = b + c: Remove a from V (any previous value of a is now dead.) Add b and c to V (any previous value of b or c is now live.) Formally: V in = ( V out \ { a })  { b,c } Initial value: Depends on semantics of language –E.g., function arguments and return values (pushes) –Result of local analysis of other blocks as part of a global analysis Merge:  =  38

Available Expressions Direction: Forward Values: Sets of expressions assigned to variables Transfer functions: Given a set of variable assignments V and statement a = b + c: –Remove from V any expression containing a as a subexpression –Add to V the expression a = b + c –Formally: V out = ( V in \ {e | e contains a })  {a = b + c} Initial value: Empty set of expressions Merge:  = ? 39

Why does this work? (Live Var.) To show correctness, we need to show that –The algorithm eventually terminates, and –When it terminates, it has a sound answer Termination argument: –Once a variable is discovered to be live during some point of the analysis, it always stays live –Only finitely many variables and finitely many places where a variable can become live Soundness argument (sketch): –Each individual rule, applied to some set, correctly updates liveness in that set –When computing the union of the set of live variables, a variable is only live if it was live on some path leaving the statement 40

Abstract Interpretation Theoretical foundations of program analysis Cousot and Cousot 1977 Abstract meaning of programs –“Executed” at compile time Execution = Analysis 41

Another view of program analysis We want to reason about some property of the runtime behavior of the program Could we run the program and just watch what happens? Idea: Redefine the semantics of our programming language to give us information about our analysis 42

Another view of program analysis We want to reason about some property of the runtime behavior of the program Could we run the program and just watch what happens? 43

Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input 44

Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input Inside a basic block, it is simpler –Basic blocks contain no loops –There is only one path through the basic block 45

Another view of program analysis The only way to find out exactly what a program will actually do is to run it Problems: –The program might not terminate –The program might have some behavior we didn't see when we ran it on a particular input Inside a basic block, it is simpler –Basic blocks contain no loops –There is only one path through the basic block –But still … 46

Assigning new semantics (Local) Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics 47

Assigning new semantics (Global) Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics Merge information from different paths 48

Abstract Interpretation Example: Available Expressions Redefine the statement a = b + c to mean “a now holds the value of b + c, and any variable holding the value a is now invalid” “Run” the program assuming these new semantics Merge information from different paths Treat the optimizer as an interpreter for these new semantics 49

Theory of Program Analysis Building up all of the machinery to design this analysis was tricky The key ideas, however, are mostly independent of the analysis: –We need to be able to compute functions describing the behavior of each statement –We need to be able to merge several subcomputations together –We need an initial value for all of the basic blocks There is a beautiful formalism that captures many of these properties 50

Join semilattices A join semilattice is a ordering defined on a set of elements –0 ≤ 1 ≤ 2 ≤ … –{} ≤ {0} ≤ {0,1}, {1,2} ≤ {0,1,2} ≤ {1,2,3,4} Any two elements have some join that is the smallest element larger than both elements There is a unique bottom element, which is smaller than all other elements –The join of two elements represents combining (merging) information from two elements by an overapproximation The bottom element represents “no information yet” or “the least conservative possible answer” / 51

Join semilattice for liveness {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element 52

What is the join of {b} and {c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 53

What is the join of {b} and {c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 54

What is the join of {b} and {a,c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 55

What is the join of {b} and {a,c}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 56

What is the join of {a} and {a,b}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 57

What is the join of {a} and {a,b}? {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} 58

Formal definitions A join semilattice is a pair (V,  ), where V is a domain of elements  is a join operator that is –commutative: x  y = y  x –associative: (x  y)  z = x  (y  z) –idempotent: x  x = x If x  y = z, we say that z is the join or (least upper bound) of x and y Every join semilattice has a bottom element denoted  such that   x = x for all x 59

Join semilattices and ordering {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Greater Lower 60

Join semilattices and ordering {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Least precise Most precise 61

Join semilattices and orderings Every join semilattice (V,  ) induces an ordering relationship  over its elements Define x  y iff x  y = y Need to prove –Reflexivity: x  x –Antisymmetry: If x  y and y  x, then x = y –Transitivity: If x  y and y  z, then x  z 62

An example join semilattice The set of natural numbers and the max function Idempotent –max{a, a} = a Commutative –max{a, b} = max{b, a} Associative –max{a, max{b, c}} = max{max{a, b}, c} Bottom element is 0: –max{0, a} = a What is the ordering over these elements? 63

A join semilattice for liveness Sets of live variables and the set union operation Idempotent: –x  x = x Commutative: –x  y = y  x Associative: –(x  y)  z = x  (y  z) Bottom element: –The empty set: Ø  x = x What is the ordering over these elements? 64

Semilattices and program analysis Semilattices naturally solve many of the problems we encounter in global analysis How do we combine information from multiple basic blocks? What value do we give to basic blocks we haven't seen yet? How do we know that the algorithm always terminates? 65

Semilattices and program analysis Semilattices naturally solve many of the problems we encounter in global analysis How do we combine information from multiple basic blocks? –Take the join of all information from those blocks What value do we give to basic blocks we haven't seen yet? –Use the bottom element How do we know that the algorithm always terminates? –Actually, we still don't! More on that later 66

A general framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, not the order in which to visit the basic blocks –V is a set of values –  is a join operator over those values –F is a set of transfer functions f : V  V –I is an initial value The only difference from local analysis is the introduction of the join operator 67

Running global analyses Assume that ( D, V, , F, I ) is a forward analysis Set OUT[s] =  for all statements s Set OUT[entry] = I Repeat until no values change: –For each statement s with predecessors p 1, p 2, …, p n : Set IN[s] = OUT[p 1 ]  OUT[p 2 ]  …  OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter –This is sometimes called chaotic iteration 68

The dataflow framework This form of analysis is called the dataflow framework Can be used to easily prove an analysis is sound With certain restrictions, can be used to prove that an analysis eventually terminates –Again, more on that later 69

Global constant propagation Constant propagation is an optimization that replaces each variable that is known to be a constant value with that constant An elegant example of the dataflow framework 70

Global constant propagation exit x = 4; z = x; w = x; y = x;z = y; x = 6; entry 71

Global constant propagation exit x = 4; z = x; w = x; y = x;z = y; x = 6; entry 72

Global constant propagation exit x = 4; z = x; w = 6; y = 6;z = y; x = 6; entry 73

Constant propagation analysis In order to do a constant propagation, we need to track what values might be assigned to a variable at each program point Every variable will either –Never have a value assigned to it, –Have a single constant value assigned to it, –Have two or more constant values assigned to it, or –Have a known non-constant value. –Our analysis will propagate this information throughout a CFG to identify locations where a value is constant 74

Properties of constant propagation For now, consider just some single variable x At each point in the program, we know one of three things about the value of x: –x is definitely not a constant, since it's been assigned two values or assigned a value that we know isn't a constant –x is definitely a constant and has value k –We have never seen a value for x Note that the first and last of these are not the same! –The first one means that there may be a way for x to have multiple values –The last one means that x never had a value at all 75

Defining a join operator The join of any two different constants is Not-a-Constant –(If the variable might have two different values on entry to a statement, it cannot be a constant) The join of Not a Constant and any other value is Not-a- Constant –(If on some path the value is known not to be a constant, then on entry to a statement its value can't possibly be a constant) The join of Undefined and any other value is that other value –(If x has no value on some path and does have a value on some other path, we can just pretend it always had the assigned value) 76

A semilattice for constant propagation One possible semilattice for this analysis is shown here (for each variable): Undefined Not-a-constant The lattice is infinitely wide 77

A semilattice for constant propagation One possible semilattice for this analysis is shown here (for each variable): Undefined Not-a-constant Note: The join of any two different constants is Not-a-Constant The join of Not a Constant and any other value is Not-a-Constant The join of Undefined and any other value is that other value 78

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined x = 6; Undefined entry Undefined x=Undefined y=Undefined z=Undefined w=Undefined 79

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined x = 6; Undefined entry Undefined 80

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; Undefined entry Undefined 81

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; x = 6, y=z=w=  entry Undefined 82

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined y = x; Undefined z = y; Undefined Undefined x = 6; x = 6, y=z=w=  entry Undefined 83

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; Undefined z = y; Undefined Undefined x = 6; x = 6 entry Undefined 84

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 85

Global constant propagation exit x = 4; Undefined z = x; Undefined w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined y=6  y=Undefined gives what? 86

Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 87

Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; Undefined x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 88

Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 89

Global constant propagation exit x = 4; Undefined z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 90

Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; Undefined x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 91

Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 92

Global constant propagation exit x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 93

Global constant propagation exit x=y=w=z=6 x = 4; Undefined x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 94

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 95

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 96

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 97

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; Undefined Undefined x = 6; x = 6 entry Undefined 98

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 99

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 x=y=w=6 z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined x=6  x=4 gives what? 100

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 y=w=6, x=  z = x; x=y=w=z=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 101

Global constant propagation exit x=y=w=z=6 x = 4; x=4, y=w=z=6 y=w=6,x=  z = x; y=w=6,z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 102

Global constant propagation exit x=y=w=z=6 x = 4; x=4,y=w=6,z=  y=w=6,x=  z = x; y=w=6,z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 103

Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6,x=  z = x; y=w=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 104

Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6,x=  z = x; y=w=6 x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 105

Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6, z=x=  z = x; y=w=6, z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined 106

Global constant propagation exit y=w=6,z=x=  x = 4; x=4,y=w=6,z=  y=w=6, z=x=  z = x; y=w=6, z=x=  x=6,y=6 w = x; x=y=w=6 x=6 y = x; x=6,y=6 x = 6 z = y; x = 6 Undefined x = 6; x = 6 entry Undefined Global analysis reached fixpoint 107

Dataflow for constant propagation Direction: Forward Semilattice: Vars  {Undefined, 0, 1, -1, 2, -2, …, Not-a-Constant} –Join mapping for variables point-wise {x  1,y  1,z  1}  {x  1,y  2,z  Not-a-Constant} = {x  1,y  Not-a-Constant,z  Not-a-Constant} Transfer functions: –f x=k (V) = V| x  k (update V by mapping x to k) –f x=a+b (V) = V| x  ? Initial value: x is Undefined –(When might we use some other value?) 108

Dataflow for constant propagation Direction: Forward Semilattice: Vars  {Undefined, 0, 1, -1, 2, -2, …, Not-a-Constant} –Join mapping for variables point-wise {x  1,y  1,z  1}  {x  1,y  2,z  Not-a-Constant} = {x  1,y  Not-a-Constant,z  Not-a-Constant} Transfer functions: –f x=k (V) = V| x  k (update V by mapping x to k) –f x=a+b (V) = V| x  ? Initial value: x is Undefined –(When might we use some other value?) 109

Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Given this, how do we know the analyses will eventually terminate? –In general, we don‘t 110

Terminates? 111

Liveness Analysis A variable is live at a point in a program if later in the program its value will be read before it is written to again 112

Join semilattice definition A join semilattice is a pair (V,  ), where V is a domain of elements  is a join operator that is –commutative: x  y = y  x –associative: (x  y)  z = x  (y  z) –idempotent: x  x = x If x  y = z, we say that z is the join or (Least Upper Bound) of x and y Every join semilattice has a bottom element denoted  such that   x = x for all x 113

Partial ordering induced by join Every join semilattice (V,  ) induces an ordering relationship  over its elements Define x  y iff x  y = y Need to prove –Reflexivity: x  x –Antisymmetry: If x  y and y  x, then x = y –Transitivity: If x  y and y  z, then x  z 114

A join semilattice for liveness Sets of live variables and the set union operation Idempotent: –x  x = x Commutative: –x  y = y  x Associative: –(x  y)  z = x  (y  z) Bottom element: –The empty set: Ø  x = x Ordering over elements = subset relation 115

Join semilattice example for liveness {} {a}{b}{c} {a, b}{a, c}{b, c} {a, b, c} Bottom element 116

Dataflow framework A global analysis is a tuple ( D, V, , F, I ), where –D is a direction (forward or backward) The order to visit statements within a basic block, NOT the order in which to visit the basic blocks –V is a set of values (sometimes called domain) –  is a join operator over those values –F is a set of transfer functions f s : V  V (for every statement s) –I is an initial value 117

Running global analyses Assume that ( D, V, , F, I ) is a forward analysis For every statement s maintain values before - IN[s] - and after - OUT[s] Set OUT[s] =  for all statements s Set OUT[entry] = I Repeat until no values change: –For each statement s with predecessors PRED[s]={p 1, p 2, …, p n } Set IN[s] = OUT[p 1 ]  OUT[p 2 ]  …  OUT[p n ] Set OUT[s] = f s (IN[s]) The order of this iteration does not matter –Chaotic iteration 118

Proving termination Our algorithm for running these analyses continuously loops until no changes are detected Problem: how do we know the analyses will eventually terminate? 119

A non-terminating analysis The following analysis will loop infinitely on any CFG containing a loop: Direction: Forward Domain: ℕ Join operator: max Transfer function: f(n) = n + 1 Initial value: 0 120

A non-terminating analysis start end x++ 121

Initialization start end x

Fixed-point iteration start end x

Choose a block start end x

Iteration 1 start end x

Iteration 1 start end x

Choose a block start end x

Iteration 2 start end x

Iteration 2 start end x

Iteration 2 start end x

Choose a block start end x

Iteration 3 start end x

Iteration 3 start end x

Iteration 3 start end x

Why doesn’t this terminate? Values can increase without bound Note that “increase” refers to the lattice ordering, not the ordering on the natural numbers The height of a semilattice is the length of the longest increasing sequence in that semilattice The dataflow framework is not guaranteed to terminate for semilattices of infinite height Note that a semilattice can be infinitely large but have finite height –e.g. constant propagation

Height of a lattice An increasing chain is a sequence of elements   a 1  a 2  …  a k –The length of such a chain is k The height of a lattice is the length of the maximal increasing chain For liveness with n program variables: –{}  {v 1 }  {v 1,v 2 }  …  {v 1,…,v n } For available expressions it is the number of expressions of the form a=b op c –For n program variables and m operator types: m  n 3 136

Another non-terminating analysis This analysis works on a finite-height semilattice, but will not terminate on certain CFGs: Direction: Forward Domain: Boolean values true and false Join operator: Logical OR Transfer function: Logical NOT Initial value: false 137

A non-terminating analysis start end x=!x 138

Initialization start end x=!x false 139

Fixed-point iteration start end x=!x false 140

Choose a block start end x=!x false 141

Iteration 1 start end x=!x false 142

Iteration 1 start end x=!x true false 143

Iteration 2 start end x=!x true false true 144

Iteration 2 start end x=!x false true 145

Iteration 3 start end x=!x false 146

Iteration 3 start end x=!x true false 147

Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever fals e true fals e true fals e

Why doesn’t it terminate? Values can loop indefinitely Intuitively, the join operator keeps pulling values up If the transfer function can keep pushing values back down again, then the values might cycle forever How can we fix this? fals e true fals e true fals e

Monotone transfer functions A transfer function f is monotone iff if x  y, then f(x)  f(y) Intuitively, if you know less information about a program point, you can't “gain back” more information about that program point Many transfer functions are monotone, including those for liveness and constant propagation Note: Monotonicity does not mean that x  f(x) –(This is a different property called extensivity) 150

Liveness and monotonicity A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer function for a = b + c is –f a = b + c (V) = (V – {a})  {b, c} Recall that our join operator is set union and induces an ordering relationship X  Y iff X  Y Is this monotone? 151

Is constant propagation monotone? A transfer function f is monotone iff if x  y, then f(x)  f(y) Recall our transfer functions –f x=k (V) = V| x  k (update V by mapping x to k) –f x=a+b (V) = V| x  Not-a-Constant (assign Not-a-Constant) Is this monotone? Undefined Not-a-constant 152

The grand result Theorem: A dataflow analysis with a finite- height semilattice and family of monotone transfer functions always terminates Proof sketch: –The join operator can only bring values up –Transfer functions can never lower values back down below where they were in the past (monotonicity) –Values cannot increase indefinitely (finite height) 153

An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is the solution that would be computed by joining results from all (potentially infinite) control-flow paths –Join over all paths Optimal if we ignore program conditions 154

An “optimality” result If(*) A else B; If(*) C else D; AB * * CD exit AA * BB CDCD f C (f A (  )  f B (  ))  f D (f A (  )  f B (  ))   f C (f A (  ))  f D (f A (  ))  f C (f B (  ))  f D (f B (  )) ?=?= 155

An “optimality” result A transfer function f is distributive if f(a  b) = f(a)  f(b) for every domain elements a and b If all transfer functions are distributive then the fixed-point solution is equal to the solution computed by joining results from all (potentially infinite) control-flow paths –Join over all paths Optimal if we pretend all control-flow paths can be executed by the program Which analyses use distributive functions? 156

Loop optimizations Most of a program’s computations are done inside loops –Focus optimizations effort on loops The optimizations we’ve seen so far are independent of the control structure Some optimizations are specialized to loops –Loop-invariant code motion –(Strength reduction via induction variables) Require another type of analysis to find out where expressions get their values from –Reaching definitions (Also useful for improving register allocation) 157

Loop invariant computation y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … 158

Loop invariant computation y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … t*4 and y+z have same value on each iteration 159

Code hoisting x < w end x = x + 1 start y = … t = … z = … y = t * 4 w = y + z 160

What reasoning did we use? y = t * 4 x < y + z end x = x + 1 start y = … t = … z = … y is defined inside loop but it is loop invariant since t*4 is loop- invariant Both t and z are defined only outside of loop constants are trivially loop- invariant 161

What about now? y = t * 4 x < y + z end x = x + 1 t = t + 1 start y = … t = … z = … Now t is not loop-invariant and so are t*4 and y 162

Loop-invariant code motion d: t = a 1 op a 2 –d is a program location a 1 op a 2 loop-invariant (for a loop L) if computes the same value in each iteration –Hard to know in general Conservative approximation –Each a i is a constant, or –All definitions of a i that reach d are outside L, or –Only one definition of of a i reaches d, and is loop-invariant itself Transformation: hoist the loop-invariant code outside of the loop 163

Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined 164

Reaching definitions analysis A definition d: t = … reaches a program location if there is a path from the definition to the program location, along which the defined variable is never redefined Direction: Forward Domain: sets of program locations that are definitions ` Join operator: union Transfer function: f d: a=b op c (RD) = (RD - defs(a))  {d} f d: not-a-def (RD) = RD –Where defs(a) is the set of locations defining a (statements of the form a=...) Initial value: {} 165

Reaching definitions analysis d4: y = t * 4 d4:x < y + z d6: x = x + 1 d1: y = … d2: t = … d3: z = … start end {} 166

Reaching definitions analysis d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … end {} 167

Initialization d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {} 168

Iteration 1 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} end {} 169

Iteration 1 d4: y = t * 4 d4:x < y + z d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {} 170

Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} {} 171

Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {} 172

Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 173

Iteration 2 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 174

Iteration 3 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {} 175

Iteration 3 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 176

Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 177

Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4} {d2, d3, d4, d5} 178

Iteration 4 d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4} {} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} 179

Iteration 5 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4} {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 180

Iteration 6 end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4, d5} {d1} {d1, d2} {d1, d2, d3} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 181

Which expressions are loop invariant? t is defined only in d2 – outside of loop z is defined only in d3 – outside of loop y is defined only in d4 – inside of loop but depends on t and 4, both loop-invariant start d1: y = … d2: t = … d3: z = … {} {d1} {d1, d2} {d1, d2, d3} end {d2, d3, d4, d5} d5: x = x + 1 {d2, d3, d4, d5} d4: y = t * 4 x < y + z {d1, d2, d3, d4, d5} {d2, d3, d4, d5} x is defined only in d5 – inside of loop so is not a loop-invariant 182

Inferring loop-invariant expressions For a statement s of the form t = a 1 op a 2 A variable a i is immediately loop-invariant if all reaching definitions IN[s]={d 1,…,d k } for a i are outside of the loop LOOP-INV = immediately loop-invariant variables and constants LOOP-INV = LOOP-INV  {x | d: x = a 1 op a 2, d is in the loop, and both a 1 and a 2 are in LOOP-INV} –Iterate until fixed-point An expression is loop-invariant if all operands are loop-invariants 183

Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} 184

Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t} 185

Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 186

Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 187

Computing LOOP-INV end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} (immediately) LOOP-INV = {t, z} 188

end start d1: y = … d2: t = … d3: z = … {} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} LOOP-INV = {t, z, 4} d4: y = t * 4 x < y + z d5: x = x + 1 {d1, d2, d3, d4, d5} {d2, d3, d4, d5} Computing LOOP-INV 189

Computing LOOP-INV d4: y = t * 4 x < y + z end d5: x = x + 1 start d1: y = … d2: t = … d3: z = … {} {d1, d2, d3, d4, d5} {d2, d3, d4, d5} {d2, d3, d4} {d1} {d1, d2} {d1, d2, d3} {d2, d3, d4, d5} LOOP-INV = {t, z, 4, y} 190

Induction variables while (i < x) { j = a + 4 * i a[j] = j i = i + 1 } i is incremented by a loop- invariant expression on each iteration – this is called an induction variable j is a linear function of the induction variable with multiplier 4 191

Strength-reduction j = a + 4 * i while (i < x) { j = j + 4 a[j] = j i = i + 1 } Prepare initial value Increment by multiplier 192

Summary of optimizations Enabled OptimizationsAnalysis Common-subexpression elimination Copy Propagation Available Expressions Constant foldingConstant Propagation Dead code elimination Register allocation Live Variables Loop-invariant code motionReaching Definitions 193

Ad Advanced course on program analysis and verification Workshop on compile time techniques for detecting malicious JavaScripts –With Trusteer Now IBM 194