Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.

Slides:



Advertisements
Similar presentations
Overview Structural Testing Introduction – General Concepts
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.
Tutorial on Widening (and Narrowing) Hongseok Yang Seoul National University.
A Deeper Look at Data-flow Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University.
Program Representations. Representing programs Goals.
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
CSE 231 : Advanced Compilers Building Program Analyzers.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
From last time: Lattices A lattice is a tuple (S, v, ?, >, t, u ) such that: –(S, v ) is a poset – 8 a 2 S. ? v a – 8 a 2 S. a v > –Every two elements.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
Administrative stuff Office hours: After class on Tuesday.
Recap Let’s do a recap of what we’ve seen so far Started with worklist algorithm for reaching definitions.
1 CS 201 Compiler Construction Lecture 6 Code Optimizations: Constant Propagation & Folding.
Data Flow Analysis Compiler Design Nov. 8, 2005.
San Diego October 4-7, 2006 Over 1,000 women in computing Events for undergraduates considering careers and graduate school Events for graduate students.
Recap: Reaching defns algorithm From last time: reaching defns worklist algo We want to avoid using structure of the domain outside of the flow functions.
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Data Flow Analysis Compiler Design Nov. 8, 2005.
From last lecture We want to find a fixed point of F, that is to say a map m such that m = F(m) Define ?, which is ? lifted to be a map: ? = e. ? Compute.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Induction and recursion
Constant Propagation. The constant propagation framework is different from all the data-flow problems discussed so far, in that It has an unbounded set.
Precision Going back to constant prop, in what cases would we lose precision?
Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.
Solving fixpoint equations
Compiler Construction Lecture 16 Data-Flow Analysis.
Formalization of DFA using lattices. Recall worklist algorithm let m: map from edge to computed value at edge let worklist: work list of nodes for each.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
1 Numeric Abstract Domains Mooly Sagiv Tel Aviv University Adapted from Antoine Mine.
CS 598 Scripting Languages Design and Implementation 9. Constant propagation and Type Inference.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Semilattices presented by Niko Simonson, CSS 548, Autumn 2012 Semilattice City, © 2009 Nora Shader.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Yet More Data flow analysis John Cavazos.
Lub and glb Given a poset (S, · ), and two elements a 2 S and b 2 S, then the: –least upper bound (lub) is an element c such that a · c, b · c, and 8 d.
Program Analysis Last Lesson Mooly Sagiv. Goals u Show the significance of set constraints for CFA of Object Oriented Programs u Sketch advanced techniques.
Lectures on Network Flows
Chapter 5. Optimal Matchings
Iterative Program Analysis Abstract Interpretation
Another example: constant prop
Program Analysis and Verification
Dataflow analysis.
Optimizations using SSA
Data Flow Analysis Compiler Design
Formalization of DFA using lattices
Formalization of DFA using lattices
Formalization of DFA using lattices
Formalization of DFA using lattices
Presentation transcript:

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe abstractly sets of program states (facts) 3.(started) Transfer functions that describe how to update facts Reviewing and continuing with: 4.Iterative algorithm, examples, convergence

Defining Abstract Interpretation Abstract Domain D (elements are data-flow facts), describing which information to compute, e.g. – inferred types for each variable: x:C, y:D – interval for each variable x:[a,b], y:[a’,b’] – for each variable if is: initialized, constant, live Transfer Functions, [[st]] for each statement st, how this statement affects the facts – Example:

For now: domain of Intervals D = {  } U { [a,b] | a ≤ b, a,b  Int32} Int64 = {-MI,...,-1,0,1,MI-1}, MI is e.g Intervals [a,b] whose ends are machine integers a,b where a is less than or equal to b [a,b] = { x | a ≤ x ≤ b} When a is minimal int, b maximal int, we have the largest representable set, largest element of the lattice, we also denote this by T (top) Least element , bottom represents empty set

Transfer Functions for Tests if (x > 1) { y = 1 / x } else { y = 42 } Tests e.g. [x>1] come from translating if,while into CFG

Joining Data-Flow Facts if (x > 0) { y = x } else { y = -x – 50 } join

Handling Loops: Iterate Until Stabilizes x = 1 while (x < 10) { x = x + 2 }

Analysis Algorithm var facts : Map[Node,Domain] = Map.withDefault(empty) facts(entry) = initialValues while (there was change) pick edge (v1,statmt,v2) from CFG such that facts(v1) has changed facts(v2)=facts(v2) join transerFun(statmt, facts(v1)) } Order does not matter for the end result, as long as we do not permanently neglect any edge whose source was changed.

Work List Version var facts : Map[Node,Domain] = Map.withDefault(empty) var worklist : Queue[Node] = empty def assign(v1:Node,d:Domain) = if (facts(v1)!=d) { facts(v1)=d for (stmt,v2) <- outEdges(v1) { worklist.add(v2) } } assign(entry, initialValues) while (!worklist.isEmpty) { var v2 = worklist.getAndRemoveFirst update = facts(v2) for (v1,stmt) <- inEdges(v2) { update = update join transferFun(facts(v1),stmt) } assign(v2, update) }

Run range analysis, prove error is unreachable int M = 16; int[M] a; x := 0; while (x < 10) { x := x + 3; } if (x >= 0) { if (x <= 15) a[x]=7; else error; } else { error; } checks array accesses

Range analysis results int M = 16; int[M] a; x := 0; while (x < 10) { x := x + 3; } if (x >= 0) { if (x <= 15) a[x]=7; else error; } else { error; } checks array accesses

Simplified Conditions int M = 16; int[M] a; x := 0; while (x < 10) { x := x + 3; } if (x >= 0) { if (x <= 15) a[x]=7; else error; } else { error; } checks array accesses

Remove Trivial Edges, Unreachable Nodes int M = 16; int[M] a; x := 0; while (x < 10) { x := x + 3; } if (x >= 0) { if (x <= 15) a[x]=7; else error; } else { error; } checks array accesses Benefits: - faster execution (no checks) - program cannot crash with error

Apply Range Analysis and Simplify int a, b, step, i; boolean c; a = 0; b = a + 10; step = -1; if (step > 0) { i = a; } else { i = b; } c = true; while (c) { process(i); i = i + step; if (step > 0) { c = (i < b); } else { c = (i > a); } For booleans, use this lattice: D b = { {}, {false}, {true}, {false,true} } with ordering given by set subset relation. (left as exercise)

Correctness Once the iterative loop stops, there were no changes. From there it follows: All program states that flow along an edge are included in the states in the target node. As long as this condition holds, it does not matter how we computed the states, the analysis results are correct. Proof is by considering an execution sequence through CFG and showing by induction that each state in the sequence is contained in the intervals.

How Long Does Analysis Take?

Handling Loops: Iterate Until Stabilizes x = 1 while (x < n) { x = x + 2 } How many steps until it stabilizes? Compute. n =

Handling Loops: Iterate Until Stabilizes var x : BigInt = 1 while (x < n) { x = x + 2 } For unknown program inputs and unbounded domains it may be practically impossible to know how long it takes. var n : BigInt = readInput() Solutions - smaller domain, e.g. only certain intervals [a,b] where a,b in {-MI,-127,-1,0,1,127,MI-1} - widening techniques (make it less precise on demand) How many steps until it stabilizes?

Smaller domain: intervals [a,b] where a,b  {-MI,-127,0,127,MI-1} (MI denoted M)

Size of analysis domain Interval analysis: D 1 = { [a,b] | a ≤ b, a,b  {-M,-127,-1,0,1,127,M-1}} U {  } Constant propagation: D 2 = { [a,a] | a  {-M,-(M-1),…,-2,-1,0,1,2,3,…,M-1}} U { ,T} suppose M is 2 63 |D 1 | = |D 2 | = How many steps until it stabilizes, for any program with one variable?

How many steps does the analysis take to finish (converge)? Interval analysis: D 1 = { [a,b] | a ≤ b, a,b  {-M,-127,-1,0,1,127,M-1}} U {  } Constant propagation: D 2 = { [a,a] | a  {-M,-(M-1),…,-2,-1,0,1,2,3,…,M-1}} U { ,T} suppose M is 2 63 With D 1 takes at most steps. With D 2 takes at most steps.

Chain of length n A set of elements x 0,x 1,..., x n in D that are linearly ordered, that is x 0  x 1 ...  x n A lattice can have many chains. Its height is the maximum n for all the chains, if finite If there is no upper bound on lengths of chains, we say lattice has infinite height A monotonic sequence of distinct elements has length at most equal to lattice height

Termination Given by Length of Chains Interval analysis: D 1 = { [a,b] | a ≤ b, a,b  {-M,-127,-1,0,1,127,M-1}} U {  } Constant propagation: D 2 = { [a,a] | a  {-M,…,-2,-1,0,1,2,3,…,M-1}} U { ,T} suppose M is 2 63

Product Lattice for All Variables If we have N variables, we keep one element for each of them This is like N-tuple of variables Resulting lattice is product of N lattices for individual variables Size is |D| N The height is only N times the height of D

Relational Analysis Suppose we keep track of interval for each var... // x:[0,10], y:[0,10] if (x > y) { if (y > 3) { t = 100/(x - 4) } } We would not be able to prove that x-4 > 0. Relational analysis would remember the constraint x<y to make such reasoning possible. Instead of tracking bounds on x,y, it tracks also bounds on x-y and x+y.