Computer Science & Engineering, Indian Institute of Technology, Bombay Code optimization by partial redundancy elimination using Eliminatability paths.

Slides:

Advertisements

Similar presentations

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

Advertisements

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

Lecture 11: Code Optimization CS 540 George Mason University.

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

Jeffrey D. Ullman Stanford University. 2  Generalizes: 1.Moving loop-invariant computations outside the loop. 2.Eliminating common subexpressions. 3.True.

Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Partial Redundancy Elimination.

Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.

Program Representations. Representing programs Goals.

Code Motion of Control Structures From the paper by Cytron, Lowry, and Zadeck, COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda.

1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.

1 Partial Redundancy Elimination Finding the Right Place to Evaluate Expressions Four Necessary Data-Flow Problems.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.

Lazy Code Motion Comp 512 Spring 2011

05 May Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

Partial Redundancy Elimination & Lazy Code Motion

Lazy Code Motion C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.

Partial Redundancy Elimination. Partial-Redundancy Elimination Minimize the number of expression evaluations By moving around the places where an expression.

1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,

6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

1 Copy Propagation What does it mean? Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.

1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,

CS 536 Spring Global Optimizations Lecture 23.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.

1 CS 201 Compiler Construction Lecture 6 Code Optimizations: Constant Propagation & Folding.

1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.

Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.

Improving Code Generation Honors Compilers April 16 th 2002.

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.

Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.

Eliminating Memory References Joshua Dunfield Alina Oprea.

Precision Going back to constant prop, in what cases would we lose precision?

1 CS 201 Compiler Construction Data Flow Analysis.

1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.

U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.

Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.

1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.

Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.

1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.

Lecture 5 Partial Redundancy Elimination

Static Single Assignment

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.

Advanced Compiler Techniques

Topic 5a Partial Redundancy Elimination and SSA Form

Optimizations using SSA

Data Flow Analysis Compiler Design

CSC D70: Compiler Optimization LICM: Loop Invariant Code Motion

Prof. Dhananjay M Dhamdhere

CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019

Presentation transcript:

Computer Science & Engineering, Indian Institute of Technology, Bombay Code optimization by partial redundancy elimination using Eliminatability paths (E-paths) Prof. Dhananjay M Dhamdhere

Computer Science & Engineering, Indian Institute of Technology, Bombay These slides are based on D. M. Dhamdhere: “E-path_PRE---Partial redundancy D. M. Dhamdhere: “E-path_PRE---Partial redundancy elimination made easy”, SIGPLAN Notices, v 37, n 8 elimination made easy”, SIGPLAN Notices, v 37, n 8 (2002), (2002), D. M. Dhamdhere: “Eliminatability path---A versatile basis D. M. Dhamdhere: “Eliminatability path---A versatile basis for partial redundancy elimination, 2002 for partial redundancy elimination, 2002  Dheeraj Kumar: “Syntactic and Semantic Partial Redundancy elimination”, M. Tech. dissertation, Redundancy elimination”, M. Tech. dissertation, I.I.T. Bombay, I.I.T. Bombay, 2006.

Computer Science & Engineering, Indian Institute of Technology, Bombay Partial redundancy elimination Partial redundancy Partial redundancy An expression e in statement s is partially redundant if its value is An expression e in statement s is partially redundant if its value is identical with value of e in some path from start of program to s identical with value of e in some path from start of program to s Partial redundancy elimination Partial redundancy elimination -- A partially redundant occurrence of e is made totally redundant by -- A partially redundant occurrence of e is made totally redundant by inserting evaluations of e in some path(s) from start of the inserting evaluations of e in some path(s) from start of the program to s program to s -- The totally redundant occurrence of e is now eliminated -- The totally redundant occurrence of e is now eliminated

Computer Science & Engineering, Indian Institute of Technology, Bombay An example of PRE a*b Insert a*b in node 2 -- Delete a*b from node 3 t=a*b t 12 3

Computer Science & Engineering, Indian Institute of Technology, Bombay Partial redundancy elimination Common subexpression elimination (CSE) Common subexpression elimination (CSE) - Expression e is computed along all paths reaching its occurrence - Expression e is computed along all paths reaching its occurrence Loop invariant movement Loop invariant movement - A loop-invariant expression is available along the looping edge. - A loop-invariant expression is available along the looping edge. Hence it is partially redundant. Hence it is partially redundant. Classical code motion Classical code motion - A less known optimization. It is in fact partial redundancy - A less known optimization. It is in fact partial redundancy elimination in specific situations. elimination in specific situations. PRE subsumes 3 important classical optimizations:

Computer Science & Engineering, Indian Institute of Technology, Bombay PRE subsumes 3 optimizations a=..a*b CSE - a*b of node 5 is a CSE. 2. Loop invariant movement - a*b of node 4 is partially redundant 3. Code movement - a*b of node 6 can be moved to node 3.

Computer Science & Engineering, Indian Institute of Technology, Bombay Benefits and costs of PRE Benefits: Benefits: Execution efficiency through a reduction in the number of Execution efficiency through a reduction in the number of expression occurrences along a graph path expression occurrences along a graph path Costs: Costs: - Use of compiler generated temporaries to hold values - Use of compiler generated temporaries to hold values of expressions of expressions - Lifetimes of compiler generated temporaries increase - Lifetimes of compiler generated temporaries increase register pressure register pressure - Insertion of new blocks due to edge placement - Insertion of new blocks due to edge placement Desirable goals: Desirable goals: Computational optimality and lifetime optimality Computational optimality and lifetime optimality

Computer Science & Engineering, Indian Institute of Technology, Bombay Data flow concepts used in partial redundancy elimination Availability : An expression e is available at a program Availability : An expression e is available at a program point p if its value is computed along ALL paths from point p if its value is computed along ALL paths from start of the program to p start of the program to p Partial availability : An expression e is partially available Partial availability : An expression e is partially available at a program point p if its value is computed along SOME at a program point p if its value is computed along SOME path from start of the program to p path from start of the program to p Availability = Total redundancy Availability = Total redundancy Partial availability = Partial redundancy Partial availability = Partial redundancy

Computer Science & Engineering, Indian Institute of Technology, Bombay Data flow concepts used in partial redundancy elimination Anticipatability: An expression e is anticipatable (that is, Anticipatability: An expression e is anticipatable (that is, “very busy”) at a program point p if it is computed along “very busy”) at a program point p if it is computed along ALL paths from p to an exit of the program ALL paths from p to an exit of the program

Computer Science & Engineering, Indian Institute of Technology, Bombay Data flow concepts used in partial redundancy elimination Anticipatability: An expression e is anticipatable (that is, Anticipatability: An expression e is anticipatable (that is, “very busy”) at a program point p if it is computed along “very busy”) at a program point p if it is computed along ALL paths from p to an exit of the program ALL paths from p to an exit of the program Safety of a computation (Kennedy 1972): An expression Safety of a computation (Kennedy 1972): An expression e is safe at a program point p if it is either available or e is safe at a program point p if it is either available or anticipatable at p anticipatable at p - Insertion of e at p is a “new” computation if e is not - Insertion of e at p is a “new” computation if e is not safe at p. safe at p. - It increases the execution time of the program. It may - It increases the execution time of the program. It may also raise “new” exceptions also raise “new” exceptions

Computer Science & Engineering, Indian Institute of Technology, Bombay “Safe” insertion of computations a*b Insertion of a*b in node 12 is safe, however in 22 it is unsafe -- Insertion in edge (22,23) is safe! -- a*b is anticipatable in node 12, but not anticipatable in node 22

Computer Science & Engineering, Indian Institute of Technology, Bombay Some partial redundancies cannot be eliminated through safe code insertion a*b i k m n t=a*b -- Insertion in the in-edge of node n is unsafe because a*b is not anticipatable a*b available a*b anticipatable a*b ¬ available, ¬ anticipatable

Computer Science & Engineering, Indian Institute of Technology, Bombay Performing Partial Redundancy Elimination Identify partially redundant occurrences of an expression e in a program Identify partially redundant occurrences of an expression e in a program Insert occurrences of e at some program points where e is safe Insert occurrences of e at some program points where e is safe Delete partially redundant occurrences of e which have become totally Delete partially redundant occurrences of e which have become totally redundant redundant Classical PRE: Elimination of partial redundancies in a program through safe Classical PRE: Elimination of partial redundancies in a program through safe insertion of computations. insertion of computations. - Can be looked upon as `code movement’ from the point of original - Can be looked upon as `code movement’ from the point of original occurrence to the point of insertion occurrence to the point of insertion - It cannot eliminate all partial redundancies in a program! - It cannot eliminate all partial redundancies in a program!

Computer Science & Engineering, Indian Institute of Technology, Bombay A brief history of PRE Morel, Renvoise (1979): Bidirectional data flows for code placement in nodes (MRA). Morel, Renvoise (1979): Bidirectional data flows for code placement in nodes (MRA). Lacks both computational and lifetime optimality. Lacks both computational and lifetime optimality. Dhamdhere (1988): Computational optimality and reduced lifetimes of temporaries Dhamdhere (1988): Computational optimality and reduced lifetimes of temporaries than Morel-Renvoise through placement in nodes and edges (EPA). than Morel-Renvoise through placement in nodes and edges (EPA). Knoop, Ruthing, Steffen (1992): Lazy code motion (LCM) offering computational Knoop, Ruthing, Steffen (1992): Lazy code motion (LCM) offering computational optimality and lifetime optimality through a priori edge splitting and placement in optimality and lifetime optimality through a priori edge splitting and placement in nodes. Drechsler and Stadel (1993) reformulated LCM to handle basic blocks. nodes. Drechsler and Stadel (1993) reformulated LCM to handle basic blocks. Bodik, Gupta, Soffa (1998) : Complete elimination of partial redundancies through Bodik, Gupta, Soffa (1998) : Complete elimination of partial redundancies through selective code expansion (ComPRE). Based on the work by Steffen (1996). selective code expansion (ComPRE). Based on the work by Steffen (1996). Kennedy et al (1999): PRE in SSA representation of programs (SSAPRE). Kennedy et al (1999): PRE in SSA representation of programs (SSAPRE). Dhamdhere (2002): Eliminatability path --- A versatile basis for PRE Dhamdhere (2002): Eliminatability path --- A versatile basis for PRE (E-path_PRE). Develops a concept originating in Dhaneshwar, Dhamdhere (1995) (E-path_PRE). Develops a concept originating in Dhaneshwar, Dhamdhere (1995) and uses it for evaluation of PRE algorithms and development of new ones. and uses it for evaluation of PRE algorithms and development of new ones.  Xue, Knoop (2006) and Dheeraj kumar, Dhamdhere (2006)

Computer Science & Engineering, Indian Institute of Technology, Bombay Morel-Renvoise Algorithm (MRA) Performs insertions strictly in nodes of the program graph Performs insertions strictly in nodes of the program graph Placement possibility (PP) of e at entry/exit of basic blocks: Placement possibility (PP) of e at entry/exit of basic blocks: whether it is feasible and safe to place expression e at entry/exit whether it is feasible and safe to place expression e at entry/exit of a block of a block Insert e at the exit of a basic block b if it can be placed at the Insert e at the exit of a basic block b if it can be placed at the exit of b but not at its entry exit of b but not at its entry Delete an existing occurrence of e in a basic block if it can be Delete an existing occurrence of e in a basic block if it can be placed at the entry of that block placed at the entry of that block

Computer Science & Engineering, Indian Institute of Technology, Bombay Morel-Renvoise Algorithm (MRA)

Computer Science & Engineering, Indian Institute of Technology, Bombay Morel-Renvoise Algorithm (MRA) a=..a*b t=a*b t a=.. t=a*b a*b of node 4 cannot be optimized because it cannot be inserted in node 1. t 3. a*b is saved in t in nodes 2 and 4. a*b of node 6 is replaced by use of t. 1. a*b is inserted in node 2. Insertion in node 3 would have been lifetime optimal.

Computer Science & Engineering, Indian Institute of Technology, Bombay Edge placement algorithm (Dhamdhere 1988) Performs insertions both in nodes and along edges in the program graph Performs insertions both in nodes and along edges in the program graph An expression is hoisted as far up as possible to obtain computational optimality An expression is hoisted as far up as possible to obtain computational optimality It is then subjected to sinking (without affecting computational optimality) to obtain lifetime optimality It is then subjected to sinking (without affecting computational optimality) to obtain lifetime optimality It is placed along an edge only if it cannot be placed in a node It is placed along an edge only if it cannot be placed in a node Edge placement is performed only along a critical edge, i.e., an edge from a “branch” node to a “join” node Edge placement is performed only along a critical edge, i.e., an edge from a “branch” node to a “join” node

Computer Science & Engineering, Indian Institute of Technology, Bombay Edge placement algorithm (Dhamdhere 1988)

Computer Science & Engineering, Indian Institute of Technology, Bombay Edge placement algorithm (Dhamdhere 1988) A. Computational optimality: The ∏ term of PPIN is dropped. Hence PPIN can be true even if The ∏ term of PPIN is dropped. Hence PPIN can be true even if PPOUT of a predecessor is false. PPOUT of a predecessor is false. If PP is true for entry of a basic block i but PP is false for exit of a If PP is true for entry of a basic block i but PP is false for exit of a predecessor j, e is placed along the edge (j,i). predecessor j, e is placed along the edge (j,i). -- It is called edge placement. A basic block is inserted in the -- It is called edge placement. A basic block is inserted in the edge if e is to be placed along it. edge if e is to be placed along it. -- Edge placement performed only along a “critical edge”, i.e. -- Edge placement performed only along a “critical edge”, i.e. along an edge from a “branch” node to a “join” node. along an edge from a “branch” node to a “join” node. Placement into nodes is done as in MRA. Placement into nodes is done as in MRA.

Computer Science & Engineering, Indian Institute of Technology, Bombay Edge placement algorithm (Dhamdhere 1988) B. Reducing lifetimes of expression variables: Move insertion points as far down as possible without sacrificing Move insertion points as far down as possible without sacrificing computational optimality (it is achieved by the ∑ term) computational optimality (it is achieved by the ∑ term)

Computer Science & Engineering, Indian Institute of Technology, Bombay Edge placement algorithm (Dhamdhere 1988) EPA solution technique: (“hoisting-followed-by-sinking” approach) 1.Solve the unidirectional data flow problem obtained by omitting the ∑ term from the PPIN equation. 2. Now a second data flow is solved to incorporate the ∑ term: We examine all predecessors of a block i and change PPIN of block i from true to false if the ∑ term is false for its predecessors. It hoists e as far up as possible. Provides computational optimality. It sinks the hoisted expression as far down as possible without compromising computational optimality.

Computer Science & Engineering, Indian Institute of Technology, Bombay Edge placement algorithm (EPA) a=..a*b t=a*bt a= t=a*b 1. a*b is inserted in node 3. However, EPA does not provide lifetime optimality in some cases. 2. a*b is inserted in edge (1,4). This is computationally optimal. t t

Computer Science & Engineering, Indian Institute of Technology, Bombay Lazy code motion (KRS 92) All “join” edges are split a priori by inserting blocks along them All “join” edges are split a priori by inserting blocks along them D-Safe-earliest points: An expression is placed at the earliest points D-Safe-earliest points: An expression is placed at the earliest points where it is anticipatable. where it is anticipatable. Evaluation of an expression is delayed to the latest point where it Evaluation of an expression is delayed to the latest point where it can be placed without losing computational optimality. can be placed without losing computational optimality. Thus, it conceptually performs “hoisting-followed-by-sinking”, as in the edge placement algorithm. Thus, it conceptually performs “hoisting-followed-by-sinking”, as in the edge placement algorithm. Insertion and saving is performed uniformly. Insertion and saving is performed uniformly. Data flow equations are not given here. (Drechsler and Stadel Data flow equations are not given here. (Drechsler and Stadel reformulated them.) reformulated them.)

Computer Science & Engineering, Indian Institute of Technology, Bombay Lazy code motion (KRS) a=..a*b t a= t=a*b 2. a*b is inserted in edge (3,6). LCM provides lifetime optimality 3. a*b is inserted in edge (1,4). As in EPA, this is computationally optimal t t t=a*b (3,6) t=a*b (1,4) 1. Edges (1,4), (3,6), (5,6) and (5,4) are split a priori 4. Empty blocks: removed

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability paths offer.. A conceptual basis for PRE: A conceptual basis for PRE: - Identifies partial redundancies which can be - Identifies partial redundancies which can be eliminated through insertion of code in safe places eliminated through insertion of code in safe places * We call them eliminatable partial redundancies * We call them eliminatable partial redundancies - A simple method for identifying safe insertion points - A simple method for identifying safe insertion points which offer lifetime optimality which offer lifetime optimality - Thus, no “hoisting-followed-by-sinking” - Thus, no “hoisting-followed-by-sinking”

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability paths offer.. Computationally optimal PRE: Computationally optimal PRE: - Elimination of all eliminatable partial redundancies - Elimination of all eliminatable partial redundancies identified by E-paths through appropriate identified by E-paths through appropriate insertions provides computational optimality insertions provides computational optimality

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability paths offer.. PRE with lifetime optimality: PRE with lifetime optimality: - Insertions performed using the notion of E-paths - Insertions performed using the notion of E-paths provides lifetime optimality provides lifetime optimality

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability paths offer.. A versatile basis for PRE: A versatile basis for PRE: - Classical PRE: PRE performed by insertion, deletion and - Classical PRE: PRE performed by insertion, deletion and saving of expressions over a program graph saving of expressions over a program graph - PRE over SSA representations of programs - PRE over SSA representations of programs

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability paths offer.. Simplicity: Simplicity: - Insertion, deletion and save points are identified using - Insertion, deletion and save points are identified using simple and well-known data flow concepts of availability simple and well-known data flow concepts of availability and anticipatability and anticipatability

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability paths offer.. A basis for evaluating effectiveness of an approach to A basis for evaluating effectiveness of an approach to PRE: PRE: - Does the approach provide computational optimality? - Does the approach provide computational optimality? (i.e. does it eliminate all partial redundancies which can (i.e. does it eliminate all partial redundancies which can be eliminated?) be eliminated?) - Does the approach provide lifetime optimality? - Does the approach provide lifetime optimality?

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability Paths (E-paths) A path i.. k in a program control flow graph is an E-path for an A path i.. k in a program control flow graph is an E-path for an expression e if expression e if - Node i contains a locally available occurrence of e and node k - Node i contains a locally available occurrence of e and node k contains a locally anticipatable occurrence of e contains a locally anticipatable occurrence of e - Nodes in the path (i.. k) are empty wrt e, i.e. they do not contain - Nodes in the path (i.. k) are empty wrt e, i.e. they do not contain an occurrence of e or a definition of any of its operands an occurrence of e or a definition of any of its operands - e is safe at the exit of each node in [i.. k), i.e., it is either available - e is safe at the exit of each node in [i.. k), i.e., it is either available or anticipatable at the exit of each node in [i.. k). or anticipatable at the exit of each node in [i.. k). Path [i.. k) includes node i, but excludes node k. Path (i.. k) excludes nodes i and k.

Computer Science & Engineering, Indian Institute of Technology, Bombay Eliminatability Path* a*b i k m n - a*b available at exit of [i.. m] - a*b anticipatable at exit of [n.. k) Occurrence of a*b in node k is said to be “eliminatable” * Dhaneshwar, Dhamdhere (1995) used eliminatability of exps, but did not define or use E-paths explicitly. i.. k is an eliminatability path because - i contains a locally available occurrence of a*b - k contains locally anticipatable occurrence of a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay Properties of E-paths: 1 Properties of E-paths: 1 PRE using E-paths provides computational optimality PRE using E-paths provides computational optimality Use of this property: Use of this property: - Use it to evaluate computational - Use it to evaluate computational optimality of a PRE algorithm. optimality of a PRE algorithm. - A PRE algorithm possesses computational optimality if it can - A PRE algorithm possesses computational optimality if it can eliminate partial redundancy of e in EACH node k such that eliminate partial redundancy of e in EACH node k such that an E-path i.. k exists in G. an E-path i.. k exists in G.

Computer Science & Engineering, Indian Institute of Technology, Bombay Properties of E-paths: 2 Properties of E-paths: 2 If i.. k is an E-path and j is a node in (i.. k] If i.. k is an E-path and j is a node in (i.. k] - For each in-edge (g, j) such that node g is not in an E-path: - For each in-edge (g, j) such that node g is not in an E-path: if node g has a successor s which is not in an E-path if node g has a successor s which is not in an E-path then insert e in edge (g, j) then insert e in edge (g, j) else insert e in node g else insert e in node g - Such insertion provides lifetime optimality of the temporary variable - Such insertion provides lifetime optimality of the temporary variable used to hold value of e used to hold value of e Use of the property: Use of the property: - Check whether a PRE algorithm provides lifetime optimality by comparing - Check whether a PRE algorithm provides lifetime optimality by comparing program points where insertions are made program points where insertions are made

Computer Science & Engineering, Indian Institute of Technology, Bombay Lifetime optimality using E-paths a*b i k m j g1g1 t=a*b - Insertion in edge (g 1, j) and node g 2 is lifetime optimal g2g2 t=a*b - i.. k is an E-path

Computer Science & Engineering, Indian Institute of Technology, Bombay Evaluating MRA using E-paths a=..a*b t=a*b t a=.. t=a*b is an E-path. Insertion node 3 would have been lifetime optimal. t is an E-path. Hence a*b of node 4 is eliminatable, but not eliminated! 0. Three E-paths exist: 4.. 5, and

Computer Science & Engineering, Indian Institute of Technology, Bombay PRE using E-paths For an E-path i.. k For an E-path i.. k a) Insertions: For a node j in (i.. k] a) Insertions: For a node j in (i.. k] - Insert e in edge (g, j) if g is not in an E-path and has a - Insert e in edge (g, j) if g is not in an E-path and has a successor which is not in an E-path successor which is not in an E-path - Insert e in predecessor g if g is not in an E-path and all its - Insert e in predecessor g if g is not in an E-path and all its successors are in E-paths successors are in E-paths b) Save: Save the computation of e in node i, unless i is the b) Save: Save the computation of e in node i, unless i is the end-node of some E-path h.. i (in which case it would be end-node of some E-path h.. i (in which case it would be deleted). deleted). c) Deletion: Delete the occurrence of e in node k. c) Deletion: Delete the occurrence of e in node k.

Computer Science & Engineering, Indian Institute of Technology, Bombay PRE using E-paths  E-path i.. k may contain 3 kinds of segments - Avail. ¬Ant segment - Avail. ¬Ant segment - Avail. Ant segment - Avail. Ant segment - ¬Avail. Ant segment : This is called the “E-path suffix”. - ¬Avail. Ant segment : This is called the “E-path suffix”. Find a node m : ¬Avail(m). Anticipatable(m). ∑ Avail(p), p=pred Find a node m : ¬Avail(m). Anticipatable(m). ∑ Avail(p), p=pred This is the start node of the E-path suffix. This is the start node of the E-path suffix. - Trace Avail. ¬Ant segment backwards from m to find node i, the - Trace Avail. ¬Ant segment backwards from m to find node i, the start of the E-path and perform a save in it start of the E-path and perform a save in it - Trace ¬Avail. Ant segment forward from m - Trace ¬Avail. Ant segment forward from m a) to perform appropriate insertion for in-edges a) to perform appropriate insertion for in-edges b) to find k and perform a deletion b) to find k and perform a deletion

Computer Science & Engineering, Indian Institute of Technology, Bombay Segments in an E-Path a*b ׃ a) : Avail · ¬Ant. b) : Avail · Ant. c) : ¬Avail ·ּAnt (E-path suffix). E-path suffix: insertions may be needed in paths joining it Start node Of E-path suffix

Computer Science & Engineering, Indian Institute of Technology, Bombay Simple data flows for Comp : e is locally available (i.e. downwards exposed) in node Antloc : e is locally anticipatable (i.e. upwards exposed) in node Transp : node does not contain definitions of e’s : Terminology is from Morel-Renvoise algorithm

Computer Science & Engineering, Indian Institute of Technology, Bombay Simple data flows for E-path_PRE Availability and Anticipatability (i.e. very busy exps.) Availability and Anticipatability (i.e. very busy exps.) Eps-in/Eps-out (Node is in E-path suffix) Eps-in/Eps-out (Node is in E-path suffix)

Computer Science & Engineering, Indian Institute of Technology, Bombay Simple data flows for E-path_PRE Availability and Anticipatability Availability and Anticipatability Eps-in/Eps-out (Node is in E-path suffix) Eps-in/Eps-out (Node is in E-path suffix) SA_in/SA_out (A save should be “performed above”) SA_in/SA_out (A save should be “performed above”)

Computer Science & Engineering, Indian Institute of Technology, Bombay Efficiency of E-path_PRE data flows The generalized theory of bit-vector data flow analysis by Khedker, The generalized theory of bit-vector data flow analysis by Khedker, Dhamdhere (1994) defines two concepts for determining the cost of data Dhamdhere (1994) defines two concepts for determining the cost of data flow analysis flow analysis - Information flow path (ifp): A graph path along which data flow - Information flow path (ifp): A graph path along which data flow information may “flow” during data flow analysis. information may “flow” during data flow analysis. (Information “flow” : Values of data flow properties change from`lattice (Information “flow” : Values of data flow properties change from`lattice top’ to `lattice bot’ during iterative data flow analysis) top’ to `lattice bot’ during iterative data flow analysis) - “Width” of a graph (reduces to depth of a graph for unidirectional data - “Width” of a graph (reduces to depth of a graph for unidirectional data flows) flows)

Computer Science & Engineering, Indian Institute of Technology, Bombay Efficiency of E-path_PRE data flows The generalized theory of bit-vector data flow analysis by Khedker, The generalized theory of bit-vector data flow analysis by Khedker, Dhamdhere (1994) defines two concepts for determining the cost of data Dhamdhere (1994) defines two concepts for determining the cost of data flow analysis flow analysis - Information flow path (ifp): A graph path along which data flow - Information flow path (ifp): A graph path along which data flow information may “flow” during data flow analysis. information may “flow” during data flow analysis. (Information “flow” : Values of data flow properties change from`lattice (Information “flow” : Values of data flow properties change from`lattice top’ to `lattice bot’ during iterative data flow analysis) top’ to `lattice bot’ during iterative data flow analysis) - “Width” of a graph (reduces to depth of a graph for unidirectional data - “Width” of a graph (reduces to depth of a graph for unidirectional data flows) flows) Number of bit-vector operations during work-list iterative df analysis Number of bit-vector operations during work-list iterative df analysis depend on length of an ifp, and the number of iterations during depend on length of an ifp, and the number of iterations during round-robin iterative df analysis depend on width of an ifp round-robin iterative df analysis depend on width of an ifp

Computer Science & Engineering, Indian Institute of Technology, Bombay Efficiency of E-path_PRE data flows The Eps_in/out data flow of E-path_PRE has been designed to have The Eps_in/out data flow of E-path_PRE has been designed to have “short” information flow paths. This fact may also lead to small “short” information flow paths. This fact may also lead to small width of a program graph. width of a program graph. Short information flow paths and small width leads to smaller Short information flow paths and small width leads to smaller solution times of data flows. solution times of data flows. This fact is borne out by experimentation --- comparison with the This fact is borne out by experimentation --- comparison with the “later” data flow of Drechsler, Stadel (1993) (Dhamdhere 2002): “later” data flow of Drechsler, Stadel (1993) (Dhamdhere 2002): - In worklist solution: No. of bit vector operations is 80% smaller - In worklist solution: No. of bit vector operations is 80% smaller - In round-robin iterative solution: No. of iterations is 37% smaller - In round-robin iterative solution: No. of iterations is 37% smaller

Computer Science & Engineering, Indian Institute of Technology, Bombay Code placement models in PRE Node model Node model - Simple node model - Simple node model Each node contains a single statement Each node contains a single statement - Basic block model - Basic block model Each node is a basic block Each node is a basic block Insertion and Saving model Insertion and Saving model - Saving in situ - Saving in situ Value of an expression is saved in the place where it is located Value of an expression is saved in the place where it is located - Saving in entry/exit of node - Saving in entry/exit of node An expression is moved to node entry/exit if its value is to be saved An expression is moved to node entry/exit if its value is to be saved - Insertion at entry/exit of node - Insertion at entry/exit of node - Unified insertion and saving - Unified insertion and saving This is possible only when saving is done at node entry/exit This is possible only when saving is done at node entry/exit

Computer Science & Engineering, Indian Institute of Technology, Bombay Code placement models in PRE Morel-Renvoise Algorithm (MRA): Morel-Renvoise Algorithm (MRA): - Basic blocks, saving in situ, insertion at exit - Basic blocks, saving in situ, insertion at exit Edge placement algorithm (EPA): Edge placement algorithm (EPA): - Basic blocks, saving in situ, insertions at node exit and in critical edges - Basic blocks, saving in situ, insertions at node exit and in critical edges (edge splitting performed on a needs basis) (edge splitting performed on a needs basis) Lazy Code Motion (LCM): Lazy Code Motion (LCM): - Simple nodes, unified saving and insertion, insertion at node entries and - Simple nodes, unified saving and insertion, insertion at node entries and in blocks inserted in join edges in a priori edge splitting in blocks inserted in join edges in a priori edge splitting E_path-PRE E_path-PRE - Basic blocks, saving in situ, insertions at node exit and in critical edges - Basic blocks, saving in situ, insertions at node exit and in critical edges SIM-PRE SIM-PRE - Basic blocks, saving in situ, insertion strictly along edges - Basic blocks, saving in situ, insertion strictly along edges

Computer Science & Engineering, Indian Institute of Technology, Bombay Evaluation of code placement models using E-paths Morel-Renvoise algorithm (MRA) Morel-Renvoise algorithm (MRA) Missed opportunities of optimization (seen before) Missed opportunities of optimization (seen before) Lazy code motion (LCM) Lazy code motion (LCM) Performs insertion in a join edge (p,j) even if it could have been Performs insertion in a join edge (p,j) even if it could have been performed in node p performed in node p a*b a*b inserted 1 3 2

Computer Science & Engineering, Indian Institute of Technology, Bombay Evaluation of code placement models using E-paths Optimal code motion (OCM) Knoop et al 1994 Optimal code motion (OCM) Knoop et al Basic blocks, Hybrid model, Insertions at node entry and exit - Basic blocks, Hybrid model, Insertions at node entry and exit - Hybrid: Uniform insertion and saving model but saving is - Hybrid: Uniform insertion and saving model but saving is performed in situ performed in situ No insertions and savings will be performed at entry to a node No insertions and savings will be performed at entry to a node (Lemmas 19 and 23). Hence this feature is redundant. (Lemmas 19 and 23). Hence this feature is redundant.

Computer Science & Engineering, Indian Institute of Technology, Bombay Evaluations of code placement models using E-paths Complete elimination of partial redundancies (ComPRE) Complete elimination of partial redundancies (ComPRE) Bodik, Gupta and Soffa (1998) (when adapted to classical PRE) Bodik, Gupta and Soffa (1998) (when adapted to classical PRE) - Simple nodes, unified saving and insertion only in edges - Simple nodes, unified saving and insertion only in edges An expression in a node is redundantly hoisted into its entry-edges An expression in a node is redundantly hoisted into its entry-edges - Addressing this problem will require an additional data flow - Addressing this problem will require an additional data flow problem, making it less efficient than E-path_PRE. problem, making it less efficient than E-path_PRE.

Computer Science & Engineering, Indian Institute of Technology, Bombay Later work SIM-PRE by J. Xue, J. Knoop (2006): inserts only along edges SIM-PRE by J. Xue, J. Knoop (2006): inserts only along edges

Computer Science & Engineering, Indian Institute of Technology, Bombay SIM-PRE by Xue and Knoop  This data flow traces an E-path !

Computer Science & Engineering, Indian Institute of Technology, Bombay SIM-PRE by Xue and Knoop a*b i k m j g1g1 t=a*b - Insertion in edge (g 1, j) and node g 2 is lifetime optimal g2g2 - i.. k is an E-path - SIM-PRE inserts in edges (g 1, j) and (g 2, l ) l t = a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay SIM-PRE by Xue and Knoop  SIM-PRE performs better than E-path_PRE in bit vector operations (Graphic is from J. Xue, J. Knoop (2006))  However, it adds almost 50% more new blocks than E-path_PRE (Dheeraj Kumar, 2006)

Computer Science & Engineering, Indian Institute of Technology, Bombay Work by Dheeraj Kumar Simplified the data flows of E-path_PRE Simplified the data flows of E-path_PRE Eps_in/out data flow finds nodes {i } that belong to an E-path and have Antout i = true Eps_in/out data flow finds nodes {i } that belong to an E-path and have Antout i = true SA_in/out data flow finds nodes {i } that belong to an E-path and have Avout i = true SA_in/out data flow finds nodes {i } that belong to an E-path and have Avout i = : M. Tech. dissertation, IIT Bombay

Computer Science & Engineering, Indian Institute of Technology, Bombay Work by Dheeraj Kumar (2006) Simplified data flows of E-path_PRE (Proposal 2): Simplified data flows of E-path_PRE (Proposal 2):

Computer Science & Engineering, Indian Institute of Technology, Bombay Work by Dheeraj Kumar (2006) Experimental results Experimental results - SPECcpu2000 benchmark under GCC SPECcpu2000 benchmark under GCC Proposal 2 performance - Proposal 2 performance * Bit map operations 5.5% smaller than SIM-PRE * Bit map operations 5.5% smaller than SIM-PRE in worklist and 15.8% smaller in iterative in worklist and 15.8% smaller in iterative * Introduced 30% fewer blocks than SIM-PRE * Introduced 30% fewer blocks than SIM-PRE

Computer Science & Engineering, Indian Institute of Technology, Bombay Thus, eliminatability paths offer.. A conceptual basis for PRE A conceptual basis for PRE A versatile basis for PRE A versatile basis for PRE A basis for evaluating effectiveness of an approach to PRE A basis for evaluating effectiveness of an approach to PRE (Efficiency is a bonus!) (Efficiency is a bonus!)