Eliminating Memory References Joshua Dunfield Alina Oprea.

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Load-Reuse Analysis: Design and Evaluation Rastislav Bodik, Rajiv Gupta, Mary Lou Soffa PLDI'99 Presented by Sue Ann Hong 4/11/2006.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
SSA.
Static Single Assignment CS 540. Spring Efficient Representations for Reachability Efficiency is measured in terms of the size of the representation.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Jeffrey D. Ullman Stanford University. 2  Generalizes: 1.Moving loop-invariant computations outside the loop. 2.Eliminating common subexpressions. 3.True.
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Partial Redundancy Elimination.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Program Representations. Representing programs Goals.
Code Motion of Control Structures From the paper by Cytron, Lowry, and Zadeck, COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
05 May Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.
Partial Redundancy Elimination & Lazy Code Motion
Lazy Code Motion C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Partial Redundancy Elimination. Partial-Redundancy Elimination Minimize the number of expression evaluations By moving around the places where an expression.
Loop Invariant Code Motion — classical approaches — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Course project presentations No midterm project presentation Instead of classes, next week I’ll meet with each group individually, 30 mins each Two time.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Feedback: Keep, Quit, Start
More Dataflow Analysis CS153: Compilers Greg Morrisett.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
From last class: CSE Want to compute when an expression is available in a var Domain:
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Loop invariant detection using SSA An expression is invariant in a loop L iff: (base cases) –it’s a constant –it’s a variable use, all of whose single.
Class canceled next Tuesday. Recap: Components of IR Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements.
Improving Code Generation Honors Compilers April 16 th 2002.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Precision Going back to constant prop, in what cases would we lose precision?
1 CS 201 Compiler Construction Data Flow Analysis.
Computer Science & Engineering, Indian Institute of Technology, Bombay Code optimization by partial redundancy elimination using Eliminatability paths.
What’s in an optimizing compiler?
Detecting Equality of Variables in Programs Bowen Alpern, Mark N. Wegman, F. Kenneth Zadeck Presented by: Abdulrahman Mahmoud.
High-Level Transformations for Embedded Computing
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
Lecture 5 Partial Redundancy Elimination
Static Single Assignment
Topic 10: Dataflow Analysis
A Practical Stride Prefetching Implementation in Global Optimizer
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Topic 5a Partial Redundancy Elimination and SSA Form
EECS 583 – Class 8 Classic Optimization
Static Single Assignment
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/4/2019 CPEG421-05S/Topic5.
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/17/2019 CPEG421-05S/Topic5.
Prof. Dhananjay M Dhamdhere
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Presentation transcript:

Eliminating Memory References Joshua Dunfield Alina Oprea

Problem Definition – Register Promotion Memory-reuse analysis find loads and stores that access the same address and the execution path along which the reuse exists Program transformation promote values from memory to registers replace redundant loads and stores with register references

Different Approaches Memory-reuse analysis 1.Using the SSAPRE algorithm 2.Modeled as a data flow problem Program transformation Formulated as a PRE problem load a load b store c load d path 1path 2path 3 a = d on path 1 b = d on path 2 c != d

Different Approaches Memory-reuse analysis 1.Using the SSAPRE algorithm 2.Modeled as a data flow problem Program transformation Formulated as a PRE problem load a load d store c path 1path 2path 3 a = d on path 1 b = d on path 2 c != d load b

Register promotion using the SSA representation Paper: “Register Promotion by Sparse Partial Redundancy Elimination of Loads and Store” – Lo, Chow, Kennedy, Liu, Tu Register promotion = 2 problems: 1.PRE of loads 2.PRE of stores

Duality between loads and stores Loads – as ordinary expressions with respect to redundancy (have to delete the latter occurrences) Stores – reverse (have to delete the earlier occurrences) load a store a

PRE of loads Replace each store x expr by r expr x r where r is a pseudo-register Apply the SSAPRE algorithm, but take into account the occurrences of stores Effect of stores on loads: x rload x

Improving Code Motion in SSAPRE by Speculation Speculation = inserting computations during SSAPRE at  ’s where the computation is not down-safe (anticipated) Is not permitted by the original SSAPRE 2 strategies: conservative speculation (when profile data is not available) profile-driven speculation

Speculation Conservative Speculation Move loop-invariant computations out of single-entry loops (can perform worse if the body of the loop not executed) Profile-driven Speculation Pb of determining the optimum code placement is undecidable (solution between no-speculate and fully- speculate) Heuristics: do speculation at the granularity of the connected components of the SSA graph (for each connected component either no speculate or fully speculate)

PRE of Stores – SSU form Single Assignment Redundant Available, later Fully redundant Dominated (by earlier load) Where to factor Merge points Factoring op Insertion points iterated DF Movement Backward Use Anticipated, earlier Post-dominated (by later store) Split points iterated post-DF Forward LoadsStores

Performance PRE of loads reduces the number of loads by 25% PRE of stores reduces the number of stores by 1% Reasons: have already applied a dead store elimination algorithm Speculation results: conservative: same performance, even worse on some cases profile-driven: 2% reduction in the # of loads and 0.5% in the # of stores

Load-Reuse Analysis Paper: “Load-Reuse Analysis: Design and Evaluation” – Bodik, Gupta, Soffa Modeled as a data-flow problem 3-fold contributions: 1.load-reuse analysis supporting indirect memory accesses 2.simulation of the dynamic amount of load reuse 3.profile-based estimators: using edge profile information to assign a dynamic weight to the static load-reuse analysis

Framework

Load-Reuse Analysis Uses Value Name Graph (VNG) representation – keeps track of address expressions that compute the same value Traditionally – values identified by lexical name VNG supports symbolic equivalences Enhance VNG handle indirect addressing develop a sparse version (more space efficient)

VNG Example name = 2v +10 u = store (v-2) name = 2 (*u) +12 x = load (u) name = 2x+12 y = 2x+8 name = y+4 z = load (y+4) name = address of last load

Load-Reuse Analysis (cont) In computing symbolic names, do substitutions for w iterations for each loop Set the max no of indirection levels (0 or 1) Find congruence classes (names that refer to the same memory address) Extract a VNG sparse representation that contains only loads and stores Solve the data flow problem on the sparse VNG representation