Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa.

Slides:

Advertisements

Similar presentations

Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Lecture 11: Code Optimization CS 540 George Mason University.

Load-Reuse Analysis: Design and Evaluation Rastislav Bodik, Rajiv Gupta, Mary Lou Soffa PLDI'99 Presented by Sue Ann Hong 4/11/2006.

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.

Demand-driven Alias Analysis Implementation Based on Open64 Xiaomi An

Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

1 Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta The University of Arizona.

CS590F Software Reliability What is a slice? S: …. = f (v)  Slice of v at S is the set of statements involved in computing v’s value at S. [Mark Weiser,

ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.

The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code Ben Hardekopf and Calvin Lin PLDI 2007 (Best Paper & Best.

Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.

Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.

Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.

Feedback: Keep, Quit, Start

Recap from last time Saw several examples of optimizations –Constant folding –Constant Prop –Copy Prop –Common Sub-expression Elim –Partial Redundancy.

Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.

Program analysis Mooly Sagiv html://

Administrative info Subscribe to the class mailing list –instructions are on the class web page, which is accessible from my home page, which is accessible.

Memory Redundancy Elimination to Improve Application Energy Efficiency Keith Cooper and Li Xu Rice University October 2003.

Program analysis Mooly Sagiv html://

© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.

Previous finals up on the web page use them as practice problems look at them early.

Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???

Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:

Intraprocedural Points-to Analysis Flow functions:

Overview of program analysis Mooly Sagiv html://

Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.

Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.

Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.

Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.

PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.

Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.

Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)

Recap from last time We saw various different issues related to program analysis and program transformations You were not expected to know all of these.

Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.

Eliminating Memory References Joshua Dunfield Alina Oprea.

Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.

P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.

Runtime Environments. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.

1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.

ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.

Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,

Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.

Using Types to Analyze and Optimize Object-Oriented Programs By: Amer Diwan Presented By: Jess Martin, Noah Wallace, and Will von Rosenberg.

Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.

Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,

Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science.

Static Identification of Delinquent Loads V.M. Panait A. Sasturkar W.-F. Fong.

Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

Basic Graph Algorithms Programming Puzzles and Competitions CIS 4900 / 5920 Spring 2009.

Memory-Aware Compilation Philip Sweany 10/20/2011.

Autumn 2006CSE P548 - Dataflow Machines1 Von Neumann Execution Model Fetch: send PC to memory transfer instruction from memory to CPU increment PC Decode.

Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Data Flow Analysis Suman Jana

Lecture 5 Partial Redundancy Elimination

Static Slicing Static slice is the set of statements that COULD influence the value of a variable for ANY input. Construct static dependence graph Control.

Harry Xu University of California, Irvine & Microsoft Research

‘99 ACM/IEEE International Symposium on Computer Architecture

Optimization Code Optimization ©SoftMoore Consulting.

Query-Friendly Compression of Graph Streams

University Of Virginia

Pointer analysis.

Network Flow CSE 373 Data Structures.

Complete Removal of Redundant Expressions

Presentation transcript:

Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa

Partial Redundancy Elimination (PRE) Partially redundant = computed on some incoming paths x:=a+b y:=a+b

a+b a:=..

Steps:  find “reuse” paths,  remove redundancy from “reuse” paths.

Register promotion = PRE of loads Three steps:  load-reuse analysis: find loads that can reuse prior loads/stores  alias analysis: which stores may kill reuse?  transformation: remove redundancy: PRE [PLDI ‘98] store a1, x store a3 load a2 load a4

Load-reuse analysis Design goal: completeness find all reuse To approach completeness, the analysis is uniform: analyze scalar, array, and pointer loads path-sensitive: different source of reuse on each path Evaluation goal: how complete? compare with ideal analysis Detecting all reuse is undecidable:  no ideal algorithm exists  instead, use simulation

Experimental framework load-reuse analysis simulator estimator transformation programinput comparison reuse level weighted solution data-flow solution profile [PLDI ‘98]

1. Load-reuse analysis It’s a data-flow analysis  on a reuse-aware representation: Value Name Graph (VNG): [POPL’98] What’s new? Sparse version of the VNG  up to 30-times smaller than non-sparse Analyzing indirect loads/stores  also, model killing stores

Naming the value y := b+c a := c-1 x := a+b+1

names for the value in ‘x’ x a+b+1 b+c

1 x a+b+1 b+c 1 1 GEN

Naming the value across loads.. := p->f.. := p->next->f *r :=... **(p+4) *p 1 1 p := p->next *p **(p+4) *p 1 1 f next offset: 0 4 GEN

kill if r = p+4 or r = *(p+4) KILL 

Sparse representation a1 := A+I load a1 a2 := A+I-1 load a2 for I = 1, N {.. := A[I] + A[I-1] } I := I+1

load a1 load a2 Ø Ø GEN

2. The simulator algorithm load a1 load a2 Ø for I = 1, N {.. := A[I] + A[I-1] } memory access history history length = 1 to 4 A[I-1] A[I] Simulator detects all PRE-exploitable reuse (up to given history length), but also some “noise”: e.g. due to hash table accesses

Ideal amount of load reuse 65% of executed loads has reuse exploitable by PRE intra-procedural reuse, history=1 go m88ksim gcc compress li ijpeg vortex tomcatv swim su2cor hydro history length 1 4 % of all dynamic loads

3. How frequent is the reuse? Edge profile: + cheap and available - cannot reconstruct frequencies of reuse paths load x kill x load x

Path profile: + precise - more expensive  Use edge profile, but bound its inherent error: compute lower & upper bound on reuse

Hierarchy of estimators PRE CMP 1 CMP c CMP r CMP f smaller error (but more complex) Hierarchy: a practical approach  A simple estimator not precise enough? Use next better one ! Estimator: data-flow solution + edge profile  weighted data-flow solution

The algorithms 1. The bounds: generators: points generating reuse stealers: points with no reuse upper bound: all reuse consumed lower bound: all reuse stolen load x kill x load x

2. Separating uncertainty: using the CMP region defined for PRE [PLDI ‘98] CMP = code-motion preventing all error is contained in the CMP region!

Improving precision “one” region connected regions control flow reachability network flow reachability

Estimators: precision PR E CMP 1 CMP c CMP r CMP f error smaller error INT FP

4. Analysis: how close to ideal ? *p **p calls array & pointer stores + calls all stores + calls ideal alias info reuse killed by: 100% = reuse seen by simulator

Related Work Load-Reuse Analysis  makes value numbering path-sensitive  Steffen, Knoop, Rüthing Value Flow Graph [ESOP ‘90] we show how analyze indirect loads, via symbolic evaluation Simulation-based analysis evaluation  Diwan, McKinley, Moss [PLDI’98] Type-based alias analysis: how powerful it needs to be? Estimators  Ramalingam “Frequency Analysis” [PLDI’96] returns a single estimate, not its bounds

Summary Load-reuse analysis:  reuse across indirect memory references  sparse representation Estimators: three principles  confidence: bound the edge-profile error  separation of uncertainty: inside/outside the CMP region  hierarchy: increasing precision and complexity Evaluation:  about 65% loads are amenable to PRE  our analysis can find about 80% of those

Combine three removal methods code motion control speculation restructuring M S R PLDI ‘98

Example: a+b M S R 10 50

Relative removal power M S R Loads removed, dynamic count, normalized Global CSE path- insensitive INT FP