Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa.

Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa

Partial Redundancy Elimination (PRE) Partially redundant = computed on some incoming paths x:=a+b y:=a+b

a+b a:=..

Steps:  find “reuse” paths,  remove redundancy from “reuse” paths.

Register promotion = PRE of loads Three steps:  load-reuse analysis: find loads that can reuse prior loads/stores  alias analysis: which stores may kill reuse?  transformation: remove redundancy: PRE [PLDI ‘98] store a1, x store a3 load a2 load a4

Load-reuse analysis Design goal: completeness find all reuse To approach completeness, the analysis is uniform: analyze scalar, array, and pointer loads path-sensitive: different source of reuse on each path Evaluation goal: how complete? compare with ideal analysis Detecting all reuse is undecidable:  no ideal algorithm exists  instead, use simulation

Experimental framework load-reuse analysis simulator estimator transformation programinput comparison reuse level weighted solution data-flow solution profile 1. 2. 3. 4. [PLDI ‘98]

1. Load-reuse analysis It’s a data-flow analysis  on a reuse-aware representation: Value Name Graph (VNG): [POPL’98] What’s new? Sparse version of the VNG  up to 30-times smaller than non-sparse Analyzing indirect loads/stores  also, model killing stores

Naming the value y := b+c a := c-1 x := a+b+1

names for the value in ‘x’ x a+b+1 b+c

1 x a+b+1 b+c 1 1 GEN

Naming the value across loads.. := p->f.. := p->next->f *r :=... **(p+4) *p 1 1 p := p->next *p **(p+4) *p 1 1 f next offset: 0 4 GEN

kill if r = p+4 or r = *(p+4) KILL 

Sparse representation a1 := A+I load a1 a2 := A+I-1 load a2 for I = 1, N {.. := A[I] + A[I-1] } I := I+1

load a1 load a2 Ø 1 1 1 1 1 1 1 1 Ø GEN

2. The simulator algorithm load a1 load a2 Ø for I = 1, N {.. := A[I] + A[I-1] } 103102101100 memory access history 10210110099 history length = 1 to 4 A[I-1] A[I] Simulator detects all PRE-exploitable reuse (up to given history length), but also some “noise”: e.g. due to hash table accesses

Ideal amount of load reuse 65% of executed loads has reuse exploitable by PRE intra-procedural reuse, history=1 go m88ksim gcc compress li ijpeg vortex tomcatv swim su2cor hydro history length 1 4 % of all dynamic loads

3. How frequent is the reuse? Edge profile: + cheap and available - cannot reconstruct frequencies of reuse paths load x kill x load x 100 30900855 75 10 50 40 65 540 75 35 20 25 55 35

Path profile: + precise - more expensive  Use edge profile, but bound its inherent error: compute lower & upper bound on reuse

Hierarchy of estimators PRE CMP 1 CMP c CMP r CMP f smaller error (but more complex) Hierarchy: a practical approach  A simple estimator not precise enough? Use next better one ! Estimator: data-flow solution + edge profile  weighted data-flow solution

The algorithms 1. The bounds: generators: points generating reuse stealers: points with no reuse upper bound: all reuse consumed lower bound: all reuse stolen load x kill x load x 100 30900855 75 10 50 40 65 540 75 35 20 25 55 35 150

2. Separating uncertainty: using the CMP region defined for PRE [PLDI ‘98] CMP = code-motion preventing all error is contained in the CMP region!

Improving precision “one” region connected regions control flow reachability network flow reachability

Estimators: precision PR E CMP 1 CMP c CMP r CMP f error smaller error INT FP

4. Analysis: how close to ideal ? *p **p calls array & pointer stores + calls all stores + calls ideal alias info reuse killed by: 100% = reuse seen by simulator

Related Work Load-Reuse Analysis  makes value numbering path-sensitive  Steffen, Knoop, Rüthing Value Flow Graph [ESOP ‘90] we show how analyze indirect loads, via symbolic evaluation Simulation-based analysis evaluation  Diwan, McKinley, Moss [PLDI’98] Type-based alias analysis: how powerful it needs to be? Estimators  Ramalingam “Frequency Analysis” [PLDI’96] returns a single estimate, not its bounds

Summary Load-reuse analysis:  reuse across indirect memory references  sparse representation Estimators: three principles  confidence: bound the edge-profile error  separation of uncertainty: inside/outside the CMP region  hierarchy: increasing precision and complexity Evaluation:  about 65% loads are amenable to PRE  our analysis can find about 80% of those

Combine three removal methods code motion control speculation restructuring M S R PLDI ‘98

Example: a+b M S R 10 50

Relative removal power M S R Loads removed, dynamic count, normalized Global CSE path- insensitive INT FP

Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa.

Similar presentations

Presentation on theme: "Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa.

Similar presentations

Presentation on theme: "Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa."— Presentation transcript:

Similar presentations

About project

Feedback