Download presentation
Presentation is loading. Please wait.
2
Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa
3
Partial Redundancy Elimination (PRE) Partially redundant = computed on some incoming paths x:=a+b y:=a+b
4
a+b a:=..
5
Steps: find “reuse” paths, remove redundancy from “reuse” paths.
6
Register promotion = PRE of loads Three steps: load-reuse analysis: find loads that can reuse prior loads/stores alias analysis: which stores may kill reuse? transformation: remove redundancy: PRE [PLDI ‘98] store a1, x store a3 load a2 load a4
7
Load-reuse analysis Design goal: completeness find all reuse To approach completeness, the analysis is uniform: analyze scalar, array, and pointer loads path-sensitive: different source of reuse on each path Evaluation goal: how complete? compare with ideal analysis Detecting all reuse is undecidable: no ideal algorithm exists instead, use simulation
8
Experimental framework load-reuse analysis simulator estimator transformation programinput comparison reuse level weighted solution data-flow solution profile 1. 2. 3. 4. [PLDI ‘98]
9
1. Load-reuse analysis It’s a data-flow analysis on a reuse-aware representation: Value Name Graph (VNG): [POPL’98] What’s new? Sparse version of the VNG up to 30-times smaller than non-sparse Analyzing indirect loads/stores also, model killing stores
10
Naming the value y := b+c a := c-1 x := a+b+1
11
names for the value in ‘x’ x a+b+1 b+c
12
1 x a+b+1 b+c 1 1 GEN
13
Naming the value across loads.. := p->f.. := p->next->f *r :=... **(p+4) *p 1 1 p := p->next *p **(p+4) *p 1 1 f next offset: 0 4 GEN
14
kill if r = p+4 or r = *(p+4) KILL
15
Sparse representation a1 := A+I load a1 a2 := A+I-1 load a2 for I = 1, N {.. := A[I] + A[I-1] } I := I+1
16
load a1 load a2 Ø 1 1 1 1 1 1 1 1 Ø GEN
17
2. The simulator algorithm load a1 load a2 Ø for I = 1, N {.. := A[I] + A[I-1] } 103102101100 memory access history 10210110099 history length = 1 to 4 A[I-1] A[I] Simulator detects all PRE-exploitable reuse (up to given history length), but also some “noise”: e.g. due to hash table accesses
18
Ideal amount of load reuse 65% of executed loads has reuse exploitable by PRE intra-procedural reuse, history=1 go m88ksim gcc compress li ijpeg vortex tomcatv swim su2cor hydro history length 1 4 % of all dynamic loads
19
3. How frequent is the reuse? Edge profile: + cheap and available - cannot reconstruct frequencies of reuse paths load x kill x load x 100 30900855 75 10 50 40 65 540 75 35 20 25 55 35
20
Path profile: + precise - more expensive Use edge profile, but bound its inherent error: compute lower & upper bound on reuse
21
Hierarchy of estimators PRE CMP 1 CMP c CMP r CMP f smaller error (but more complex) Hierarchy: a practical approach A simple estimator not precise enough? Use next better one ! Estimator: data-flow solution + edge profile weighted data-flow solution
22
The algorithms 1. The bounds: generators: points generating reuse stealers: points with no reuse upper bound: all reuse consumed lower bound: all reuse stolen load x kill x load x 100 30900855 75 10 50 40 65 540 75 35 20 25 55 35 150
23
2. Separating uncertainty: using the CMP region defined for PRE [PLDI ‘98] CMP = code-motion preventing all error is contained in the CMP region!
24
Improving precision “one” region connected regions control flow reachability network flow reachability
25
Estimators: precision PR E CMP 1 CMP c CMP r CMP f error smaller error INT FP
26
4. Analysis: how close to ideal ? *p **p calls array & pointer stores + calls all stores + calls ideal alias info reuse killed by: 100% = reuse seen by simulator
27
Related Work Load-Reuse Analysis makes value numbering path-sensitive Steffen, Knoop, Rüthing Value Flow Graph [ESOP ‘90] we show how analyze indirect loads, via symbolic evaluation Simulation-based analysis evaluation Diwan, McKinley, Moss [PLDI’98] Type-based alias analysis: how powerful it needs to be? Estimators Ramalingam “Frequency Analysis” [PLDI’96] returns a single estimate, not its bounds
28
Summary Load-reuse analysis: reuse across indirect memory references sparse representation Estimators: three principles confidence: bound the edge-profile error separation of uncertainty: inside/outside the CMP region hierarchy: increasing precision and complexity Evaluation: about 65% loads are amenable to PRE our analysis can find about 80% of those
29
Combine three removal methods code motion control speculation restructuring M S R PLDI ‘98
30
Example: a+b M S R 10 50
31
Relative removal power M S R Loads removed, dynamic count, normalized Global CSE path- insensitive INT FP
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.