Download presentation
Presentation is loading. Please wait.
1
UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es ICS´99, Rhodes (Greece) - June 20-25, 1999
2
UPC for (i=0; i<N; i++) A[i] = B[i]+C[i];..... R = S / T ;..... X = S / U ;..... Motivation Quasi - invariantQuasi-common subexpression
3
UPC Outline Instruction Reuse Related Work Redundant Computation Buffer Performance Results Conclusions
4
UPC Instruction Reuse Fetch Decode & Rename Commit OOO Execution Reuse Mechanism index
5
UPC Related Work Instruction Reuse Value Cache for the Tree Machine (Harbison 82) Result Cache (Richardson 92, Oberman et al. 95) Reuse Buffer (Sodani and Sohi 97) Physical Register Reuse (Jourdan et al. 98) Trace Reuse Basic blocks (Huang and Lilja 99) General traces (González et al. 99)
6
UPC Related Work Result Cache Richardson 92, Oberman & Flynn 95 –Special purpose (long latency operations) –Indexed by operand values –No reuse chaining –Can reuse dynamic instances of other static instructions Reuse Buffer Sodani & Sohi 97 –General purpose –Indexed by PC –Reuse chaining –Only reuse dynamic instances of same static instructions
7
UPC Redundant Computation Buffer Vtabl e Atable pointer opcoderesult/addressopnd1opnd2pointer Atable address tag result Mtable Reuse Test Reused Value Reused Memory Value
8
UPC RCB (Working Example) I1: 8 / 2 = 4 Vtable Atable 10: div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; }
9
UPC 20: div824 nil RCB (Working Example) Vtable 10: Atable div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; } I2: 8 / 2 = 4
10
UPC Vtable 10: Atable div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; } I2: 8 / 2 = 4 20: div824 RCB (Working Example)
11
UPC 20: div8nil24div8nil24div9nil33 Vtable 10: Atable 4 while (cond) { r = s / t ;...... x = s / u ; } I1: 9 / 3 = 3 3 I2: 9 / 3 = 3 RCB (Working Example)
12
UPC Enhanced Result Cache Mtable address tag result Atable opcoderesult/addressopnd1opnd2 Operands Enhanced Reuse Buffer Mtable Atable opcoderesult/addressopnd1opnd2 address tag result PC Enhancements to Other Schemes
13
UPC Timing Considerations fetchissue commit execute write back decode& rename opnd read &dispatch Pipeline Stages Atable lookup reuse test Latency of the Reuse Buffer 1 st Atable lookup reuse test 2 nd Atable lookup Latency of the RCB Atable lookup reuse test Latency of the Result Cache
14
UPC Experimental Framework Simulator Alpha version of the SimpleScalar Toolset Benchmarks Spec95 Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 Statistics Collected for 125 million instructions Skipping initializations
15
UPC Basic Reuse Statistics We evaluate different schemes - Enhanced Result Cache (ERC) - Enhanced Reuse Buffer (ERB) - Redundant Computation Buffer (RCB) We find best configuration for each scheme - Number of entries - History depth Best configurations will be evaluated - Percentage of reuse - Speedup
16
UPC Quasi-Common Subexpressions 32 KB
17
UPC Study of Reuse (ERB) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes
18
UPC Study of Reuse (RCB) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes
19
UPC Study of Reuse (Comparative) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes
20
UPC Performance Evaluation Two different capacities are evaluated - 32 KB - 200 KB Best configuration has been chosen for each reuse scheme We present a performance evaluation for a supercalar processor - Speedup - Percentage of reuse
21
UPC Base Microarchitecture
22
UPC Speedup (32 KB) 1.20 1.10 1.00 1.05 1.15
23
UPC Speedup (200 KB) 1.25 1.20 1.15 1.10 1.05 1.00
24
UPC Reuse (32 KB) Ops ready
25
UPC Reuse (200 KB) Ops ready
26
UPC Reuse by Instruction Category Load Value Memory Address Arithmetic Cond Branch
27
UPC Hybrid Scheme opcores/addrop1op2pointer Atable PC Atable opcores/addrop1op2pointer PC Opnds opcores/addrop1op2 nil Atable opcodresult/addropnd1opnd2 Opnds
28
UPC Speedup (Hybrid Scheme) 1.20 1.10 1.05 1.00 1.15
29
UPC Reuse (Hybrid Scheme)
30
UPC Speedup (Perfect Reuse Engine) 1.60 1.40 1.80 2.00 2.20 1.20 1.00
31
UPC Conclusions Redundant Computation Buffer Quasi-invariants Quasi-common subexpressions High reuse coverage and low latency 30% reuse 10% speedup Outperforms previous schemes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.