UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona

UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es ICS´99, Rhodes (Greece) - June 20-25, 1999

UPC for (i=0; i<N; i++) A[i] = B[i]+C[i];..... R = S / T ;..... X = S / U ;..... Motivation Quasi - invariantQuasi-common subexpression

UPC Outline Instruction Reuse Related Work Redundant Computation Buffer Performance Results Conclusions

UPC Instruction Reuse Fetch Decode & Rename Commit OOO Execution Reuse Mechanism index

UPC Related Work Instruction Reuse Value Cache for the Tree Machine (Harbison 82) Result Cache (Richardson 92, Oberman et al. 95) Reuse Buffer (Sodani and Sohi 97) Physical Register Reuse (Jourdan et al. 98) Trace Reuse Basic blocks (Huang and Lilja 99) General traces (González et al. 99)

UPC Related Work Result Cache Richardson 92, Oberman & Flynn 95 –Special purpose (long latency operations) –Indexed by operand values –No reuse chaining –Can reuse dynamic instances of other static instructions Reuse Buffer Sodani & Sohi 97 –General purpose –Indexed by PC –Reuse chaining –Only reuse dynamic instances of same static instructions

UPC Redundant Computation Buffer Vtabl e Atable pointer opcoderesult/addressopnd1opnd2pointer Atable address tag result Mtable Reuse Test Reused Value Reused Memory Value

UPC RCB (Working Example) I1: 8 / 2 = 4 Vtable Atable 10: div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; }

UPC 20: div824 nil RCB (Working Example) Vtable 10: Atable div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; } I2: 8 / 2 = 4

UPC Vtable 10: Atable div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; } I2: 8 / 2 = 4 20: div824 RCB (Working Example)

UPC 20: div8nil24div8nil24div9nil33 Vtable 10: Atable 4 while (cond) { r = s / t ;...... x = s / u ; } I1: 9 / 3 = 3 3 I2: 9 / 3 = 3 RCB (Working Example)

UPC Enhanced Result Cache Mtable address tag result Atable opcoderesult/addressopnd1opnd2 Operands Enhanced Reuse Buffer Mtable Atable opcoderesult/addressopnd1opnd2 address tag result PC Enhancements to Other Schemes

UPC Timing Considerations fetchissue commit execute write back decode& rename opnd read &dispatch Pipeline Stages Atable lookup reuse test Latency of the Reuse Buffer 1 st Atable lookup reuse test 2 nd Atable lookup Latency of the RCB Atable lookup reuse test Latency of the Result Cache

UPC Experimental Framework Simulator Alpha version of the SimpleScalar Toolset Benchmarks Spec95 Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 Statistics Collected for 125 million instructions Skipping initializations

UPC Basic Reuse Statistics We evaluate different schemes - Enhanced Result Cache (ERC) - Enhanced Reuse Buffer (ERB) - Redundant Computation Buffer (RCB) We find best configuration for each scheme - Number of entries - History depth Best configurations will be evaluated - Percentage of reuse - Speedup

UPC Quasi-Common Subexpressions 32 KB

UPC Study of Reuse (Comparative) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

UPC Performance Evaluation Two different capacities are evaluated - 32 KB - 200 KB Best configuration has been chosen for each reuse scheme We present a performance evaluation for a supercalar processor - Speedup - Percentage of reuse

UPC Base Microarchitecture

UPC Speedup (32 KB) 1.20 1.10 1.00 1.05 1.15

UPC Speedup (200 KB) 1.25 1.20 1.15 1.10 1.05 1.00

UPC Reuse (32 KB) Ops ready

UPC Reuse (200 KB) Ops ready

UPC Reuse by Instruction Category  Load Value  Memory Address  Arithmetic  Cond Branch

UPC Hybrid Scheme opcores/addrop1op2pointer Atable PC Atable opcores/addrop1op2pointer PC Opnds opcores/addrop1op2 nil Atable opcodresult/addropnd1opnd2 Opnds

UPC Speedup (Hybrid Scheme) 1.20 1.10 1.05 1.00 1.15

UPC Reuse (Hybrid Scheme)

UPC Speedup (Perfect Reuse Engine) 1.60 1.40 1.80 2.00 2.20 1.20 1.00

UPC Conclusions Redundant Computation Buffer Quasi-invariants Quasi-common subexpressions High reuse coverage and low latency 30% reuse 10% speedup Outperforms previous schemes

UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona

Similar presentations

Presentation on theme: "UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona

Similar presentations

Presentation on theme: "UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona"— Presentation transcript:

Similar presentations

About project

Feedback