Presentation is loading. Please wait.

Presentation is loading. Please wait.

UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,

Similar presentations


Presentation on theme: "UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,"— Presentation transcript:

1 UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI, Nara City (Japan) - September 7-9, 2005 λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spain antoniox.gonzalez@intel.com ф Dept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain {antonio,cmolina,jordit}@ac.upc.edu ψ Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net

2 Techniques to Boost I Execution Data Value Reuse Data Value Speculation  Avoid serialization caused by data dependences  Determine results of instructions without executing them  Target is to boost the execution of programs  Avoid serialization caused by data dependences  Determine results of instructions without executing them  Target is to boost the execution of programs Computation Repetition

3  NON SPECULATIVE !!!  Buffers previous inputs and their corresponding outputs  Only possible if a computation has been done in the past  Inputs have to be ready at reuse test time  NON SPECULATIVE !!!  Buffers previous inputs and their corresponding outputs  Only possible if a computation has been done in the past  Inputs have to be ready at reuse test time Techniques to Boost I Execution Computation Repetition Data Value Reuse Data Value Speculation

4  SPECULATIVE !!!  Predicts values as a function of the past history  Needs to confirm speculation at a later point  Solves reuse test but introduces misspeculation penalty  SPECULATIVE !!!  Predicts values as a function of the past history  Needs to confirm speculation at a later point  Solves reuse test but introduces misspeculation penalty Techniques to Boost I Execution Computation Repetition Data Value Reuse Data Value Speculation

5 Trace Level Speculation Avoids serialization caused by data dependences Skips in a row multiple instructions Predicts values based on the past Introduces penalties due to misspeculations With Live Output Test Trace Level Speculation With Live Input Test

6 BUFFER Trace Level Speculation with Live Output Test Live Output Update & Trace Speculation NST ST Trace Miss Speculation Detection & Recovery Actions INSTRUCTION EXECUTION NOT EXECUTED LIVE OUTPUT VALIDATION

7 Motivation Two orthogonal issues microarchitecture support for trace speculation control and data speculation techniques –prediction of initial and final points –prediction of live output values This work focuses on microarchitecture support (TSMA) concretely, on reducing penalties due to misspeculations Molina, González, Tubella, “Trace-Level Speculative Multithreaded Architecture (TSMA)”, ICCD’02 Molina, González, Tubella “Compiler Analysis for TSMA”, INTERACT’05

8 Outline TSMA ( Trace-level Speculative Multithreaded Architecture ) Verification Engine Enhanced Verification Engine Experimental Framework Simulation Results Conclusions

9 TSMA Block Diagram Cache I Engine Fetch Rename Decode & Units Functional Predictor Branch Speculation Trace NST Reorder Buffer ST Reorder Buffer NST Ld/St Queue ST Ld/St Queue NST I Window ST I Window Look Ahead Buffer Engine Verification L1NSDC L2NSDC L1SDC Data Cache Register File NST Arch. Register File ST Arch.

10 Verification Engine ST stores it’s commited instructions in the LAB Look-Ahead Buffer I1I2I3I4 Program Counters Operation Type Sources & Destination Register Numbers Sources & Destination Register Values Effective Address NST verifies instructions from the LAB Source values are tested with the non-speculative state If they match, destination value is updated

11 BRANCHES: source value tested; program counter updated Verification Engine Look-Ahead Buffer I1I2I3I4 VERIFICATION ENGINE Non-Speculative Memory Hierarchy Non-Speculative Register File BRANCH R source1, Target Non-Speculative Register File

12 BRANCHES: source value tested; program counter updated Verification Engine Look-Ahead Buffer I1I2I3I4 VERIFICATION ENGINE ARITH IS: source values tested; destination register updated Non-Speculative Memory Hierarchy Non-Speculative Register File Non-Speculative Register File R dest R source1 OP R source2 Non-Speculative Register File

13 BRANCHES: source value tested; program counter updated Verification Engine Look-Ahead Buffer I1I2I3I4 VERIFICATION ENGINE ARITH IS: source values tested; destination register updated Non-Speculative Memory Hierarchy Non-Speculative Register File STORES: effective address verified; destination memory updated M [ R source1, literal ] R source2 Non-Speculative Register File Non-Speculative Memory Hierarchy

14 BRANCHES: source value tested; program counter updated Verification Engine Look-Ahead Buffer I1I2I3I4 VERIFICATION ENGINE ARITH IS: source values tested; destination register updated Non-Speculative Memory Hierarchy Non-Speculative Register File STORES: effective address verified; destination memory updated LOADS: effective address verified; memory value checked; register updated R dest M [ R source1, literal ] Non-Speculative Register File Non-Speculative Memory Hierarchy Non-Speculative Register File

15 Squashed Is from LAB On average, up to 85 instructions are squashed from LAB in each thread synchronization

16 Correctly Executed Is On average, over 20% of the squashed instructions were correctly executed by ST

17 Our Proposal Enhanced Verification Engine does not throw away execution results of instructions that are independent of the mispredicted point reduce the number of Is fetched and executed thread synchronizations can be delayed or even aborted verification of branches, loads, stores and single-cycle instructions is reconsidered.

18 Related Work Instruction reissue [Lipasti 1997, González & González 1997, Sato 1998] Squash reuse [Sodani & Sohi 1997] Control independence in trace processors [Rotenberg et al, 1997] Dynamic control independence [Chou et al 1999] Register integration [Roth & Sohi 2000]

19 BRANCHES: branch target is validated instead of source values. Enhanced Verification Engine Look-Ahead Buffer I1I2I3I4 ENHANCED VERIFICATION ENGINE Non-Speculative Memory Hierarchy Non-Speculative Register File BRANCH R source1, Target

20 BRANCHES: branch target is validated instead of source values. Enhanced Verification Engine Look-Ahead Buffer I1I2I3I4 ENHANCED VERIFICATION ENGINE ARITH IS: if source values do not match, instruction is re-executed. Non-Speculative Memory Hierarchy Non-Speculative Register File R dest R source1 OP R source2 Non-Speculative Register File Non-Speculative Register File F.U

21 BRANCHES: branch target is validated instead of source values. Enhanced Verification Engine Look-Ahead Buffer I1I2I3I4 ENHANCED VERIFICATION ENGINE ARITH IS: if source values do not match, instruction is re-executed. Non-Speculative Memory Hierarchy Non-Speculative Register File STORES: effective address is re-computed if fails and memory is updated with value obtained from the non-speculative architectural state. M [ R source1, literal ] R source2 Non-Speculative Register File Non-Speculative Memory Hierarchy

22 BRANCHES: branch target is validated instead of source values. Enhanced Verification Engine Look-Ahead Buffer I1I2I3I4 ENHANCED VERIFICATION ENGINE ARITH IS: if source values do not match, instruction is re-executed. Non-Speculative Memory Hierarchy Non-Speculative Register File STORES: effective address is re-computed if fails and memory is updated with value obtained from the non-speculative architectural state. Non-Speculative Register File LOADS: effective address is re-computed if fails and destination value obtained from memory is commited to register file. R dest M [ R source1, literal ] Non-Speculative Memory Hierarchy Non-Speculative Register File

23 Incorrect Speculated Is RestStoresLoadsBranchesSimple Is On average, close to 90% of the instructions are branches, loads, stores and single-cycle instructions Only 1% Is inserted in LAB are incorrectly predicted

24 Experimental Framework Simulator Alpha version of the SimpleScalar Toolset Benchmarks Spec2000, ref input Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 Statistics Collected for 250 million instructions Skipping an initial part of 500 million instructions

25 Simulation Parameters Base microarchitecture out of order machine, 4 instructions per cycle I cache: 16KB, D cache: 16KB, L2 shared: 256KB bimodal predictor TSMA additional structures each thread: I window, reorder buffer, register file speculative data cache: 1KB trace table: 128 entries, 4-way set associative look ahead buffer: 128 entries verification engine: up to 8 instructions per cycle only one I reexecuted per cycle

26 Thread Synchronizations Conventional VE Enhanced VE On average, the number of thread synchronizations is about 10% lower (from 30% to 20%)

27 Speedup 1.35 1.30 1.25 1.20 1.15 1.10 1.05 1.00 1.40 1.45 Conventional VE Enhanced VE On average, the average performance improvement is around 9%

28 Executed Is Reduced On average, almost 8% of the instructions are reduced in execution with the enhanced VE

29 Conclusions TSMA significant number of Is are correctly executed, but discarded when synchronizing novel hardware technique to enhance TSMA Enhanced Verification Engine thread synchros are delayed or even aborted branches, loads, stores and single-cycle Is are reconsidered Results show speedup of 38% (9% improvement) misprediction rate of 20% (10% reduction)

30 Future Work Aggressive trace level predictors Generalization to multiple threads

31 UPC Questions & Answers ISHPC-VI, Nara City (Japan) - September 7-9, 2005


Download ppt "UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,"

Similar presentations


Ads by Google