Presentation is loading. Please wait.

Presentation is loading. Please wait.

Patrick Akl and Andreas Moshovos AENAO Research Group

Similar presentations


Presentation on theme: "Patrick Akl and Andreas Moshovos AENAO Research Group"— Presentation transcript:

1 BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto

2 What Happens on a Branch Misprediction?
Execution Timeline Predict a Branch Outcome Predicted Path Correct Path Misprediction Discovered Recover Processor State Redirect Fetch Resume Execution We wish to make the recovery fast

3 State-of-the-art recovery
Existing mechanisms Reorder buffer based: slow Instantaneous checkpoints: faster Problem: can’t have enough checkpoints State-of-the-art solution: checkpoint prediction Allocate the few checkpoints judiciously Another degree of freedom: speculation control Sometimes deeper speculation = higher recovery cost Can hurt performance Throttle speculation

4 BranchTap Results / Benefits
No additional checkpoints are needed Dynamically adapts to application behavior Improves performance for most programs Misprediction performance penalty reduced by 28% on AVG BranchTap comes “for free” Very simple to implement Better than more accurate checkpoint predictors

5 Outline Background BranchTap Methodology and Results Summary

6 State Recovery Example: Register Alias Table
Original Code Lg(# arch. regs) RAT A add r1, r2, 100 B breq r1, E C sub r1, r2, r2 p4 p1 p5 p5 p4 Architectural Register p2 p3 # arch. regs Renamed Code A add p4, p2, 100 B breq p4, E C sub r5, p2, p2 Physical Register

7 ROB: Slow, Fine-Grain Recovery
Each entry contains Architectural destination register Its previous RAT map Program Order 3. Undo RAT updates in reverse order B B B B B Reorder Buffer Misprediction discovered 2. Locate newest instruction INVALID RAT Too slow: recovery latency proportional to number of instructions to squash

8 Global Checkpoints: Fast, Coarse-Grain Recovery
Program Order checkpoint checkpoint checkpoint checkpoint B B B B B Reorder Buffer Misprediction discovered INVALID RAT Branch w/ GC: Recovery is “Instantaneous”

9 Impact of More Checkpoints
Concept Actual Implementation Working Copy checkpoints RAT architectural register physical register More checkpoints ? Power hungry structure Increased delay Only a few checkpoints can practically be implemented Cannot always cover all branches

10 Intelligent Checkpointing
State of the art solution Checkpoint allocation: Allocate checkpoints at hard-to-predict branches Checkpoint management: Release checkpoints as soon as they are no longer needed Use few checkpoints efficiently

11 Conventional Mechanisms: Recovery Scenarios
Mispeculation on a branch w/ a GC: Direct recovery Mispeculation on a branch w/o a GC: Indirect recovery With intelligent checkpointing: 30% Indirect recoveries  75% of performance loss B B B ROB Fast Recovery checkpoint B B B ROB Slow Recovery checkpoint

12 Outline Background BranchTap Methodology and Results Summary

13 BranchTap Motivation Low confidence branch ~ Recovery Cost No Wait Scenario B B B ROB checkpoint checkpoint Misprediction discovered Wait Scenario B B B ROB ~ Recovery Cost checkpoint checkpoint Sometimes, it is better to wait if no checkpoint is available

14 BranchTap Concept Key idea: stall when speculation is likely to deteriorate performance Count the number of low confidence branches w/o a checkpoint If it exceeds a threshold, stall Threshold selection Fixed Varies greatly across programs Can deteriorate performance significantly Adaptive Robust performance Minimize recovery cost while conserving good speculation opportunities

15 Threshold Adaptation Policy
BranchTap adapts across and within applications

16 Outline Background BranchTap Methodology and Results Summary

17 Results Overview Performance w/o Checkpoints
BranchTap improves even with just an ROB Performance w/ 4 Checkpoints BranchTap improves over conventional recovery methods Performance w/ Larger Checkpoint Predictors BranchTap offers better performance than a 64x larger predictor

18 Methodology Simulator based on Simplescalar
24 SPEC CPU 2000 benchmarks Reference Inputs Processor configurations 8-way OoO core Up to 1K in-flight instructions 1K-entry confidence table for low confidence branch identification 1B committed instructions after skipping 100B

19 “Perfect Checkpointing” Configuration
A checkpoint is auto-magically taken at all mispredicted branches All recoveries are fast We report the “deterioration relative to perfect checkpointing” We compare BranchTap against the obvious solution of unrestricted speculation. We normalize our performance results relative to a “Perfect Checkpointing” configuration where we assume all mispredictions are automagically checkpointed.

20 Performance with No Checkpoints
Deterioration relative to “perfect checkpointing” better -39% deterioration BranchTap improves over conventional mechanisms Adaptation leads to robust performance improvements

21 Performance Evaluation with 4 Checkpoints
Deterioration relative to “perfect checkpointing” BranchTap with 4 checkpoints is better than 6 checkpoints alone better -28% deterioration

22 BranchTap vs. Larger Checkpoint Predictors
BranchTap with a 1K-entry confidence table and 4 GCs: Higher performance than a 64K-entry confidence table with 4 GCs Lower complexity, virtually comes “for free” better deterioration BranchTap confidence table size

23 Outline Background BranchTap Methodology and Results Summary

24 Summary Performance with 4 (no) checkpoints
~28 (39) % of misprediction penalty removed BranchTap is robust: Up to 6 (13) % better and max 1.2 (0.1) % worse than conventional mechanisms BranchTap is very simple to implement Few counters and comparators BranchTap is better than other alternatives BT + 1K predictor better than a 64K predictor alone BT + 4 GCs better than 6 GCs alone

25 BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl,


Download ppt "Patrick Akl and Andreas Moshovos AENAO Research Group"

Similar presentations


Ads by Google