Download presentation
Presentation is loading. Please wait.
Published byOsborn Barnett Modified over 8 years ago
1
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto 1 Now with AMD/ATI
2
2/25 HIPEAC 2008 TurboROB We wish to make the recovery fast What Happens on a Branch Misprediction? Execution Timeline Misprediction Discovered Recover Processor State Redirect Fetch Resume Execution Predict a Branch Outcome Predicted PathCorrect Path
3
3/25 HIPEAC 2008 TurboROB ROB: –Buffer all changes –Slow Instantaneous checkpoints: –Snapshot before speculating –Fast –Problem: can’t have enough checkpoints Checkpoint prediction –Allocate the few checkpoints judiciously Speculation control –Sometimes deeper speculation = higher recovery cost Can hurt performance –Throttle speculation Recover Mechanisms Overview
4
4/25 HIPEAC 2008 TurboROB Complements or Replaces Existing Mechanisms ROB: recover at any point TurboROB: recover only at frequent points Improves performance for most programs –Misprediction performance penalty reduced by 28% on AVG BranchTap comes “for free” –Very simple to implement –Better than more accurate checkpoint predictors TurboROB Overview
5
5/25 HIPEAC 2008 TurboROB Outline Background BranchTap Methodology and Results Summary
6
6/25 HIPEAC 2008 TurboROB State Recovery Example: Register Alias Table RAT Architectural Register Physical Register # arch. regs Lg(# arch. regs) A add r1, r2, 100 B breq r1, E Csub r1, r2, r2 Original Code A add p4, p2, 100 B breq p4, E Csub r5, p2, p2 Renamed Code p1 p2 p3 p4p5 p4
7
7/25 HIPEAC 2008 TurboROB ROB: Slow, Fine-Grain Recovery Too slow: recovery latency proportional to number of instructions to squash Reorder Buffer BBBBB 1.Misprediction discovered 2. Locate newest instruction 3. Undo RAT updates in reverse order Program Order RAT INVALID Each entry contains 1.Architectural destination register 2.Its previous RAT map
8
8/25 HIPEAC 2008 TurboROB Global Checkpoints: Fast, Coarse-Grain Recovery Branch w/ GC: Recovery is “Instantaneous” Reorder Buffer BBBBB 1.Misprediction discovered Program Order RAT INVALID checkpoint
9
9/25 HIPEAC 2008 TurboROB Impact of More Checkpoints More checkpoints ? –Power hungry structure –Increased delay Only a few checkpoints can practically be implemented –Cannot always cover all branches architectural register physical register Actual Implementation Working Copy checkpoints RAT Concept
10
10/25 HIPEAC 2008 TurboROB Intelligent Checkpointing State of the art solution –Checkpoint allocation: Allocate checkpoints at hard-to- predict branches –Checkpoint management: Release checkpoints as soon as they are no longer needed Use few checkpoints efficiently
11
11/25 HIPEAC 2008 TurboROB Mispeculation on a branch w/ a GC: Direct recovery Mispeculation on a branch w/o a GC: Indirect recovery With intelligent checkpointing: 30% Indirect recoveries 75% of performance loss Conventional Mechanisms: Recovery Scenarios BBB ROB BBB checkpoint Fast Recovery Slow Recovery checkpoint
12
12/25 HIPEAC 2008 TurboROB Outline Background BranchTap Methodology and Results Summary
13
13/25 HIPEAC 2008 TurboROB BranchTap Motivation ROB No Wait Scenario Misprediction discovered ~ Recovery Cost checkpoint Low confidence branch checkpoint ROB Sometimes, it is better to wait if no checkpoint is available Wait Scenario BBB BBB
14
14/25 HIPEAC 2008 TurboROB BranchTap Concept Key idea: stall when speculation is likely to deteriorate performance –Count the number of low confidence branches w/o a checkpoint –If it exceeds a threshold, stall Threshold selection –Fixed Varies greatly across programs Can deteriorate performance significantly –Adaptive Robust performance Minimize recovery cost while conserving good speculation opportunities
15
15/25 HIPEAC 2008 TurboROB Threshold Adaptation Policy BranchTap adapts across and within applications
16
16/25 HIPEAC 2008 TurboROB Outline Background BranchTap Methodology and Results Summary
17
17/25 HIPEAC 2008 TurboROB Results Overview Performance w/o Checkpoints –BranchTap improves even with just an ROB Performance w/ 4 Checkpoints –BranchTap improves over conventional recovery methods Performance w/ Larger Checkpoint Predictors –BranchTap offers better performance than a 64x larger predictor
18
18/25 HIPEAC 2008 TurboROB Methodology Simulator based on Simplescalar 24 SPEC CPU 2000 benchmarks Reference Inputs Processor configurations –8-way OoO core –Up to 1K in-flight instructions –1K-entry confidence table for low confidence branch identification 1B committed instructions after skipping 100B
19
19/25 HIPEAC 2008 TurboROB “Perfect Checkpointing” Configuration A checkpoint is auto-magically taken at all mispredicted branches –All recoveries are fast We report the “deterioration relative to perfect checkpointing”
20
20/25 HIPEAC 2008 TurboROB Performance with No Checkpoints Deterioration relative to “perfect checkpointing” -39% deterioration BranchTap improves over conventional mechanisms Adaptation leads to robust performance improvements better
21
21/25 HIPEAC 2008 TurboROB Deterioration relative to “perfect checkpointing” BranchTap with 4 checkpoints is better than 6 checkpoints alone Performance Evaluation with 4 Checkpoints -28% deterioration better
22
22/25 HIPEAC 2008 TurboROB BranchTap with a 1K-entry confidence table and 4 GCs: –Higher performance than a 64K-entry confidence table with 4 GCs –Lower complexity, virtually comes “for free” BranchTap vs. Larger Checkpoint Predictors BranchTap deterioration confidence table size better
23
23/25 HIPEAC 2008 TurboROB Outline Background BranchTap Methodology and Results Summary
24
24/25 HIPEAC 2008 TurboROB Summary Performance with 4 (no) checkpoints –~28 (39) % of misprediction penalty removed –BranchTap is robust: Up to 6 (13) % better and max 1.2 (0.1) % worse than conventional mechanisms BranchTap is very simple to implement –Few counters and comparators BranchTap is better than other alternatives –BT + 1K predictor better than a 64K predictor alone –BT + 4 GCs better than 6 GCs alone
25
25/25 HIPEAC 2008 TurboROB BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.