Download presentation
Presentation is loading. Please wait.
Published byJunior Perkins Modified over 9 years ago
1
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu
2
2/25 HIPEAC 2008 TurboROB Accelerate Recovery – Improve Performance Recovering From Control Flow Mispredictions Execution Timeline Misprediction Discovered Recover Processor State Redirect Fetch Resume Execution Predict a Branch Outcome Predicted Path Correct Path
3
3/25 HIPEAC 2008 TurboROB State-of-the-Art Recovery Misprediction Discovered Predict a Branch Outcome whatold value Log of Changes ROB State Snapshot Scalability and/or Performance Issues
4
4/25 HIPEAC 2008 TurboROB Make common case fast: –Recover only at branches Store only as much as needed: –Partial Log Turbo-ROB Misprediction Discovered Predict a Branch Outcome Log of Changes ROB Partial Log of Changes
5
5/25 HIPEAC 2008 TurboROB Outline Control Flow Mispeculation Recovery TurboROB Methodology and Results Summary
6
6/25 HIPEAC 2008 TurboROB State Recovery Example: Register Alias Table RAT Architectural Register Physical Register # arch. regs Lg(# arch. regs) A add r1, r2, 100 B breq r1, E Csub r1, r2, r2 Original Code A add p4, p2, 100 B breq p4, E Csub r5, p2, p2 Renamed Code p1 p2 p3 p4p5 p4
7
7/25 HIPEAC 2008 TurboROB ROB: Slow, Fine-Grain Recovery Too slow: recovery latency proportional to number of instructions to squash Reorder Buffer BBBBB 1.Misprediction discovered 2. Locate newest instruction 3. Undo RAT updates in reverse order Program Order RAT INVALID Each entry contains 1.Architectural destination register 2.Its previous RAT map
8
8/25 HIPEAC 2008 TurboROB Global Checkpoints: Fast, Coarse-Grain Recovery Branch w/ GC: Recovery is “Instantaneous” Reorder Buffer BBBBB 1.Misprediction discovered Program Order RAT INVALID checkpoint
9
9/25 HIPEAC 2008 TurboROB Impact of More Checkpoints More checkpoints ? –Power hungry structure –Increased delay Only a few checkpoints can practically be implemented –Cannot always cover all branches architectural register physical register Actual Implementation Working Copy checkpoints RAT Concept
10
10/25 HIPEAC 2008 TurboROB Intelligent Checkpointing & BranchTap Use Few Checkpoints Effectively BranchTap: –Throttle Speculation BBBBB checkpoint
11
11/25 HIPEAC 2008 TurboROB Conventional Mechanisms: Recovery Scenarios BBB BBB checkpoint BBB Re-Execution
12
12/25 HIPEAC 2008 TurboROB Outline Background Turbo-ROB Methodology and Results Summary
13
13/25 HIPEAC 2008 TurboROB Turbo-ROB We only need to reverse the first subsequent change for every RAT entry ROB Recovery B R1 usefulredundant ~ Recovery Cost R2 R1
14
14/25 HIPEAC 2008 TurboROB Turbo-ROB Replacing the ROB BBB TROB BBB Re-Execution
15
15/25 HIPEAC 2008 TurboROB Selective Turbo-ROB w/ ROB BBB TROB Selective Turbo-ROB w/ GCs BBB TROB checkpoint
16
16/25 HIPEAC 2008 TurboROB Outline Background TurboROB Methodology and Results Summary
17
17/25 HIPEAC 2008 TurboROB Results Overview TROB as an ROB replacement –BranchTap offers better performance than ROB –Fewer resources –Even for smaller windows Selective TROB as a GC reduction mechanism –TROB reduces pressure for GCs –Offload a critical structure: RAT In the paper: –Selective TROB as an ROB accelerator –Even the smallest TROB accelerates recovery
18
18/25 HIPEAC 2008 TurboROB Methodology Simulator based on Simplescalar –Alpha/OSF 24 SPEC CPU 2000 benchmarks Reference Inputs Processor configurations –4-way OoO core –128/256/512 in-flight instructions –1K-entry confidence table for low confidence branch identification / similar results with Anyweak 1B committed instructions after skipping 2B
19
19/25 HIPEAC 2008 TurboROB “Perfect Checkpointing” Configuration A checkpoint is auto-magically taken at all mispredicted branches –All recoveries are fast We report the “deterioration relative to perfect checkpointing”
20
20/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/512-Entry Window 64-entry TROB == ROB on the Average Pathological cases exist 256-entry needed 512-Entry TROB better than ROB better
21
21/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/128-Entry Window 64-Entry 50% better than ROB Fewer pathological cases 128-Entry TROB better than ROB better
22
22/25 HIPEAC 2008 TurboROB sTROB and Global Checkpoints/128-Entry Window TROB + 1 GC better than 4GCs better
23
23/25 HIPEAC 2008 TurboROB Summary TROB vs. ROB –Replacement Same resources better performance Fewer resources often better performance –Except when accuracy is high –Acceleration: ¼ resources 35% improvement TROB vs. GCs –Reduce pressure from the critical path –With just 1 GC match the performance of four GCs One more alternative for designers –Allows different area/performance/power tradeoffs
24
24/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu
25
25/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/512-Entry Window 64-entry TROB == ROB on the Average Pathological cases exist 256-entry needed 512-Entry TROB better than ROB better
26
26/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/128-Entry Window 64-Entry 50% better than ROB Fewer pathological cases 128-Entry TROB better than ROB better
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.