Download presentation
Presentation is loading. Please wait.
Published byKeyshawn Sizer Modified over 10 years ago
1
NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer Engineering *North Carolina State University, Raleigh, NC *Digital Enterprise Group *Intel Corporation, Hillsboro, OR
2
NC STATE UNIVERSITY Effect of branch mispredictions Branch misprediction rate of 5%-10% still a problem Each misprediction squashs 100s of inst. Reduces performance: limits window size Increases power: useless speculative work © 2007 Ahmed S. Al-Zawawi ISCA 34 2
3
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 3 Control independence basics
4
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 4 Control independence basics
5
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 5 Control independence basics
6
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 6 Control independence basics
7
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 7 Four steps for exploiting CI
8
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 8 Four steps for exploiting CI 1.Identify reconv. point
9
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 9 Four steps for exploiting CI 1.Identify reconv. point 2.Remove/Insert CD inst.
10
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 10 Four steps for exploiting CI 1.Identify reconv. point 2.Remove/Insert CD inst. 3.Identify CIDD inst.
11
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 11 Four steps for exploiting CI 1.Identify reconv. point 2.Remove/Insert CD inst. 3.Identify CIDD inst. 4.Repair CIDD inst. a)Fix data dependencies b)Re-execute CIDD inst.
12
NC STATE UNIVERSITY CIDI-supplied source value © 2007 Ahmed S. Al-Zawawi ISCA 34 12 Insert correct CD instructions in middle of the window: Repair program order Re-execute CIDD instructions: Re-reference values from CIDI instructions Squash wrong CD instructionsIdentify wrong CD inst. and CIDD inst. CIDD instructions Wrong CD instructions Conventional CI misprediction recovery R CI inst. CD inst.
13
NC STATE UNIVERSITY 2.Dependence order between CIDD & CIDI inst.: Re-executing CIDD instructions requires preserving referenced CIDI instructions 1.Program order between CD & CI inst: Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst. © 2007 Ahmed S. Al-Zawawi ISCA 34 13 Conventional CI limitations Fully decouple CIDI instructions from CD & CIDD instructions Goal of selective misprediction recovery:
14
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 14 No need to identify wrong CD and CIDD instructionsInsert correct CD instructions like any new instructions Insert duplicate CIDD instructions like any new instructions Repair program state using self-sufficient recovery program while relaxing program order TCI misprediction recovery R CI inst. CD inst. Correct CD inst. Duplicate CIDD inst. Recovery program
15
NC STATE UNIVERSITY CIDI-supplied source value © 2007 Ahmed S. Al-Zawawi ISCA 34 15 Leverage checkpointed source values to mimic the effect of program order Exploit coarse-grain checkpoint-based retirement to relax ordering constraints TCI misprediction recovery R Recovery program Checkpoint 2 branch checkpoint Duplicate CIDD inst. Correct CD inst. In-order retirement is not possible when instructions are out of program order Leverage branch checkpoint for correct CD instructions CIDD instructions Checkpoint-based retirement enables aggressive register reclamation (e.g., CPR): Completed instructions free their resources Checkpoint 1 Checkpoint CIDI-supplied source values
16
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 16 Transparent Control Independence TCI repairs program state, not program order TCI pipeline is recovery-free Transparent recovery by fetching additional instructions with checkpointed source values TCI pipeline is free-flowing Leverage conventional speculation to execute correct and incorrect instructions quickly and efficiently Completed instructions free their resources
17
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 17 TCI microarchitecture Add repair rename map Add selective re-execution buffer (RXB)
18
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 18 Predict the branch Instructions execute and leave the pipeline when done
19
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 19 Construct recovery program Copy duplicate of CIDD inst. with their source values into RXB
20
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 20 Insert correct CD instructions Load branch checkpoint into repair rename map, then fetch correct CD inst.
21
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 21 Repair & re-execute CIDD instructions Inject duplicate CIDD inst. with their checkpointed source values
22
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 22 Merge repair & spec. rename maps Copy corrected register mappings from repair map to spec. map
23
NC STATE UNIVERSITY 1.Identifying CIDD instructions: Control-flow stack (CFS) detects nested reconv. points Influenced register set (IRS) and branch-sets 2.RXB reconstruction: CIDD inst. of multiple branches are co-mingled A misprediction may require repairing RXB 3.Renaming partial programs: Re-rename recovery program despite its CIDI gaps 4.Merging repair/speculative rename maps © 2007 Ahmed S. Al-Zawawi ISCA 34 23 TCI implementation details
24
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 24 Example: construct the RXB B1 & B2 are branches R1 & R2 are reconvergent points Rectangular inst. are CIDD on B1 Oval inst. are CIDD on B2
25
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 25 Dispatch 11 Dont insert 11 into the RXB: CIDI w.r.t. B1 & B2 Fetch correct CD: 11 and 12 Meanwhile pre-read 16 to Temp Buffer Rollback RXB tail, like complete squash Initiate RXB pre-read pointer Start fetching correct CD Dispatch 12 Insert 12 into the RXB: CIDD w.r.t. B1 Example: reconstructing the RXB Objective of this example: Inject recovery program for B2 Reconstruct RXB for B1
26
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 26 Dispatch 13 Dont insert 13 into the RXB: CIDI w.r.t. B1 & B2 Reconvergence point detected Correct CD complete Dispatch 14 Insert 14 into the RXB: CIDD w.r.t. B1 Fetch correct CD: 13 and 14 Meanwhile pre-read 18 to Temp Buffer Example: reconstructing the RXB
27
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 27 Dispatch 18: CIDD w.r.t. B2 Dont insert 18 into the RXB: Not CIDD w.r.t. B1 Dispatch 20: CIDD w.r.t. B2 Insert 20 into the RXB: CIDD w.r.t. B1 B2 recovery program injection complete B1 recovery program is maintained and compressed Dont dispatch 16: Not CIDD w.r.t. B2 Insert 16 into the RXB: CIDD w.r.t. B1 Begin renaming CIDD instructions from Temp Buffer Meanwhile pre-read 20 into Temp Buffer Example: reconstructing the RXB
28
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 28 Simulation methodology Baseline: Checkpoint-based superscalar processor Issue width: 4 Perceptron branch predictor Register file: 256 registers Branch checkpoints: 16 Load store queue: 512 entries L1 I & L1 D: 64KB 4-way (Hit: 1 cycle) L2: 2MB 8-way (Hit:10 cycles, Miss: 200 cycles) Benchmarks: 11 SPEC2000 INT + 4 SPEC95 INT SimPoint: 10M inst. warm-up + 100M inst. simulated
29
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 29 CIDD inst. re-renaming models Seq CIDD (TCI): Only CIDD inst. are re-renamed and re-executed Seq CI: [Akkary et al.] [Chou et al.] [Rotenberg et al.] All CI inst. are re-renamed, but only CIDD inst. re-execute Proxy: [Cher et al.] [Gandhi et al.] Uses proxy move instructions to insulate CIDD inst. from source name changes Only proxies are re-renamed Both proxies and CIDD inst. re-execute by holding issue queue entries All models have relaxed order through checkpoint-based substrate
30
NC STATE UNIVERSITY TCI maximum %IPC improvement is 61%(64%)Proxy average %IPC improvement is 6%(11%) © 2007 Ahmed S. Al-Zawawi ISCA 34 30 Results for 32 & 64 entries issue queue Proxy can degrade performanceSeq CI can degrade performanceTCI average %IPC improvement is 16%(16%)
31
NC STATE UNIVERSITY Proxy is bandwidth efficient, but resource inefficient © 2007 Ahmed S. Al-Zawawi ISCA 34 31 Varying the issue queue size TCI is both bandwidth and resource efficient Seq CI is bandwidth inefficient, but resource efficient
32
NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 32 Varying the RXB size In Seq CI, the RXB limits the window size TCI overcomes problem by only buffering CIDD inst.
33
NC STATE UNIVERSITY Conclusion Recover program state, not program order Transparent branch misprediction recovery using fully decoupled recovery program Resource efficient All instructions execute, drain, and free resources quickly based on conventional speculation Bandwidth efficient TCI only re-sequences CIDD instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 33
34
NC STATE UNIVERSITY Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.