Download presentation
Presentation is loading. Please wait.
1
CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt Proceedings of HPCA-9 February 2003
2
In-Flight Windows #1 p33 #2 p34 c #3 p35 c #95 p127 c #96 p128 c #4 p36.... p 33-128 p 1-32 Load instruction – cache miss 300 cycles Physical Register File Reorder Buffer
3
In-Flight Windows #1 p33 #2 p34 c #3 p35 c #95 p127 c #96 p128 c #4 p36.... p 33-128 p 1-32 Load instruction – cache miss 300 cycles #97 Load instruction – cache miss 300 cycles Physical Register File Reorder Buffer
4
Memory Bottlenecks 128-entry window, real L2 0.77 IPC 128-entry window, perfect L2 1.69 2048-entry window, real L2 1.15 2048-entry window, perfect L2 2.02 128-entry window, real L2, runahead 0.94
5
Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache When the oldest instruction is a cache miss, behave like it causes a context-switch: checkpoint the committed registers, rename table, return address stack, and branch history register assume a bogus value and start a new thread this thread cannot modify program state, but can prefetch
6
Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache When the cache miss returns, copy the registers and the mapping and start executing from that ld/st instruction cost of copying back and forth is not trivial many instructions get executed twice
7
Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache Note that some values are missing: Do not bother to execute instrs that have invalid inputs Accelerates the thread and generates accurate prefetches Unknown store addresses are ignored
8
Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache Runahead instrs write to registers (as before), but runahead stores write to the runahead cache: Runahead cache and L1D are accessed in parallel If a block gets evicted out of runahead cache, data is lost
9
Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache The branch predictor gets accessed/updated twice Cannot resolve branch mispredicts if the branch has an invalid input
10
Another Form of Runahead Primary Thread Runahead Thread Occasional State Copy and Re-start
11
Methodology 80 benchmarks – 147 code sequences (that are memory-bound) – each 30M instructions – SPEC, Web, Media, Server, workstation, productivity Pentium 4 hardware prefetcher – eight stream buffers that stay 256 bytes ahead Also evaluate a “future baseline” with twice as many resources Perfect memory disam, 500-cycle memory access
12
Methodology
13
Results Runahead improves performance by 22% Synergistic interaction between prefetch & runahead – is the stream buffer not keeping up?
14
Other Results Runahead with a 128-entry window does as well as a 384-entry window A better front-end improves benefits from runahead On average, 431 useful instructions per runahead and 280 after a mispredict Without the runahead cache, only half the improvement is observed
15
Unanswered Questions How many re-execs? How many invalid instrs? How much wasted power? – re-execs, double writes to checkpoints How many accesses to hash tables, pointers, and branch-dependent data?
16
Alternative Approaches Does runahead lead to excessive power and verification complexity? Better stride prefetchers or stream buffers? Is this the best way to support a large in-flight window (register file, issueq, ROB)?
17
Next Week’s Paper “Delaying Physical Register Allocation Through Virtual-Physical Registers”, T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals, Proceedings of MICRO-32, November 1999
18
Title Bullet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.