CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.

CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt Proceedings of HPCA-9 February 2003

In-Flight Windows #1 p33 #2 p34 c #3 p35 c #95 p127 c #96 p128 c #4 p36.... p 33-128 p 1-32 Load instruction – cache miss 300 cycles Physical Register File Reorder Buffer

In-Flight Windows #1 p33 #2 p34 c #3 p35 c #95 p127 c #96 p128 c #4 p36.... p 33-128 p 1-32 Load instruction – cache miss 300 cycles #97 Load instruction – cache miss 300 cycles Physical Register File Reorder Buffer

Memory Bottlenecks 128-entry window, real L2  0.77 IPC 128-entry window, perfect L2  1.69 2048-entry window, real L2  1.15 2048-entry window, perfect L2  2.02 128-entry window, real L2, runahead  0.94

Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache When the oldest instruction is a cache miss, behave like it causes a context-switch: checkpoint the committed registers, rename table, return address stack, and branch history register assume a bogus value and start a new thread this thread cannot modify program state, but can prefetch

Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache When the cache miss returns, copy the registers and the mapping and start executing from that ld/st instruction cost of copying back and forth is not trivial many instructions get executed twice

Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache Note that some values are missing: Do not bother to execute instrs that have invalid inputs Accelerates the thread and generates accurate prefetches Unknown store addresses are ignored

Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache Runahead instrs write to registers (as before), but runahead stores write to the runahead cache: Runahead cache and L1D are accessed in parallel If a block gets evicted out of runahead cache, data is lost

Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache The branch predictor gets accessed/updated twice Cannot resolve branch mispredicts if the branch has an invalid input

Another Form of Runahead Primary Thread Runahead Thread Occasional State Copy and Re-start

Methodology 80 benchmarks – 147 code sequences (that are memory-bound) – each 30M instructions – SPEC, Web, Media, Server, workstation, productivity Pentium 4 hardware prefetcher – eight stream buffers that stay 256 bytes ahead Also evaluate a “future baseline” with twice as many resources Perfect memory disam, 500-cycle memory access

Methodology

Results Runahead improves performance by 22% Synergistic interaction between prefetch & runahead – is the stream buffer not keeping up?

Other Results Runahead with a 128-entry window does as well as a 384-entry window A better front-end improves benefits from runahead On average, 431 useful instructions per runahead and 280 after a mispredict Without the runahead cache, only half the improvement is observed

Unanswered Questions How many re-execs? How many invalid instrs? How much wasted power? – re-execs, double writes to checkpoints How many accesses to hash tables, pointers, and branch-dependent data?

Alternative Approaches Does runahead lead to excessive power and verification complexity? Better stride prefetchers or stream buffers? Is this the best way to support a large in-flight window (register file, issueq, ROB)?

Next Week’s Paper “Delaying Physical Register Allocation Through Virtual-Physical Registers”, T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals, Proceedings of MICRO-32, November 1999

Title Bullet

CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.

Similar presentations

Presentation on theme: "CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.

Similar presentations

Presentation on theme: "CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N."— Presentation transcript:

Similar presentations

About project

Feedback