EECS 470 Register Renaming Lecture 8 Coverage: Chapter 3.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

CS6290 Speculation Recovery. Loose Ends Up to now: –Techniques for handling register dependencies Register renaming for WAR, WAW Tomasulo’s algorithm.
Lecture 7: Register Renaming. 2 A: R1 = R2 + R3 B: R4 = R1 * R R1 R2 R3 R4 Read-After-Write A A B B
EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3.
Lecture 9: R10K scheme, Tclk
EECS 470 Lecture 6 Branches: Address prediction and recovery (And interrupt recovery too.)
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Out-of-Order Machine State Instruction Sequence: Inorder State: Look-ahead State: Architectural State: R3  A R7  B R8  C R7  D R4  E R3  F R8  G.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 ILP, cont. Maintaining Sequential Appearance –Precise Interrupts –RUU approach to OoO Scheduling.
EECS 470 Lecture 7 Branches: Address prediction and recovery (And interrupt recovery too.)
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
EECS 470 Dynamic Scheduling – Part II Lecture 10 Coverage: Chapter 3.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
EECS 470 Memory Scheduling Lecture 11 Coverage: Chapter 3.
Virtual Memory.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
EECS 470 Lecture 7 Branches: Address prediction and recovery (And interrupt recovery too.)
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
CDA 5155 Out-of-order execution: Pentium Pro/II/III Week 7.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
CS161 – Design and Architecture of Computer
Lecture: Out-of-order Processors
CS 352H: Computer Systems Architecture
Dynamic Scheduling Why go out of style?
CS161 – Design and Architecture of Computer
/ Computer Architecture and Design
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
Lecture: Out-of-order Processors
CS5100 Advanced Computer Architecture Hardware-Based Speculation
Module: Handling Exceptions
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
ECE 2162 Reorder Buffer.
Lecture 11: Memory Data Flow Techniques
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Instruction-Level Parallelism (ILP)
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
ECE 721 Modern Superscalar Microarchitecture
Presentation transcript:

EECS 470 Register Renaming Lecture 8 Coverage: Chapter 3

MEM Reorder Buffer Alloc –Allocate result storage at Sched –Get inputs (ROB T-to-H then ARF) –Wait until all inputs WB –Write results/fault to ROB –Indicate result is CT –Wait until Head is done –If fault, initiate handler –Else, write results to ARF –Deallocate entry from ROB IFID AllocSched EX ROB CT HeadTail PC Dst regID Dst value Except? Reorder Buffer (ROB) –Circular queue of spec state –May contain multiple definitions of same register In-order Any order ARF

A “High Complexity” Reorder Buffer ROB HeadTail Serial scan! regID === valval valval valval >>> Sched we must access the nearest-previous definition –Requires a serial scan of ROB –Tail (newest) to head (oldest) –Implemented with daisy-chain What is the latency of ROB access? –O(N), with N ROB entries –Due to more wire and logic What does this mean wrt. ILP?

Factors that Determine tCLK Recall: t CPU = N inst *CPI*t CLK What defines t CLK ? –Critical path latency (= logic + wire latency) –Latch latency –Clock skew –Clock period design margins In current and future generation designs –Wire latency becoming dominant latency of critical path –Due to growing side-wall capacitance –Brings a spatial dimension to architecture optimization E.g., How long are the wires that will connect these two devices?

Determining the Latency of a Wire scale shrinks grows

Reducing Complexity with Register Renaming Key observation –The definition we want is the last one written Register Renaming –Implement a table (indexed by regID) that returns the ROB entry that holds the last definition of the register –Translate the program from register identifiers to one that accesses reorder buffer entries directly –Then, access ROB entry directly, no scanning for nearest-previous register definitions!

Logical vs. Physical Registers Logical registers (aka “architected” registers) –Register names used by programmer/compiler to identify program values –How many do we need? Physical registers –Storage names implemented in the microarchitecture used to hold actual register values –ROB entries in our microarchitecture Other implementations possible, e.g., P4 physical register file What is the advantage/disadvantage of P4’s physical register file? –How many do we need?

Register renaming translates program from logical register accesses to physical storage accesses Logical ProgramPhysical Program r6 = r5 + r2p52 = p45 + p42 r8 = r6 + r3p53 = p52 + r3 r6 = r9 + r10p54 = r9 + r10 r12 = r8 + r6p55 = p53 + p54 Note: program semantics have not changed –Only storage names have changed –Storage names are unimportant to program semantics Register Translation Example rename

MEM Pipeline with Register Renaming IFID AllocREN EX ROB CT In-order Any order ARF Sched Rename Table regIDrobIDX REN –Index table with source operand regID to locate ROB/ARF Sched –Get inputs from ROB/ARF entry specified by REN –Wait until all inputs CT –Wait until Head is done –If fault, initiate handler –Else, write results to ROB/ARF entry specified by REN –Deallocate entry from ROB –Invalidate rename table dest regID iff the entry still points to ROB entry being deallocated Rename Table –Indexed with regID –Returns (valid, robIDX) –If valid, ROB does/will contain value of register –If invalid, ARF holds value (no instruction in flight defines this register) Why ?

Register Renaming Example Logical ProgramPhysical Program r6 = r5 + r2 r8 = r6 + r3 r6 = r9 + r10 r12 = r8 + r Logical ProgramPhysical Program r6 = r5 + r2p52 = p45 + p42 r8 = r6 + r3 r6 = r9 + r10 r12 = r8 + r p42 p45 x x p42 p45 p52 x x x

Register Renaming Example Logical ProgramPhysical Program r6 = r5 + r2p52 = p45 + p42 r8 = r6 + r3p53 = p52 + r3 r6 = r9 + r10 r12 = r8 + r Logical ProgramPhysical Program r6 = r5 + r2p52 = p45 + p42 r8 = r6 + r3p53 = p52 + r3 r6 = r9 + r10p54 = r9 + r10 r12 = r8 + r p42 p45 p53 x x x p42 p45 p54 p53 x x x x p52 x

Register Renaming Example Logical ProgramPhysical Program r6 = r5 + r2p52 = p45 + p42 r8 = r6 + r3p53 = p52 + r3 r6 = r9 + r10p54 = r9 + r10 r12 = r8 + r6p55 = p53 + p p45 p54 p53 p55 x x x x x p42

Cross-cutting Issue: Mispeculation What are the impacts of mispeculation or exceptions? –When instructions are flushed from the pipeline, rename mappings must be restored to point-of-restart –Otherwise, new instructions will see stale definitions Two recovery approaches –Simple/slow 1.Wait until the faulting/mispredicting instruction reaches retirement 2.Flush ALL speculative register definitions by clearing all rename table valid bits –Complex/fast 1.Checkpoint ENTIRE rename table anywhere recovery may be needed 2.At soon as mispeculation detected, recover table associated with PC

Discussion Points What are the trade-offs between rename table flush recovery and checkpointing? What if another instruction (being renamed) needs to access a physical storage entry after it has been overwritten? Can I rename memory?