1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Advanced Pipelining Optimally Scheduling Code Optimally Programming Code Scheduling for Superscalars (6.9) Exceptions (5.6, 6.8)
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
ELEN 468 Advanced Logic Design
Out-of-Order Machine State Instruction Sequence: Inorder State: Look-ahead State: Architectural State: R3  A R7  B R8  C R7  D R4  E R3  F R8  G.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
1 Lecture 7: Speculative Execution and Recovery using Reorder Buffer Branch prediction and speculative execution, precise interrupt, reorder buffer.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
Lecture: Out-of-order Processors
CSL718 : Superscalar Processors
CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction John Wawrzynek Electrical Engineering.
/ Computer Architecture and Design
Smruti R. Sarangi IIT Delhi
ELEN 468 Advanced Logic Design
Out of Order Processors
CS203 – Advanced Computer Architecture
Lecture: Out-of-order Processors
/ Computer Architecture and Design
Pipelining: Advanced ILP
Sequential Execution Semantics
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
ECE 2162 Reorder Buffer.
Lecture 11: Memory Data Flow Techniques
Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Advanced Computer Architecture
Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design

2 Generic Superscalar Processor Models FetchRename Wakeup select Regfile FU bypass D-cache execute commit FetchRename ROB FU bypass D-cache execute commit Reg Wakeup select Issue queue based Reservation based (already studied) Revised from Paracharla PhD thesis 1998 schedule

3 Issue Queue Based Pipeline Fetch->Rename->Issue->Reg-read-> Execute- >Writeback/Commit Core structure: register mapping table Rename: translate architectural registers into physical registers Issue: send instruction out to register read and then execution Commit: Process mis-prediction/exception, update register renaming Why study? Used in Alpha 21264, MIPS R10000, Intel P4

4 Compare Reservation Station and Issue Queue Pipeline Stage Sequence 1. RS: IF -> REN -> REG/ROB->SCHD->… 2. IQ: IF -> REN -> SCHD -> REG ->… Mapping Table vs. Status Table 1. RS: Status table chooses architectural register or ROB 2. IQ: Always renames to a physical register Register file 1. RS: Architectural register file stores architectural states 2. IQ: Physical register file; No architectural register file! Mapping table determines architectural states

5 Compare Reservation Station and Issue Queue Reservation Station 1. RS: busy, fu, op, Qj, Qk, Vj, Vk 2. IQ: busy, fu, op, Pj, Pk, ReadyJ, ReadyK ROB 1. RS: Store register values 2. IQ: No register contents Pros and Cons of IQ: No copying between ROB and register Efficient use of register Bad: Complex mapping table design

6 Register Mapping Table Records the mapping from virtual, architectural registers to physical registers Mapping is stored in RAM or CAM memories Arch reg (virtual) Phy reg R1 => P3 R2 => P10 R3 => P6 R4 => P8 R5 => P12 …

7 Register Renaming Examples Loop: LW R2, 0(R1) ADD R2, R2, 1 SW R2, 0(R1) ADD R1, R1, 4 BNE R2, R3, LOOP LW returns 100, R1=1000 Renamed dynamic instructions: … BNE P2, P3, Loop LW P32, 0(P1) ADD P33, P32, 1 SW R33, 0(P1) ADD P34, P1, 4 BNE P34, P3, LOOP … Assume at first BNE.rename, R1-R31 mapped to P1-P31, P32-P127 are free First BNE may be predicted either correctly or not

8 Register Mapping Status R1 => P1 R2 => P32 R3 => P3 R4 => P4 R5 => P5 … R1=>P1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … R1 => P1 R2 => P2 R3 => P3 R4 => P4 R5 => P5 … P1=4000 P2=200 … P32=100 P33=101 P34=4004 At commit (possible sequence) P1=4000 P2=200 … P32=100 P33=101 P34=4004 P1=4000 P2=200 … P32=100 P33=? P34=4004 P1=4000 P2=200 … P32=100 P33=101 P34=4004 R1=>P1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … R1=>R34 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … No change

9 Commit and Rollback R1 => P1 R2 => P32 R3 => P3 R4 => P4 R5 => P5 … P1=>R1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … R1 => P1 R2 => P2 R3 => P3 R4 => P4 R5 => P5 … P1=>R1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=>R34 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=4000 P2=200 … P32=100 P33=? P34=4004 Commit successful: make the next mapping status as committed mapping status free the previous physical register Mis-prediction/exception: flush pipeline, flush the following mappings Rename point commit point

10 Program Execution Correctness Only committed instructions write to register and memory Yes, from programmer’s viewpoint -- only committed instructions’ register output becomes visible Maintain correct data flow – a child instruction always use the values from its parents Yes, in renamed form, and not affected by speculative execution Register/memory receives the value of last write Yes, from programmer’s viewpoint -- architectural mapping status is updated in program order Note memory correctness is not affected

11 Mapping Table Design – MIPS R1000 RAM-based structure: Automatically, parallel saving on branches at rename On mis-prediction: restore the previous mapping immediately, flush pipeline, restart fetch at the alternative PC On commit of branch instruction: make the corresponding mapping as the committed one Stall if branch stack is full Mapping after Br4 Mapping after Br3 Mapping after Br2 Mapping after Br1 Committed mapping Branch stack Alternative PC4 Alternative PC3 Alternative PC2 Alternative PC1 Mapping tables Current mapping Committed mapping

12 Mapping Table Design – MIPS R1000 How about precise exception? Cannot preserve every mapping status for every instruction Solution: record the change of mapping in ROB ROB: Contains Dest Architectural Register, Renamed physical register, Old renamed physical register On exception: rollback mapping one instruction by one instruction, four instructions per cycle Slow performance – but how frequent is exception? Note branch mis-prediction has fast recovery

13 Mapping Table Design – Alpha CAM structure Associative searching on architecture register index, output physical register index (through an encoder) One column represents one mapping, allocated to each instruction with register output at rename One pair of valid bit changes per one dest renaming Fast recovery even on exceptions Arch. Reg # … … p0 p1 p2 pk Valid bits current mapping committed mapping Match and valid

14 Multiple Issue Pipelines Each pipeline stages accept k instructions – k- issue processor Alpha – 4-issue MIPS R1000 – 4-issue Intel P4 – 3-issue Memory structure must have multiple ports proportional to issue width! What if k instructions at rename have dependence among them? Need Dependence check logic!

15 Dependence Check Logic Any change to the first renaming? What is the change to the second one? Third and forth ones? mapping table Rs0Rt0Rd0 Ps0Ps1 Rs1Rt1Rd1Rs2Rt2Rd2Rs3Rt3Rd3 Ps0Ps1Ps0Ps1Ps0Ps1 Pd0Pd1Pd2Pd3 No dependence check yet