© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Out-of-Order Processors Krste Asanovic

Slides:



Advertisements
Similar presentations
Krste Asanovic Electrical Engineering and Computer Sciences
Advertisements

© Krste Asanovic, 2014CS252, Spring 2014, Lecture 5 CS252 Graduate Computer Architecture Spring 2014 Lecture 5: Out-of-Order Processing Krste Asanovic.
2/28/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism – Part 1 Benjamin Lee Electrical and Computer Engineering Duke.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.
February 28, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste Asanovic Electrical Engineering.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo.
March 11, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP I Steve Ko Computer Sciences and Engineering University at Buffalo.
March 2, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer.
CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
March 4, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CS 152 Computer Architecture and Engineering Lecture 15 - Out-of-Order Memory, Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer.
© Krste Asanovic, 2015CS252, Fall 2015, Lecture 7 CS252 Graduate Computer Architecture Spring 2014 Lecture 7: Advanced Out-of-Order Superscalar Designs.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
March 1, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
CS203 – Advanced Computer Architecture ILP and Speculation.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering.
CS252 Graduate Computer Architecture Fall 2015 Lecture 5: Out-of-Order Processing Krste Asanovic
CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction John Wawrzynek Electrical Engineering.
Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
/ Computer Architecture and Design
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
Out of Order Processors
Dr. George Michelogiannakis EECS, University of California at Berkeley
High-level view Out-of-order pipeline
Lecture 6: Advanced Pipelines
Out of Order Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Electrical and Computer Engineering
Krste Asanovic Electrical Engineering and Computer Sciences
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 4 – Pipelining Part II Krste Asanovic Electrical Engineering.
September 20, 2000 Prof. John Kubiatowicz
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 11 – Out-of-Order Execution Krste Asanovic Electrical Engineering.
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Presentation transcript:

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CS252 Graduate Computer Architecture Fall 2015 Lecture 6: Out-of-Order Processors Krste Asanovic

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Supercomputers Definitions of a supercomputer:  Fastest machine in world at given task  A device to turn a compute-bound problem into an I/O bound problem  Any machine costing $30M+  Any machine designed by Seymour Cray  CDC6600 (Cray, 1964) regarded as first supercomputer 2

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CDC 6600 Seymour Cray, 1963  A fast pipelined machine with 60-bit words -128 Kword main memory capacity, 32 banks  Ten functional units (parallel, unpipelined) -Floating Point: adder, 2 multipliers, divider -Integer: adder, 2 incrementers,...  Hardwired control (no microcoding)  Scoreboard for dynamic scheduling of instructions  Ten Peripheral Processors for Input/Output -a fast multi-threaded 12-bit integer ALU  Very fast clock, 10 MHz (FP add in 4 clocks)  >400,000 transistors, 750 sq. ft., 5 tons, 150 kW, novel freon-based technology for cooling  Fastest machine in world for 5 years (until 7600) -over 100 sold ($7-10M each) 3 3/10/2009

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CDC 6600: A Load/Store Architecture 4 Separate instructions to manipulate three types of reg. 8x60-bit data registers (X) 8x18-bit address registers (A) 8x18-bit index registers (B) All arithmetic and logic instructions are register-to-register Only Load and Store instructions refer to memory! Touching address registers 1 to 5 initiates a load 6 to 7 initiates a store - very useful for vector operations opcode i j k Ri   Rj op Rk opcode i j disp Ri M[Rj + disp]

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CDC 6600: Datapath 5 Address Regs Index Regs 8 x 18-bit 8 x 18-bit Operand Regs 8 x 60-bit Inst. Stack 8 x 60-bit IR 10 Functional Units Central Memory 128K words, 32 banks, 1µs cycle result addr result operand addr

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CDC6600 ISA designed to simplify high- performance implementation  Use of three-address, register-register ALU instructions simplifies pipelined implementation -Only 3-bit register specifier fields checked for dependencies -No implicit dependencies between inputs and outputs  Decoupling setting of address register (Ar) from retrieving value from data register (Xr) simplifies providing multiple outstanding memory accesses -Software can schedule load of address register before use of value -Can interleave independent instructions inbetween  CDC6600 has multiple parallel but unpipelined functional units -E.g., 2 separate multipliers  Follow-on machine CDC7600 used pipelined functional units -Foreshadows later RISC designs 6

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CDC6600: Vector Addition 7 B0  - n loop:JZE B0, exit A0  B0 + a0load X0 A1  B0 + b0 load X1 X6  X0 + X1 A6  B0 + c0 store X6 B0  B0 + 1 jump loop Ai = address register Bi = index register Xi = data register

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 CDC6600 Scoreboard  Instructions dispatched in-order to functional units provided no structural hazard or WAW -Stall on structural hazard, no functional units available -Only one pending write to any register  Instructions wait for input operands (RAW hazards) before execution -Can execute out-of-order  Instructions wait for output register to be read by preceding instructions (WAR) -Result held in functional unit until register free 8

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 9 [© IBM]

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 IBM 360/91 Floating-Point Unit R. M. Tomasulo, Mult load buffers (from memory) Adder Floating-Point Regfile store buffers (to memory)... instructions Common bus ensures that data is made available immediately to all the instructions waiting for it. Match tag, if equal, copy value & set presence “p”. Distribute reservation stations to functional units ptag/datap p p p p p p p p p p p 2 p p p p p p p p p p

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Out-of-Order Fades into Background Out-of-order processing implemented commercially in 1960s, but disappeared again until 1990s as two major problems had to be solved:  Precise traps -Imprecise traps complicate debugging and OS code -Note, precise interrupts are relatively easy to provide  Branch prediction -Amount of exploitable instruction-level parallelism (ILP) limited by control hazards Also, simpler machine designs in new technology beat complicated machines in old technology -Big advantage to fit processor & caches on one chip -Microprocessors had era of 1%/week performance scaling 11

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Separating Completion from Commit  Re-order buffer holds register results from completion until commit -Entries allocated in program order during decode -Buffers completed values and exception state until in- order commit point -Completed values can be used by dependents before committed (bypassing) -Each entry holds program counter, instruction type, destination register specifier and value if any, and exception status (info often compressed to save hardware)  Memory reordering needs special data structures -Speculative store address and data buffers -Speculative load address and data buffers 12

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 In-Order Commit for Precise Traps  In-order instruction fetch and decode, and dispatch to reservation stations inside reorder buffer  Instructions issue from reservation stations out-of-order  Out-of-order completion, values stored in temporary buffers  Commit is in-order, checks for traps, and if none updates architectural state 13 FetchDecode Execute Commit Reorder Buffer In-order Out-of-order Trap? Kill Inject handler PC

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Phases of Instruction Execution 14 Fetch: Instruction bits retrieved from instruction cache. I-cache Fetch Buffer Issue Buffer Functional Units Architectural State Execute: Instructions and operands issued to functional units. When execution completes, all results and exception flags are available. Decode: Instructions dispatched to appropriate issue buffer Result Buffer Commit: Instruction irrevocably updates architectural state (aka “graduation”), or takes precise trap/interrupt. PC Commit Decode/Rename

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 In-Order versus Out-of-Order Phases  Instruction fetch/decode/rename always in-order -Need to parse ISA sequentially to get correct semantics -Proposals for speculative OoO instruction fetch, e.g., Multiscalar. Predict control flow and data dependencies across sequential program segments fetched/decoded/executed in parallel, fixup if prediction wrong  Dispatch (place instruction into machine buffers to wait for issue) also always in-order -Some use “Dispatch” to mean issue, but not in these lectures 15

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 In-Order Versus Out-of-Order Issue  In-order issue: -Issue stalls on RAW dependencies or structural hazards, or possibly WAR/WAW hazards -Instruction cannot issue to execution units unless all preceding instructions have issued to execution units  Out-of-order issue: -Instructions dispatched in program order to reservation stations (or other forms of instruction buffer) to wait for operands to arrive, or other hazards to clear -While earlier instructions wait in issue buffers, following instructions can be dispatched and issued out-of-order 16

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 In-Order versus Out-of-Order Completion  All but the simplest machines have out-of-order completion, due to different latencies of functional units and desire to bypass values as soon as available  Classic RISC 5-stage integer pipeline just barely has in- order completion -Load takes two cycles, but following one-cycle integer op completes at same time, not earlier -Adding pipelined FPU immediately brings OoO completion 17

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 In-Order versus Out-of-Order Commit  In-order commit supports precise traps, standard today -Some proposals to reduce the cost of in-order commit by retiring some instructions early to compact reorder buffer, but this is just an optimized in-order commit  Out-of-order commit was effectively what early OoO machines implemented (imprecise traps) as completion irrevocably changed machine state -i.e., complete == commit in these machines 18

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 OoO Design Choices  Where are reservation stations? -Part of reorder buffer, or in separate issue window? -Distributed by functional units, or centralized?  How is register renaming performed? -Tags and data held in reservation stations, with separate architectural register file -Tags only in reservation stations, data held in unified physical register file 19

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 “Data-in-ROB” Design (HP PA8000, Pentium Pro, Core2Duo, Nehalem)  Managed as circular buffer in program order, new instructions dispatched to free slots, oldest instruction committed/reclaimed when done (“p” bit set on result)  Tag is given by index in ROB (Free pointer value)  In dispatch, non-busy source operands read from architectural register file and copied to Src1 and Src2 with presence bit “p” set. Busy operands copy tag of producer and clear “p” bit.  Set valid bit “v” on dispatch, set issued bit “i” on issue  On completion, search source tags, set “p” bit and copy data into src on tag match. Write result and exception flags to ROB.  On commit, check exception status, and copy result into architectural register file if no trap.  On trap, flush machine and ROB, set free=oldest, jump to handler TagpSrc1TagpSrc2RegpResultExcept?ivOpcodeTagpSrc1TagpSrc2RegpResultExcept?ivOpcodeTagpSrc1TagpSrc2RegpResultExcept?ivOpcodeTagpSrc1TagpSrc2RegpResultExcept?ivOpcodeTagpSrc1TagpSrc2RegpResultExcept?ivOpcode Oldest Free

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Managing Rename for Data-in-ROB  If “p” bit set, then use value in architectural register file  Else, tag field indicates instruction that will/has produced value  For dispatch, read source operands from arch. regfile, and also read from producing instruction in ROB, bypassing as needed. Copy to ROB  Write destination arch. register entry with, to assign tag to ROB index of this instruction  On commit, update arch. regfile with  On trap, reset table (All p=1) 21 TagpValue TagpValue TagpValue TagpValue One entry per arch. register Rename table associated with architectural registers, managed in decode/dispatch

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 ROB Data Movement in Data-in-ROB Design 22 Architectural Register File Read operands during decode Source Operands Write sources in dispatch Read operands at issue Functional Units Write results at completion Read results for commit Bypass newer values at dispatch Result Data Write results at commit

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Unified Physical Register File (MIPS R10K, Alpha 21264, Intel Pentium 4 & Sandy/Ivy Bridge)  Rename all architectural registers into a single physical register file during decode, no register values read  Functional units read and write from single unified register file holding committed and temporary registers in execute  Commit only updates mapping of architectural register to physical register, no data movement 23 Unified Physical Register File Read operands at issue Functional Units Write results at completion Committed Register Mapping Decode Stage Register Mapping

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Lifetime of Physical Registers 24 ld x1, (x3) addi x3, x1, #4 sub x6, x7, x9 add x3, x3, x6 ld x6, (x1) add x6, x6, x3 sd x6, (x1) ld x6, (x11) ld P1, (Px) addi P2, P1, #4 sub P3, Py, Pz add P4, P2, P3 ld P5, (P1) add P6, P5, P4 sd P6, (P1) ld P7, (Pw) Rename When can we reuse a physical register? When next writer of same architectural register commits Physical regfile holds committed and speculative values Physical registers decoupled from ROB entries (no data in ROB)

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 25 opp1PR1p2PR2exuseRdPRdLPRd P5 P6 P7 P0 Pn P1 P2 P3 P4 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 ROB Rename Table Physical Regs Free List ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) p p p P0 P1 P3 P2 P4 (LPRd requires third read port on Rename Table for each instruction) P8 p

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 26 opp1PR1p2PR2exuseRdPRdLPRd ROB ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 p x ld p P7 x1 P0 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 27 opp1PR1p2PR2exuseRdPRdLPRd ROB ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 p x ld p P7 x1 P0 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8 P7 P1 x addi P0 x3 P1

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 28 opp1PR1p2PR2exuseRdPRdLPRd ROB ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 p x ld p P7 x1 P0 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8 P7 P1 x addi P0 x3 P1 P5 P3 x sub p P6 p P5 x6 P3

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 29 opp1PR1p2PR2exuseRdPRdLPRd ROB ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 p x ld p P7 x1 P0 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8 P7 P1 x addi P0 x3 P1 P5 P3 x sub p P6 p P5 x6 P3 P1 P2 x add P1 P3 x3 P2

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 30 opp1PR1p2PR2exuseRdPRdLPRd ROB ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 p x ld p P7 x1 P0 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8 P7 P1 x addi P0 x3 P1 P5 P3 x sub p P6 p P5 x6 P3 P1 P2 x add P1 P3 x3 P2 x ld P0 x6 P4P3 P4

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 31 opp1PR1p2PR2exuseRdPRdLPRd ROB x ld p P7 x1 P0 x addi P0 x3 P1 x sub p P6 p P5 x6 P3 x ld p P7 x1 P0 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 p x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8 P7 P1 P5 P3 P1 P2 x add P1 P3 x3 P2 x ld P0 x6 P4P3 P4 Execute & Commit p p p P8 x

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Physical Register Management 32 opp1PR1p2PR2exuseRdPRdLPRd ROB x sub p P6 p P5 x6 P3 x addi P0 x3 P1 ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1) Free List P0 P1 P3 P2 P4 P5 P6 P7 P0 Pn P1 P2 P3 P4 Physical Regs p p p P8 x x ld p P7 x1 P0 x5 P5 x6 P6 x7 x0 P8 x1 x2 P7 x3 x4 Rename Table P0 P8 P7 P1 P5 P3 P1 P2 x add P1 P3 x3 P2 x ld P0 x6 P4P3 P4 Execute & Commit p p p P8 x p p P7

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 MIPS R10K Trap Handling  Rename table is repaired by unrenaming instructions in reverse order using the PRd/LPRd fields  The Alpha had similar physical register file scheme, but kept complete rename table snapshots for each instruction in ROB (80 snapshots total) -Flash copy all bits from snapshot to active table in one cycle 33

© Krste Asanovic, 2015CS252, Fall 2015, Lecture 6 Acknowledgements  This course is partly inspired by previous MIT and Berkeley CS252 computer architecture courses created by my collaborators and colleagues: -Arvind (MIT) -Joel Emer (Intel/MIT) -James Hoe (CMU) -John Kubiatowicz (UCB) -David Patterson (UCB) 34