Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Kubiatowicz © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 668 Exceptions, Reorder Buffer (ROB), Speculative Tomasulo
Copyright 2001 UCB & Morgan Kaufmann ECE668.2 Adapted from Patterson, Katz and Kubiatowicz © UCB Exceptions - Basics Exception = unprogrammed control transfer system takes action to handle the exception »must record the address of the offending instruction »record any other information necessary to return afterwards returns control to user must save & restore user state normal control flow: sequential, jumps, branches, calls, returns user program System Exception Handler Exception: return from exception
Copyright 2001 UCB & Morgan Kaufmann ECE668.3 Adapted from Patterson, Katz and Kubiatowicz © UCB Two Types of Exceptions Interrupts caused by external events: »Network, Keyboard, Disk I/O, Timer asynchronous to program execution »Most interrupts can be disabled for brief periods of time may be handled between instructions simply suspend and resume user program Traps caused by internal events »exceptional conditions (overflow) »errors (parity) »page faults (non-resident page) synchronous to program execution condition must be remedied by the handler instruction may be retried and program continued or program may be aborted
Copyright 2001 UCB & Morgan Kaufmann ECE668.4 Adapted from Patterson, Katz and Kubiatowicz © UCB Exceptions - Examples
Copyright 2001 UCB & Morgan Kaufmann ECE668.5 Adapted from Patterson, Katz and Kubiatowicz © UCB StagePossible exceptions IFPage fault on instruction fetch; misaligned memory access; memory-protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory-protection violation; memory error How do we stop the pipeline? How do we restart it? Do we interrupt immediately or wait? 5 instructions, executing in 5 different pipeline stages! Who caused the interrupt? Exceptions in MIPS pipeline
Copyright 2001 UCB & Morgan Kaufmann ECE668.6 Adapted from Patterson, Katz and Kubiatowicz © UCB Multiple exceptions Time (clock cycles) Load Add Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 Data page fault Arithmetic exception Time (clock cycles) Load Add Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 Data page fault Instruction page fault
Copyright 2001 UCB & Morgan Kaufmann ECE668.7 Adapted from Patterson, Katz and Kubiatowicz © UCB Precise Interrupts/Exceptions Exceptions should be Precise or clean, i.e., the outcome should be exactly the same as in a non- pipelined machine Precise state of the machine is preserved as if program executed up to the offending instruction All previous instructions completed Offending instruction and all following instructions act as if they have not even started Same code will work on different processor implementations Difficult in the presence of pipelining, out-of-order execution,... Imprecise system software has to figure out what is where and put it all back together Modern techniques for out-of-order execution and branch prediction help implement precise interrupts
Copyright 2001 UCB & Morgan Kaufmann ECE668.8 Adapted from Patterson, Katz and Kubiatowicz © UCB Relationship between precise interrupts and speculation Speculation: guess and check Important for branch prediction: Need to “take our best shot” at predicting branch direction If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly: This is exactly the same as precise exceptions! Technique for both precise interrupts/exceptions and speculation: in-order completion or commit
Copyright 2001 UCB & Morgan Kaufmann ECE668.9 Adapted from Patterson, Katz and Kubiatowicz © UCB Handling Exceptions Exceptions are handled by not recognizing the exception until instruction that caused it is ready to commit in ROB If a speculated instruction raises an exception, the exception is recorded in the ROB This is why reorder buffers in all new processors
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Reorder Buffer (HW support for precise interrupts) ROB=Buffer for results of uncommitted instructions An instruction commits when it completes its execution and all its predecessors have already committed Once instruction commits, result is put into register »Therefore, easy to undo speculated instructions on mispredicted branches or exceptions Supplies operands between execution complete & commit Reorder Buffer FP Op Queue FP AdderFP Mpier Res Stations FP Regs
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB More on Reorder Buffer operation Holds instructions in FIFO order, exactly as issued When instructions complete, results placed into ROB Supplies operands to other instruction between execution complete & commit Tag results with ROB buffer number instead of reservation station Instructions commit values at head of ROB placed in registers Reorder Buffer FP Op Queue FP Adder Res Stations FP Regs Commit path
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Another Perspective on Reorder Buffer If instructions write results in program order, reg/memory always get the correct values Role of ROB: to reorder out-of-order instruction to program order at the time of writing register/memory (commit) Instruction cannot write reg/memory immediately after execution, so ROB also buffer the results No such a place in original Tomasulo Reorder Buffer Decode FU1FU2 ReSt Fetch Unit Rename L-bufS-buf DM Regfile IM
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB ROB: Circular Buffer with Head/Tail Pointers … headtail … headtail … headtail Freed ROB entry Allocated ROB Entry when instr issued Entries between head and tail are valid
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Reorder Buffer Entry Details Reorder Buffer Dest reg Result Exceptions? Program Counter Branch or L/W? Ready?
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Organization with ROB and Associated Result Shift Register (from Smith et al. 1988) Common Result Bus Data (upon Commit) Bypass Logic/ Comparators For more details read: J. Smith & A. Pleszkun, IEEETC, May 1988 REGISTER FILE Result Shift Register REORDER BUFFER Control Source Data to functional units Result Shift Register controls Result Bus Stages labeled 1through n, n length longest FU pipeline An instruction taking i clocks reserves stage i in RSR when issues If valid instr already it waits until next clock The issuing instr places control information into RSR Each clock moves to stage towards 1 and next cycle uses control The ROB Tag guides the results to end up in correct ROB entry
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Example of RSR use (see Smith et al ) PC Instruction Ex_Time (in FU) 6 ADDF F10,F1,F3 6 7 ADD R9,R2,R5 2 StageFunctionalValidTag unit sourceinstr. 10 2Integer ADD Flt. Pt. ADD14 N0 Direction of movement Reorder (circular) Buffer Result Shift Register Head Tail State in RSR (control info plus ROB tag) after the ADD issues (for example below) ROB entry at Tail is given to issuing instruction; Tail ++
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Four Steps of Speculative Tomasulo Algorithm 1.Issue— get instruction from FP Op Queue If reservation station, reorder buffer slot, and result shift register slot free, issue instr & send operands & reorder buffer no. for destination. (this stage sometimes called “dispatch”) Actions summary: (1) decode the instruction; (2) allocate a RS, RSR and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB 2.Execution— operate on operands (EX) Action: when both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; this takes care of RAW. (sometimes called “issue”) 3.Write result— finish execution (WB) Action: Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available 4.Commit— update register with result from reorder buffer Action: When instr. at head of ROB & result present, update register with result (or store to memory) and remove instr from ROB. Mispredicted branch flushes reorder buffer. (sometimes called “graduation”)
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 LD F0,10(R2) N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB 2 ADDD R(F4),ROB1 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F10 F0 ADDD F10,F4,F0 LD F0,10(R2) N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 6 ADDD ROB5, R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 N N F4 LD F4,0(R3) N N -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers 5 0+R3
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 6 ADDD ROB5, R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 -- F0 ROB5 ST 0(R3),F4 ADDD F0,F4,F6 N N N N F4 LD F4,0(R3) N N -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers 1 10+R2 5 0+R3
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB 3 DIVD ROB2,R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 -- F0 M[10] ST 0(R3),F4 ADDD F0,F4,F6 Y Y N N F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers 2 ADDD R(F4),ROB1 6 ADDD M[10],R(F6)
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 -- F0 M[10] ST 0(R3),F4 ADDD F0,F4,F6 Y Y Ex F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB -- F0 M[10] ST 0(R3),F4 ADDD F0,F4,F6 Y Y Ex F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers What about memory hazards???
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Avoiding Memory Hazards WAW and WAR hazards through memory are eliminated with speculation because actual updating of memory occurs in order, when a store is at head of the ROB, and hence, no earlier loads or stores can still be pending RAW hazards through memory are maintained by two restrictions: 1. not allowing a load to initiate the second step of its execution if any active ROB entry occupied by a store has a Destination field that matches the value of the A field of the load, and 2. maintaining the program order for the computation of an effective address of a load with respect to all earlier stores. these restrictions ensure that any load that accesses a memory location written to by an earlier store cannot perform the memory access until the store has written the data
Copyright 2001 UCB & Morgan Kaufmann ECE Adapted from Patterson, Katz and Kubiatowicz © UCB Getting CPI below 1 CPI ≥ 1 if issue only 1 instruction every clock cycle Multiple-issue processors come in many flavors, e.g.,: 1. dynamically-scheduled superscalar processors, and (out-of-order execution) 2. VLIW (very long instruction word) processors »VLIW processors, in contrast, issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (Intel/HP Itanium)