Download presentation
Presentation is loading. Please wait.
1
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution
2
2 Goal: –Performance (IPC>1) How? –Wide Machine –Speculations (Branch prediction) –Out Of Order execution »Essentially a data flow execution model: Operations execute as soon as their operands are available –Eliminate name dependencies (aka false/anti dependencies) WAW, WAR »Via register renaming But, we still want a precise interrupt model –In-order commit »Via Reorder Buffer (ROB)
3
3 MIPS FPU with Tomasulo and ROB
4
4 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers F0 F2 F4 F10 Dest ValueInstruction F0 F2 F4 F10 RAT
5
5 4 Steps of Speculative Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)
6
6 Code Example 1. LD F0, 10(R2) 2. ADDD F10, F4, F0 3. DIVD F2, F10, F6 4. BNE F2, 5. LD F4, 0(R3) 6. ADDD F0, F4, F6 7. ADDD F0, F4, F6
7
7 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers F0 F2 F4 F10 Dest ValueInstruction F0 F2 F4 F10 RAT
8
8 F0 F2 F4 F10 To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 LD F0,10(R2) N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest F0 F2 F4 F10 Dest ValueInstruction Tomasulo With Reorder buffer: ROB1 RAT Reorder Buffer Registers
9
9 F0 F2 F4 F10 2 ADDD R(F4),ROB1 To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F10 F0 ADDD F10,F4,F0 LD F0,10(R2) N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers F0 F2 F4 F10 Dest ValueInstruction Tomasulo With Reorder buffer: ROB1 ROB2 RAT
10
10 F0 F2 F4 F10 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 Dest Reorder Buffer Registers F0 F2 F4 F10 Dest ValueInstruction ROB1 ROB3 ROB2 RAT
11
11 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 6 ADDD ROB5, R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 N N F4 LD F4,0(R3) N N -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory 1 10+R2 5 0+R3 Dest Reorder Buffer Registers F0 F2 F4 F10 Dest ValueInstruction F0 F2 F4 F10 ROB6 ROB3 ROB5 ROB2 RAT
12
12 F0 ROB7 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 6 ADDD ROB5, R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 N N N N F4 LD F4,0(R3) N N -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers 7 ADDD ROB5, R(F6) F0 F2 F4 F10 1 10+R2 5 0+R3 Dest ValueInstruction F2 F4 F10 ROB3 ROB5 ROB2 RAT
13
13 3 DIVD ROB2,R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 N N N N F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers 2 ADDD R(F4),ROB1 6 ADDD M[10],R(F6) 7 ADDD M[10],R(F6) F0 F2 F4 F10 1 10+R2 Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB3 ROB5 ROB2 RAT
14
14 3 DIVD ROB2,R(F6) 2 ADDD R(F4),ROB1 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N N N Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers F0 F2 F4 F10 1 10+R2 Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB3 ROB5 ROB2 RAT
15
15 3 DIVD ROB2,R(F6) 2 ADDD R(F4),M[2] Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N N N Y Y Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers F0 F2 F4 F10 Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB3 ROB5 ROB2 RAT
16
16 3 DIVD,R(F6) Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) N N Y Y C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers F0 = M[2] F2 F4 F10 Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB3 ROB5 ROB2 RAT
17
17 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) Y Y C C C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers F0 = M[2] F2 F4 F10= Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB3 ROB5 RAT
18
18 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue F0 = M[2] F2 = F4 F10= ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) Y Y -- BNE F2, N N F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) C C C C C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB5 RAT
19
19 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue F0 = M[2] F2 = F4 F10= ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) Y Y -- BNE F2, C C F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) C C C C C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers Dest ValueInstruction F0 F2 F4 F10 ROB7 ROB5 RAT
20
20 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue F0 = M[2] F2 = F4 = M[10] F10= ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y Y Y F4 M[10] LD F4,0(R3) C C -- BNE F2, C C F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) C C C C C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers Dest ValueInstruction F0 F2 F4 F10 ROB7 RAT
21
21 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue F0 = F2 = F4 = M[10] F10= ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 Y Y C C F4 M[10] LD F4,0(R3) C C -- BNE F2, C C F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) C C C C C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers Dest ValueInstruction F0 F2 F4 F10 ROB7 RAT
22
22 Tomasulo With Reorder buffer: To Memory FP adders FP multipliers Reservation Stations FP Op Queue F0 = F2 = F4 = M[10] F10= ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 F0 ADDD F0,F4,F6 C C C C F4 M[10] LD F4,0(R3) C C -- BNE F2, C C F2 F10 F0 M[2] DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2) C C C C C C Done? Dest Oldest Newest from Memory Dest Reorder Buffer Registers Dest ValueInstruction F0 F2 F4 F10 RAT
23
23 Remarks What about timing? –What happens on what cycle? No #cycles in the figure »How many fetches/commits in a cycle? »How many execution units? Homework assignment Preserving precise interrupt model –When an interrupt occurs, we can flush everything »Instructions that were not committed have no effect Commit happens in-order –Exceptions are taken on commit What happen if ROB is full? –Fetch is stopped until some instruction commits »Committed instruction frees its ROB entry
24
24 Memory Hazards When is memory updated? –On commit (or later) »Relevant for store instructions only WAR/WAW Hazards? –Handled by ROB RAW Hazards? –Must ensure that no in-flight store is targeting the same address –What about memory disambiguation? »Simple answer: before starting the load we must know all the addresses of all other in-flight stores »In real life we speculate on this ST 0(R2),F1 LD F2, 0(R2) ST 0(R2), F1 LD F2, 0(R4) //What if R4=R2?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.