Download presentation
Presentation is loading. Please wait.
Published byEustace Daniels Modified over 9 years ago
1
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5
2
2 Tomasulo Algorithm History: 1966: scoreboarding in CDC6600, implementing limited dynamic scheduling Three years later: Tomasulo in IBM 360/91, introducing register renaming and reservation station Now appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others In today’s perspective: The implementation belongs to reservation- based dynamic scheduling without data forwarding (obsolete) The merit is the use of tag to identify instructions in the fly: register renaming, reservation station, tag broadcast It is an “algorithm” first Adapted from UCB CS252 S98, Copyright 1998 USB
3
Assumptions Assumptions for time being: No branches Only register dependence is a concern Only floating-point instructions are executed out-of-order 3
4
4 Register Renaming and Correctness 1. The same set of instructions write to user-visible register and memory; 2. Each instruction receives the same operands as in the sequential execution; and 3. Any register or memory word receives the value of the last write as in the sequential execution
5
Tomasulo Organization 5
6
6 Reservation Station Components Op—Operation to perform in the unit (e.g., + or –) Vj, Vk—Value of Source operands Store buffers has V field, result to be stored Qj, Qk—Reservation stations producing source registers (value to be written) Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready Store buffers only have Qi for RS producing result Busy—Indicates reservation station or FU is busy How are dependences represented in reservation station? Adapted from UCB CS252 S98B
7
7 Register Result Status It’s a table to record the RS index of the latest pending instruction, if there is one, that writes to each register It’s kind of register renaming: rename register to RS if there is a pending write
8
8 Three Stages of Tomasulo Algorithm 1.Issue —get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2.Execution —operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3.Write result —finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available Issue: build dependence for new inst Writeback: Wakeup dependent instructions Adapted from UCB CS252 S98
9
9 Issue Stage and Renaming Source renaming: rename the two source registers if there are pending writes Assigns a free RS Dest. Renaming: Updates register result status with the allocated RS Renaming removes name dependences
10
Issue Stage and Renaming Example: ADD R31, R1, R2 SUB R4, R3, R31 ADD R31, R5, R6 SUB R8, R7, R31 10
11
11 Execute Stage Only “ready” instructions can join the competition There is a select logic to select instructions for FU execution Some policy may be used, e.g. age based Non-ready instructions can be “waken up” during writeback of its parent inst
12
12 Writeback and Common Data Bus Normal data bus: data + destination (“go to” bus) Common data bus: data + source (“come from” bus) 64 bits of data + 4 bits of source index (tag) Does the broadcast to every instruction in the fly Child instructions do tag matching and update their ready bits and value fields (if the tag matches theirs) Adapted from UCB CS252 S98, Copyright 1998 USB
13
13 Code Example LD F6,34(R2) LD F2,45(R3) MULTI F0,F2,F4 SUBD F8,F6,F2 DIVD F10,F0,F6 ADD F6,F8,F2 LD1LD2 MULTISUBD DIVDADD Operation latencies: load/store 2 cycles, Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle
14
14 Tomasulo Example Cycle 0 Adapted from UCB CS252 S98, Copyright 1998 USB
15
15 Tomasulo Example Cycle 1 Yes Adapted from UCB CS252 S98, Copyright 1998 USB
16
16 Tomasulo Example Cycle 2 Adapted from UCB CS252 S98, Copyright 1998 USB
17
17 Tomasulo Example Cycle 3 Note: registers names are removed (“renamed”) in Reservation Stations Load1 completing; what is waiting for Load1? Adapted from UCB CS252 S98, Copyright 1998 USB
18
18 Tomasulo Example Cycle 4 Load2 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB
19
19 Tomasulo Example Cycle 5 Adapted from UCB CS252 S98, Copyright 1998 USB
20
20 Tomasulo Example Cycle 6 Adapted from UCB CS252 S98, Copyright 1998 USB
21
21 Tomasulo Example Cycle 7 Add1 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB
22
22 Tomasulo Example Cycle 8 Adapted from UCB CS252 S98, Copyright 1998 USB
23
23 Tomasulo Example Cycle 9 Adapted from UCB CS252 S98, Copyright 1998 USB
24
24 Tomasulo Example Cycle 10 Add2 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB
25
25 Tomasulo Example Cycle 11 Write result of ADDD here vs. scoreboard? Adapted from UCB CS252 S98, Copyright 1998 USB
26
26 Tomasulo Example Cycle 12 Adapted from UCB CS252 S98, Copyright 1998 USB
27
27 Tomasulo Example Cycle 13 Adapted from UCB CS252 S98, Copyright 1998 USB
28
28 Tomasulo Example Cycle 14 Adapted from UCB CS252 S98, Copyright 1998 USB
29
29 Tomasulo Example Cycle 15 Mult1 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB
30
30 Tomasulo Example Cycle 16 Note: Just waiting for divide Adapted from UCB CS252 S98, Copyright 1998 USB
31
31 Tomasulo Example Cycle 55 Adapted from UCB CS252 S98, Copyright 1998 USB
32
32 Tomasulo Example Cycle 56 Mult 2 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB
33
33 Tomasulo Example Cycle 57 Again, in-oder issue, out-of-order execution, completion Adapted from UCB CS252 S98, Copyright 1998 USB
34
34 Scheduling Results Inst DispSchdEXEMEMCDB L.D 12-34 23-45 MULT 34-56-15-16 SUB.D 456-7-8 DIV.D 55-11 17-56 -57 ADD.D 67-89-10-11 It’s not perfect: lose one cycle for CDB broadcasting
35
35 Correctness of Tomasulo Each instruction receives the same operands as in the sequential execution Or, each instruction receives its operand from its parent in program order (data flow) How many possible ways of register data flow in Tomasulo?
36
36 Correctness of Tomasulo Any register or memory word receives the value of the last write as in the sequential execution No stale writes How can Tomasulo prevent stale writes?
37
37 Register Renaming and Correctness 1. The same set of instructions write to user-visible register and memory; 2. Each instruction receives the same operands as in the sequential execution; and 3. Any register or memory word receives the value of the last write as in the sequential execution
38
38 Review Dependences How are dependences are enforced or removed in Tomasulo Algorithm? Data dependences (RAW) Must be enforced: timing and data flow Antidependence (WAR) Hazard to data flow, can be removed Output Dependence (WAW) Hazard of stale write, can be removed
39
39 Renaming in Tomasulo Tomasulo Example Cycle 4 Adapted from UCB CS252 S98, Copyright 1998 USB Can you see renamed instructions?
40
40 Modern Renaming: Use of Tag In Tomasulo, RS and load/store buffer index is used as tag. Renaming assign every new reg. value a unique tag RS stores tags to preserve dependences CDB broadcasts tag with data for data passing and wakeup Why tag is so chosen? What does tag really represent? What else can be used as tag?
41
41 Tomasulo Summary Reservations stations: Increases effective register number Distributes scheduling logic Register renaming: Avoids WAR and WAW dependence Tag + Data broadcasting for waking up child instructions Pros: can be effectively combined with speculative execution Cons: CDB broadcasting adds one-cycle delay (addressed in modern instruction scheduling) Adapted from UCB CS252 S98, Copyright 1998 USB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.