Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

Similar presentations


Presentation on theme: "1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5."— Presentation transcript:

1 1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5

2 2 Tomasulo Algorithm History: 1966: scoreboarding in CDC6600, implementing limited dynamic scheduling Three years later: Tomasulo in IBM 360/91, introducing register renaming and reservation station Now appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others In today’s perspective: The implementation belongs to reservation- based dynamic scheduling without data forwarding (obsolete) The merit is the use of tag to identify instructions in the fly: register renaming, reservation station, tag broadcast It is an “algorithm” first Adapted from UCB CS252 S98, Copyright 1998 USB

3 Assumptions Assumptions for time being: No branches Only register dependence is a concern Only floating-point instructions are executed out-of-order 3

4 4 Register Renaming and Correctness 1. The same set of instructions write to user-visible register and memory; 2. Each instruction receives the same operands as in the sequential execution; and 3. Any register or memory word receives the value of the last write as in the sequential execution

5 Tomasulo Organization 5

6 6 Reservation Station Components Op—Operation to perform in the unit (e.g., + or –) Vj, Vk—Value of Source operands Store buffers has V field, result to be stored Qj, Qk—Reservation stations producing source registers (value to be written) Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready Store buffers only have Qi for RS producing result Busy—Indicates reservation station or FU is busy How are dependences represented in reservation station? Adapted from UCB CS252 S98B

7 7 Register Result Status It’s a table to record the RS index of the latest pending instruction, if there is one, that writes to each register It’s kind of register renaming: rename register to RS if there is a pending write

8 8 Three Stages of Tomasulo Algorithm 1.Issue —get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2.Execution —operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3.Write result —finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available Issue: build dependence for new inst Writeback: Wakeup dependent instructions Adapted from UCB CS252 S98

9 9 Issue Stage and Renaming Source renaming: rename the two source registers if there are pending writes Assigns a free RS Dest. Renaming: Updates register result status with the allocated RS Renaming removes name dependences

10 Issue Stage and Renaming Example: ADD R31, R1, R2 SUB R4, R3, R31 ADD R31, R5, R6 SUB R8, R7, R31 10

11 11 Execute Stage Only “ready” instructions can join the competition There is a select logic to select instructions for FU execution Some policy may be used, e.g. age based Non-ready instructions can be “waken up” during writeback of its parent inst

12 12 Writeback and Common Data Bus Normal data bus: data + destination (“go to” bus) Common data bus: data + source (“come from” bus) 64 bits of data + 4 bits of source index (tag) Does the broadcast to every instruction in the fly Child instructions do tag matching and update their ready bits and value fields (if the tag matches theirs) Adapted from UCB CS252 S98, Copyright 1998 USB

13 13 Code Example LD F6,34(R2) LD F2,45(R3) MULTI F0,F2,F4 SUBD F8,F6,F2 DIVD F10,F0,F6 ADD F6,F8,F2 LD1LD2 MULTISUBD DIVDADD Operation latencies: load/store 2 cycles, Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle

14 14 Tomasulo Example Cycle 0 Adapted from UCB CS252 S98, Copyright 1998 USB

15 15 Tomasulo Example Cycle 1 Yes Adapted from UCB CS252 S98, Copyright 1998 USB

16 16 Tomasulo Example Cycle 2 Adapted from UCB CS252 S98, Copyright 1998 USB

17 17 Tomasulo Example Cycle 3 Note: registers names are removed (“renamed”) in Reservation Stations Load1 completing; what is waiting for Load1? Adapted from UCB CS252 S98, Copyright 1998 USB

18 18 Tomasulo Example Cycle 4 Load2 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB

19 19 Tomasulo Example Cycle 5 Adapted from UCB CS252 S98, Copyright 1998 USB

20 20 Tomasulo Example Cycle 6 Adapted from UCB CS252 S98, Copyright 1998 USB

21 21 Tomasulo Example Cycle 7 Add1 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB

22 22 Tomasulo Example Cycle 8 Adapted from UCB CS252 S98, Copyright 1998 USB

23 23 Tomasulo Example Cycle 9 Adapted from UCB CS252 S98, Copyright 1998 USB

24 24 Tomasulo Example Cycle 10 Add2 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB

25 25 Tomasulo Example Cycle 11 Write result of ADDD here vs. scoreboard? Adapted from UCB CS252 S98, Copyright 1998 USB

26 26 Tomasulo Example Cycle 12 Adapted from UCB CS252 S98, Copyright 1998 USB

27 27 Tomasulo Example Cycle 13 Adapted from UCB CS252 S98, Copyright 1998 USB

28 28 Tomasulo Example Cycle 14 Adapted from UCB CS252 S98, Copyright 1998 USB

29 29 Tomasulo Example Cycle 15 Mult1 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB

30 30 Tomasulo Example Cycle 16 Note: Just waiting for divide Adapted from UCB CS252 S98, Copyright 1998 USB

31 31 Tomasulo Example Cycle 55 Adapted from UCB CS252 S98, Copyright 1998 USB

32 32 Tomasulo Example Cycle 56 Mult 2 completing; what is waiting for it? Adapted from UCB CS252 S98, Copyright 1998 USB

33 33 Tomasulo Example Cycle 57 Again, in-oder issue, out-of-order execution, completion Adapted from UCB CS252 S98, Copyright 1998 USB

34 34 Scheduling Results Inst DispSchdEXEMEMCDB L.D 12-34 23-45 MULT 34-56-15-16 SUB.D 456-7-8 DIV.D 55-11 17-56 -57 ADD.D 67-89-10-11 It’s not perfect: lose one cycle for CDB broadcasting

35 35 Correctness of Tomasulo Each instruction receives the same operands as in the sequential execution Or, each instruction receives its operand from its parent in program order (data flow) How many possible ways of register data flow in Tomasulo?

36 36 Correctness of Tomasulo Any register or memory word receives the value of the last write as in the sequential execution  No stale writes How can Tomasulo prevent stale writes?

37 37 Register Renaming and Correctness 1. The same set of instructions write to user-visible register and memory; 2. Each instruction receives the same operands as in the sequential execution; and 3. Any register or memory word receives the value of the last write as in the sequential execution

38 38 Review Dependences How are dependences are enforced or removed in Tomasulo Algorithm? Data dependences (RAW) Must be enforced: timing and data flow Antidependence (WAR) Hazard to data flow, can be removed Output Dependence (WAW) Hazard of stale write, can be removed

39 39 Renaming in Tomasulo Tomasulo Example Cycle 4 Adapted from UCB CS252 S98, Copyright 1998 USB Can you see renamed instructions?

40 40 Modern Renaming: Use of Tag In Tomasulo, RS and load/store buffer index is used as tag. Renaming assign every new reg. value a unique tag RS stores tags to preserve dependences CDB broadcasts tag with data for data passing and wakeup Why tag is so chosen? What does tag really represent? What else can be used as tag?

41 41 Tomasulo Summary Reservations stations: Increases effective register number Distributes scheduling logic Register renaming: Avoids WAR and WAW dependence Tag + Data broadcasting for waking up child instructions Pros: can be effectively combined with speculative execution Cons: CDB broadcasting adds one-cycle delay (addressed in modern instruction scheduling) Adapted from UCB CS252 S98, Copyright 1998 USB


Download ppt "1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5."

Similar presentations


Ads by Google