Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.

Similar presentations


Presentation on theme: "CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane."— Presentation transcript:

1 CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~mji www.cse.psu.edu/~cg431 [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 and Superscalar Microprocessor Design, Johnson, © 1992]

2 CSE431 L13 SS Execute & Commit.2Irwin, PSU, 2005 SS Pipeline Fetch multiple instructions Decode instructions 1 st RUU function (RUU allocation, src operand copying (or Tag field set), dst Tag field set) 3 nd RUU function (when both src operands Ready and FU free, schedule Result Bus and issue instr for execution) 2 nd RUU function (copy Result Bus data to matching src’s and to RUU dst entry) 4 th RUU function (for instr at RUU_Head (if executed), write dst Contents to RegFile) FETCHDECODE & DISPATCH ISSUE & EXECUTE WRITE BACK RESULT COMMIT In Order Out of Order In Order

3 CSE431 L13 SS Execute & Commit.3Irwin, PSU, 2005 Result Buses Utilization  Our SS model has only one Result Bus to carry results generated by the FU’s to the RUU and LSQ l Even at the high levels of performance, the utilization of the Result Bus is only about 70% (i.e., the fraction of capacity actually used)  If a FU requests for the Result Bus is not granted, instruction issue to that FU is stalled until the bus request can be granted (i.e., the FU remains “busy”) Avg # of results waiting for the Result Bus # of Result Buses From Johnson, 1992

4 CSE431 L13 SS Execute & Commit.4Irwin, PSU, 2005 Arbitrating For Result Buses  And since there are usually fewer Result Buses than FUs, the FUs must continue to arbitrate for use of the existing Result Buses l The arbiter not only decides which FU is granted use of the Result Buses, but also which of the two buses is to be used -Prioritizing old requests over new helps prevent starvation  Reducing the impact of bus contention by adding a second Result Bus improves performance (by almost 19%)  But adding a third Results Buses yields only a very small improvement in performance (less than 3%) From Johnson, 1992

5 CSE431 L13 SS Execute & Commit.5Irwin, PSU, 2005 Result Forwarding  Result forwarding supplies operands directly to the waiting instr’s in the RUU to resolve true dependencies that could not be resolved during decode l The cost of forwarding is the comparison logic in the RUU to compare the Result Bus Tag to the source operand Tags l Need a set of comparators for each Result Bus Fraction of Results # of RUU Entries Receiving Result as Input Operand  About 2/3 rd of all results are forwarded to one waiting operand, and about 1/6 th are forwarded to more than one From Johnson, 1992

6 CSE431 L13 SS Execute & Commit.6Irwin, PSU, 2005  Add a slide on impact of different number of FUs and their type?  Add a slide on impact of being able to commit more than one instruction at a time? (If any, probably small?)

7 CSE431 L13 SS Execute & Commit.7Irwin, PSU, 2005 Performance Advantages  Hardware complexity arises from four major hardware features l Out-of-order issue l Register renaming l Branch prediction l 4-way instruction fetch and decode OOIRegister Renaming Branch Prediction 4-way Fetch & Decode 52%36%30%18% From Johnson, 1992

8 CSE431 L13 SS Execute & Commit.8Irwin, PSU, 2005 ILP in a Perfect OOI-OOC Processor  The perfect processor has l An infinite number of rename registers that eliminates all storage hazards (i.e., write-before-write and write-before-read) l No (fetch, decode, dispatch, issue, FU, buses, ports) limit on the number of instr’s that can begin execution simultaneously as long as read-before-write true data hazards are not present l Perfect branch and jump (including jump register) prediction l Loads can be moved before stores (as long as the addresses are not identical) with memory address analysis l All FU’s have a 1 cycle latency l Perfect caches with 1 cycle latency From H&P, 2003

9 CSE431 L13 SS Execute & Commit.9Irwin, PSU, 2005 Effect of Instruction Window Size on ILP  Instruction window – the set of instructions that are examined simultaneously for execution From H&P, 2003

10 CSE431 L13 SS Execute & Commit.10Irwin, PSU, 2005 Effect of Realistic Branch Prediction on ILP  On a processor with an instruction window size of 2K and maximum 64-way issue capability From H&P, 2003

11 CSE431 L13 SS Execute & Commit.11Irwin, PSU, 2005 Effect of Finite Rename Registers  On a processor with an instruction window size of 2K, maximum 64-way issue capability, and a tournament branch predictor with 8K entries From H&P, 2003

12 CSE431 L13 SS Execute & Commit.12Irwin, PSU, 2005 A SS Example  Intel Pentium 4 (IA-32 ISA) l Decodes the IA-32 instructions into microoperations l Does register renaming with a RUU-like structure l Has a 20 stage pipeline T$ access (Bpredict) RUU allocation Instr dispatch RegFile access ExecutionCommit # cycles545213 RUU queue FU queues µop queue l 7 FUs: 2 integer ALUs, 1 FP ALU, 1FP move, load, store, complex l Up to 126 instructions in flight, including 48 loads and 24 stores l 4K entry branch predictor

13 CSE431 L13 SS Execute & Commit.13Irwin, PSU, 2005 Another SS Example  IBM Power4, 970 (64-bit data width) l Fetch 8 instr. per cycle, dispatch up to five, issue (execute) up to 8, commit up to 5 – 5-way superscalar – more than 200 instructions “in flight” at any given time (5x20 in the “dispatch groups”, the rest in fetch, decode, and commit) l 7 FUs: 2 integer ALUs, 2 FP ALUs, 2 load/store, vector engine l Varying pipeline depth – !9! fetch and decode stages + 16 stages for integer, 17 stages for load/store, 21 stages for floating point, and up to 25 stages for vector

14 CSE431 L13 SS Execute & Commit.14Irwin, PSU, 2005 Next Lecture and Reminders  Next lecture l Exam review  Next lecture after exam -Reading assignment – PH 6.9  Reminders l HW3, Part 2 (the SimpleScalar experiments) due Friday, October 21 th (email solution to the TA, rdas@cse.psu.edu) by 5:00pmrdas@cse.psu.edu l Evening midterm exam scheduled (Tuesday night) -Tuesday, October 18 th, 20:15 to 22:15, Location 113 IST -One student is taking the conflict exam – you know who you are and when it is scheduled l Final exam (tentatively) schedule -Tuesday, December 13 th, 2:30 to 4:20, Location TBD


Download ppt "CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane."

Similar presentations


Ads by Google