Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

Similar presentations


Presentation on theme: "Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining."— Presentation transcript:

1 Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining

2 Chap 6.2  Pipelining Overview  Pipelined Datapath  Pipelined Control  Hazards Structural Hazards Data Hazards Branch Hazards  Dynamic Scheduling  Examples of Pipelining  Summary Contents

3 Chap 6.3  The Five Classic Components of a Computer  Back to the datapath again to try to speed it up some more Single cycle datapath – great CPI but terrible cycle time (long critical path) Multiple cycle datapath – good cycle time but poor CPI Pipelining – get the best of both! Control Datapath Memory Processor Input Output The Big Picture: Where are We Now?

4 Chap 6.4  Sequential laundry takes 8 hours for 4 loads  If they learned pipelining, how long would laundry take? 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM Sequential Laundry

5 Chap 6.5  Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30 Pipelined Laundry: Start work ASAP

6 Chap 6.6  Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory  Reg/Dec: Register Fetch and Instruction Decode  Exec: Calculate the memory address  Mem: Read the data from the Data Memory  Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad One way to break up the datapath operations

7 Chap 6.7 I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg Apply pipelining

8 Chap 6.8 IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB Program Flow Time Conventional Pipelined Execution Representation

9 Chap 6.9  Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this? Pipelining

10 Chap 6.10 Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 Single Cycle, Multiple Cycle, vs. Pipeline

11 Chap 6.11  Suppose we execute 100 instructions  Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns  Multicycle Machine 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns  Ideal pipelined machine 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns How much improvement can pipelining give us ?

12 Chap 6.12  What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores  What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction  We’ll build a simple pipeline and look at these issues  We’ll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc. Pipelining

13 Chap 6.13  What do we need to add to actually split the datapath into stages? Basic Idea

14 Chap 6.14 Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? Pipelined Datapath

15 Chap 6.15 Corrected Datapath

16 Chap 6.16  Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths Graphically Representing Pipelines

17 Chap 6.17 Pipeline Control

18 Chap 6.18  We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back  How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? Pipeline control

19 Chap 6.19  Pass control signals along just like the data Pipeline Control

20 Chap 6.20 Datapath with Control

21 Chap 6.21  Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can ’ t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaluated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions  Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards Can pipelining get us into trouble?

22 Chap 6.22 Mem I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (right half highlight means read, left half write) Single Memory is a Structural Hazard

23 Chap 6.23  Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards Dependencies

24 Chap 6.24  Have compiler guarantee no hazards  Where do we insert the nops?? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2)  Problem: this really slows us down! Software Solution

25 Chap 6.25  Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding what if this $2 was $13? Forwarding

26 Chap 6.26 Forwarding

27 Chap 6.27  Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register.  Thus, we need a hazard detection unit to stall the load instruction Can't always forward

28 Chap 6.28  We can stall the pipeline by keeping an instruction in the same stage Stalling

29 Chap 6.29  Stall by letting an instruction that won’t write anything go forward Hazard Detection Unit

30 Chap 6.30  When we decide to branch, other instructions are in the pipeline!  We are predicting branch not taken need to add hardware for flushing instructions if we are wrong Control (Branch) Hazards

31 Chap 6.31 Flushing Instructions

32 Chap 6.32  Stall: wait until decision is clear Its possible to move up decision to 2nd stage by adding hardware to check registers as being read  Impact: 2 clock cycles per branch instruction => slow I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem Example: Control Hazard Solutions

33 Chap 6.33  Predict: guess one direction then back up if wrong Predict not taken  Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right ­ 50% of time)  More dynamic scheme: history of 1 branch ( ­ 90%) I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Example: Control Hazard Solutions

34 Chap 6.34  Redefine branch behavior (takes place after next instruction) “ delayed branch ”  Impact: 1 clock cycles per branch instruction if can find instruction to put in “ slot ” ( ­ 50% of time) I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg Example: Control Hazard Solutions

35 Chap 6.35  The hardware performs the scheduling? hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction  All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology important Dynamic Scheduling

36 Chap 6.36  Pipelining is a fundamental concept multiple steps using distinct resources  Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards Summary


Download ppt "Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining."

Similar presentations


Ads by Google