Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining

Chap 6.2  Pipelining Overview  Pipelined Datapath  Pipelined Control  Hazards Structural Hazards Data Hazards Branch Hazards  Dynamic Scheduling  Examples of Pipelining  Summary Contents

Chap 6.3  The Five Classic Components of a Computer  Back to the datapath again to try to speed it up some more Single cycle datapath – great CPI but terrible cycle time (long critical path) Multiple cycle datapath – good cycle time but poor CPI Pipelining – get the best of both! Control Datapath Memory Processor Input Output The Big Picture: Where are We Now?

Chap 6.4  Sequential laundry takes 8 hours for 4 loads  If they learned pipelining, how long would laundry take? 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM Sequential Laundry

Chap 6.5  Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30 Pipelined Laundry: Start work ASAP

Chap 6.6  Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory  Reg/Dec: Register Fetch and Instruction Decode  Exec: Calculate the memory address  Mem: Read the data from the Data Memory  Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad One way to break up the datapath operations

Chap 6.7 I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg Apply pipelining

Chap 6.8 IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB Program Flow Time Conventional Pipelined Execution Representation

Chap 6.9  Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this? Pipelining

Chap 6.10 Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 Single Cycle, Multiple Cycle, vs. Pipeline

Chap 6.11  Suppose we execute 100 instructions  Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns  Multicycle Machine 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns  Ideal pipelined machine 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns How much improvement can pipelining give us ?

Chap 6.12  What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores  What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction  We’ll build a simple pipeline and look at these issues  We’ll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc. Pipelining

Chap 6.13  What do we need to add to actually split the datapath into stages? Basic Idea

Chap 6.14 Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? Pipelined Datapath

Chap 6.15 Corrected Datapath

Chap 6.16  Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths Graphically Representing Pipelines

Chap 6.17 Pipeline Control

Chap 6.18  We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back  How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? Pipeline control

Chap 6.19  Pass control signals along just like the data Pipeline Control

Chap 6.20 Datapath with Control

Chap 6.21  Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can ’ t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaluated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions  Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards Can pipelining get us into trouble?

Chap 6.22 Mem I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (right half highlight means read, left half write) Single Memory is a Structural Hazard

Chap 6.23  Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards Dependencies

Chap 6.24  Have compiler guarantee no hazards  Where do we insert the nops?? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2)  Problem: this really slows us down! Software Solution

Chap 6.25  Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding what if this $2 was $13? Forwarding

Chap 6.26 Forwarding

Chap 6.27  Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register.  Thus, we need a hazard detection unit to stall the load instruction Can't always forward

Chap 6.28  We can stall the pipeline by keeping an instruction in the same stage Stalling

Chap 6.29  Stall by letting an instruction that won’t write anything go forward Hazard Detection Unit

Chap 6.30  When we decide to branch, other instructions are in the pipeline!  We are predicting branch not taken need to add hardware for flushing instructions if we are wrong Control (Branch) Hazards

Chap 6.31 Flushing Instructions

Chap 6.32  Stall: wait until decision is clear Its possible to move up decision to 2nd stage by adding hardware to check registers as being read  Impact: 2 clock cycles per branch instruction => slow I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem Example: Control Hazard Solutions

Chap 6.33  Predict: guess one direction then back up if wrong Predict not taken  Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right 50% of time)  More dynamic scheme: history of 1 branch ( 90%) I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Example: Control Hazard Solutions

Chap 6.34  Redefine branch behavior (takes place after next instruction) “ delayed branch ”  Impact: 1 clock cycles per branch instruction if can find instruction to put in “ slot ” ( 50% of time) I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg Example: Control Hazard Solutions

Chap 6.35  The hardware performs the scheduling? hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction  All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology important Dynamic Scheduling

Chap 6.36  Pipelining is a fundamental concept multiple steps using distinct resources  Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards Summary

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

Similar presentations

Presentation on theme: "Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

Similar presentations

Presentation on theme: "Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining."— Presentation transcript:

Similar presentations

About project

Feedback