Download presentation
Presentation is loading. Please wait.
Published byAshlee Elfreda Carr Modified over 9 years ago
1
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining
2
Chap 6.2 Pipelining Overview Pipelined Datapath Pipelined Control Hazards Structural Hazards Data Hazards Branch Hazards Dynamic Scheduling Examples of Pipelining Summary Contents
3
Chap 6.3 The Five Classic Components of a Computer Back to the datapath again to try to speed it up some more Single cycle datapath – great CPI but terrible cycle time (long critical path) Multiple cycle datapath – good cycle time but poor CPI Pipelining – get the best of both! Control Datapath Memory Processor Input Output The Big Picture: Where are We Now?
4
Chap 6.4 Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM Sequential Laundry
5
Chap 6.5 Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30 Pipelined Laundry: Start work ASAP
6
Chap 6.6 Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Register Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad One way to break up the datapath operations
7
Chap 6.7 I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg Apply pipelining
8
Chap 6.8 IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB IFetchRegExecMemWB Program Flow Time Conventional Pipelined Execution Representation
9
Chap 6.9 Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this? Pipelining
10
Chap 6.10 Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 Single Cycle, Multiple Cycle, vs. Pipeline
11
Chap 6.11 Suppose we execute 100 instructions Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multicycle Machine 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns Ideal pipelined machine 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns How much improvement can pipelining give us ?
12
Chap 6.12 What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction We’ll build a simple pipeline and look at these issues We’ll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc. Pipelining
13
Chap 6.13 What do we need to add to actually split the datapath into stages? Basic Idea
14
Chap 6.14 Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? Pipelined Datapath
15
Chap 6.15 Corrected Datapath
16
Chap 6.16 Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths Graphically Representing Pipelines
17
Chap 6.17 Pipeline Control
18
Chap 6.18 We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? Pipeline control
19
Chap 6.19 Pass control signals along just like the data Pipeline Control
20
Chap 6.20 Datapath with Control
21
Chap 6.21 Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can ’ t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaluated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards Can pipelining get us into trouble?
22
Chap 6.22 Mem I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (right half highlight means read, left half write) Single Memory is a Structural Hazard
23
Chap 6.23 Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards Dependencies
24
Chap 6.24 Have compiler guarantee no hazards Where do we insert the nops?? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution
25
Chap 6.25 Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding what if this $2 was $13? Forwarding
26
Chap 6.26 Forwarding
27
Chap 6.27 Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to stall the load instruction Can't always forward
28
Chap 6.28 We can stall the pipeline by keeping an instruction in the same stage Stalling
29
Chap 6.29 Stall by letting an instruction that won’t write anything go forward Hazard Detection Unit
30
Chap 6.30 When we decide to branch, other instructions are in the pipeline! We are predicting branch not taken need to add hardware for flushing instructions if we are wrong Control (Branch) Hazards
31
Chap 6.31 Flushing Instructions
32
Chap 6.32 Stall: wait until decision is clear Its possible to move up decision to 2nd stage by adding hardware to check registers as being read Impact: 2 clock cycles per branch instruction => slow I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem Example: Control Hazard Solutions
33
Chap 6.33 Predict: guess one direction then back up if wrong Predict not taken Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right 50% of time) More dynamic scheme: history of 1 branch ( 90%) I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Example: Control Hazard Solutions
34
Chap 6.34 Redefine branch behavior (takes place after next instruction) “ delayed branch ” Impact: 1 clock cycles per branch instruction if can find instruction to put in “ slot ” ( 50% of time) I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg Example: Control Hazard Solutions
35
Chap 6.35 The hardware performs the scheduling? hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology important Dynamic Scheduling
36
Chap 6.36 Pipelining is a fundamental concept multiple steps using distinct resources Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards Summary
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.