Download presentation
Presentation is loading. Please wait.
Published byBenedict Reynolds Modified over 9 years ago
1
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html
2
ECE 232 L18.Pipeline.2 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Outline °Review Microprogramming °Pipelining, basic concept °Hazards Structural Data hazards Control hazards
3
ECE 232 L18.Pipeline.3 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelining is Natural! Laundry Example °Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold °Washer takes 30 minutes °Dryer takes 40 minutes °“Folder” takes 20 minutes ABCD
4
ECE 232 L18.Pipeline.4 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Sequential Laundry °Sequential laundry takes 6 hours for 4 loads °If they learned pipelining, how long would laundry take? A B C D TaskOrderTaskOrder 304020304020304020304020 6 PM 789 10 11 Midnight Time
5
ECE 232 L18.Pipeline.5 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelined Laundry: Start work ASAP °Pipelined laundry takes 3.5 hours for 4 loads ABCD TaskOrderTaskOrder 6 PM 789 10 11 Midnight Time 3040 20
6
ECE 232 L18.Pipeline.6 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Pipeline rate limited by slowest pipeline stage °Multiple tasks operating simultaneously using different resources °Potential speedup = Number pipe stages °Unbalanced lengths of pipe stages reduces speedup °Time to “fill” pipeline and time to “drain” it reduces speedup °Stall for Dependences ABCD TaskOrderTaskOrder 6 PM 789 Time 3040 20
7
ECE 232 L18.Pipeline.7 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers The Five Stages of Load °IFetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Read the data from the Data Memory °Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad
8
ECE 232 L18.Pipeline.8 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Pipelined Execution °Utilization? °Now we just have to make it work IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Program Flow Time
9
ECE 232 L18.Pipeline.9 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation : IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation : IfetchRegExecMemWrStore Clk Single Cycle Implementation : LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2
10
ECE 232 L18.Pipeline.10 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Why Pipeline? °Suppose we execute 100 instructions °Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns °Multicycle Machine 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns °Ideal pipelined machine 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain ) = 1040 ns
11
ECE 232 L18.Pipeline.11 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Why Pipeline? Because the resources are there! Time (clock cycles) I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg
12
ECE 232 L18.Pipeline.12 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards
13
ECE 232 L18.Pipeline.13 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Single Memory is a Structural Hazard Time (clock cycles) Mem I n s t r. O r d e r Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (right half highlight means read, left half write)
14
ECE 232 L18.Pipeline.14 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers °Stall: wait until decision is clear It is possible to move up decision to 2nd stage by adding hardware to check registers as being read °Impact: 2 clock cycles per branch instruction => slow Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg Mem
15
ECE 232 L18.Pipeline.15 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers °Predict: guess one direction then back up if wrong Predict not taken °Impact: 1 clock cycles per branch instruction if right, 2 if wrong (right 50% of time) °More dynamic scheme: history of 1 branch ( 90%) Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg
16
ECE 232 L18.Pipeline.16 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers °Redefine branch behavior (takes place after next instruction) “delayed branch” °Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) °As launch more instruction per clock cycle, less useful Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Load Mem ALU Reg MemReg
17
ECE 232 L18.Pipeline.17 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Dependencies backwards in time are hazards Data Hazard on r1: I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
18
ECE 232 L18.Pipeline.18 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers “Forward” result from one stage to another “or” OK if define read/write properly Data Hazard Solution: I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 ALU Im Reg IFIF ID/R F EXEX ME M WBWB Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
19
ECE 232 L18.Pipeline.19 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) IFIF ID/R F EXEX ME M WBWB lw r1,0(r2) sub r4,r1,r3 ALU Im Reg Dm Reg ALU Im Reg DmReg
20
ECE 232 L18.Pipeline.20 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Designing a Pipelined Processor °Go back and examine your datapath and control diagrams °Associated resources with states °Ensure that flows do not conflict, or figure out how to resolve °Assert control in appropriate stage
21
ECE 232 L18.Pipeline.21 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Summary °Microprogramming is a fundamental concept implement an instruction set by building a very simple processor and interpreting the instructions essential for very complex instructions and when few register transfers are possible °Pipelining is a fundamental concept multiple steps using distinct resources °Utilize capabilities of the Datapath by pipelined instruction processing start next instruction while working on the current one limited by length of longest stage (plus fill/flush) detect and resolve hazards
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.