Download presentation
Presentation is loading. Please wait.
1
1 CSE 45432 SUNY New Paltz Chapter Six Enhancing Performance with Pipelining
2
2 CSE 45432 SUNY New Paltz Sequential Laundry Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM
3
3 CSE 45432 SUNY New Paltz Pipelined Laundry Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Stall for dependencies 6 PM 789 Time B C D A 30 TaskOrderTaskOrder
4
4 CSE 45432 SUNY New Paltz Single Stage VS. Pipeline Performance Ideal speedup is number of stages in the pipeline.
5
5 CSE 45432 SUNY New Paltz Pipelining What makes it easy in MIPS –all instructions are the same length –just a few instruction formats –memory operands appear only in loads and stores What makes it hard? –structural hazards: suppose we had only one memory –control hazards: need to worry about branch instructions –data hazards: an instruction depends on a previous instruction We’ll build a simple pipeline and look at these issues
6
6 CSE 45432 SUNY New Paltz The Five Stages of Load Ifetch: Instruction Fetch –Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad
7
7 CSE 45432 SUNY New Paltz Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2
8
8 CSE 45432 SUNY New Paltz Basic Idea xecute/ address calculation MEM: Memory accessWB: Write back
9
9 CSE 45432 SUNY New Paltz Pipelined Datapath Walk through lw instruction Walk through sw instruction The design
10
10 CSE 45432 SUNY New Paltz Corrected Datapath
11
11 CSE 45432 SUNY New Paltz Graphically Representing Pipelines Can help with answering questions like: –how many cycles does it take to execute this code? –what is the ALU doing during cycle 4? –use this representation to help understand datapaths
12
12 CSE 45432 SUNY New Paltz Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg
13
13 CSE 45432 SUNY New Paltz Pipeline Control
14
14 CSE 45432 SUNY New Paltz Pass control signals along just like the data Pipeline Control
15
15 CSE 45432 SUNY New Paltz Datapath with Control
16
16 CSE 45432 SUNY New Paltz Designing a Pipelined Processor Go back and examine your datapath and control diagram associated resources with states ensure that flows do not conflict, or figure out how to resolve assert control in appropriate stage
17
17 CSE 45432 SUNY New Paltz Can pipelining get us into trouble? Yes: Pipeline Hazards –structural hazards: attempt to use the same resource two different ways at the same time –data hazards: attempt to use item before it is ready instruction depends on result of prior instruction still in the pipeline –control hazards: attempt to make a decision before condition is evaluated branch instructions Can always resolve hazards by waiting –pipeline control must detect the hazard –take action (or delay action) to resolve hazards
18
18 CSE 45432 SUNY New Paltz Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards Data Hazards r $2: DM Reg Reg Reg Reg DM
19
19 CSE 45432 SUNY New Paltz Have compiler guarantee no hazards Where should compiler insert “nop” instructions? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: –It happens too often to rely on compiler –It really slows us down! Software Solution
20
20 CSE 45432 SUNY New Paltz Use temporary results, don’t wait for them to be written Also, write register file during 1st half of clock and read during 2nd half Data Hazard Solution: Forwarding what if this $2 was $13? register $2 : DMReg Reg Reg Reg XXX–20XXXXXValue of EX/MEM : XXXX–20XXXXValue of MEM/WB : DM
21
21 CSE 45432 SUNY New Paltz Steer the result from precious instruction to the ALU EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd = 0) and (EX /MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd = 0) and (EX /MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd = 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd = 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Hazard Conditions
22
22 CSE 45432 SUNY New Paltz Forwarding it IF/ID I n s t r u c t i o n M u x Rd EX/MEM.RegisterRd MEM/WB.RegisterRd Rt Rt Rs IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs 00 Register file 01 Mem. or earlier ALU 10 Prior ALU
23
23 CSE 45432 SUNY New Paltz lw can still cause a hazard: –an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the load instruction Can't always forward
24
24 CSE 45432 SUNY New Paltz Stalling We can stall the pipeline by keeping an instruction in the same stage Repeat in clock cycle 4 what they did in clock cycle 3
25
25 CSE 45432 SUNY New Paltz Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward controls writing of the PC and IF/ID plus MUX
26
26 CSE 45432 SUNY New Paltz When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” –need to add hardware for flushing instructions if we are wrong Branch Hazards Reg
27
27 CSE 45432 SUNY New Paltz Flushing Instructions Reduce branch delay
28
28 CSE 45432 SUNY New Paltz Improving Performance Superpipelining: ideal maximum speedup is related to number of stages Superscalar: start more than one instruction in the same cycle Dynamic pipeline scheduling –Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1)
29
29 CSE 45432 SUNY New Paltz Dynamic Scheduling The hardware performs the “scheduling” –hardware tries to find instructions to execute –out of order execution is possible –speculative execution and dynamic branch prediction All modern processors are very complicated –DEC Alpha 21264: 9 stage pipeline, 6 instruction issue –PowerPC and Pentium: branch history table –Compiler technology important
30
30 CSE 45432 SUNY New Paltz Dynamic Scheduling in PowerPC 604 and Pentium Pro Both In-order Issue, Out-of-order execution, In-order Commit Pentium Pro central reservation station for any functional units with one bus shared by a branch and an integer unit
31
31 CSE 45432 SUNY New Paltz Dynamic Scheduling in Pentium Pro PPro doesn’t pipeline 80x86 instructions PPro decode unit translates the Intel instructions into 72-bit micro-operations ( MIPS) Sends micro-operations to reorder buffer & reservation stations Takes 1 clock cycle to determine length of 80x86 instructions + 2 more to create the micro-operations Most instructions translate to 1 to 4 micro-operations Complex 80x86 instructions are executed by a conventional microprogram (8K x 72 bits) that issues long sequences of micro-operations
32
32 CSE 45432 SUNY New Paltz FYI: MIPS R3000 clocking discipline 2-phase non-overlapping clocks phi1 phi2 phi1 phi2 Edge-triggered
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.