Download presentation
Presentation is loading. Please wait.
1
1 Recap (Pipelining)
2
2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks
3
3 Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time from start to finish for one car. Throughput: Number of finished cars per time unit. 1 car/275 min = 0.218 cars/hour 275 minutes per car. Issues: How can we make the process better by adding? (smaller is better) (larger is better)
4
4 An Assembly line 11 1 11 22 2 22 33 3 33 44 4 44 6050 804045 First two stages can’t produce faster than one car/80 min or a backlog will occur at third stage. 80 Last two stages only receive one car/80 min to work on. 80 Latency: 400 min/car Throughput: 4 cars/640 min (1 car/160 min) time Will approach 1 car/80 min as time goes on
5
5 Pipelining a Digital System Key idea: break big computation up into pieces Separate each piece with a pipeline register 1ns200ps Pipeline Register
6
6 Pipelining a Digital System Why do this? Because it's faster for repeated computations 1ns Non-pipelined: 1 operation finishes every 1ns 200ps Pipelined: 1 operation finishes every 200ps
7
7 Comments about pipelining Pipelining increases throughput, but not latency –Answer available every 200ps, BUT –A single computation still takes 1ns Limitations: –Computations must be divisible into stages of equal sizes –Pipeline registers add overhead
8
8 Another Example Comb. Logic REGREG 30ns3ns Clock Delay = 33ns Throughput = 30MHz Time Unpipelined System Op1Op2Op3 ?? –One operation must complete before next can begin –Operations spaced 33ns apart
9
9 3 Stage Pipelining –Space operations 13ns apart –3 operations occur simultaneously REGREG Clock Comb. Logic REGREG Comb. Logic REGREG Comb. Logic 10ns3ns10ns3ns10ns3ns Delay = 39ns Throughput = 77MHz Time Op1 Op2 Op3 Op4
10
10 Limitation: Nonuniform Pipelining Clock REGREG Com. Log. REGREG Comb. Logic REGREG Comb. Logic 5ns3ns15ns3ns10ns3ns Delay = 18 * 3 = 54 ns Throughput = 55MHz Throughput limited by slowest stage Delay determined by clock period * number of stages Must attempt to balance stages
11
11 Limitation: Deep Pipelines Diminishing returns as add more pipeline stages Register delays become limiting factor Increased latency Small throughput gains More hazards Delay = 48ns, Throughput = 128MHz Clock REGREG Com. Log. 5ns3ns REGREG Com. Log. 5ns3ns REGREG Com. Log. 5ns3ns REGREG Com. Log. 5ns3ns REGREG Com. Log. 5ns3ns REGREG Com. Log. 5ns3ns
12
12 Pipelining MIPS Pipelining
13
13 MIPS 5-stage pipeline The MIPS processor needs 5 stages to execute instructions Pipelining stages: –IF - Instruction Fetch –ID - Instruction Decode –EX - Execute / Address Calculation –MEM - Memory Access (read / write) –WB - Write Back (results into register file) Not all instructions need all the stages (e.g., add instruction does not need the MEM stage)
14
14 Basic MIPS Pipelined Processor IF/ID Pipeline Registers ID/EXEX/MEMMEM/WB
15
15 Pipelined Example - Executing Multiple Instructions Consider the following instruction sequence: lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10
16
16 Executing Multiple Instructions Clock Cycle 1 LW
17
17 Executing Multiple Instructions Clock Cycle 2 LWSW
18
18 Executing Multiple Instructions Clock Cycle 3 LWSWADD
19
19 Executing Multiple Instructions Clock Cycle 4 LWSWADD SUB
20
20 Executing Multiple Instructions Clock Cycle 5 LWSWADDSUB
21
21 Executing Multiple Instructions Clock Cycle 6 SWADDSUB
22
22 Executing Multiple Instructions Clock Cycle 7 ADD SUB
23
23 Executing Multiple Instructions Clock Cycle 8 SUB
24
24 Alternative View - Multicycle Diagram
25
25 Processor Pipelining There are two ways that pipelining can help: 1.Reduce the clock cycle time, and keep the same CPI 2.Reduce the CPI, and keep the same clock cycle time CPU time = Instruction count * CPI * Clock cycle time
26
26 Reduce the clock cycle time, and keep the same CPI CPI = 1 Clock = X Hz
27
27 Reduce the clock cycle time, and keep the same CPI Pipeline Registers 55 16 RD1 RD2 RN1RN2WN WD Register FileALU E X T N D 1632 RD WD Data Memory ADDR 5 Instruction I 32 M U X <<2 RD Instruction Memory ADDR PC 4 ADD M U X 32 CPI = 1 Clock = X*5 Hz
28
28 Reduce the CPI, and keep the same cycle time CPI = 5 Clock = X*5 Hz
29
29 Reduce the CPI, and keep the same cycle time Pipeline Registers 55 16 RD1 RD2 RN1RN2WN WD Register FileALU E X T N D 1632 RD WD Data Memory ADDR 5 Instruction I 32 M U X <<2 RD Instruction Memory ADDR PC 4 ADD M U X 32 CPI = 1 Clock = X*5 Hz
30
30 Pipeline performance Ideally we get a speedup (by reducing clock cycle or reducing the CPI) equal to the number of stages. In practice, we do not achieve that – but we get close: –Pipelining has additional overhead (e.g., pipeline registers) –Pipeline hazards
31
31 Pipeline Hazards Hazards are situations in pipelining which prevent the next instruction in the instruction stream from executing during the designated clock cycle. Hazards reduce the ideal speedup gained from pipelining (e.g., CPI =1) and are classified into three classes: –Structural hazards – Data hazards –Control hazards
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.