Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing a Pipelined CPU

Similar presentations


Presentation on theme: "Designing a Pipelined CPU"— Presentation transcript:

1 Designing a Pipelined CPU
Read 4.7 for next class Designing a Pipelined CPU Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

2 Review -- Single Cycle CPU

3 Review -- Multiple Cycle CPU

4 Review -- Instruction Latencies
Single-Cycle CPU Load Ifetch Reg/Dec Exec Mem Wr Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch Reg/Dec Exec Mem Wr Add Ifetch Reg/Dec Exec Wr

5 Instruction Latencies and Throughput
Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

6 Instruction Latencies and Throughput
Which of the statements below is true about a pipelined processor? Selection Statement A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases E None of the above Tails - flipped Correct Answer - C

7 Instruction Latencies and Throughput
Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Ifetch Reg/Dec Exec Mem Wr Load

8 Instruction Latencies and Throughput
Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load

9 Instruction Latencies and Throughput
Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load

10 Pipelining Advantages
PI throughput Vs. latency Higher maximum throughput Higher utilization of CPU resources But, more complicated datapath, more complex control(?)

11 Pipelining Throughput
PI Matching Peek Throughput 1. Longest instruction 2. Average instruction 3. Cycle time Selection SC MC Pipeline A 1 2 3 B C D E None of the above

12 A Pipelined Datapath IF: Instruction fetch
ID: Instruction decode and register fetch EX: Execution and effective address calculation MEM: Memory access WB: Write back

13 Idea for a Pipelined Datapath

14 Execution in a Pipelined Datapath
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 IF ID EX MEM WB IM Reg ALU DM lw IF ID EX MEM WB IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw

15 Execution in a Pipelined Datapath
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Make sure to point Out Steady State CPI = 1 IF ID EX MEM WB IM Reg ALU DM lw IF ID EX MEM WB IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw steady state

16 Reason (Choose BEST answer)
Pipeline Stages Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5? Selection Yes/No Reason (Choose BEST answer) A Yes Decreasing R-type to 4 cycles improves instruction throughput B Decreasing R-type to 4 cycles improves instruction latency C No Decreasing R-type to 4 cycles causes hazards D Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput E Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency Tails - flipped Correct Answer - C

17 Mixed Instructions in the Pipeline
CC1 CC2 CC3 CC4 CC5 CC6 IM Reg ALU DM lw add

18 Pipeline Principles All instructions that share a pipeline must have the same stages in the same order. therefore, add does nothing during Mem stage sw does nothing during WB stage All intermediate values must be latched each cycle. There is no functional block reuse IF ID EX MEM WB IM Reg ALU DM

19 Pipelined Datapath Instruction Fetch Instruction Decode/
Register Fetch Execute/ Address Calculation Memory Access Write Back registers!

20 The Pipeline in Execution
add $10, $1, $2 Instruction Decode/ Register Fetch Execute/ Address Calculation Memory Access Write Back Draw active datapath through example

21 The Pipeline in Execution
lw $12, 1000($4) add $10, $1, $2 Execute/ Address Calculation Memory Access Write Back

22 The Pipeline in Execution
sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back

23 The Pipeline in Execution
Instruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back

24 The Pipeline in Execution
Instruction Fetch Instruction Decode/ Register Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

25 The Pipeline in Execution
Instruction Fetch Instruction Decode/ Register Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Register is wrong Did someone spot the error??

26 The Pipeline in Execution
Instruction Fetch Instruction Decode/ Register Fetch Execute/ Address Calculation sub $15, $4, $1 lw $12, 1000($4)

27 The Pipeline, with controls But….

28 Pipeline Control X Y None of the above Selection SC MC Pipeline A B C
Control Logic X. Combinational Logic Y. FSM or Microprogram Selection SC MC Pipeline A X Y B C D E None of the above

29 Pipelined Control IF/ID ID/EX EX/MEM MEM/WB control instruction

30 Pipelined Control Signals

31 The Pipeline with Control Logic

32 Is it really this simple?

33 Neg edge! What just happened here which is problematic (BEST ANSWER)?
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 IM Reg ALU DM sub $2, $1, $3 IM Reg ALU DM and $12, $2, $5 IM Reg ALU DM or $13, $6, $2 IM Reg ALU DM add $14, $2, $2 ALU IM Reg DM sw $15, 100($2) What just happened here which is problematic (BEST ANSWER)? The register file is trying to read and write the same register The ALU and data memory are both active in the same cycle A value is used before it is produced Both A and B Both A and C Neg edge!

34 Data Hazards When a result is needed in the pipeline before it is available, a “data hazard” occurs. R2 Available CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 IM Reg ALU DM sub $2, $1, $3 IM Reg ALU DM and $12, $2, $5 IM Reg ALU DM or $13, $6, $2 R2 Needed IM Reg ALU DM add $14, $2, $2 Use red and black lines! ALU IM Reg DM sw $15, 100($2)

35 Hazards continued What happens when... Draw dependencies
add $3, $10, $11 lw $8, 1000($3) sub $11, $8, $7 Draw dependencies

36 The Pipeline in Execution
lw $8, 1000($3) add $3, $10, $11 Execute/ Address Calculation Memory Access Write Back

37 The Pipeline in Execution
sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back

38 The Pipeline in Execution
add $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back

39 Pipelining Key Points ET = IC * CPI * CT
We achieve high throughput without reducing instruction latency. Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles). Pipelining uses combinational logic to generate (and registers to propagate) control signals. Pipelining creates potential hazards.

40 Summary comparison (MIPS designs)
CPI CT Single Cycle Multi-cycle Pipeline


Download ppt "Designing a Pipelined CPU"

Similar presentations


Ads by Google