Designing a Pipelined CPU

Designing a Pipelined CPU
Read 4.7 for next class Designing a Pipelined CPU Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Review -- Single Cycle CPU

Review -- Multiple Cycle CPU

Review -- Instruction Latencies
Single-Cycle CPU Load Ifetch Reg/Dec Exec Mem Wr Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Ifetch Reg/Dec Exec Mem Wr Add Ifetch Reg/Dec Exec Wr

Instruction Latencies and Throughput
Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Which of the statements below is true about a pipelined processor? Selection Statement A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases E None of the above Tails - flipped Correct Answer - C

Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Ifetch Reg/Dec Exec Mem Wr Load

Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load

Single-Cycle CPU Ifetch Reg/Dec Exec Mem Wr Load Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load

Pipelining Advantages
PI throughput Vs. latency Higher maximum throughput Higher utilization of CPU resources But, more complicated datapath, more complex control(?)

Pipelining Throughput
PI Matching Peek Throughput 1. Longest instruction 2. Average instruction 3. Cycle time Selection SC MC Pipeline A 1 2 3 B C D E None of the above

A Pipelined Datapath IF: Instruction fetch
ID: Instruction decode and register fetch EX: Execution and effective address calculation MEM: Memory access WB: Write back

Idea for a Pipelined Datapath

Execution in a Pipelined Datapath
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 IF ID EX MEM WB IM Reg ALU DM lw IF ID EX MEM WB IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw

Execution in a Pipelined Datapath
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Make sure to point Out Steady State CPI = 1 IF ID EX MEM WB IM Reg ALU DM lw IF ID EX MEM WB IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw IM Reg ALU DM lw steady state

Reason (Choose BEST answer)
Pipeline Stages Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5? Selection Yes/No Reason (Choose BEST answer) A Yes Decreasing R-type to 4 cycles improves instruction throughput B Decreasing R-type to 4 cycles improves instruction latency C No Decreasing R-type to 4 cycles causes hazards D Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput E Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency Tails - flipped Correct Answer - C

Mixed Instructions in the Pipeline
CC1 CC2 CC3 CC4 CC5 CC6 IM Reg ALU DM lw add

Pipeline Principles All instructions that share a pipeline must have the same stages in the same order. therefore, add does nothing during Mem stage sw does nothing during WB stage All intermediate values must be latched each cycle. There is no functional block reuse IF ID EX MEM WB IM Reg ALU DM

Pipelined Datapath Instruction Fetch Instruction Decode/
Register Fetch Execute/ Address Calculation Memory Access Write Back registers!

The Pipeline in Execution
add $10, $1, $2 Instruction Decode/ Register Fetch Execute/ Address Calculation Memory Access Write Back Draw active datapath through example

lw $12, 1000($4) add $10, $1, $2 Execute/ Address Calculation Memory Access Write Back

sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back

Instruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back

Instruction Fetch Instruction Decode/ Register Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

Instruction Fetch Instruction Decode/ Register Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Register is wrong Did someone spot the error??

Instruction Fetch Instruction Decode/ Register Fetch Execute/ Address Calculation sub $15, $4, $1 lw $12, 1000($4)

The Pipeline, with controls But….

Pipeline Control X Y None of the above Selection SC MC Pipeline A B C
Control Logic X. Combinational Logic Y. FSM or Microprogram Selection SC MC Pipeline A X Y B C D E None of the above

Pipelined Control IF/ID ID/EX EX/MEM MEM/WB control instruction

Pipelined Control Signals

The Pipeline with Control Logic

Is it really this simple?

Neg edge! What just happened here which is problematic (BEST ANSWER)?
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 IM Reg ALU DM sub $2, $1, $3 IM Reg ALU DM and $12, $2, $5 IM Reg ALU DM or $13, $6, $2 IM Reg ALU DM add $14, $2, $2 ALU IM Reg DM sw $15, 100($2) What just happened here which is problematic (BEST ANSWER)? The register file is trying to read and write the same register The ALU and data memory are both active in the same cycle A value is used before it is produced Both A and B Both A and C Neg edge!

Data Hazards When a result is needed in the pipeline before it is available, a “data hazard” occurs. R2 Available CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 IM Reg ALU DM sub $2, $1, $3 IM Reg ALU DM and $12, $2, $5 IM Reg ALU DM or $13, $6, $2 R2 Needed IM Reg ALU DM add $14, $2, $2 Use red and black lines! ALU IM Reg DM sw $15, 100($2)

Hazards continued What happens when... Draw dependencies
add $3, $10, $11 lw $8, 1000($3) sub $11, $8, $7 Draw dependencies

lw $8, 1000($3) add $3, $10, $11 Execute/ Address Calculation Memory Access Write Back

sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back

add $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back

Pipelining Key Points ET = IC * CPI * CT
We achieve high throughput without reducing instruction latency. Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles). Pipelining uses combinational logic to generate (and registers to propagate) control signals. Pipelining creates potential hazards.

Summary comparison (MIPS designs)
CPI CT Single Cycle Multi-cycle Pipeline

Designing a Pipelined CPU

Similar presentations

Presentation on theme: "Designing a Pipelined CPU"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Designing a Pipelined CPU

Similar presentations

Presentation on theme: "Designing a Pipelined CPU"— Presentation transcript:

Similar presentations

About project

Feedback