Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

Slides:

Advertisements

Similar presentations

Pipelined Processor II (cont’d) CPSC 321

Advertisements

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.

Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Chapter Six Enhancing Performance with Pipelining

1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

Computing Systems Pipelining: enhancing performance.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Exceptions and Interrupts “Unexpected” events requiring change in flow of control – Different ISAs use the terms differently Exception – Arises within.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Chapter 7 :: Microarchitecture

Computer Organization

Exceptions Another form of control hazard Could be caused by…

Computer Organization CS224

Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception.

Stalling delays the entire pipeline

CDA 3101 Spring 2016 Introduction to Computer Organization

Morgan Kaufmann Publishers

Single Clock Datapath With Control

Pipeline Implementation (4.6)

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Pipelining: Advanced ILP

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

The processor: Exceptions and Interrupts

Morgan Kaufmann Publishers The Processor

Single-cycle datapath, slightly rearranged

The processor: Pipelining and Branching

Pipelining in more detail

Lecture 5. MIPS Processor Design

Pipeline control unit (highly abstracted)

The Processor Lecture 3.6: Control Hazards

Control unit extension for data hazards

November 5 No exam results today. 9 Classes to go!

Instruction Execution Cycle

Pipeline control unit (highly abstracted)

CS203 – Advanced Computer Architecture

CSC3050 – Computer Architecture

Pipeline Control unit (highly abstracted)

Control unit extension for data hazards

Introduction to Computer Organization and Architecture

Control unit extension for data hazards

Pipelined Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Pipelining - 1.

Spring 2010 Ilam University

Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

©2003 Craig Zilles (derived from slides by Howard Huang)

Pipelined datapath and control

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

Presentation transcript:

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 2010 R&E Computer System Education & Research Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 Prof. Taeweon Suh Computer Science Education Korea University

Pipelined Datapath

Example for lw instruction: Instruction Fetch (IF) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Example for lw instruction: Instruction Decode (ID) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Example for lw instruction: Execution (EX) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Example for lw instruction: Memory (MEM) d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Example for lw instruction: Writeback (WB) y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Example for sw instruction: Memory (MEM) d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Example for sw instruction: Writeback (WB): do nothing y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Corrected Datapath (for lw) I n s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z

Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)

Pipeline Control Note that in this implementation, branch instruction decides whether to branch in the MEM stage

Pipeline Control We have 5 stages IF, ID, EX, MEM, WB What needs to be controlled in each stage? Instruction fetch and PC increment Instruction decode / operand fetch Execution stage RegDst ALUop[1:0] ALUSrc Memory stage Branch MemRead MemWrite Writeback MemtoReg RegWrite (note that this signal is in ID stage)

Pipeline Control Extend pipeline registers to include control information (created in ID) Pass control signals along just like the data

Datapath with Control

Datapath with Control IF: lw $10, 9($1) P C I n s t r u c i o m e y A [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f E X F / IF: lw $10, 9($1)

Datapath with Control IF: sub $11, $2, $3 ID: lw $10, 9($1) “lw” 11 m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E IF: sub $11, $2, $3 ID: lw $10, 9($1) 11 010 0001 “lw”

Datapath with Control ID: sub $11, $2, $3 EX: lw $10, 9($1) m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 11 010 00 ID: sub $11, $2, $3 EX: lw $10, 9($1) IF: and $12, $4, $5 10 000 1100 “sub”

Datapath with Control EX: sub $11, $2, $3 MEM: lw $10, 9($1) y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 EX: sub $11, $2, $3 MEM: lw $10, 9($1) ID: and $12, $4, $5 1100 IF: or $13, $6, $7 11 “and”

Datapath with Control MEM: sub $11, .. WB: lw $10, 9($1) y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 MEM: sub $11, .. WB: lw $10, 9($1) EX: and $12, $4, $5 1100 ID: or $13, $6, $7 “or” IF: add $14, $8, $9

Datapath with Control WB: sub $11, .. MEM: and $12… EX: or $13, $6, $7 y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f X F / E 10 000 WB: sub $11, .. MEM: and $12… 1100 EX: or $13, $6, $7 “add” ID: add $14, $8, $9 IF: xxxx

Datapath with Control WB: and $12… MEM: or $13, .. EX: add $14, $8, $9 s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X 10 000 WB: and $12… MEM: or $13, .. EX: add $14, $8, $9 IF: xxxx ID: xxxx

Datapath with Control MEM: add $14, .. EX: xxxx IF: xxxx ID: xxxx s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X MEM: add $14, .. 10 EX: xxxx IF: xxxx ID: xxxx WB: or $13…

Datapath with Control WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx s t r u c i o m e y A d [ 2 – 1 6 ] M R g L U O p B a h D S 4 3 5 x l W Z f F / E X WB: add $14.. MEM: xxxx EX: xxxx IF: xxxx ID: xxxx

Dependencies Dependencies Problem with starting (or executing) next instruction before first is finished Dependencies incur data and control hazards

Data Hazard - Software Solution Dependencies that “go backward in time” Have compiler guarantee no hazards? Insert nop (no operation) instructions (“0x00000000” is nop in MIPS) Code scheduling Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem? This really slows us down!

Data Hazard - Pipeline Stalls? bubble I M R e g s u b $ 2 , 1 3 a n d 5 o r 6 4 w ( ) D stall

Data Hazard - Forwarding Use temporary results, don’t wait for them to be written Register file forwarding to handle read/write to same register ALU forwarding Ok.. Then, do we have to do this forwarding? If you are asked to design CPU using only rising-edge of the clock, then? Let’s stick to this for our project If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this

Forwarding (simplified) ID/EX EX/MEM MEM/WB Register File ALU Data Memory MUX

Forwarding (from EX/MEM) ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX

Forwarding (from MEM/WB) ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX

Forwarding (operand selection) ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX Forwarding Unit

Forwarding (operand propagation) ALU Data Memory Register File MUX ID/EX EX/MEM MEM/WB Forwarding Unit Rt Rs Rd EX/MEM Rd MEM/WB Rd

Forwarding P C I n s t r u c i o m e y R g M x l A L U E X W B D / a F .

Can't always forward lw (load word) can still cause a hazard An instruction tries to read a register following a load instruction that writes to the same register Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction

Stalling We can stall the pipeline by keeping an instruction in the same stage ID ID IF IF

Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt)

Control Hazards - Branch When we decide to branch, other instructions are in the pipeline! Assume: branch is not taken When this assumption failed, flush 3 instructions We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong

Alleviate Branch Hazards Move branch compare to ID stage of the pipeline Add adder to calculate branch target in ID stage Add IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register Reduce penalty to 1 cycle Actual condition is generated here Taken target address is known here IF ID MEM WB EX beq $1,$2,L1 IF ID MEM WB EX Bubblee add $1,$2,$3 … IF ID MEM WB EX L1: sub $1,$2, $3

Flushing Instructions P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2

Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) and $12, $2, $5 beq $1, $3, L2 I F . F l u s h H a z a r d d e t e c t i o n u n i t M I D / E X u x W B E X / M E M M C o n t r o l u M W B x M E M / W B I F / I D E X M W B 4 S h i f t l e f t 2 M u = x R e g i s t e r s P C I n s t r u c t i o n D a t a A L U m e m o r y m e m o r y M u M x u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 and $12, $2, $5 beq $1, $3, L2 L2

Flushing Instructions (cycle N+1) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2 nop beq $1, $3, L2 lw $4, 40($7)

Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” The next instruction after a branch is always executed Rely on compiler to “fill” the slot with something useful Superscalar Start more than one instruction in the same cycle Most all processors are now pipelined and Superscalar

Dynamic Scheduling The hardware performs the “scheduling” Hardware tries to find instructions to execute Out of order (OOO) execution is possible Speculative execution and dynamic branch prediction All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction issue PowerPC and Pentium: branch history table Compiler technology is important This class has given you the background you need to learn more

Exceptions & Interrupts CPU has to prepare for all possible situations it could face “Unexpected” events require change in flow of control Exceptions arise within the CPU Undefined opcode Arithmetic overflow in MIPS Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU Interrupts are from external I/O devices Keyboard, Mouse, Network card etc Many architectures and authors do not distinguish between interrupts and exceptions Often use the term “interrupt” to refer to both types of events

Pipelined Performance Example Ideally CPI = 1 But, need to handle stalling (cause by loads and branches) SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type Suppose 40% of loads are used by next instruction 25% of branches are mispredicted What is the average CPI?

Pipelined Performance Example SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type If there is no stall in the pipelined MIPS, how would you calculate CPI? Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 Suppose 40% of loads are used by next instruction 25% of branches are mispredicted All jumps flush next instruction What is the average CPI? Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus CPIlw = 1 (0.6) + 2 (0.4) = 1.4 CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 CPIjump = 2 (1) = 2 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15

Pipelined Performance Critical path of the pipelined MIPS processor: Tc = max { tpcq + tmem + tsetup , // IF stage 2(tRFread + tmux + teq + tAND + tmux + tsetup ) , // ID stage tpcq + tmux + tmux + tALU + tsetup , // EX stage tpcq + tmemwrite + tsetup , // MEM stage 2(tpcq + tmux + tRFwrite) // WB stage } Where does this “2” come from? If you are asked to design CPU using only rising-edge of the clock, then? Let’s stick to this for our project If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? Our textbook follows this

Pipelined Performance Example Element Parameter Delay (ps) Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup Equality comparator teq 40 AND gate tAND 15 Memory write Tmemwrite 220 Register file write tRFwrite 100 ps Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps

Pipelined Performance Example For a program with 100 billion instructions executing on a pipelined MIPS processor, CPI = 1.15 Tc = 550 ps Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 × 109)(1.15)(550× 10-12 s) = 63 seconds Processor Execution Time (seconds) Speedup (single-cycle is baseline) Single-cycle 95 1 Multicycle 133 0.71 Pipelined 63 1.51

Backup Slides

Exception Handling in MIPS and Handler Actions Exception handling in MIPS Hardware (CPU) CPU saves PC of offending (or interrupted) instruction to the “Exception Program Counter (EPC)” register CPU saves indication of the problem to the “Cause” register Jump to handler at 0x8000 00180 Exception Handler in Software Read cause, and transfer to relevant handler If restartable, Take corrective action Use EPC to return to program Otherwise Terminate program Report error using EPC, cause, …

Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware

Pipeline with Exceptions

Exception Example Exception on add in Handler 40 sub $11, $2, $4 44 and $12, $2, $5 48 or $13, $2, $6 4C add $1, $2, $1 50 slt $15, $6, $7 54 lw $16, 50($7) … Handler 80000180 sw $25, 1000($0) 80000184 sw $26, 1004($0) …

Exception Example

Exception Example