ECE243 CPU.

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

Morgan Kaufmann Publishers The Processor
Computer Organization and Architecture
Computer Organization and Architecture
Pipelined Processor II (cont’d) CPSC 321
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
331 W9.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 9 Building a Single-Cycle Datapath [Adapted from Dave Patterson’s.
Levels in Processor Design
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
CH12 CPU Structure and Function
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
Electrical and Computer Engineering University of Cyprus LAB 2: MIPS.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Microarchitecture. Outline Architecture vs. Microarchitecture Components MIPS Datapath 1.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Our programmer needs to do this !
1 ECE243 ISA: Instruction Set Architecture. 2 A TYPICAL PC Motherboard (CPU, MEMORY) Hard drive CD/DVD R/W USB Connectors Graphics card Monitor Keyboard.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3 In-Class Exercises.
Design a MIPS Processor (II)
CS Computer Architecture Week 10: Single Cycle Implementation
CS161 – Design and Architecture of Computer Systems
Electrical and Computer Engineering University of Cyprus
Problem with Single Cycle Processor Design
CS161 – Design and Architecture of Computer Systems
IT 251 Computer Organization and Architecture
Morgan Kaufmann Publishers
/ Computer Architecture and Design
Prof. Sirer CS 316 Cornell University
Multi-Cycle CPU.
ECS 154B Computer Architecture II Spring 2009
Processor (I).
\course\cpeg323-08F\Topic6b-323
CS/COE0447 Computer Organization & Assembly Language
Single-Cycle CPU DataPath.
TIME Single-Cycle LOAD K1 (K2) ADD K1 K2 ORI 0x1F Multi-Cycle
The Multicycle Implementation
Drawbacks of single cycle implementation
Levels in Processor Design
The Multicycle Implementation
Systems Architecture I
Vishwani D. Agrawal James J. Danaher Professor
Guest Lecturer TA: Shreyas Chand
Processor: Multi-Cycle Datapath & Control
Prof. Sirer CS 316 Cornell University
CS/COE0447 Computer Organization & Assembly Language
Control Unit for Multiple Cycle Implementation
Control Unit for Multiple Cycle Implementation
FloorPlan for Multicycle MIPS
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
Presentation transcript:

ECE243 CPU

IMPLEMENTING A SIMPLE CPU How are machine instructions implemented? What components are there? How are they connected and controlled?

MINI ISA: every instruction is 1-byte wide address space 4 registers: data and address values are also 1-byte wide address space byte addressable (every byte has an address) 8 addr bits => 256 byte locations 4 registers: r0..r3 PC (resets to $80) Condition codes: Z (zero), N (negative) these are used by branches

Some Definitions: IMM3: a 3-bit signed immediate, 2 parts: 1 sign bit: sign(IMM3) 2 bit value: value(IMM3) IMM4: a 4-bit signed immediate IMM5: a 5-bit unsigned immediate OpA, OpB: registers variables represent one of r0..r3 SE8(X): means sign-extend value X to 8 bits NOTE: ALL INSTS DO THIS LAST: PC = PC + 1

Mini ISA Instructions load OpA (OpB): OpA = mem[OpB] PC = PC + 1 store OpA (OpB): mem[OpB] = OpA   PC = PC + 1 add OpA OpB OpA = OpA+ OpB IF (OpA == 0) Z = 1 ELSE Z = 0 IF (OpA< 0) N = 1 ELSE N = 0 sub OpA OpB OpA = OpA - OpB PC = PC + 1

Mini ISA Instructions nand OpA OpB OpA = OpA bitwise-NAND OpB IF (OpA == 0) Z = 1 ELSE Z = 0 IF (OpA< 0) N = 1 ELSE N = 0    PC = PC + 1 ori IMM5 r1 = r1 bitwise-OR IMM5 IF (r1 == 0) Z = 1 ELSE Z = 0 IF (r1< 0) N = 1 ELSE N = 0   PC = PC + 1 shift OpA IMM3 IF (sign(IMM3)) OpA = OpA << value(IMM3) ELSE OpA = OpA >> value(IMM3)

Mini ISA Instructions bz IMM4 IF (Z == 1) PC = PC + SE8(IMM4) bnz IMM4 IF (Z == 0) PC = PC + SE8(IMM4) bpz IMM4 IF (N == 0) PC= PC + SE8(IMM4)    PC = PC + 1

ENCODINGS: Inst(opcode) Load(0000), store(0010), add(0100), sub(0110), nand(1000): Ori: 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0

ENCODINGS: Inst(opcode) Shift: BZ(0101), BNZ(1001), BPZ(1101): 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0

DESIGNING A CPU Two main components: datapath: control: datapath and control datapath: registers, functional units, muxes, wires must be able to perform all steps of every inst control: a finite state machine (FSM) commands the datapath performs: fetch, decode, read, execute, write, get next inst

ECE243 CPU: basic components

REGISTERS REGISTERS can always read we assume falling-edge-triggered 8 REGWrite? in out clock REGISTERS can always read we assume falling-edge-triggered in is stored if REGWrite=1 on falling clock edge we won’t normally draw the clock input

MUXES ‘select’ signal chooses which input to route to output out 8 1 1 out ‘select’ signal chooses which input to route to output

REGISTER FILE Out1 is the value of reg indexed by OpA (r0,r1,r2,r3) 2 8 OpA OpB Out1 Out2 clock REGWrite? Rwrite in Out1 is the value of reg indexed by OpA Out2 is the value of reg indexed by OpB if REGWrite is 1 when clock goes low then the value on ‘in’ is written to reg indexed by Rwrite

ALU (arithmetic logic unit) 8 In0 In1 Z N out ALUop 3 ALUop: add = 000 sub = 001 or = 010 nand = 011 shift = 100   Z = nor(out7,out6,out5…out0) N = out bit 7 (implies negative---sign bit)

MEMORY our CPU has two memories for simplicity: instruction memory and data memory known as a “Harvard architecture”

INSTRUCTION MEM is read only 8 addr Iout is read only Iout is set to the value indexed by the address

DATA MEMORY can read or write on falling clock edge: DATA MEM 8 addr Din Dout MEMRead? clock MEMWrite? can read or write but only one in a given clock cycle on falling clock edge: if MEMWrite==1: value on Din is stored at addr if MEMRead==1: value at addr is output on Dout

SE8(x): SIGN-EXTEND TO 8 BITS assuming 4-bit input Recall: want: SE8(0100) -> 00000100 SE8(1100) -> 11111100 In bits i3,i2,i1,i0; out bits o7…o0

ZE8(x): ZERO EXTEND TO 8 bits assuming 5-bit input Recall: want ZE8(00100) -> 00000100 ZE8(11100) -> 00011100 In bits i4,i3,i2,i1,i0; out bits o7…o0

CPU: Single Cycle Implementation ECE243 CPU: Single Cycle Implementation

SINGLE CYCLE DATAPATH each instruction executes entirely in one cycle of the cpu clock registers are triggered by the falling edge new values begin propagating through datapath some values may be temporarily incorrect the clock period is large enough to ensure: that all values correct before next falling edge

FETCH needed by every instruction addr PC INST MEM 8 inst PCwrite? 8 needed by every instruction i.e., every instruction must be fetched

PC = PC + 1 PC INST MEM 8 addr inst PCwrite? 8

BRANCHES: BZ IMM4 (if branch is taken does: PC = PC + IMM4 + 1) PC INST MEM 8 addr inst PCwrite? 8 + 1 8 IMM4 opcode 7 6 5 4 3 2 1 0 (if branch is taken does: PC = PC + IMM4 + 1)

ADD add OpA OpB Does OpA = OpA + OpB same datapath for sub and nand 1 PC 8 addr INST MEM inst PCwrite? 8 PCsel IMM4 8 4 SE8 + + 1 8 Does OpA = OpA + OpB same datapath for sub and nand OpA OpB 0 1 0 0 i7 i6 i5 i4 i3 i2 i1 i0 Inst:

SHIFT: SHIFT OpA IMM3 REGwrite? N Z 2 REG FILE Rw PC 2 Out1 addr INST 2 REG FILE Rw PC 2 Out1 addr INST MEM A L U 8 OpA 2 OpB inst Out2 PCwrite? 8 in 2 PCsel IMM4 ALUop 8 4 SE8 + + 1 8 OpA 0 1 1 i7 i6 i5 i4 i3 i2 i1 i0 IMM3

ORI: ORI IMM5 does: r1 <- r1 bitwise-or IMM5 REGwrite? A L U N Z 2 2 REG FILE Rw Out1 PC INST MEM 8 addr OpA 2 OpB inst Out2 PCwrite? 8 8 2 in PCsel ZE8 IMM3 IMM4 ALU2 ALUop 8 4 SE8 + + 1 8 IMM5 1 1 1 i7 i6 i5 i4 i3 i2 i1 i0 does: r1 <- r1 bitwise-or IMM5

Store: Store OpA (OpB) does: mem[OpB] = OpA OpASel REGwrite? A L U N Z 1 2 1 1 REG FILE Rw PC INST MEM 2 2 Out1 8 addr OpA 2 OpB 00 01 10 11 inst Out2 PCwrite? 8 8 in 5 2 3 PCsel IMM5 ZE8 IMM3 ZE8 IMM4 ALU2 ALUop 8 4 SE8 + + 1 8 does: mem[OpB] = OpA OpA OpB opcode i7 i6 i5 i4 i3 i2 i1 i0 Inst:

Load: Load OpA (OpB) does: OpA = mem[OpB] MEMwrite MEMread addr Data OpASel REGwrite? A L U N Z 1 Din 2 1 1 REG FILE Rw PC INST MEM 2 2 Out1 8 addr OpA 2 OpB 00 01 10 11 inst Out2 PCwrite? 8 8 in 2 5 3 ZE8 PCsel IMM5 ZE8 IMM3 IMM4 ALUop ALU2 8 4 SE8 + + 1 8 OpA OpB opcode i7 i6 i5 i4 i3 i2 i1 i0 Inst: does: OpA = mem[OpB]

Final Datapath! MEMwrite MEMread addr Data MEM OpASel REGwrite? RFin A U N Z 1 Din 2 1 1 REG FILE Rw 1 PC INST MEM 2 2 Out1 8 addr OpA 2 OpB 00 01 10 11 inst Out2 PCwrite? 8 8 in 2 5 3 ZE8 PCsel IMM5 ZE8 IMM3 IMM4 ALUop ALU2 8 4 SE8 + + 1 8

DESIGNING THE CONTROL UNIT CTRL PCsel … opcode Z N CONTROL SIGNALS TO GENERATE: PCsel, PCwrite, REGwrite, MEMread, MEMwrite, OpASel, ALUop, ALU2, RFin

Control Signals Load OpA (OpB) INPUTS OUTPUTS INST Inst bits 3-0 N Z REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Load OpA (OpB)   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop LOAD 0000 X

Control Signals Store OpA (OpB) INPUTS OUTPUTS INST Inst bits 3-0 N Z REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Store OpA (OpB)   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop STORE 0010 X

Control Signals Add OpA OpB INPUTS OUTPUTS INST Inst bits 3-0 N Z REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Add OpA OpB   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop ADD 0100 X

Control Signals Sub OpA OpB INPUTS OUTPUTS INST Inst bits 3-0 N Z REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Sub OpA OpB   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop SUB 0110 X

Control Signals Nand OpA OpB INPUTS OUTPUTS INST Inst bits 3-0 N Z REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Nand OpA OpB   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop NAND 1000 X

Control Signals ori IMM5 INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 ori IMM5   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop ORI X111 X

Control Signals Shift OpA IMM3 INPUTS OUTPUTS INST Inst bits 3-0 N Z REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Shift OpA IMM3   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop SHIFT X011 X

Control Signals bz IMM4 INST Inst bits 3-0 N Z PCSel PCWrite RegWrite FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 bz IMM4 INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop BZ 0101 X 1

Control Signals bnz IMM4 INST Inst bits 3-0 N Z PCSel PCWrite RegWrite FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 bnz IMM4 INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop BNZ 1001 X 1

Control Signals bpz IMM4 INST Inst bits 3-0 N Z PCSel PCWrite RegWrite FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 bpz IMM4 INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop BPZ 1101 X 1

All Control Signals INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop LOAD 0000 X 1 XXX STORE 0010 ADD 0100 00 000 SUB 0110 001 NAND 1000 011

All Control Signals INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel   INPUTS OUTPUTS INST Inst bits 3-0 N Z PCSel PCWrite RegWrite MemRead OpASel MemWrite ALU2 RFin ALUop ORI X111 X 1 01 010 SHIFT X011 10 100 BZ 0101 XXX BNZ 1001 BPZ 1101

Building Control Logic: MemRead Load Store Add Sub Nand Ori Shift Bz Bnz BPZ inst bits i3-i0 0000 0010 0100 0110 1000 X111 X011 0101 1001 1101 N X 1 Z Mem Read

Building Control Logic: PCSel Load Store Add Sub Nand Ori Shift Bz Bnz BPZ inst bits i3-i0 0000 0010 0100 0110 1000 X111 X011 0101 1001 1101 N X 1 Z PCSel

CPU: Multicycle Implementation ECE243 CPU: Multicycle Implementation

A Multicycle Datapath OpASel OpA OpB

Key Difference #1: Only 1 Memory OpASel OpA OpB

Key Difference #2: Only 1 ALU OpASel OpA OpB

Key Difference #3: Temp Regs OpASel OpA OpB what benefit are tmp regs / multicycle?

Key Difference #3: Temp Regs OpASel OpA OpB critical path is long large clock period

Key Difference #3: Temp Regs OpASel OpA OpB smaller critical pathsshorter clock period

Key Difference #3: Temp Regs OpASel OpA OpB let’s examine these one at a time

IR: Instruction Register OpASel OpA OpB holds inst encoding

MDR: Memory Data Register OpASel OpA OpB holds the value returned from Memory

hold values from the register file OpA and OpB OpASel OpA OpB hold values from the register file

holds the result calculcated by the ALU ALUout OpASel OpA OpB holds the result calculcated by the ALU

Cycle by Cycle Operation OpASel OpA OpB

All Insts Cycle1: Fetch and Increment PC IR ← mem[PC]; PC ← PC + 1; OpASel OpA OpB increment PC fetch next inst into the IR

All Insts Cycle2: Decoding Inst & Reading Reg File OpA ← rx; OpB ← ry OpASel OpA OpB Note: not all insts need OpA and OpB

Add, Sub, Nand Cycle3: Calculate ALUout ← OpA op OpB OpASel OpA OpB

Add, Sub, Nand Cycle4: Write to Reg FIle OpASel OpA OpB rx ← ALUout

Shift Cycle3: Calculate ALUout ← OpA op IMM3 OpASel OpA OpB

Shift Cycle4: Write to Reg FIle rx ← ALUout OpASel OpA OpB

ORI Cycle3: Read r1 from Reg File OpA ← r1 OpASel OpA OpB

ORI Cycle4: Calculate ALUout ← OpA op IMM5 OpASel OpA OpB

ORI Cycle5: Write to Reg FIle r1 ← ALUout OpASel OpA OpB

Load Cycle3: addr to Mem, value into MDR MDR ← mem[OpB] OpASel OpA OpB

Load Cycle4: write value into reg file rx ← MDR OpASel OpA OpB

Store Cycle3: addr to Mem, value to Mem mem[OpB] ← OpA OpASel OpA OpB

Branches Cycle3 PC ← PC + IMM4 OpASel OpA OpB

Summary Instructions Single Cycle Eg: 1 MHz Multicycle Eg: 4 MHz Store, BZ, BNZ, BPZ 1 cycle 3 cycles Add, Sub, Nand, Load 4 cycles ORI 5 cycles Example: total time to execute one of each instruction: Single cycle: 1*4 + 1*4+1*1 = 9 cycles; 9 cycles / 1MHz = 9us Multicycle: 3*4 + 4*4 + 1*5 = 33 cycles; 33 cycles / 4MHz = 8.25us

Implementing Multicycle Control Add, Sub, Nand Shift Ori Load Store Bnz, Bz, Bpz 1 IR = [PC] PC = PC + 1 2 OpA = RF[rx] OpB = RF[ry] 3 ALUout = OpA op OpB ALUout = OpA shift Imm3 OpA = RF[1] MDR = mem[OpB] Mem[OpB] = OpA PC = PC + SE(Imm4) 4 RF[rx] = ALUout ALUout = OpA OR Imm5 RF[rx] = MDR X 5 RF[1] = ALUout

Control: An FSM need a state transition diagram how many states are there? how many bits to represent state?

Multicycle Control as an FSM

Multicycle Control Hardware IR N Ctrl logic Z State Register (4 bits) IR:3..0 Pcwrite Pcsel ALUop … Next_state Current_state

CPU: Adding a New Instruction ECE243 CPU: Adding a New Instruction

EXAMPLE QUESTION: ADDING A NEW INSTRUCTION Implement a post-increment load: Load rx, (ry)+ Does: RF[rx] = MEM[RF[ry]] RF[ry] = RF[ry] + 1 ry is permanently changed to be ry+1

Implementing: RF[rx] = MEM[RF[ry]]; RF[ry] = RF[ry] + 1 Recall: load rx, (ry) IR= mem[PC] , PC = PC + 1 OpA = RF[rx], OpB = RF[ry] MDR = mem[ry] RF[rx] = MDR

Modifying the Datapath RF[ry] = RF[ry] + 1 OpASel OpA OpB

ECE243 CPU: Pipelining

A Fast-Food Sandwich Shop cook take order select bun add ingredients wrap and bag cash and change

With One Cook one customer is serviced at a time cook take order select bun add ingredients wrap and bag cash and change customer1 customer1 customer1 customer1 customer1 one customer is serviced at a time

Like the single-cycle CPU REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8 Add r1, r2 one instruction flows through at a time

With Two Cooks? cook cook take order select bun add ingredients wrap and bag cash and change

Pipelining Like an assembly line Doesn’t change the interface or result improves performance

Pipelining a CPU (rough idea) REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8

Pipelining Details: MEMwrite MEMread Data OpASel REGwrite? RFin N Z FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8

With Three Cooks? cook cook cook take order select bun add ingredients wrap and bag cash and change

Pipelining a CPU (rough idea) REG FILE Rw 2 OpA OpB 5 A L U N Z Out1 Out2 in REGwrite? 8 inst INST MEM PC addr + PCwrite? 1 OpASel ALU2 IMM5 ALUop Data 00 01 10 11 IMM3 3 Din MEMread MEMwrite RFin 4 SE8 IMM4 PCsel ZE8

Visualizing Pipelining Fetch (inst mem) Decode (reg file) Execute (ALU and data mem) Cycle Fetch Decode Execute 1 2 3 4

Visualizing Pipelining (again) Fetch (inst mem) Decode (reg file) Execute (ALU and data mem) Cycle 1 2 3 4 5 inst1 inst2 inst3 inst4

Fast Food Hazards What if: c1 and c2 are friends, c2 has no money, and cook cook cook take order select bun add ingredients wrap and bag cash and change customer3 customer2 customer1 What if: c1 and c2 are friends, c2 has no money, and c2 needs to know how much change c1 will get before ordering (to ensure c2 can afford his order)?

Fast Food Hazards cook cook cook take order select bun add ingredients wrap and bag cash and change customer2 customer1

CPU Hazards called a data hazard Fetch (inst mem) Decode (reg file) Execute (ALU and data mem) called a data hazard must be observed to ensure correct execution there are two solutions to data hazards

Solution1: Stalling Cycle 1 2 3 4 5 Execute Fetch Decode (ALU and (inst mem) Decode (reg file) Execute (ALU and data mem) Cycle 1 2 3 4 5 add r1,r2 add r3,r1 sub r0,r2 add r2,r2

How to insert bubbles option1: hardware stalls the pipeline need extra logic to do so happens ‘automatically’ for any code option2: compiler inserts “no-ops” a no-op is an instruction that does nothing ex: add r0,r0,r0 (NIOS) compiler must do it right or wrong results! example: inserting a bubble with a no-op: add r1, r2 noop add r3, r1

Solution2: Forwarding Lines Fetch (inst mem) Decode (reg file) Execute (ALU and data mem) add “forwarding” logic to pass values directly between stages Cycle 1 2 3 4 5 add r1,r2 add r3,r1 sub r0,r2 add r2,r2

Control Hazards Cycle 1 2 3 4 5 cpu predicts each branch is not taken add r1,r2 bnz -2 add r3,r1 add r2,r2 cpu predicts each branch is not taken Better: predict taken why?---loops are common, usually taken More advanced: remember what each branch did last time “branch predictor”: a table that remembers what each branch did the last time uses this to make a prediction next time

Some Real CPU Pipelines 21264 Pipeline (Alpha) Microprocessor Report 10/28/96 Pentium IV’s Pipeline: TC nxt IP TC fetch Drv Alloc Rename Que Sch Disp RF Ex Flgs BrCk

CPU: Alternate Architectures ECE243 CPU: Alternate Architectures

ANOTHER MULTICYCLE CPU CONTROL IR PC MDR Regs r0..r3 Y Z 1 Control Signals to All components Internal bus MEM addr Din Dout MAR ALU Select 111 … 000 MEMRead MEMWrite Imm3,4,5 ALUop

SOME CONTROL SIGNALS PCout: PCin: MDRinBus: MDRinMem: MDRoutBus: write PC value to bus PCin: read bus value into PC MDRinBus: read value from bus into MDR MDRinMem: write value from Dout of MEM into MDR MDRoutBus: write value from MDR onto bus

Ex: Ctrl: Add r1, r2 # r1 = r1 + r2 CONTROL IR PC MDR Y Z 1 Control Regs r0..r3 Y Z 1 Control Signals to All components Internal bus MEM addr Din Dout MAR ALU Select 111 … 000 MEMRead MEMWrite Imm3,4,5 ALUop

Ex: Ctrl: Add r1, r2 # r1 = r1 + r2 CONTROL IR PC MDR Y Z 1 Control Regs r0..r3 Y Z 1 Control Signals to All components Internal bus MEM addr Din Dout MAR ALU Select 111 … 000 MEMRead MEMWrite Imm3,4,5 ALUop

CHARACTERIZATION OF ISAs attribute #1: number of explicit operands Attribute #2: are registers general purpose? Attribute #3: Can an operand be a memory location? Attribute #4: RISC vs CISC Attribute #5: Relation between instructions and data

att1: num of explicit operands focus on calculation instructions (add,sub…) running example: A = B + C (C-code) assume A, B, C are memory locations 0 operands: eg., stack based (like first calculator CPUs) push and pop operations, refer to top of stack

att1: num of explicit operands eg., accumulator based; accumulator is a reg inside cpu instructions use accum as destination.

att1: num of explicit operands eg: 68k, ia32

att1: num of explicit operands eg: MIPS, SPARC, POWERpc How many operands is NIOS?

Att2: are regs general purpose? if yes: you can use any register for any purpose special registers are by convention only if no: some registers have hardwired purposes ex: in 68k, A7 is hardwired to be stack pointer used implicitly for jsr, rts, link instructions Are NIOS registers general purpose?

Att3: operand = mem location? with respect to calculation insts (add, sub) if yes: one operand can be in memory, the other in a register maybe: can can also write result to memory if no: called a load/store architecture only load/store insts can get/put memory values to/from regs Can a NIOS operand be a mem location?

Att4: RISC vs CISC Are there instructions with many steps? a vague and debatable question CISC: complex instruction set computer Many, complex instructions can be hard to pipeline! ex: 68k, x86, PowerPC? RISC: reduced instruction set computer Fewer, simple instructions easy to pipeline ex: MIPS, alpha, Powerpc? Which is NIOS? Quandry: x86 is a CISC but pentiumIV has a 20-stage pipeline! How’d they do it?

Att5: Relation bet. insts & data SISD: single instruction, single data everyting we have seen so far an inst only writes one reg/memory location SIMD: single instruction, multiple data one instruction tells CPU to operate on an array of regs or memory locations ex: multimedia extensions: MMX, SSE, 3Dnow (intel); altivec (powerpc) ex: IBM/Sony/toshiba Cell processor (vector processor) MIMD: multiple instruction, multiple data ex: Cluster of workstations, SMP servers, multicores, hyperthreading Which is NIOS?