ISA Design for the Project CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos.

Slides:

Advertisements

Similar presentations

Processor Data Path and Control Diana Palsetia UPenn

Advertisements

Chapter 4 The Von Neumann Model

Execution Cycle. Outline (Brief) Review of MIPS Microarchitecture Execution Cycle Pipelining Big vs. Little Endian-ness CPU Execution Time 1 IF ID EX.

CS/COE1541: Introduction to Computer Architecture Datapath and Control Review Sangyeun Cho Computer Science Department University of Pittsburgh.

©UCB CPSC 161 Lecture 3 Prof. L.N. Bhuyan

Cosc 2150: Computer Organization

MIPS Assembly Tutorial

Basic Pipelining CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic.

DLX computer Electronic Computers M.

1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.

Review of the MIPS Instruction Set Architecture. RISC Instruction Set Basics All operations on data apply to data in registers and typically change the.

Lecture 5: MIPS Instruction Set

CS/COE0447 Computer Organization & Assembly Language

Branches Two branch instructions:

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.

1 ECE462/562 ISA and Datapath Review Ali Akoglu. 2 Instruction Set Architecture A very important abstraction –interface between hardware and low-level.

CS1104: Computer Organisation School of Computing National University of Singapore.

Deeper Assembly: Addressing, Conditions, Branching, and Loops

1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 5 MIPS ISA & Assembly Language Programming.

Single-Cycle Processor Design CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos.

ELEN 468 Advanced Logic Design

Chapter 2 Instructions: Language of the Computer

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ENEE350 Spring07 1 Ankur Srivastava University of Maryland, College Park Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005.”

Instruction Representation II (1) Fall 2007 Lecture 10: Instruction Representation II.

Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.

MIPS Instruction Set Advantages

9/29: Lecture Topics Memory –Addressing (naming) –Address space sizing Data transfer instructions –load/store on arrays on arrays with variable indices.

11/02/2009CA&O Lecture 03 by Engr. Umbreen Sabir Computer Architecture & Organization Instructions: Language of Computer Engr. Umbreen Sabir Computer Engineering.

April 23, 2001Systems Architecture I1 Systems Architecture I (CS ) Lecture 9: Assemblers, Linkers, and Loaders * Jeremy R. Johnson Mon. April 23,

Lecture 4: MIPS Instruction Set

RISC Processor Design RISC Instruction Set Virendra Singh Indian Institute of Science Bangalore Lecture 8 SE-273: Processor Design.

Chapter 2 CSF 2009 The MIPS Assembly Language. Stored Program Computers Instructions represented in binary, just like data Instructions and data stored.

Computer Organization CS224 Fall 2012 Lessons 7 and 8.

Our programmer needs to do this !

EE 3755 Datapath Presented by Dr. Alexander Skavantzos.

Chapter 2 — Instructions: Language of the Computer — 1 Memory Operands Main memory used for composite data – Arrays, structures, dynamic data To apply.

CS61C L20 Datapath © UC Regents 1 Microprocessor James Tan Adapted from D. Patterson’s CS61C Copyright 2000.

Branch Prediction CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Deeper Assembly: Addressing, Conditions, Branching, and Loops

Electrical and Computer Engineering University of Cyprus

Lecture 5: Procedure Calls

Assembly language.

Lecture 6: Assembly Programs

MIPS Instruction Set Advantages

Lecture 4: MIPS Instruction Set

ELEN 468 Advanced Logic Design

RISC Concepts, MIPS ISA Logic Design Tutorial 8.

The Interconnect, Control, and Instruction Decoding

Computer Architecture (CS 207 D) Instruction Set Architecture ISA

Instructions - Type and Format

Lecture 4: MIPS Instruction Set

CSCI206 - Computer Organization & Programming

ECE232: Hardware Organization and Design

Instruction encoding The ISA defines Format = Encoding

Lecture 5: Procedure Calls

Guest Lecturer TA: Shreyas Chand

Computer Instructions

Flow of Control -- Conditional branch instructions

Instruction encoding The ISA defines Format = Encoding

UCSD ECE 111 Prof. Farinaz Koushanfar Fall 2018

Instruction encoding The ISA defines Format = Encoding

Lecture 6: Assembly Programs

Flow of Control -- Conditional branch instructions

Instruction encoding The ISA defines Format = Encoding

9/27: Lecture Topics Memory Data transfer instructions

Presentation transcript:

ISA Design for the Project CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic

Project ISA  Who are the players? – Are we doing HW/SW co-design?  We will be designing processor, need an ISA  What do we want in our ISA – Easy to decode (you’ll have to write this in Verilog) – Easy to write assembler for (you’ll have to write one) – Easy to write applications for (you’ll do this, too)  Similar tradeoff involved in designing real CPUs – Plus backward compatibility – But for CS 3220 we don’t want backward compatibility! – Encourages laziness and cheating (Verilog code may already be posted somewhere) 2

ISA decisions  CISC or RISC? – Definitely RISC (much easier to design)  Fixed-size or variable size? – Definitely fixed (fetch and decode much easier)  How many things can be read or written – Each register read (>1) complicates register file – Each register write (>1) complicates register file a lot! – Each memory read or write (>1) creates lots of problems (memory ports, pipeline stages, hazards). 3

Which instructions? Memory!  How will we access memory – Do we use only LD/ST, or do we allow memory operands in other kinds of instructions?  Only LD/ST is far simpler to implement because: – Mem operands in ADD, SUB, etc. require many “flavors” for each instruction (tough to decode) And we need to describe the entire decoding logic in Verilog – Don’t want multiple memory accesses per inst! Even one memory stage in the pipeline is complex enough  OK, we’ll have LW, SW 4

Which instructions? ALU!  Let’s have some arithmetic – ADD, SUB, what else?  How about some logic? – Option 1: AND, OR, NOT, XOR, etc. – Option 2: Let’s just have one! Which one? NAND! Can “fake” others using NAND, e.g. “NOT A” is “A NAND A” – Let’s use Option 1 but not go overboard Easier to write assembler, easier to decode But leave room (unused opcodes) for more  Comparisons? It depends… – Option 1: Conditional branches do comparisons – Option 2: Comparison instructions, one cond. branch – Option 3: Mix of the two 5

Speaking of branches…  Conditional branches – PC relative, need decent-sized offset operand – Hard to write if-then-else and loops if branch only goes e.g. 3 instructions forward or back  How will we call procedures? – Option 1: Special branch that saves return address – Option 2: Save RA in SW, use normal branch  How will we return from procedures? – Option 1: Specialized “RET” – Option 2: Jump-to-address-in-register (JR)  Let’s have only one call/jump/return inst for now! – Similar to JALR instruction from CS 2200 – Syntax would be JAL Rdst,Imm(Rsrc) 6

Conditional branches?  Typical conditional branches BEQ R1,R2,Label ; Go to Label if R1==R2 Can also have BLT, BLE, BNE, BGT, BGE Need to encode two registers in the instruction BEQZ R1, Label ; Go to Label if R1==0 Can also have BNEZ, BLEZ, etc. Need to encode only one register in the instruction (so we can have a 6-bit offset)  Could have implicit operand, e.g. always R1 BEQZ Label ; If R1==0 go to Label Bad: R1 won’t be very useful for anything else 7

How many registers?  Need at least 2 to do ALU operations  Plus one to be a stack pointer  Plus one to save return address – Unless we want to save it directly to memory  Nice to have a few extra – One for return value (to avoid saving it to stack) – Some to pass parameters? Need at least 2 (more is even better)  Need at least one for system use – We’ll work on this in the last two projects  OK, this is already 8 or more, so let’s have 16 – When writing code in assembler, we’ll see that more is better 8

Size of instruction word?  Bits in instruction word? Hmm, let’s see – Need room for opcode How many types of instructions do we have? Can have a secondary opcode for some (e.g. for ADD,SUB, etc.) – Need room for register operands Do we want 1, 2, or 3 or those? 3! This will use 12 bits in the instruction word – Need room for immediate operands The more the better, but too few will be a problem  Let’s have 32-bit instruction word – 8 not really an option (not enough room) – 16 is very tight (with 16 regs, only 4 bits left for opcode) – So let’s do 32 (allows large offsets, more opcodes, etc.) 9

Register size?  How about 8? – Will need multi-word values often (e.g. loop counters) – PC must be larger than this, procedure calls get tricky  Can we do with 16? – Most loops and programs will be OK – Immediate operand can load entire constant (nice) – Can display entire word on HEX display  But it makes sense to have 32-bit registers – Same as instruction word – Almost never have to worry about overflows and such 10

Memory addressing?  Byte-addressed or word-addressed?  Word-addressed is simpler – Only need LD/ST instruction, vs. LW/SW, LB/SB, etc. – Don’t have to worry about alignment  But – Hard to switch apps to byte-addressed later – Can’t use e.g. 16-bit memory locations – We can achieve most of the HW simplicity if we require word-alignment  So we’ll have byte-addressed aligned LW/SW only – Can drop alignment limitations later if we want to – But can add LB/SB, LH/SH later if we want to 11

ISA definition  How many bits for the opcode?  For insts w/ 3 reg operands, 12 bits already used – Great, leaves 20 bits for opcode! But…  For insts w/ 2 reg and 1 imm operand – E.g. LW R1,-4(R2), ADDI R1,R2,64, BNE R1,R2,Label – Imm and opcode must fit in 24 bits (10 used for regno)  Let’s have a 16-bit immediate and 4-bit opcode – Will make register number decoding a bit easier – Few “reach” issues in branches and LW/SW – Fairly large constants in ADDI, SUBI, ANDI, etc. – We have 16 opcodes Won’t be enough  – LW, SW, Will needa a trick called “secondary opcode” to for >16 instructions 12

Instruction Format Thus Far wire [3:0] op1; // Primary opcode wire [3:0] rd,rs,rt; // Register operands wire [15:0] imm; // 16-bit immediate operand assign {op1,rd,rs,rt,imm}=iword;  Decoding of register numbers is trivial  But… only 16 different instructions? – LW, SW (and leave room for LH, SH, LB, SB) – ADDI, ADD, SUB, AND, OR, XOR, NOT – BEQZ, BNEZ, JAL – This is already 16  What if we want to add more later, e.g. MUL? 13

Primary/Secondary Opcode  Have a smaller primary opcode (our four bits) – Instructions without an imm operand have 16 “free” bits ADD Rd,Rs,Rt uses 16 bits for primary opcode and regs – Instructions with an imm but only two regs have 4 free bits LW Rd,Imm(Rs) does not use the Rt field – Also ADDI Rd,Rs,Imm, SUBI, etc. SW Rt,Imm(Rs) does not use the Rd field – Also BEQ Rs,Rt,Imm, etc.  Idea: Use these extra bits for a secondary opcode – Uses only one primary opcode for a family of ALU instructions – Secondary opcode => the actual operation  Primary opcode of 0000 now means “3-reg ALU inst” – Imm field unused => Secondary opcode can be up to 16 bits – We’ll use only 6 for now (enough for many insts) E.g is NOP, is ADD, etc.  Primary opcode of 1000 now means “2-reg load inst” – Secondary opcode in Rt field (4 bits), e.g is LW  … 14

Assign Primary Opcodes  Does it matter which insts get which opcode? – E.g. ALU Rd,Rs,Rt 0000, ALU Rd,Rs,Imm is 0001, etc.?  Make the decoding easy! – After we read the primary opcode, need to look at secondary opcode to finish decoding – Let some opcode bits tell us where the op2 is!  Assigning opcode numbers as a list is messy – So we use an opcode chart 15

Opcode Chart  We have 4-bit primary opcodes (2 x 2 bits) 4 Feb 2014Project ISA ALURCMPR 01StoreBcond 10ALUILoadCMPI 11 Less significant 2 bits More significant 2 bits

Opcode Chart  We have 4-bit primary opcodes (2 x 2 bits) 4 Feb 2014Project ISA ALURCMPR 01StoreBcond 10ALUILoadCMPI 11 Less significant 2 bits More significant 2 bits opcode2 = specific operation op2 = Rd op2 = Rt

Load (op1=1001) Opcode Chart  We have a 4-bit secondary opcode instead of Rt LWLHLB Less significant 2 bits More significant 2 bits Will add these later Why not here? No particular reason!

Store (op1=0101) Opcode Chart  We have a 4-bit secondary opcode instead of Rd SWSHSB Less significant 2 bits More significant 2 bits Will add these later Why not here? Symmetry w/ Load!

ALUR (op1=0000) Opcode Chart  16-bit secondary opcode instead of Imm – We’ll keep bits 11:4 at zero, use only [3:0]. Why? ADDSUB 01ANDORXOR 10 11NANDNORNXOR Less significant 2 bits More significant 2 bits

ALUI (op1=1000) Opcode Chart  4-bit secondary opcode instead of Rt – Where should ADDI, SUBI, etc. go in this table? Less significant 2 bits More significant 2 bits ADDISUBI 01ANDIORIXORI 10 11NANDINORINXORI

CMP/CMPI/Bcond Opcode Chart  4-bit secondary opcode instead of Rd – All have the same op2 decoding Less significant 2 bits More significant 2 bits FalseEQLTLTE 01EQZLTZLTEZ 10 TrueNEGTE GT 11NEZGTEZGTZ False, True? Why 0000 for EQ? Why GTE and GT swapped here?

Constant into register?  How would you put a 32-bit constant into a reg? – Start with zero in a register (easy, e.g. XOR R1,R1,R1) – ADDI a 16-bit constant… OK, half-way there! – What now? Errr… shift up 16 places! – ADD R1,R1,R1 is R1<<1, just do this 16 times? – We’ll want to have proper shift instructions – To load a large constant: XOR, ADDI, SLL, ADDI   Let’s add a MVHI instruction! – The upper 16 bits come from the immediate operand – What about the lower 16 bits? Zero them out! – Can MVHI then ADDI to load a 32-bit constant 23

Adding MVHI to the ALUI op2 Chart Less significant 2 bits More significant 2 bits ADDISUBI 01ANDIORIXORI 10MVHI 11NANDINORINXORI

JAL?  JAL Rd,Imm(Rs) – RD = PC + 4 – Jump to RS + Imm  Can’t be in the Bcond op2 table! – Does not do a comparison… But this is similar to B (Bcond with True condition) – Writes to Rd! Can’t use Rd for op2! 25

JAL op1?  Not using Rt => Can use op1=1011 – Should we have op2 for JAL? Unlikely to have more JAL-like instructions… BUT! ALURCMPR 01StoreBcond 10ALUILoadCMPIJAL 11 Less significant 2 bits More significant 2 bits op2 = imm op2 = Rd op2 = Rt Don’t waste opcodes! op1=1011 (op2 in Rt) op2=0000

Instruction Format  {op1,rd,rs,rt,12’b0,op2} – This format is used when op1 is ALUR or CMPR – ALUR: rd = rs OP2 rt – CMPR: rd = (rs OP2 rt)?1:0 Instruction mnemonics are F (False), T (for True), EQ, NE, etc.  {op1,op2,rs,rt,imm} – This format is used when op1 is Store or Bcond – Store: mem[rs + sxt(imm)]=rt – Bcond: if(rs OP2 rt) PC=PC+4+(sxt(imm)*4) Instruction mnemonics are BF, BT, BEQ, BNE, etc.  {op1,rd,rs,op2,imm} – This format is used when op1 is ALUI, CMPI, Load, or JAL – ALUI: rd = rs OP2 sxt(imm) – CMPI: rd=(rs OP2 sxt(imm))?1:0 Instruction mnemonics are FI, TI, EQI, NEI, etc. – Load: rd=mem[rs + sxt(imm)] – JAL: rd<=PC+4; PC<=rs+4*sxt(imm); Note <= here! What should JAL R1,0(R1) do? 27

Assembler syntax  Instruction opcodes and register names – Are reserved words (can’t be used as labels) – Appear in either lowercase or uppercase – If there is a destination register, it is listed first  Labels – Created using a name and then “:” at the start of a line – Corresponds to the address where label created  Immediate operands – number or label – If number, hex (C format, e.g. 0xffff) or decimal (can have - sign) – If label, just use the name of the lable (without “:”) For PC-relative, the immediate field is label_addr-PC-4 For other insts, the immediate field is 16 least-significant bits of label_addr 28

Register Names  Each register has multiple names R0..R3 are also A0..A3 (function arguments, caller saved) R3 is also RV (return value, caller saved) R4..R5 are also T0..T1 (temporaries, caller saved) R6..R8 are also S0..S2 (calee-saved values) R9 reserved for assembler use R10..R11 reserved for system use (we’ll see later for what) R12 is GP (global pointer) R13 is FP (frame pointer) R14 is SP (stack pointer) R15 is RA (return address) – Stack grows down, SP points to lowest in-use address 29

Assembler syntax .ORG – Changes “current” address to .WORD – Places 32-bit word at the current address – can be a number or a label name – If label name, value is the full 32-bit label_addr .NAME = – Defines a name (label) with a given value (number) – Otherwise we would have to name constants using.ORG 1 One: 30

Pseudo-instructions  Do not actually exist in the ISA – Translate into existing instructions – Can use R9 (see below) That’s why we reserved it for assembler use  We will have (for now) NOT Ri,Rj=>NANDRi,Rj,Rj CALL Imm(Ri)=>JALRA,Imm(Ri) RET=>JALR9,0(RA) JMP Imm(Ri)=>JALR9,Imm(Ri) 31

Memory?  Separate inst and data memory? – Good: Our design will be faster, cheaper – Bad: How does one load programs into memory?  We’ll have separate imem and dmem for now – We’ll see later how to unify them  How much memory? – There are 239,616 memory bits on-chip, so – 8kB ( bit words) of imem – 8kB ( bit words) of dmem – Leaves about half of memory bits on the FPGA chip (for register file, debugging in SignalTap, etc.) 32

Input/Output?  We want our programs to – Read SW, KEY (so we can interact with it) – Write to HEX, LEDG, LEDG – Maybe some more I/O  Need instructions for this! – Special instruction for each device, e.g. “WRLEDG” Extensions are hard (change processor as each device added) – Special IN/OUT instructions Assign “addresses” to devices, then use IN/OUT to read/write – Memory-mapped I/O (this is what we’ll use) Each device gets a memory address, LW/SW can be used for I/O Can’t use those memory locations as normal memory! 33

Prelude to Assignment 2  Write an assembler – Reads assembler listing for this project ISA Including pseudo instructions – Outputs a file with bit words of memory in the.mif file format (Test2.mif, Sorter2.mif)  Verilog design of a multi-cycle processor – Implements this ISA, PC starts at (byte address) 0x40 – Uses Sorter2.mif to pre-load its 8kB memory – SW to address 0xF displays bits as hexadecimal digits on HEX display – SW to address 0xF displays bits 9..0 on LEDR – SW to address 0xF displays bits 7..0 on LEDG – LW from address 0xF reads KEY state Result of LW should be 0 when no KEY pressed, 0xF when all are pressed This means we actually need LW to get {28’b0,!KEY} – LW from address 0xF reads SW state The 32-bit value we read should really be {22’b0,SWd) SWd is a debounced value of SW 34 Don’t panic (yet)! Will do much of the design in lectures!