Han Wang CS3410, Spring 2012 Computer Science Cornell University

Han Wang CS3410, Spring 2012 Computer Science Cornell University
Processor Han Wang CS3410, Spring 2012 Computer Science Cornell University See P&H Chapter , 4.1-4

Announcements Project 1 Available Design Document due in one week.
Final Design due in three weeks. Work in group of 2. Use your resources FAQ, class notes, book, Lab sections, office hours, Piazza Make sure you Registered for class, can access CMS, have a Section, and have a project partner Check online syllabus/schedule, review slides and lecture notes, Office Hours.

MIPS Register file MIPS register file r1
32 registers, 32-bits each (with r0 wired to zero) Write port indexed via RW Writes occur on falling edge but only if WE is high Read ports indexed via RA, RB r1 r2 … r31 W A 32 32 B 32 clk 5 5 5 WE RW RA RB

MIPS Register file Registers Numbered from 0 to 31.
Each register can be referred by number or name. $0, $1, $2, $3 … $31 Or, by convention, each register has a name. $16 - $23  $s0 - $s7 $8 - $15  $t0 - $t7 $0 is always $zero. Patterson and Hennessy p121.

MIPS Memory MIPS Memory memory Up to 32-bit address
32-bit data (but byte addressed) Enable + 2 bit memory control 00: read word (4 byte aligned) 01: write byte 10: write halfword (2 byte aligned) 11: write word (4 byte aligned) memory 32 32 ≤ 32 2 addr mc E

Basic Computer System Let’s build a MIPS CPU
…but using (modified) Harvard architecture CPU Data Memory Registers ... Control data, address, control ALU Program Memory ... The Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data Under pure von Neumann architecture the CPU can be either reading an instruction or reading/writing data from/to the memory. Both cannot occur at the same time since the instructions and data use the same bus system. In a computer using the Harvard architecture, the CPU can both read an instruction and perform a data memory access at the same time, even without a cache. A Harvard architecture computer can thus be faster for a given circuit complexity because instruction fetches and data access do not contend for a single memory pathway . Also, a Harvard architecture machine has distinct code and data address spaces: instruction address zero is not the same as data address zero. Instruction address zero might identify a twenty-four bit value, while data address zero might indicate an eight bit byte that isn't part of that twenty-four bit value. A modified Harvard architecture machine is very much like a Harvard architecture machine, but it relaxes the strict separation between instruction and data while still letting the CPU concurrently access two (or more) memory buses. The most common modification includes separate instruction and data caches backed by a common address space. While the CPU executes from cache, it acts as a pure Harvard machine. When accessing backing memory, it acts like a von Neumann machine (where code can be moved around like data, a powerful technique). This modification is widespread in modern processors such as the ARM architecture and X86 processors. It is sometimes loosely called a Harvard architecture, overlooking the fact that it is actually "modified".

Agenda Stages of datapath Datapath Walkthroughs
Detailed design / Instruction format

Levels of Interpretation
for (i = 0; i < 10; i++) printf(“go cucs”); main: addi r2, r0, 10 addi r1, r0, 0 loop: slt r3, r1, r2 ... High Level Language C, Java, Python, Ruby, … Loops, control flow, variables Assembly Language No symbols (except labels) One operation per statement Machine Langauge Binary-encoded assembly Labels become addresses ALU, Control, Register File, … Machine Implementation

Instruction Set Architecture
Instruction Set Architecture (ISA) Different CPU architecture specifies different set of instructions. Intel x86, IBM PowerPC, Sun Sparc, MIPS, etc. MIPS ≈ 200 instructions, 32 bits each, 3 formats mostly orthogonal all operands in registers almost all are 32 bits each, can be used interchangeably ≈ 1 addressing mode: Mem[reg + imm] x86 = Complex Instruction Set Computer (ClSC) > 1000 instructions, 1 to 15 bytes each operands in special registers, general purpose registers, memory, on stack, … can be 1, 2, 4, 8 bytes, signed or unsigned 10s of addressing modes e.g. Mem[segment + reg + reg*scale + offset]

Instruction Types Arithmetic Memory Control flow
add, subtract, shift left, shift right, multiply, divide Memory load value from memory to a register store value to memory from a register Control flow unconditional jumps conditional jumps (branches) jump and link (subroutine call) Many other instructions are possible vector add/sub/mul/div, string operations manipulate coprocessor I/O

Addition and Subtraction
add $s0 $s2 $s3 (in MIPS) Equivalent to a = b + c (in C) $s0, $s2, $s3 are associated with a, b and c. Subtraction sub $s0 $s2 $s3 (in MIPS) Equivalent to d = e – f (in C) $s0, $s2, $s3 are associated with d, e and f.

Five Stages of MIPS datapath
Basic CPU execution loop Instruction Fetch Instruction Decode Execution (ALU) Memory Access Register Writeback addu $s0, $s2, $s3 slti $s0, $s2, 4 lw $s0, 20($s3) j 0xdeadbeef

Stages of datapath (1/5) Stage 1: Instruction Fetch
Fetch 32-bit instruction from memory. (Instruction cache or memory) Increment PC accordingly. +4, byte addressing +N

Stages of datapath (2/5) Stage 2: Instruction Decode
Gather data from the instruction Read opcode to determine instruction type and field length Read in data from register file for addu, read two registers. for addi, read one registers. for jal, read no registers.

Stages of datapath (3/5) Stage 3: Execution (ALU)
Useful work is done here (+, -, *, /), shift, logic operation, comparison (slt). Load/Store? lw $t2, 32($t3) Compute the address of the memory.

Stages of datapath (4/5) Stage 4: Memory access
Used by load and store instructions only. Other instructions will skip this stage. This stage is expected to be fast, why?

Stages of datapath (5/5) Stage 5:
For instructions that need to write value to register. Examples: arithmetic, logic, shift, etc, load. Store, branches, jump??

Datapath and Clocking Prog. inst Reg. File Mem ALU Data Mem PC control
+4 5 5 5 PC control Fetch Decode Execute Memory WB

Datapath walkthrough addu $s0, $s2, $s3 # s0 = s2 + s3
Stage 1: fetch instruction, increment PC Stage 2: decode to determine it is an addu, then read s2 and s3. Stage 3: add s2 and s3 Stage 4: skip Stage 5: write result to s0

Example: ADDU instruction
Prog. Mem Reg. File Data Mem inst ALU +4 5 5 5 PC control

Datapath walkthrough slti $s0, $s2, 4
#if (s2 < 4), s0 = 1, else s0 = 0 Stage 1: fetch instruction, increment PC. Stage 2: decode to determine it is a slti, then read register s2. Stage 3: compare s2 with 4. Stage 4: do nothing Stage 5: write result to s0.

Example: SLTI instruction

Datapath walkthrough lw $s0, 20($s3) # s0 = Mem[s3+20]
R[rt] = M[R[rs] + SignExtImm] Stage 1: fetch instruction, increment PC. Stage 2: decode to determine it is lw, then read register s3. Stage 3: add 20 to s3 to compute address. Stage 4: access memory, read value at memory address s3+20 Stage 5: write value to s0.

Example: LW instruction

Datapath walkthrough j 0xdeadbeef
PC = Jumpaddr Stage 1: Fetch instruction, increment PC. Stage 2: decode and determine it is a jal, update PC to 0xdeadbeef. Stage 3: skip Stage 4: skip Stage 5: skip

MIPS instruction formats
All MIPS instructions are 32 bits long, has 3 formats R-type I-type J-type op rs rt rd shamt func 6 bits 5 bits op rs rt immediate 6 bits 5 bits 16 bits op immediate (target address) 6 bits 26 bits

Arithmetic Instructions
op rs rt rd - func 6 bits 5 bits R-Type op func mnemonic description 0x0 0x21 ADDU rd, rs, rt R[rd] = R[rs] + R[rt] 0x23 SUBU rd, rs, rt R[rd] = R[rs] – R[rt] 0x25 OR rd, rs, rt R[rd] = R[rs] | R[rt] 0x26 XOR rd, rs, rt R[rd] = R[rs]  R[rt] 0x27 NOR rd, rs rt R[rd] = ~ ( R[rs] | R[rt] )

Instruction Fetch Instruction Fetch Circuit inst +4
Fetch instruction from memory Calculate address of next instruction Repeat Program Memory inst 32 32 2 00 +4 PC

Arithmetic and Logic Prog. Mem Reg. File inst ALU +4 5 5 5 PC control

Arithmetic Instructions: Shift
op - rt rd shamt func 6 bits 5 bits R-Type op func mnemonic description 0x0 SLL rd, rs, shamt R[rd] = R[rt] << shamt 0x2 SRL rd, rs, shamt R[rd] = R[rt] >>> shamt (zero ext.) 0x3 SRA rd, rs, shamt R[rd] = R[rs] >> shamt (sign ext.) ex: r5 = r3 * 8

Shift Prog. Mem Reg. File inst ALU +4 5 5 5 PC control shamt

Arithmetic Instructions: Immediates
op rs rd immediate 6 bits 5 bits 16 bits I-Type op mnemonic description 0x9 ADDIU rd, rs, imm R[rd] = R[rs] + sign_extend(imm) 0xc ANDI rd, rs, imm R[rd] = R[rs] & zero_extend(imm) 0xd ORI rd, rs, imm R[rd] = R[rs] | zero_extend(imm) op mnemonic description 0x9 ADDIU rd, rs, imm R[rd] = R[rs] + imm 0xc ANDI rd, rs, imm R[rd] = R[rs] & imm 0xd ORI rd, rs, imm R[rd] = R[rs] | imm ex: r5 += 5 ex: r9 = -1 ex: r9 = 65535

Immediates Prog. inst Reg. File Mem ALU PC control imm extend shamt 5
+4 5 5 5 PC control control imm extend shamt

Arithmetic Instructions: Immediates
op - rd immediate 6 bits 5 bits 16 bits I-Type op mnemonic description 0xF LUI rd, imm R[rd] = imm << 16 ex: r5 = 0xdeadbeef

Immediates Prog. inst Reg. File Mem ALU PC control 16 imm extend shamt
+4 5 5 5 PC control control 16 imm extend or, extra input to alu B mux from before/after extend (or, extra mux after alu) shamt

MIPS Instruction Types
Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

base + offset addressing
Memory Instructions op rs rd offset 6 bits 5 bits 16 bits I-Type base + offset addressing op mnemonic description 0x20 LB rd, offset(rs) R[rd] = sign_ext(Mem[offset+R[rs]]) 0x24 LBU rd, offset(rs) R[rd] = zero_ext(Mem[offset+R[rs]]) 0x21 LH rd, offset(rs) 0x25 LHU rd, offset(rs) 0x23 LW rd, offset(rs) R[rd] = Mem[offset+R[rs]] 0x28 SB rd, offset(rs) Mem[offset+R[rs]] = R[rd] 0x29 SH rd, offset(rs) 0x2b SW rd, offset(rs) Mem[offset+R[rs]] = R[rd] signed offsets

Memory Operations Prog. inst Reg. File Mem ALU PC Data Mem control imm
+4 addr 5 5 5 PC control Data Mem imm ext

Example int h, A[]; A[12] = h + A[8]; lw, t0, 32(s3) add t0, s2, t0
sw t0, 48(s3)

Control Flow: Absolute Jump
op immediate 6 bits 26 bits J-Type op mnemonic description 0x2 J target PC = (PC+4) || target || 00 op mnemonic description 0x2 J target PC = target || 00 op mnemonic description 0x2 J target PC = target Absolute addressing for jumps Jump from 0x to 0x ? NO Reverse? NO But: Jumps from 0x2FFFFFFF to 0x3xxxxxxx are possible, but not reverse Trade-off: out-of-region jumps vs. 32-bit instruction encoding MIPS Quirk: jump targets computed using already incremented PC

Absolute Jump long jump? computed jump? Prog. inst Reg. File Mem ALU
+4 addr 5 5 5 PC control Data Mem imm ext tgt || long jump? computed jump?

Control Flow: Jump Register
op rs - func 6 bits 5 bits R-Type op func mnemonic description 0x0 0x08 JR rs PC = R[rs]

Jump Register Prog. inst Reg. File Mem ALU PC Data Mem control imm ext
+4 addr 5 5 5 PC control Data Mem imm ext tgt ||

Examples (2) jump to 0xabcd1234 # assume 0 <= r3 <= 1 if (r3 == 0) jump to 0xdecafe00 else jump to 0xabcd1234

Examples (2) jump to 0xabcd1234 # assume 0 <= r3 <= 1 if (r3 == 0) jump to 0xdecafe0 else jump to 0xabcd1234

Control Flow: Branches
op rs rd offset 6 bits 5 bits 16 bits I-Type signed offsets op mnemonic description 0x4 BEQ rs, rd, offset if R[rs] == R[rd] then PC = PC+4 + (offset<<2) 0x5 BNE rs, rd, offset if R[rs] != R[rd] then PC = PC+4 + (offset<<2)

Examples (3) s0 s1 if (i == j) { i = i * 4; } else { j = i - j; }

Could have used ALU for branch add Could have used ALU for branch cmp
Absolute Jump Prog. Mem Reg. File inst ALU +4 addr 5 5 5 =? PC control Data Mem offset + imm ext tgt || Could have used ALU for branch add Could have used ALU for branch cmp

Control Flow: More Branches
op rs subop offset 6 bits 5 bits 16 bits almost I-Type signed offsets op subop mnemonic description 0x1 0x0 BLTZ rs, offset if R[rs] < 0 then PC = PC+4+ (offset<<2) BGEZ rs, offset if R[rs] ≥ 0 then PC = PC+4+ (offset<<2) 0x6 BLEZ rs, offset if R[rs] ≤ 0 then PC = PC+4+ (offset<<2) 0x7 BGTZ rs, offset if R[rs] > 0 then PC = PC+4+ (offset<<2)

Could have used ALU for branch cmp
Absolute Jump Prog. Mem Reg. File inst ALU +4 addr 5 5 5 =? PC control Data Mem cmp offset + imm ext tgt || Could have used ALU for branch cmp

Control Flow: Jump and Link
op immediate 6 bits 26 bits J-Type op mnemonic description 0x3 JAL target r31 = PC+8 PC = (PC+4) || target || 00

Could have used ALU for link add
Absolute Jump Prog. Mem Reg. File inst ALU +4 +4 addr 5 5 5 =? PC control Data Mem cmp offset + imm ext tgt || Could have used ALU for link add

Memory Layout 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x a 0x b ... 0xffffffff Examples: # r5 contains 0x5 sb r5, 2(r0) M[R[rs] + SigExtImm](7:0) = R[rd](7:0) lb r6, 2(r0) sw r5, 8(r0) lb r7, 8(r0) lb r8, 11(r0)

Endianness Endianness: Ordering of bytes within a memory word
Little Endian = least significant part first (MIPS, x86) 1000 1001 1002 1003 as 4 bytes as 2 halfwords as 1 word 0x Big Endian = most significant part first (MIPS, networks) 1000 1001 1002 1003 as 4 bytes as 2 halfwords as 1 word 0x

Next Time CPU Performance Pipelined CPU

Han Wang CS3410, Spring 2012 Computer Science Cornell University

Similar presentations

Presentation on theme: "Han Wang CS3410, Spring 2012 Computer Science Cornell University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Han Wang CS3410, Spring 2012 Computer Science Cornell University

Similar presentations

Presentation on theme: "Han Wang CS3410, Spring 2012 Computer Science Cornell University"— Presentation transcript:

Similar presentations

About project

Feedback