CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: 4.1-4.4, 1.4, Appendix A.

Slides:



Advertisements
Similar presentations
Datorteknik DatapathControl bild 1 Designing a Single Cycle Datapath & Datapath Control.
Advertisements

331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
1 CS3410 Guest Lecture A Simple CPU: remaining branch instructions CPU Performance Pipelined CPU Tudor Marian.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Arithmetic See: P&H Chapter 3.1-3, C.5-6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University A Processor See: P&H Chapter ,
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Fall 2006 © UCB T-Mobile’s Wi-Fi / Cell phone  T-mobile just announced a new phone that.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Spring 2007 © UCB Google Summer of Code  Student applications are now open (through );
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 1.6, Appendix B.
ECE 4436ECE 5367 ISA I. ECE 4436ECE 5367 CPU = Seconds= Instructions x Cycles x Seconds Time Program Program Instruction Cycle CPU = Seconds= Instructions.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , , Appendix B.
Hakim Weatherspoon CS 3410, Spring 2010 Computer Science Cornell University A Processor See: P&H Chapter ,
Processor Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter ,
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
Fall EE 333 Lillevik 333f06-l7 University of Portland School of Engineering Computer Organization Lecture 7 ALU design MIPS data path.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 1.6, Appendix B.
CPU Performance Pipelined CPU Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapters 1.4 and 4.5.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University CPU Performance Pipelined CPU See P&H Chapters 1.4 and 4.5.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.6.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
CS 61C: Great Ideas in Computer Architecture Datapath
ECE 445 – Computer Organization
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
Computer Organization & Programming Chapter 6 Single Datapath CPU Architecture.
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
Kavita Bala CS 3410, Spring 2014 Computer Science Cornell University.
Datapath and Control Unit Design
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
Computer Architecture Lecture 10 MIPS Control Unit Ralph Grishman Oct NYU.
MIPS Instruction Set Architecture Prof. Sirer CS 316 Cornell University.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University A Processor See: P&H Chapter ,
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.
Processor Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter ,
Single Cycle Controller Design
CPU Performance Pipelined CPU Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapters 1.4 and 4.5.
CS 61C: Great Ideas in Computer Architecture MIPS Datapath 1 Instructors: Nicholas Weaver & Vladimir Stojanovic
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CPU Performance Pipelined CPU
Morgan Kaufmann Publishers
IT 251 Computer Organization and Architecture
Processor (I).
CSCI206 - Computer Organization & Programming
Han Wang CS3410, Spring 2012 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Rocky K. C. Chang 6 November 2017
The Processor Lecture 3.2: Building a Datapath with Control
COMS 361 Computer Organization
Access the Instruction from Memory
MIPS Instruction Set Architecture
Hakim Weatherspoon CS 3410 Computer Science Cornell University
The RISC-V Processor Hakim Weatherspoon CS 3410 Computer Science
COMS 361 Computer Organization
CS 3410, Spring 2014 Computer Science Cornell University
Processor: Datapath and Control
Presentation transcript:

CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 1.4, Appendix A

5 ALU 55 control Reg. File PC Prog. Mem inst +4 Data Mem FetchDecodeExecuteMemoryWB

How many stages of a datapath are there in our single cycle MIPS design? A) 1 B) 2 C) 3 D) 4 E) 5

5 ALU 55 control Reg. File PC Prog. Mem inst +4 Data Mem FetchDecodeExecuteMemoryWB A Single cycle processor

5 ALU 55 control Reg. File PC Prog. Mem inst +4 Data Mem FetchDecodeExecuteMemoryWB A Single cycle processor

5 ALU 55 control Reg. File PC Prog. Mem inst +4 Data Mem FetchDecodeExecuteMemoryWB A Single cycle processor

5 ALU 55 control Reg. File PC Prog. Mem inst +4 Data Mem FetchDecodeExecuteMemoryWB A Single cycle processor

5 ALU 55 control Reg. File PC Prog. Mem inst +4 Data Mem FetchDecodeExecuteMemoryWB

The datapath for a MIPS processor has five stages: 1.Instruction Fetch 2.Instruction Decode 3.Execution (ALU) 4.Memory Access 5.Register Writeback This five stage datapath is used to execute all MIPS instructions

There are how many types of instructions in the MIPS ISA? A) 1 B) 3 C) 5 D) 200 E) 1000s

All MIPS instructions are 32 bits long, has 3 formats R-type I-type J-type oprsrtrdshamtfunc 6 bits5 bits 6 bits oprsrtimmediate 6 bits5 bits 16 bits opimmediate (target address) 6 bits 26 bits

Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

op-rtrdshamtfunc 6 bits5 bits 6 bits opfuncmnemonicdescription 0x0 SLL rd, rt, shamtR[rd] = R[rt] << shamt 0x00x2SRL rd, rt, shamtR[rd] = R[rt] >>> shamt (zero ext.) 0x00x3SRA rd, rt, shamtR[rd] = R[rt] >> shamt (sign ext.) ex: r8 = r4 * 64 # SLL r8, r4, 6 r8 = r4 << 6 R-Type

5 ALU 55 Reg. File PC Prog. Mem inst +4 shamt control r4 shamt = 6 r8 sll

mnemonicdescription 0x9ADDIU rd, rs, immR[rd] = R[rs] + sign_extend(imm) 0xcANDI rd, rs, immR[rd] = R[rs] & zero_extend(imm) 0xdORI rd, rs, immR[rd] = R[rs] | zero_extend(imm) opmnemonicdescription 0x8ADDI rd, rs, immR[rd] = R[rs] + imm 0xcANDI rd, rs, immR[rd] = R[rs] & imm 0xdORI rd, rs, immR[rd] = R[rs] | imm oprsrdimmediate 6 bits5 bits 16 bits I-Type ex: r5 += -1 ex: r5 = r5 + 5# ADDI r5, r5, 5 r5 += 5 What if immediate is negative?

5 imm 55 extend +4 shamt Reg. File PC Prog. Mem ALU inst control r5 addi

5 imm 55 extend +4 shamt Reg. File PC Prog. Mem ALU inst control r5 addi

opmnemonicdescription 0xFLUI rd, immR[rd] = imm << 16 op-rdimmediate 6 bits5 bits 16 bits I-Type ex: LUI r5, 0xdead ORI r5, r5 0xbeef What does r5 = ? r5 = 0xdeadbeef ex: r5 = 0x50000# LUI r5, 5

5 imm 55 extend +4 shamt Reg. File PC Prog. Mem ALU inst 16 control r5 lui 5

MIPS Datapath Memory layout Control Instructions Performance How fast can we make it? CPI (Cycles Per Instruction) MIPS (Instructions Per Cycle) Clock Frequency

Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

opmnemonicdescription 0x23LW rd, offset(rs)R[rd] = Mem[offset+R[rs]] 0x2bSW rd, offset(rs)Mem[offset+R[rs]] = R[rd] oprsrdoffset 6 bits5 bits 16 bits base + offset addressing I-Type signed offsets ex: = Mem[4+r5] = r1# SW r1, 4(r5)

Data Mem addr ext +4 5 imm 55 control Reg. File PC Prog. Mem ALU inst sw r5+4 4 Write Enable r5 r1 ex: = Mem[4+r5] = r1# SW r1, 4(r5)

opmnemonicdescription 0x20LB rd, offset(rs)R[rd] = sign_ext(Mem[offset+R[rs]]) 0x24LBU rd, offset(rs)R[rd] = zero_ext(Mem[offset+R[rs]]) 0x21LH rd, offset(rs)R[rd] = sign_ext(Mem[offset+R[rs]]) 0x25LHU rd, offset(rs)R[rd] = zero_ext(Mem[offset+R[rs]]) 0x23LW rd, offset(rs)R[rd] = Mem[offset+R[rs]] 0x28SB rd, offset(rs)Mem[offset+R[rs]] = R[rd] 0x29SH rd, offset(rs)Mem[offset+R[rs]] = R[rd] 0x2bSW rd, offset(rs)Mem[offset+R[rs]] = R[rd] oprsrdoffset 6 bits5 bits 16 bits

Endianness: Ordering of bytes within a memory word x Big Endian = most significant part first (MIPS, networks) Little Endian = least significant part first (MIPS, x86) as 4 bytes as 2 halfwords as 1 word x as 4 bytes as 2 halfwords as 1 word 0x780x56 0x340x12 0x56780x1234 0x120x34 0x560x78 0x12340x5678

Examples (big/little endian): # r5 contains 5 (0x ) SB r5, 2(r0) LB r6, 2(r0) # R[r6] = 0x05 SW r5, 8(r0) LB r7, 8(r0) LB r8, 11(r0) # R[r7] = 0x00 # R[r8] = 0x05 0x x x x x x x x x x x a 0x b... 0xffffffff 0x05 0x00 0x05

Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

opimmediate 6 bits26 bits J-Type ex: j 0x PC = ((PC+4) & 0xf ) | 0x opMnemonicDescription 0x2J targetPC = (PC+4)  target  00 (PC+4) target 00 4 bits26 bits2 bits (PC+4)

Absolute addressing for jumps Jump from 0x to 0x ? NO Reverse? NO –But: Jumps from 0x2FFFFFFF to 0x3xxxxxxx are possible, but not reverse Trade-off: out-of-region jumps vs. 32-bit instruction encoding MIPS Quirk: jump targets computed using already incremented PC opimmediate 6 bits26 bits J-Type (PC+4) will be the same opMnemonicDescription 0x2J targetPC = (PC+4)  target  00

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm J opMnemonicDescription 0x2J targetPC = (PC+4)  target  00

op rs---func 6 bits5 bits 6 bits opfuncmnemonicdescription 0x00x08JR rsPC = R[rs] R-Type ex: JR r3

+4  tgt Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm opfuncmnemonicdescription 0x00x08JR rsPC = R[rs] R[r3] JR

E.g. Use Jump or Jump Register instruction to jump to 0xabcd1234 But, what about a jump based on a condition? # assume 0 <= r3 <= 1 if (r3 == 0) jump to 0xdecafe00 else jump to 0xabcd1234

opmnemonicdescription 0x4BEQ rs, rd, offsetif R[rs] == R[rd] then PC = PC+4 + (offset<<2) 0x5BNE rs, rd, offsetif R[rs] != R[rd] then PC = PC+4 + (offset<<2) oprsrdoffset 6 bits5 bits 16 bits signed offsets I-Type ex: BEQ r5, r1, 3 If(R[r5]==R[r1]) then PC = PC (i.e. 12 == 3<<2)

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + Could have used ALU for branch add =? Could have used ALU for branch cmp opmnemonicdescription 0x4BEQ rs, rd, offsetif R[rs] == R[rd] then PC = PC+4 + (offset<<2) 0x5BNE rs, rd, offsetif R[rs] != R[rd] then PC = PC+4 + (offset<<2) R[r5] BEQ R[r1]

oprssubopoffset 6 bits5 bits 16 bits signed offsets almost I-Type opsubopmnemonicdescription 0x10x0BLTZ rs, offsetif R[rs] < 0 then PC = PC+4+ (offset<<2) 0x1 BGEZ rs, offsetif R[rs] ≥ 0 then PC = PC+4+ (offset<<2) 0x60x0BLEZ rs, offsetif R[rs] ≤ 0 then PC = PC+4+ (offset<<2) 0x70x0BGTZ rs, offsetif R[rs] > 0 then PC = PC+4+ (offset<<2) Conditional Jumps (cont.) ex: BGEZ r5, 2 If(R[r5] ≥ 0) then PC = PC (i.e. 8 == 2<<2)

opsubopmnemonicdescription 0x10x0BLTZ rs, offsetif R[rs] < 0 then PC = PC+4+ (offset<<2) 0x1 BGEZ rs, offsetif R[rs] ≥ 0 then PC = PC+4+ (offset<<2) 0x60x0BLEZ rs, offsetif R[rs] ≤ 0 then PC = PC+4+ (offset<<2) 0x70x0BGTZ rs, offsetif R[rs] > 0 then PC = PC+4+ (offset<<2) tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + Could have used ALU for branch cmp =? cmp R[r5] BEQZ

opmnemonicdescription 0x3JAL targetr31 = PC+8 (+8 due to branch delay slot) PC = (PC+4)  target  00 opimmediate 6 bits 26 bits J-Type Why? Function/procedure calls opmnemonicdescription 0x2J targetPC = (PC+4)  target  00 Discuss later ex: JAL 0x r31 = PC+8 PC = (PC+4)  0x

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + =? cmp Could have used ALU for link add +4 opmnemonicdescription 0x3JAL targetr31 = PC+8 (+8 due to branch delay slot) PC = (PC+4) || (target << 2) R[r31]PC+8

MIPS Datapath Memory layout Control Instructions Performance How to get it? CPI (Cycles Per Instruction) MIPS (Instructions Per Cycle) Clock Frequency Pipelining Latency vs throughput

How do we measure performance? What is the performance of a single cycle CPU? How do I get performance? See: P&H 1.4

A) LW B) SW C) ADD/SUB/AND/OR/etc D) BEQ E) J

Data Mem addr ext +4 5 imm 55 control Reg. File PC Prog. Mem ALU inst lw r5+4 4 Write Enable r5 ex: = r1 = Mem[4+r5] # LW r1, 4(r5) Mem[4+r5] r1

How do I get it? Parallelism Pipelining Both!

Speed of a circuit is affected by the number of gates in series (on the critical path or the deepest level of logic) combinatorial Logic t combinatorial inputs arrive outputs expected

A3A3 B3B3 S3S3 C4C4 A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A0A0 B0B0 C0C0 S0S0 C1C1 C2C2 C3C3 First full adder, 2 gate delay Second full adder, 2 gate delay … Carry ripples from lsb to msb

Main ALU, slows us down Does it need to be this slow? Observations Have to wait for Cin Can we compute in parallel in some way? CLA carry look-ahead adder

Can we reason independent of Cin? Just based on (A,B) only When is Cout == 1, irrespective of Cin If Cin == 1, when is Cout also == 1

A B S C in C out ABC in C out S Full Adder Adds three 1-bit numbers Computes 1-bit result and 1-bit carry Can be cascaded

AB Cin R pg Create two terms: propagator, generator g = 1, generates Cout: g = AB Irrespective of Cin p = 1, propagates Cin to Cout: p = A + B p and g generated in 1 cycle delay R is 2 cycle delay after we get Cin

A B S C in C out ABC in C out S Full Adder Adds three 1-bit numbers Computes 1-bit result and 1-bit carry Can be cascaded

AB C0 pg AB pg AB pg AB pg CLA (carry look-ahead logic) CLA takes p,g from all 4 bits, C0 and generates all Cs: 2 gate delay C1 C2C3 C4 S C i = Function (C 0, p and g values)

A0B0 C0 A1B1A2B2A3B3 CLA (carry look-ahead logic) p0 g0 p1 g1 p2 g2 p3 g3 Given A,B’s, all p,g’s are generated in 1 gate delay in parallel C1 C2C3 Given all p,g’s, all C’s are generated in 2 gate delay in parallel S3 S2 S1S0 Given all C’s, all S’s are generated in 2 gate delay in parallel Sequential operation is made into parallel operation!!

Ripple carry adder vs carry lookahead adder for 8 bits 2 x 8 vs. 5

How do I get it? Parallelism Pipelining Both!