Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 4.1-4.4, 1.6, Appendix B.

Slides:



Advertisements
Similar presentations
Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University See P&H Appendix B.7. B.8, B.10, B.11.
Advertisements

State & Finite State Machines Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Appendix C.7. C.8, C.10, C.11.
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Appendix B.7. B.8, B.10, B.11.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Appendix B.7. B.8, B.10, B.11.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Arithmetic Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 2.4 (signed), 2.5, 2.6, C.6, and Appendix C.6.
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
1 CS3410 Guest Lecture A Simple CPU: remaining branch instructions CPU Performance Pipelined CPU Tudor Marian.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Arithmetic See: P&H Chapter 3.1-3, C.5-6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University A Processor See: P&H Chapter ,
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 1.6,
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 1.4, Appendix A.
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
CS 61C L17 Control (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #17: CPU Design II – Control
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 1.6, Appendix B.
ECE 4436ECE 5367 ISA I. ECE 4436ECE 5367 CPU = Seconds= Instructions x Cycles x Seconds Time Program Program Instruction Cycle CPU = Seconds= Instructions.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , , Appendix B.
Hakim Weatherspoon CS 3410, Spring 2010 Computer Science Cornell University A Processor See: P&H Chapter ,
Processor Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter ,
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See: P&H Chapter 2.4, 3.2, B.2, B.5, B.6.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: 1.6,
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Appendix B.8 (register files) and B.9.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Fall EE 333 Lillevik 333f06-l7 University of Portland School of Engineering Computer Organization Lecture 7 ALU design MIPS data path.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University Memory See: P&H Appendix C.8, C.9.
Memory Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University.
Numbers and Arithmetic Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See: P&H Chapter , 3.2, C.5 – C.6.
CPU Performance Pipelined CPU Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapters 1.4 and 4.5.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University CPU Performance Pipelined CPU See P&H Chapters 1.4 and 4.5.
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter 4.6.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
CS2100 Computer Organisation
CDA 3101 Fall 2013 Introduction to Computer Organization
Kavita Bala CS 3410, Spring 2014 Computer Science Cornell University.
Datapath and Control Unit Design
1. Building A CPU  We’ve built a small ALU l Add, Subtract, SLT, And, Or l Could figure out Multiply and Divide  What about the rest l How do.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
Computer Architecture Lecture 10 MIPS Control Unit Ralph Grishman Oct NYU.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University MIPS Pipeline See P&H Chapter 4.6.
CSE331 W10.1Irwin&Li Fall 2006 PSU CSE 331 Computer Organization and Design Fall 2006 Week 10 Section 1: Mary Jane Irwin (
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
MIPS Instruction Set Architecture Prof. Sirer CS 316 Cornell University.
Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University See P&H Appendix B.7. B.8, B.10, B.11.
Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University A Processor See: P&H Chapter ,
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Processor Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapter ,
CPU Performance Pipelined CPU Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H Chapters 1.4 and 4.5.
CPU Performance Pipelined CPU
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2012
State & Finite State Machines
Pipeline Control Hazards and Instruction Variations
Prof. Kavita Bala and Prof. Hakim Weatherspoon
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Han Wang CS3410, Spring 2012 Computer Science Cornell University
Pipeline Control Hazards and Instruction Variations
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Rocky K. C. Chang 6 November 2017
MIPS Instruction Set Architecture
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University
MIPS Pipeline Hakim Weatherspoon CS 3410, Spring 2013 Computer Science
The RISC-V Processor Hakim Weatherspoon CS 3410 Computer Science
Presentation transcript:

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 1.6, Appendix B

Project Partner finding assignment on CMS No official office hours over break Lab1 due tomorrow HW1 Help Sessions Wed, Feb 18 and Sun, Feb 21

Make sure to go to your Lab Section this week Lab2 due in class this week (it is not homework) Lab1: Completed Lab1 due tomorrow Friday, Feb 13 th, before winter break Note, a Design Document is due when you submit Lab1 final circuit Work alone Save your work! Save often. Verify file is non-zero. Periodically save to Dropbox, . Beware of MacOSX 10.5 (leopard) and 10.6 (snow-leopard) Homework1 is out Due a week before prelim1, Monday, February 23rd Work on problems incrementally, as we cover them in lecture (i.e. part 1) Office Hours for help Work alone Work alone, BUT use your resources Lab Section, Piazza.com, Office Hours Class notes, book, Sections, CSUGLab

Check online syllabus/schedule Slides and Reading for lectures Office Hours Pictures of all TAs Homework and Programming Assignments Dates to keep in Mind Prelims: Tue Mar 3rd and Thur April 30th Lab 1: Due this Friday, Feb 13th before Winter break Proj2: Due Thur Mar 26th before Spring break Final Project: Due when final would be (not known until Feb 14th) Schedule is subject to change

“Black Board” Collaboration Policy Can discuss approach together on a “black board” Leave and write up solution independently Do not copy solutions Late Policy Each person has a total of four “slip days” Max of two slip days for any individual assignment Slip days deducted first for any late assignment, cannot selectively apply slip days For projects, slip days are deducted from all partners 25% deducted per day late after slip days are exhausted Regrade policy Submit written request to lead TA, and lead TA will pick a different grader Submit another written request, lead TA will regrade directly Submit yet another written request for professor to regrade.

MIPS Datapath Memory layout Control Instructions Performance How fast can we make it? CPI (Cycles Per Instruction) MIPS (Instructions Per Cycle) Clock Frequency

Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

opmnemonicdescription 0x23LW rd, offset(rs)R[rd] = Mem[offset+R[rs]] 0x2bSW rd, offset(rs)Mem[offset+R[rs]] = R[rd] oprsrdoffset 6 bits5 bits 16 bits base + offset addressing I-Type signed offsets

Data Mem addr ext +4 5 imm 55 control Reg. File PC Prog. Mem ALU inst

opmnemonicdescription 0x20LB rd, offset(rs)R[rd] = sign_ext(Mem[offset+R[rs]]) 0x24LBU rd, offset(rs)R[rd] = zero_ext(Mem[offset+R[rs]]) 0x21LH rd, offset(rs)R[rd] = sign_ext(Mem[offset+R[rs]]) 0x25LHU rd, offset(rs)R[rd] = zero_ext(Mem[offset+R[rs]]) 0x23LW rd, offset(rs)R[rd] = Mem[offset+R[rs]] 0x28SB rd, offset(rs)Mem[offset+R[rs]] = R[rd] 0x29SH rd, offset(rs)Mem[offset+R[rs]] = R[rd] 0x2bSW rd, offset(rs)Mem[offset+R[rs]] = R[rd] oprsrdoffset 6 bits5 bits 16 bits

Endianness: Ordering of bytes within a memory word x Big Endian = most significant part first (MIPS, networks) Little Endian = least significant part first (MIPS, x86) as 4 bytes as 2 halfwords as 1 word x as 4 bytes as 2 halfwords as 1 word

Examples (big/little endian): # r5 contains 5 (0x ) SB r5, 2(r0) LB r6, 2(r0) SW r5, 8(r0) LB r7, 8(r0) LB r8, 11(r0) 0x x x x x x x x x x x a 0x b... 0xffffffff

Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute

opMnemonicDescription 0x2J targetPC = target  00 opimmediate 6 bits26 bits J-Type opMnemonicDescription 0x2J targetPC = (PC+4)  target  00

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm

op rs---func 6 bits5 bits 6 bits opfuncmnemonicdescription 0x00x08JR rsPC = R[rs] R-Type

+4  tgt Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm JR

E.g. Use Jump or Jump Register instruction to jump to 0xabcd1234 But, what about a jump based on a condition? # assume 0 <= r3 <= 1 if (r3 == 0) jump to 0xdecafe00 else jump to 0xabcd1234

opmnemonicdescription 0x4BEQ rs, rd, offsetif R[rs] == R[rd] then PC = PC+4 + (offset<<2) 0x5BNE rs, rd, offsetif R[rs] != R[rd] then PC = PC+4 + (offset<<2) oprsrdoffset 6 bits5 bits 16 bits signed offsets I-Type

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + =?

oprssubopoffset 6 bits5 bits 16 bits signed offsets almost I-Type opsubopmnemonicdescription 0x10x0BLTZ rs, offsetif R[rs] < 0 then PC = PC+4+ (offset<<2) 0x1 BGEZ rs, offsetif R[rs] ≥ 0 then PC = PC+4+ (offset<<2) 0x60x0BLEZ rs, offsetif R[rs] ≤ 0 then PC = PC+4+ (offset<<2) 0x70x0BGTZ rs, offsetif R[rs] > 0 then PC = PC+4+ (offset<<2) Conditional Jumps (cont.)

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + =? cmp

opmnemonicdescription 0x3JAL targetr31 = PC+8 (+8 due to branch delay slot) PC = (PC+4)  target  00 opimmediate 6 bits 26 bits J-Type Discuss later Function/procedure callsWhy?

tgt +4  Data Mem addr ext 555 Reg. File PC Prog. Mem ALU inst control imm offset + =? cmp +4

MIPS Datapath Memory layout Control Instructions Performance How to get it? CPI (Cycles Per Instruction) MIPS (Instructions Per Cycle) Clock Frequency Pipelining Latency vs throughput

How do we measure performance? What is the performance of a single cycle CPU? How do I get performance? See: P&H 1.6

How do I get it? Parallelism Pipelining Both!

Speed of a circuit is affected by the number of gates in series (on the critical path or the deepest level of logic) combinatorial Logic t combinatorial inputs arrive outputs expected

A3A3 B3B3 S3S3 C4C4 A1A1 B1B1 S1S1 A2A2 B2B2 S2S2 A0A0 B0B0 C0C0 S0S0 C1C1 C2C2 C3C3 First full adder, 2 gate delay Second full adder, 2 gate delay … Carry ripples from lsb to msb

Main ALU, slows us down Does it need to be this slow? Observations Have to wait for C in Can we compute in parallel in some way? CLA carry look-ahead adder

Can we reason C out independent of C in ? Just based on (A,B) only When is C out == 1, irrespective of C in ? If C in == 1, when is C out also == 1

A B S C in C out ABC in C out S Full Adder Adds three 1-bit numbers Computes 1-bit result and 1-bit carry Can be cascaded

AB C in S pg Create two terms: propagator, generator g = 1, generates C out : g = AB Irrespective of C in p = 1, propagates C in to C out : p = A + B p and g generated in 1 gate delay S is 2 gate delay after we get C in

AB C0C0 pg AB pg AB pg AB pg CLA (carry look-ahead logic) C1C1 C2C2 C3C3 C4 S

How do I get it? Parallelism Pipelining Both!

MIPS Datapath Memory layout Control Instructions Performance How to get it? Parallelism and Pipeline! CPI (Cycles Per Instruction) MIPS (Instructions Per Cycle) Clock Frequency Pipelining Latency vs throughput Next Time