John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

Slides:



Advertisements
Similar presentations
1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.
Advertisements

CEG3420 Lec2.1 ©UCB Fall 1997 ISA Review CEG3420 Computer Design Lecture 2.
ECE 232 L6.Assemb.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 6 MIPS Assembly.
ELEN 468 Advanced Logic Design
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
ENEE350 Spring07 1 Ankur Srivastava University of Maryland, College Park Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005.”
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
IT253: Computer Organization Lecture 5: Assembly Language and an Introduction to MIPS Tonga Institute of Higher Education.
6.S078 - Computer Architecture: A Constructive Approach Introduction to SMIPS Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts.
Chapter 10 The Assembly Process. What Assemblers Do Translates assembly language into machine code. Assigns addresses to all symbolic labels (variables.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
MIPS Instructions Instructions: Language of the Machine
Chapter 2 — Instructions: Language of the Computer — 1 Conditional Operations Branch to a labeled instruction if a condition is true – Otherwise, continue.
EEL5708/Bölöni Lec 3.1 Fall 2006 Sept 1, 2006 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
Single Cycle Controller Design
Instructor: Prof. Hany H Ammar, LCSEE, WVU
Digital Logic Design Alex Bronstein
CS161 – Design and Architecture of Computer Systems
CS 230: Computer Organization and Assembly Language
Single-Cycle Datapath and Control
Computer Organization and Design Instruction Sets - 2
Instructions: Language of the Computer
IT 251 Computer Organization and Architecture
Lecture 4: MIPS Instruction Set
ELEN 468 Advanced Logic Design
Computer Organization and Design Instruction Sets - 2
(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
32-bit MIPS ISA.
ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath Start: X:40.
Computer Organization and Design Instruction Sets
Single Cycle CPU Design
Instructions - Type and Format
Appendix A Classifying Instruction Set Architecture
Single-Cycle CPU DataPath.
Lecture 4: MIPS Instruction Set
Instructors: Randy H. Katz David A. Patterson
The University of Adelaide, School of Computer Science
ECE232: Hardware Organization and Design
Today’s Lecture Quick Review of Last Lecture
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
The Single Cycle Datapath
Instruction encoding The ISA defines Format = Encoding
CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.
Lecture 5: Procedure Calls
Guest Lecturer TA: Shreyas Chand
COMS 361 Computer Organization
Computer Instructions
Computer Architecture
Flow of Control -- Conditional branch instructions
inst.eecs.berkeley.edu/~cs61c-to
UCSD ECE 111 Prof. Farinaz Koushanfar Fall 2018
MIPS Instruction Set Architecture
Flow of Control -- Conditional branch instructions
Instructors: Randy H. Katz David A. Patterson
CS352H Computer Systems Architecture
Reduced Instruction Set Computer (RISC)
MIPS Arithmetic and Logic Instructions
COMS 361 Computer Organization
What You Will Learn In Next Few Sets of Lectures
Designing a Single-Cycle Processor
Processor: Datapath and Control
Presentation transcript:

John Kubiatowicz (http.cs.berkeley.edu/~kubitron) CS152 Computer Architecture and Engineering Lecture 2 Review of MIPS ISA and Design Concepts January 26, 2004 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://inst.eecs.berkeley.edu/~cs152/

All computers consist of five components Review: Organization Control Datapath Memory Processor Input Output Let me summarize what I have said so far. The most important thing I want you to remember is that: all computers, no matter how complicated or expensive, can be divided into five components: (1) The datapath and (2) control that make up the processor. (3) The memory system that supplies data to the processor. And last but not least, the (4) input and (5) output devices that get data in and out of the computer. One thing about memory is that Not all “memory” are created equally. Some memory are faster but more expensive and we place them closer to the processor and call them “cache.” The main memory can be slower than the cache so we usually use less expensive parts so we can have more of them. Finally as you can see from the last few slides, the input and output devices usually has the messiest organization. There are several reasons for it: (1) First of all, I/O devices can have a wide range of speed. (2) Then I/O devices also have a wide range of requirements. (s) Finally to make matters worse, historically I/O has attracted the least amount of research interest. But hopefully this is changing. In this class, you will learn about all these five components and we will try to make this as enjoyable as possible. So have fun. All computers consist of five components Processor: (1) datapath and (2) control (3) Memory I/O: (4) Input devices and (5) Output devices Datapath and Control typically on on chip 1/26/03 ©UCB Spring 2004

The Instruction Set: a Critical Interface software instruction set hardware 1/26/03 ©UCB Spring 2004

ISA Choices 1/26/03 ©UCB Spring 2004

Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code UNICODE 16 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision exponent How many +/- #'s? Where is decimal pt? How are +/- exponents represented? E M x R base 1/26/03 mantissa ©UCB Spring 2004

Instruction Set Architecture: What Must be Specified? Fetch Decode Operand Execute Result Store Next Instruction Format or Encoding how is it decoded? Location of operands and result where other than memory? how many explicit operands? how are memory operands located? which can or cannot be in memory? Data type and Size Operations what are supported Successor instruction jumps, conditions, branches fetch-decode-execute is implicit! 1/26/03 ©UCB Spring 2004

Top 10 80x86 Instructions 1/26/03 ©UCB Spring 2004

move register-register, and, shift, compare equal, compare not equal, Operation Summary Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return; 1/26/03 ©UCB Spring 2004

Methods of Testing Condition Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. ex: add r1, r2, r3 bz label Condition Register Ex: cmp r1, r2, r3 bgt r1, label Compare and Branch Ex: bgt r1, r2, label Branches will be the bane of our existence in the future! 1/26/03 ©UCB Spring 2004

2 questions for design of ISA: Memory Addressing Processor Memory: Continuous Linear Address Space? Since 1980 almost every machine uses addresses to level of 8-bits (byte) 2 questions for design of ISA: How do byte addresses map onto words? Can a word be placed on any byte boundary? • 1/26/03 ©UCB Spring 2004

Addressing Objects: Endianess and Alignment Big Endian: address of most significant byte = word address (xx00 = Big End of word) IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA Little Endian: address of least significant byte = word address (xx00 = Little End of word) Intel 80x86, DEC Vax, DEC Alpha (Windows NT) little endian byte 0 3 2 1 0 msb lsb 0 1 2 3 0 1 2 3 Aligned big endian byte 0 Alignment: require that objects fall on address that is multiple of their size. Not Aligned 1/26/03 ©UCB Spring 2004

Why Post-increment/Pre-decrement? Scaled? Addressing Modes Addressing mode Example Meaning Register Add R4,R3 R4 ¬ R4+R3 Immediate Add R4,#3 R4 ¬ R4+3 Displacement Add R4,100(R1) R4 ¬ R4+Mem[100+R1] Register indirect Add R4,(R1) R4 ¬ R4+Mem[R1] Indexed / Base Add R3,(R1+R2) R3 ¬ R3+Mem[R1+R2] Autoinc/dec change source register! Why auto inc/dec Why reverse order of inc/dec? Scaled multiples by d, a small constant Why? Hint: byte addressing Direct or absolute Add R1,(1001) R1 ¬ R1+Mem[1001] Memory indirect Add R1,@(R3) R1 ¬ R1+Mem[Mem[R3]] Post-increment Add R1,(R2)+ R1 ¬ R1+Mem[R2]; R2 ¬ R2+d Pre-decrement Add R1,–(R2) R2 ¬ R2–d; R1 ¬ R1+Mem[R2] Scaled Add R1,100(R2)[R3] R1 ¬ R1+Mem[100+R2+R3*d] Why Post-increment/Pre-decrement? Scaled? 1/26/03 ©UCB Spring 2004

Addressing Mode Usage? 3 programs measured on machine with all address modes (VAX) --- Displacement: 42% avg, 32% to 55% 75% --- Immediate: 33% avg, 17% to 43% 85% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% 75% displacement & immediate 85% displacement, immediate & register indirect 1/26/03 ©UCB Spring 2004

MIPS I Instruction set 1/26/03 ©UCB Spring 2004

MIPS R3000 Instruction Set Architecture (Summary) Register Set 32 general 32-bit registers Register zero ($R0) always zero Hi/Lo for multiplication/division Instruction Categories Load/Store Computational Integer/Floating point Jump and Branch Memory Management Special 3 Instruction Formats: all 32 bits wide Registers R0 - R31 PC HI LO OP rs rt rd sa funct OP rs rt immediate OP jump target 1/26/03 ©UCB Spring 2004

MIPS Addressing Modes/Instruction Formats All instructions 32 bits wide Register (direct) op rs rt rd register Immediate op rs rt immed Base+index op rs rt immed Memory register + PC-relative op rs rt immed Memory PC + Register Indirect? 1/26/03 ©UCB Spring 2004

MIPS I Operation Overview Arithmetic Logical: Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access: LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR 1/26/03 ©UCB Spring 2004

Multiply / Divide Start multiply, divide MULT rs, rt MULTU rs, rt DIV rs, rt DIVU rs, rt Move result from multiply, divide MFHI rd MFLO rd Move to HI or LO MTHI rd MTLO rd Why not Third field for destination? (Hint: how many clock cycles for multiply or divide vs. add?) Registers HI LO 1/26/03 ©UCB Spring 2004

MIPS arithmetic instructions Instruction Example Meaning Comments add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no except multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi Move from Lo mflo $1 $1 = Lo Used to get copy of Lo Which add for address arithmetic? Which add for integers? 1/26/03 ©UCB Spring 2004

MIPS logical instructions Instruction Example Meaning Comment and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 Å $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable 1/26/03 ©UCB Spring 2004

MIPS data transfer instructions Instruction Comment SW 500($4), $3 Store word SH 502($2), $3 Store half SB 41($3), $2 Store byte LW $1, 30($2) Load word LH $1, 40($3) Load halfword LHU $1, 40($3) Load halfword unsigned LB $1, 40($3) Load byte LBU $1, 40($3) Load byte unsigned LUI $1, 40 Load Upper Immediate (16 bits shifted left by 16) Why need LUI? LUI $5 $5 0000 … 0000 1/26/03 ©UCB Spring 2004

When does MIPS sign extend? When value is sign extended, copy upper bit to full value: Examples of sign extending 8 bits to 16 bits: 00001010  00000000 00001010 10001100  11111111 10001100 When is an immediate operand sign extended? Arithmetic instructions (add, sub, etc.) always sign extend immediates even for the unsigned versions of the instructions! Logical instructions do not sign extend immediates (They are zero extended) Load/Store address computations always sign extend immediates Multiply/Divide have no immediate operands however: “unsigned”  treat operands as unsigned The data loaded by the instructions lb and lh are extended as follows (“unsigned”  don’t extend): lbu, lhu are zero extended lb, lh are sign extended 1/26/03 ©UCB Spring 2004

Administrative Matters CS152 news group: ucb.class.cs152 (email cs152@cory with specific questions) Slides and handouts available via web: http://inst.eecs.berkeley.edu/~cs152 Sign up to the cs152-announce mailing list: Link on front page of web site Sections are on Thursday: Starting this Thursday (1/29)! 2:00 – 4:00 3107 Etcheverry 4:00 – 6:00 3107 Etcheverry Get Cory key card/card access to Cory 119 Homework #1 due Monday at beginning of lecture It is up there now – really! Lab 1 due on following Wednesday (2/4), so get moving! Prerequisite quiz will also be on Monday 1/29: CS 61C, CS150 Review Chapters 1-4, Ap, B of COD, Second Edition Look at previous versions of prerequisite quiz off handouts page Don’t forget to petition if you are on the waiting list! Available on third-floor, near front office Must turn in soon Turn in survey forms with photo before Prereq quiz! 1/26/03 ©UCB Spring 2004

MIPS Compare and Branch BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch BNE rs, rt, offset <> Compare to zero and Branch BLEZ rs, offset if R[rs] <= 0 then PC-relative branch BGTZ rs, offset > BLT < BGEZ >= BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31) BGEZAL >=! Remaining set of compare and branch ops take two instructions Almost all comparisons are against zero! 1/26/03 ©UCB Spring 2004

MIPS jump, branch, compare instructions Instruction Example Meaning branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp. set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp. set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; natural numbers set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; natural numbers jump j 10000 go to 10000 Jump to target address jump register jr $31 go to $31 For switch, procedure return jump and link jal 10000 $31 = PC + 4; go to 10000 For procedure call 1/26/03 ©UCB Spring 2004

Signed vs. Unsigned Comparison After executing these instructions: slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0 What are values of registers r4 - r7? Why? r4 = ; r5 = ; r6 = ; r7 = ; two two two 1/26/03 ©UCB Spring 2004

Branch & Pipelines Time li r3, #7 execute sub r4, r4, 1 ifetch execute bz r4, LL ifetch execute Branch addi r5, r3, 1 ifetch execute Delay Slot LL: slt r1, r3, r5 ifetch execute Branch Target By the end of Branch instruction, the CPU knows whether or not the branch will take place. However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken. Why not execute it? 1/26/03 ©UCB Spring 2004

 Delay Slot Instruction Delayed Branches li r3, #7 sub r4, r4, 1 bz r4, LL addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5  Delay Slot Instruction In the “Raw” MIPS, the instruction after the branch is executed even when the branch is taken This is hidden by the assembler for the MIPS “virtual machine” allows the compiler to better utilize the instruction pipeline (???) Jump and link (jal inst): Put the return addr. Into link register (R31): PC+4 (logical architecture) PC+8 physical (“Raw”) architecture  delay slot executed Then jump to destination address 1/26/03 ©UCB Spring 2004

Filling Delayed Branches Inst Fetch Dcd & Op Fetch Execute execute successor even if branch taken! Inst Fetch Dcd & Op Fetch Execute Then branch target or continue Inst Fetch Single delay slot impacts the critical path add r3, r1, r2 sub r4, r4, 1 bz r4, LL NOP ... LL: add rd, ... Compiler can fill a single delay slot with a useful instruction 50% of the time. try to move down from above jump move up from target, if safe Is this violating the ISA abstraction? 1/26/03 ©UCB Spring 2004

Miscellaneous MIPS I instructions break A breakpoint trap occurs, transfers control to exception handler syscall A system trap occurs, transfers control to exception handler coprocessor instrs. Support for floating point TLB instructions Support for virtual memory: discussed later restore from exception Restores previous interrupt mask & kernel/user mode bits into status register load word left/right Supports misaligned word loads store word left/right Supports misaligned word stores 1/26/03 ©UCB Spring 2004

MIPS: Software conventions for Registers 0 zero constant 0 1 at reserved for assembler 2 v0 expression evaluation & 3 v1 function results 4 a0 arguments 5 a1 6 a2 7 a3 8 t0 temporary: caller saves . . . (callee can clobber) 15 t7 16 s0 callee saves . . . (callee must save) 23 s7 24 t8 temporary (cont’d) 25 t9 26 k0 reserved for OS kernel 27 k1 28 gp Pointer to global area 29 sp Stack pointer 30 fp frame pointer 31 ra Return Address (HW) 1/26/03 ©UCB Spring 2004

Calls: Why Are Stacks So Great? Stacking of Subroutine Calls & Returns and Environments: A A: CALL B CALL C C: RET B: A B A B C A B A Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS) 1/26/03 ©UCB Spring 2004

Memory Stacks Useful for stacked environments/subroutine call & return even if operand stack not part of architecture Stacks that Grow Up vs. Stacks that Grow Down: inf. Big 0 Little Next Empty? grows up grows down Memory Addresses c b Last Full? SP a 0 Little inf. Big How is empty stack represented? Big  Little: Last Full POP: Read from Mem(SP) Increment SP PUSH: Decrement SP Write to Mem(SP) Big  Little: Next Empty POP: Increment SP Read from Mem(SP) PUSH: Write to Mem(SP) Decrement SP 1/26/03 ©UCB Spring 2004

Call-Return Linkage: Stack Frames High Mem ARGS Reference args and local variables at fixed (positive) offset from FP Callee Save Registers (old FP, RA) Local Variables FP Grows and shrinks during expression evaluation SP Low Mem Many variations on stacks possible (up/down, last pushed / next ) Compilers normally keep scalar variables in registers, not memory! 1/26/03 ©UCB Spring 2004

MIPS / GCC Calling Conventions fact: addiu $sp, $sp, -32 sw $ra, 20($sp) sw $fp, 16($sp) addiu $fp, $sp, 32 . . . sw $a0, 0($fp) ... lw $ra, 20($sp) lw $fp, 16($sp) addiu $sp, $sp, 32 jr $ra FP SP ra low address FP SP ra ra old FP FP SP ra old FP First four arguments passed in registers Result passed in $v0/$v1 1/26/03 ©UCB Spring 2004

Logic Design: Combinational And Sequential 1/26/03 ©UCB Spring 2004

Finite State Machines: System state is explicit in representation Transitions between states represented as arrows with inputs on arcs. Output may be either part of state or on arcs Alpha/ Delta/ 2 Beta/ 1 “Mod 3 Machine” Input (MSB first) 106 Mod 3 1 1 1 0 1 0 1 0 0 1 2 2 1 1 1/26/03 ©UCB Spring 2004

Implementation as Combinational logic + Latch Alpha/ Delta/ 2 Beta/ 1 0/0 1/0 1/1 0/1 Latch Combinational Logic “Moore Machine” “Mealey Machine” 1/26/03 ©UCB Spring 2004

Again: The loop of control (is there a statemachine here?) Instruction Fetch Decode Operand Execute Result Store Next Instruction Format or Encoding how is it decoded? Location of operands and result where other than memory? how many explicit operands? how are memory operands located? which can or cannot be in memory? Data type and Size Operations what are supported Successor instruction jumps, conditions, branches fetch-decode-execute is implicit! 1/26/03 ©UCB Spring 2004

Remember The Single-Cycle Datapath? Control Ideal Instruction Memory Control Signals Instruction Conditions Rd Rs Rt 5 5 5 Instruction Address A Data Address Data Out One thing you may noticed from our last slide is that almost all instructions, except Jump, require reading some registers, do some computation, and then do something else. Therefore our datapath will look something like this. For example, if we have an add instruction (points to the output of Instruction Memory), we will read the registers from the register file (Ra, Rb and then busA and busB). Add the two numbers together (ALU) and then write the result back to the register file. On the other hand, if we have a load instruction, we will first use the ALU to calculate the memory address. Once the address is ready, we will use it to access the Data Memory. And once the data is available on Data Memory’s output bus, we will write the data to the register file. Well, this is simple enough. But if it is this simple, you probably won’t need to take this class. So in today’s lecture, I will show you how to turn this abstract datapath into a real datapath by making it slightly (JUST slightly) more complicated so it can do real work for you. But before we do that, let’s do a quick review of the clocking methodology +3 = 16 (X:56) 32 Clk PC Rw Ra Rb ALU 32 32 Ideal Data Memory 32 32-bit Registers Next Address Data In B Clk Clk 32 Datapath If you don’t fully remember this, it is ok! (Don’t need for prereq quiz) 1/26/03 ©UCB Spring 2004

More detail: A Single Cycle Datapath Rs, Rt, Rd and Imed16 hardwired into datapath from Fetch Unit We have everything except ontrol signals (underline) Today’s lecture will show you how to generate the control signals Instruction<31:0> nPC_sel Instruction Fetch Unit Rd Rt <21:25> <16:20> <11:15> <0:15> Clk RegDst 1 Mux Rs Rt Rt Rs Rd Imm16 RegWr ALUctr The result of the last lecture is this single-cycle datapath. +1 = 6 min. (X:46) 5 5 5 Zero MemWr MemtoReg busA Rw Ra Rb busW 32 32 32-bit Registers ALU 32 busB 32 Clk 32 Mux Mux 32 WrEn Adr 1 1 Data In 32 Data Memory imm16 Extender 32 16 Clk ALUSrc 1/26/03 ExtOp ©UCB Spring 2004

PLA Implementation of the Main Control op<0> op<5> . <0> R-type ori lw sw beq jump RegWrite ALUSrc Similarly, for ALUSrc, we need to OR the ori, load, and store terms together because we need to assert the ALUSrc signals whenever we have the Ori, load, or store instructions. The RegDst, MemtoReg, MemWrite, Branch, and Jump signals are very simple. They don’t need to OR any product terms together because each is asserted for only one instruction. For example, RegDst is asserted ONLY for R-type instruction and MemtoReg is asserted ONLY for load instruction. ExtOp, on the other hand, needs to be set to 1 for both the load and store instructions so the immediate field is sign extended properly. Therefore, we need to OR the load and store terms together to form the signal ExtOp. Finally, we have the ALUop signals. But clever encoding of the ALUop field, we are able to keep them simple so that no OR gates is needed. If you don’t already know, this regular structure with an array of AND gates followed by another array of OR gates is called a Programmable Logic Array, or PLA for short. It is one of the most common ways to implement logic function and there are a lot of CAD tools available to simplify them. +3 = 70 min. (Y:50) RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0> 1/26/03 ©UCB Spring 2004

Recap: An Abstract View of the Critical Path (Load) Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after “access time.” Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Ideal Instruction Memory Instruction Now with the clocking methodology back in your mind, we can think about how the critical path of our “abstract” datapath may look like. One thing to keep in mind about the Register File and Ideal Memory (points to both Instruction and Data) is that the Clock input is a factor ONLY during the write operation. For read operation, the CLK input is not a factor. The register file and the ideal memory behave as if they are combinational logic. That is you apply an address to the input, then after certain delay, which we called access time, the output is valid. We will come back to these points (point to the “behave” bullets) later in this lecture. But for now, let’s look at this “abstract” datapath’s critical path which occurs when the datapath tries to execute the Load instruction. The time it takes to execute the load instruction are the sum of: (a) The PC’s clock-to-Q time. (b) The instruction memory access time. (c) The time it takes to read the register file. (d) The ALU delay in calculating the Data Memory Address. (e) The time it takes to read the Data Memory. (f) And finally, the setup time for the register file and clock skew. +3 = 21 (Y:01) Rd Rs Rt Imm 5 5 5 16 Instruction Address A Data Address Rw Ra Rb ALU 32 Clk PC 32 32 Ideal Data Memory 32 32-bit Registers Next Address Data In B Clk Clk 32 1/26/03 ©UCB Spring 2004

Worst Case Timing (Load) Clk Clk-to-Q PC Old Value New Value Instruction Memory Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value ExtOp Old Value New Value ALUSrc Old Value New Value This timing diagram shows the worst case timing of our single cycle datapath which occurs at the load instruction. Clock to Q time after the clock tick, PC will present its new value to the Instruction memory. After a delay of instruction access time, the instruction bus (Rs, Rt, ...) becomes valid. Then three things happens in parallel: (a) First the Control generates the control signals (Delay through Control Logic). (b) Secondly, the regiser file is access to put Rs onto busA. (c) And we have to sign extended the immediate field to get the second operand (busB). Here I asuume register file access takes longer time than doing the sign extension so we have to wait until busA valid before the ALU can start the address calculation (ALU delay). With the address ready, we access the data memory and after a delay of the Data Memory Access time, busW will be valid. And by this time, the control unit whould have set the RegWr signal to one so at the next clock tick, we will write the new data coming from memory (busW) into the register file. +3 = 77 min. (Y:57) MemtoReg Old Value New Value Register Write Occurs RegWr Old Value New Value Register File Access Time busA Old Value New Value Delay through Extender & Mux busB Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW 1/26/03 Old Value ©UCB Spring 2004 New

Ultimately: It’s all about communication Pentium III Chipset Proc Caches Busses Memory I/O Devices: Controllers adapters Disks Displays Keyboards Networks All have interfaces & organizations Um…. It’s the network stupid???! 1/26/03 ©UCB Spring 2004

Summary: Salient features of MIPS I 32-bit fixed format inst (3 formats) 32 32-bit GPR (R0 contains zero) and 32 FP registers (and HI LO) partitioned by software convention 3-address, reg-reg arithmetic instr. Single address mode for load/store: base+displacement no indirection, scaled 16-bit immediate plus LUI Simple branch conditions compare against zero or two registers for =, no integer condition codes Delayed branch execute instruction after a branch (or jump) even if the branch is taken (Compiler can fill a delayed branch with useful work about 50% of the time) 1/26/03 ©UCB Spring 2004