Download presentation
Presentation is loading. Please wait.
1
Processor (I)
2
The Processor: Datapath & Control
We're ready to look at an implementation of the MIPS Simplified ISA to contain only: Memory-reference instructions: lw, sw Arithmetic-logical instructions: add, sub, and, or, slt Control flow instructions: beq, j Generic implementation: Use the program counter (PC) to supply instruction address Get the instruction from memory Read registers Use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Memory-reference? arithmetic? control flow?
3
More Implementation Details
Abstract / Simplified View: Five steps to execute an instruction Fetch, Decode, Execute, Memory, Writeback add ALU Registers 4 PC Instruction memory address instruction data reg# Data
4
State Elements State elements are needed to contain the state (value)
Clocks used in synchronous logic When should an element that contains state be updated? Unclocked state element: set-reset latch C l o c k p e r i d R s n g F a (cycle time) R S Q S R Output No change Q = 1 Q = 0 Invalid
5
D-latch vs. D flip-flop D-latch D flip-flop
Output changes only on the clock edge Q C D _ D C Q D C l a t c h Q
6
Our Implementation An edge triggered methodology Typical execution:
Read contents of some state elements, Send values through some combinational logic Write results to one or more state elements S t a e l m n 1 2 C o b i g c k y
7
The Steps of Designing a Processor
1. Instruction Set Architecture used for high-level specification or Register-Transfer Level (RTL) model Includes major organizational decisions Examples: number and type of functional units, number of register file ports 2. Datapath-RTL refined to specify functional unit behavior and interfaces Datapath components Datapath interconnect Associated datapath “control points” 3. Control structure defined and Control-RTL behavioral representation created 4. RTL datapath and control design are refined to track physical design and functional validation Changes made for timing and bug fixes Amount of work varies with capabilities of CAD tools and degree of optimization for cost and performance
8
Example RTL add rd, rs, rt lw rt, imm16(rs)
Instruction mem[PC]; Fetch instruction from memory R[rd] R[rs] + R[rt]; ADD operation PC PC + 4; Calculate next address lw rt, imm16(rs) Addr R[rs] + SignExt(imm16); Compute memory Addr R[rt] Mem[Addr]; Load data into register op rs rt rd shamt funct op rs rt imm16
9
Combinational Logic Elements
Combinational logic does not use a clock Adder MUX ALU Adder 32 A B Sum Carry CarryIn 32 A B Y Select MUX ALU 32 A B Result OP 3
10
Logic Abstraction Make sure you understand the abstractions!
Simpler version is easier to understand than actual implementation M u x S e l c t B 3 1 A C … M u x C S e l c t 3 2 B A
11
Register File Built using D flip-flops Data ports Select register by
Two read ports connected to two 32-bit buses One write port connected to on 32-bit bus Select register by Read register 1, Read register 2, Write register Register File Read Data 1 Data 2 Read register 2 Read register 1 Write register Write data Write
12
Register File: Read Ports
Two read ports Two source operands are read from two ports R e a d r g i s t r 1 1 . n – 2 M u x r 2 Do you understand? What is the “Mux” above?
13
Register File: Write Port
We still use the real clock to determine when to write 1 n - t o 2 d e c r – R g i s C D . write register write write data
14
Storage Element: Memory
Two ports and buses for memory One input bus (Data In) connected to one input port (Write data) One output bus (Data Out) connected to one output port (Read data) Memory word is selected by Address If MemRead = 1 then Address selects the word to put on Data Out If MemWrite = 1 then Address selects the memory word to be written via the Data In bus MemRead Data In Memory MemWrite 32 DataOut Address Read data Write
15
Instruction Fetch Unit
Common RTL operations Fetch the Instruction: Instruction mem[PC] Update the program counter: Sequential Code: PC PC + 4 Branch and Jump: PC ”something else” Instruction Memory
16
R-type ALU Operations Register File Read Data 1 Data 2 Read register 2 Read register 1 Write register Write data Write A L U o p e r a t i o n 4 Instruction Z e r o A L U op rs rt rd shamt funct A L U r e s u l t Registers[Write register] ALU operation (Read data 1, Read data 2) R e g W r i t e
17
Load & Store R e a d r g i s t 1 2 W D A L U Z o u l p n 4 Instruction m y M S x 6 3 (=add) op rs rt imm16 Addresses are calculated from ALU (ALU operation is add) Load ALU result ALU operation(Read data1, Sign extend(Instruction[0:15])) Registers[Write register] MEM[ALU result] Store MEM[ALU result] Read data 2
18
Branch imm16 rt rs op Zero ALU operation (Read data 1, Read data 2)
4 (=subtract) op rs rt imm16 Condition is calculated from ALU (ALU operation is subtract) Zero ALU operation (Read data 1, Read data 2) Branch target Add(PC+4, Shift left2(Sign extend(Instruction[0:15]))) Branch control Zero Depending on condition select appropriate target address Even for fall-through case, ALU needs to calculate target address for taken case
19
Simple Implementation
Include the functional units we need for each instruction P C I n s t r u c i o a d e m y A S . b g A d r e s R a t D m o y . u n i W M b S g - x 1 6 3 2 R e a d r g i s t 1 2 W D A L U Z o u l . b 5 n m p 4
20
Building the Datapath Use MUXs to stitch them together R e a d r g i s
1 2 W A L U Z o M m P C S c p n 4 x 6 3 I u l D y h f Use MUXs to stitch them together
21
Control Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 ALU's operation based on instruction type and function code 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct
22
ALU Control What should the ALU do with this instruction
Example: lw $1, 100($2) ALU control input AND OR add subtract set-on-less-than NOR Why is the code for subtract 0110 and not 0011? 35 2 1 100 op rs rt 16 bit offset
23
ALU Control (cont’d) Must describe hardware to compute 4-bit ALU control input Given instruction type (opcode) decides the ALUOp ALUOp = 00 (lw, sw instructions) (beq instruction) (all arithmetic-logical instructions) Function code is meaningful only for arithmetic type (10) Describe it using a truth table (can turn into gates): Input bits output bits
24
Combinational Logic for ALU Control
00 (lw, sw instructions) 01 (beq instruction) 10 (arithmetic-logical) O p e r a t i o n 2 1 A L U F 3 ( 5 – ) c l b k 0000 AND 0001 OR 0010 add 0110 subtract 0111 slt 1100 NOR
25
Main Control Instruction encoding determines the main control
Arithmetic-logical instructions: add, sub, and, or, slt Memory instructions & branch: lw, sw, beq Jump instructions: j op rs rt rd shamt funct R-type 31:26 25:21 20:16 15:11 10:6 5:0 op rs rt 16 bit address I-type 31:26 25:21 20:16 15:0 lw: rt <- Mem[rs + Imm16] – the only case for writing a register op 26 bit address J-type 31:26 25:0
26
R e a d r g i s t 1 2 W A L U Z o S n x 6 3 I u c [ – ] l M D m y h f 4 P C 5 B O p
27
Control R - type I w s b e q O p 1 2 3 4 5 n u t g D A L U S r c M m o
1 2 3 4 5 n u t g D A L U S r c M m o W i a d B h
28
Implementing Jump Unconditional jump to target address
Target address is calculated by using pseudo-direct addressing PCnext = (PC+4)[31-28] + (Instruction[25-0] << 2) Additional logic to calculate the target address (Figure 5.24 in textbook) Reuse adder for PC+4 Extra shifter: to calculate << 2 operation Extra mux: to select the target address op 26 bit address J-type 31:26 25:0
29
Our Simple Control Structure
All of the logic is combinational Wait for everything to settle down and right thing to be done ALU might not produce “right answer” right away We use write signals along with clock to determine when to write Cycle time determined by length of the longest path S t a e l m n 1 2 C o b i g c k y We are ignoring some details like setup and hold times
30
Single Cycle Implementation (example)
Calculate cycle time under the following condition Operation time of major units (assume no delay for the others) Memory (200ps), ALU and adders (100ps), Register file access (50ps) Compare with a machine with variable clock cycles Assume the following instruction mix: load(25%), store(10%), ALU instr(45%), branch(15%), jump(5%) lw: 200(instruction fetch) + 50(register) + 100(adder) + 200(memory read) +50(register write)
31
Where we are headed Single Cycle Problems: One Solution:
What if we had a more complicated instruction like floating point? Wasteful of area One Solution: Use a “smaller” cycle time Have different instructions take different numbers of cycles A “multicycle” datapath: D a t R e g i s r # P C A d I n u c o M m y L U O B
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.