Download presentation
Presentation is loading. Please wait.
1
EI 209 Computer Organization Fall 2017
Chapter 4B: The Processor, Control and Multi-cycle Datapath [Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK] Combinations to AV system, etc (1988 in 113 IST) Call AV hot line at
2
Recap: A Summary of Control Signals
inst Register Transfer ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4” SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4” ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4 ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4” LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4” STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4 ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4” BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4 nPC_sel = “Br”, ALUctr = “sub”
3
Step 5: Assemble Control logic
Instruction<31:0> Inst Memory <21:25> <21:25> <16:20> <11:15> <0:15> Adr Op Fun Rt Rs Rd Imm16 Decoder nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg Equal DATA PATH
4
A Summary of the Control Signals
func We Don’t Care :-) op add sub ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUctr<2:0> 1 x Add Subtr Or xxx Here is a table summarizing the control signals setting for the seven (add, sub, ...) instructions we have looked at. Instead of showing you the exact bit values for the ALU control (ALUctr), I have used the symbolic values here. The first two columns are unique in the sense that they are R-type instrucions and in order to uniquely identify them, we need to look at BOTH the op field as well as the func fiels. Ori, lw, sw, and branch on equal are I-type instructions and Jump is J-type. They all can be uniquely idetified by looking at the opcode field alone. Now let’s take a more careful look at the first two columns. Notice that they are identical except the last row. So we can combine these two rows here if we can “delay” the generation of ALUctr signals. This lead us to something called “local decoding.” +3 = 42 min. (Y:22) op rs rt rd shamt func 6 11 16 21 26 31 R-type add, sub I-type op rs rt immediate ori, lw, sw, beq J-type op target address jump
5
The Concept of Local Decoding
Two levels of decoding: Main Control and ALU Control R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUctr 1 x Add/Subtr Or Add Subtr xxx op That is, instead of asking the Main Control to generates the ALUctr signals directly (see the diagram with the ALU), the main control will generate a set of signals called ALUop. For all I and J type instructions, ALUop will tell the ALU Control exactly what the ALU needs to do (Add, Subtract, ...) . But whenever the Main Control sees a R-type instructions, it simply throws its hands up and say: “Wow, I don’t know what the ALU has to do but I know it is a R-type instruction” and let the Local Control Block, ALU Control to take care of the rest. Notice that this save us one column from the table we had on the last slide. But let’s be honest, if one column is the ONLY thing we save, we probably will not do it. But when you have to design for the entire MIPS instruction set, this column will used for ALL R-type instructions, which is more than just Add and Subtract I showed you here. Another advantage of this table over the last one, besides being smaller, is that we can uniquely identify each column by looking at the Op field only. Therefore, as I will show you later, the Main Control ONLY needs to look at the Opcode field. How many bits do we need for ALUop? +3 = 45 min. (Y:25) ALUctr is determined by ALUop and func, while other control signals are determined by op。 func Main Control op 6 ALU (Local) N=? ALUop ALUctr 3 How many bits will N need? 3, why? R、I-ori、I-lw/sw、I-beq、J
6
The Decoding of the “func” Field
Main Control op 6 ALU (Local) func N ALUop ALUctr 3 Encoding ALUop as follows R-type ori lw sw beq jump ALUop (Symbolic) “R-type” Or Add Subtr xxx ALUop<2:0> 1 xx 0 10 0 00 0x1 Could ALUop use just 2 bits? Yes!Since jump is X,we could use 2 bits:R:11, I-ori:10, I-beq:01, I-lw/sw:00, J-xx R-Type is1xx,could identify R-instruction by using 1 bit! 000000 rs rt rd shamt func 6 11 16 21 26 31 R-type What this table and diagram implies is that if the ALU Control receives ALUop = 100, it has to decode the instruction’s “func” field to figure out what the ALU needs to do. Based on the MIPS encoding in Appendix A of your text book, we know we have a Add instruction if the func field is If the func field is , we know we have a subtract operation and so on. Notice that the bit 5 and bit 4 of this field is the same for all these operations so as far as the ALU control is concerned, these bits are don’t care. Now recall from your ALU homework, the ALUctr signals has the following meaning (point to the table): 000 means Add, 001 means subtract, ... etc. Based on these three tables (point to the last row of the top table and then the two other tables) and the fact that bit 5 and bit 4 of the “func” field are don’t care, we can derive the following truth table for ALUctr. +2 = 48 min. (Y:28) func<5:0> Instruction Operation add subtract and or set-on-less-than ALUctr<2:0> ALU Operation 000 001 100 101 010 Add Subtract And Or ALUctr ALU
7
The Truth Table for ALUctr
R-type Instructions determined by funct Non-R-type Instructions determined by ALUop funct<3:0> Instruction Op. 0000 add R-type ori lw sw beq ALUop (Symbolic) “R-type” Or Add Subtr ALUop<2:0> 1 00 0 10 0 00 0 x1 0010 subtract 0100 and 0101 or 1010 set-on-less-than ALUop func bit2 bit1 bit0 bit<2> bit<1> bit<0> bit<3> x ALUctr ALU Operation Add 1 Subtract Or And That is, whenever ALUop is 000, we don’t care anything about the func field because we know we need the ALU to do an ADD operation (point to Add column). Whenever the ALUop bit<2> is 0 and bit<0> is 1, we know we want the ALU to perform a Subtract regardless of what func field is. Bit<1> is a don’t care because for our encoding here, ALUop<1> will never be equal to 1 whenever bit<0> is 1 and bit<2> is 0. Similarly, whenever ALUop bit<2> is 0 and bit<1> is 1, we need the ALU to perform Or. The tricky part occurs when the ALUOp bit<2> equals to 1. In that case, we have a R-type instruction and we need to look at the Func field. In any case, once we have this Symbolic column, we can get this actual bit columns by referring to our ALU table on the last slide (use the last table of last slide if time permits). +2 = 30 min. (Y:30)
8
The Logic Equation for ALUctr<0>
Choose the rows with ALUctr[0]=1 ALUop func bit<2> bit<1> bit<0> bit<3> ALUctr<0> x 1 This makes func<3> a don’t care ALUctr<0> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0> From the truth table we had before the break, we can derive the logic equation for ALUctr bit 2 by collecting all the rows that has ALUCtr bit 2 equals to 1 and this table is the result. Each row becomes a product term and we need to OR the product terms together. Notice that the last row are identical except the bit<3> of the func fields. One is zero and the other is one. Together, they make bit<3> a don’t care term. With all these don’t care terms, the logic equation is rather simple. The first product term is: not ALUOp<2> and ALUOp<0>. The second product term, after we making Func<3> a don’t care becomes ... +2 = 57 min. (Y:37)
9
The Logic Equation for ALUctr<1>
Choose the rows with ALUctr[1]=1 ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<1> 1 x x x x 1 1 x x 1 1 1 x x 1 1 1 Here is the truth table when we collect all the rows where ALCctr bit<1> equals to 1. Once again, we can simplify the table by noticing that the first two rows are different only at the ALUop bit<0> position. We can make ALUop bit<0> into a don’t care. Similarly, the last three rows can be combined to make Func bit<3> and bit<1> into don’t cares. I cannot understand it. Consequently, the logic equation for ALUctr bit<1> becomes ... +2 = 59 min. (Y:39) ALUctr<1> = !ALUop<2> & ALUop<1> & ! ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1>
10
The Logic Equation for ALUctr<2>
Choose the rows with ALUctr[2]=1 ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<2> 1 x x x x 1 1 x x 1 1 1 Finally, after we gather all the rows where ALUctr bit 0 are 1’s, we have this truth table. Well, we are out of luck here. I don’t see any simple way to simplify these product terms by just looking at them. There may be some if you draw out the 7 dimension K map but I am not going to try it. So I just write down the logic equations as it is. +2 = 61 min. (Y:41) ALUctr<2> = !ALUop<2> & ALUop<1> & !ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0>
11
Summery of Control Logic of Local Control
ALU Control (Local) func 3 6 ALUop ALUctr ALUctr<0> = !ALUop<2> & ALUop<0> + ALUop<2> & !func<2> & func<1> & !func<0> ALUctr<1> = !ALUop<2> & ALUop<1> & !ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> ALUctr<2> = !ALUop<2> & ALUop<1> & !ALUop<0> + ALUop<2> & !func<3> & func<2> & !func<1> & func<0> With all the logic equations available, you should be able to implement this logic block without any problem. +1 = 62 min. (Y:42)
12
The “Truth Table” for the Main Control
op 6 ALU (Local) func 3 ALUop ALUctr RegDst ALUSrc : Output of Main Control Input of main control op R-type ori lw sw beq jump RegDst 1 x x x ALUSrc 1 1 1 x MemtoReg 1 x x x RegWrite 1 1 1 MemWrite 1 Now that we have taken care of the Local Control (ALU Control), let’s refocus our attention to the Mian Controller. The job of the Main Control is to look at the Opcode field of the instruction and generate these control signals for the datapath (RegDst, ... ExtOp) as well as the 3-bit ALUop field for the ALU Control. Here, I have shown you the symbolic value of the ALUop field as well as the actual bit assignment. For example here (2nd column), the R-type ALUop is encode as 100 and the Add operation (3rd column) is encoded as 000.. This is call a quote “Truth Table” unquote because if you think about it, this is like having the truth table rotates 90 degrees. Let me show you what I mean by that. +3 = 65 min. (Y:45) Branch 1 Jump 1 ExtOp x 1 1 x x ALUop (Symbolic) “R-type” Or Add Add Subtr xxx ALUop <2> 1 x ALUop <1> x 1 x x ALUop <0> x 1 x
13
The “Truth Table” for RegWrite
op R-type ori lw sw beq jump RegWrite 1 1 1 RegWrite = R-type + ori + lw = !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type) + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori) + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw) op<0> op<5> . <0> R-type ori lw sw beq jump Decoder For example, consider the control signal RegWrite. If we treat all the don’t cares as zeros, this row here means RegDest has to be equal to one whenever we have a R-type, or an OR immediate, or a load instruction. Since we can determine whether we have any of these instructions (point to the column headers) by looking at the bits in the “OP” field, we can transform this symbolic equation to this binary logic equation. For example, the first product term here say we have a R-type instruction whenever all the bits in the “OP” field are zeros. So each of these big AND gates implements one of the columns (R-type, ori, ...) in our table. Or in more technical terms, each AND gate implements a product term. In order to finish implementing this logic equation, we have to OR the proper terms together. In the case of the RegWrite signal, we need to OR the R-type, ORi, and load terms together. +2 = 67 min. (Y:47) RegWrite
14
The “Truth Table” for RegWrite
Decoder . . . . . . op<5> . op<5> . op<5> . op<5> . op<5> . op<5> . <0> <0> <0> <0> <0> op<0> R-type ori lw sw beq jump RegWrite ALUSrc RegDst MemtoReg Similarly, for ALUSrc, we need to OR the ori, load, and store terms together because we need to assert the ALUSrc signals whenever we have the Ori, load, or store instructions. The RegDst, MemtoReg, MemWrite, Branch, and Jump signals are very simple. They don’t need to OR any product terms together because each is asserted for only one instruction. For example, RegDst is asserted ONLY for R-type instruction and MemtoReg is asserted ONLY for load instruction. ExtOp, on the other hand, needs to be set to 1 for both the load and store instructions so the immediate field is sign extended properly. Therefore, we need to OR the load and store terms together to form the signal ExtOp. Finally, we have the ALUop signals. But clever encoding of the ALUop field, we are able to keep them simple so that no OR gates is needed. If you don’t already know, this regular structure with an array of AND gates followed by another array of OR gates is called a Programmable Logic Array, or PLA for short. It is one of the most common ways to implement logic function and there are a lot of CAD tools available to simplify them. +3 = 70 min. (Y:50) MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0>
15
The Complete Single Cycle Data Path with Control
ALUop ALU Control ALUctr 3 RegDst func op Main Control 3 Instr<5:0> 6 ALUSrc 6 : Instr<31:26> Branch Instruction<31:0> Instruction Fetch Unit Jump Rd Rt <21:25> <16:20> <11:15> <0:15> RegDst Clk 1 Mux Rs Rt Rt Rs Rd Imm16 RegWr ALUctr 5 5 5 MemtoReg busA Zero MemWr Rw Ra Rb busW 32 32 32-bit Registers ALU OK, now that we have the Main Control implemented, we have everything we needed for the single cycle processor and here it is. The Instruction Fetch Unit gives us the instruction. The OP field is fed to the Main Control for decode and the Func field is fed to the ALU Control for local decoding. The Rt, Rs, Rd, and Imm16 fields of the instruction are fed to the data path. Based on the OP field of the instruction, the Main Control will set the control signals RegDst, ALUSrc, .... etc properly as I showed you earlier using separate slides. Furthermore, the ALUctr use the ALUop from the Main control and the func field of the instruction to generate the ALUctr signals to ask the ALU to do the right thing: Add, Subtract, Or, and so on. This processor will execute each of the MIPS instruction in the subset in one cycle. There is, however, a couple of subtle differences between this single-cycle processor and a real MIPS processor in terms of instruction execution. +2 = 72 min (Y:52) 32 busB 32 32 Mux Clk Mux 32 32 WrEn Adr 1 1 Data In imm16 Ext Data Memory 32 16 Instr<15:0> Clk ALUSrc ExtOp
16
Time Delay for LW: Critical Path
Clk PC PC Clk-to-Q PC Old Value New Value PC+4 Instruction Memory Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value ExtOp Old Value New Value ALUSrc Old Value New Value MemtoReg Old Value New Value Register Write Occurs RegWr Old Value New Value This timing diagram shows the worst case timing of our single cycle datapath which occurs at the load instruction. Clock to Q time after the clock tick, PC will present its new value to the Instruction memory. After a delay of instruction access time, the instruction bus (Rs, Rt, ...) becomes valid. Then three things happens in parallel: (a) First the Control generates the control signals (Delay through Control Logic). (b) Secondly, the register file access is to put Rs onto busA. (c) And we have to sign extended the immediate field to get the second operand (busB). Here I assume register file access takes longer time than doing the sign extension so we have to wait until busA valid before the ALU can start the address calculation (ALU delay). With the address ready, we access the data memory and after a delay of the Data Memory Access time, busW will be valid. And by this time, the control unit would have set the RegWr signal to one so at the next clock tick, we will write the new data coming from memory (busW) into the register file. +3 = 77 min. (Y:57) Register File Access Time busA Old Value New Value Delay through Extender & Mux busB Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW Old Value New
17
Performance of Single-Cycle Machine
Assume that the operation times for the major functional units in this implementation are the following Memory units: 200 picoseconds ALU and adders: 100 ps Register file (read or write): 50 ps Assuming that the multiplexors, control unit, PC accesses, sign extension unit, and wires have no delay, which of the following implementations would be faster and by how much? An implementation in which every instruction operates in 1 clock cycle of a fixed length. An implementation where every instruction executes in 1 clock cycle using a variable-length clock. To compare the performance, assume the following instruction mix: 25% loads, 10% stores, 45% ALU instructions, 15 % branches, and 5% jumps.
18
What’s wrong with our CPI=1 processor?
Arithmetic & Logical PC Inst Memory Reg File ALU mux mux setup Load PC Inst Memory Reg File ALU Data Mem mux mux setup Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux
19
Instruction Critical Paths
Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires, setup and hold times) except: Instruction and Data Memory (200 ps) ALU and adders (100 ps) Register File access (reads or writes) (50 ps) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total R-type load store beq jump 200 50 100 400 200 50 100 600 For lecture Note that PC is updated during I Mem read 200 50 100 550 200 50 100 350 200
20
Comparison of clock cycles for two implementation
An implementation in which every instruction operates in 1 clock cycle of a fixed length. CPU clock cycle=600 ps (longest instruction) An implementation where every instruction executes in 1 clock cycle using a variable-length clock. CPU clock cycle=600*25%+550*10%+400*45%+350*15%+200*5% =447.5 ps
21
Single Cycle Disadvantages & Advantages
Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction especially problematic for more complex instructions like floating point multiply May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle but Is simple and easy to understand Clk lw sw Waste Cycle 1 Cycle 2 In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted.
22
Multicycle Implementation Overview
Each instruction step takes 1 clock cycle Therefore, an instruction takes more than 1 clock cycle to complete Not every instruction takes the same number of clock cycles to complete Multicycle implementations allow faster clock rates different instructions to take a different number of clock cycles functional units to be used more than once per instruction as long as they are used on different clock cycles, as a result only need one memory only need one ALU/adder
23
The Multicycle Datapath – A High Level View
Registers have to be added after every major functional unit to hold the output value until it is used in a subsequent clock cycle IR MDR A B Address Memory Read Addr 1 PC Read Data 1 Register File Read Addr 2 ALUout Read Data (Instr. or Data) ALU Write Addr Write Data Read Data 2 Write Data Note extra registers added – IR, MDR, A, B, and ALUout
24
Clocking the Multicycle Datapath
System Clock MemWrite RegWrite clock cycle Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU IR MDR A B ALUout Note extra registers added – IR, MDR, A, B, and ALUout
25
Our Multicycle Approach
Break up the instructions into steps where each step takes a clock cycle while trying to balance the amount of work to be done in each step use only one major functional unit per clock cycle At the end of a clock cycle Store values needed in a later clock cycle by the current instruction in a state element (internal register not visible to the programmer) IR – Instruction Register MDR – Memory Data Register A and B – Register File read data registers ALUout – ALU output register All (except IR) hold data only between a pair of adjacent clock cycles (so they don’t need a write control signal) Data used by subsequent instructions are stored in programmer visible state elements (i.e., Register File, PC, or Memory)
26
The Complete Multicycle Data with Control
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For class handout B MDR 1 Write Data 4 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
27
Our Multicycle Approach, con’t
Reading from or writing to any of the internal registers, Register File, or the PC occurs (quickly) at the beginning (for read) or the end of a clock cycle (for write) Reading from the Register File takes ~50% of a clock cycle since it has additional control and access overhead (but reading can be done in parallel with decode) Had to add multiplexors in front of several of the functional unit input ports (e.g., Memory, ALU) because they are now shared by different clock cycles and/or do multiple jobs All operations occurring in one clock cycle occur in parallel This limits us to one ALU operation, one Memory access, and one Register File access per clock cycle
28
Five Instruction Steps
Instruction Fetch Instruction Decode and Register Fetch R-type Instruction Execution, Memory Read/Write Address Computation, Branch Completion, or Jump Completion Memory Read Access, Memory Write Completion or R-type Instruction Completion Memory Read Completion (Write Back) INSTRUCTIONS TAKE FROM CYCLES!
29
RTL for Instructions Common Steps: Decode and Register reading
Instr fetch IR = Memory[PC]; PC Updating PC = PC + 4; Decode and Register reading A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; Instruction Dependent operation
30
RTL for Instructions Instruction Dependent operation Execute
ALUOut = A op B; ALUOut = A + sign-extend (IR[15-0]); if (A==B) PC = ALUOut; ALUOut = PC +(sign-extend(IR[15-0])<< 2); PC = PC[31-28] ||(IR[25-0] << 2); Memory access Reg[IR[15-11]] = ALUOut; MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Write-back Reg[IR[20-16]] = MDR;
31
Group Discussion 1 Four-six members form a group and discuss the following questions related to multi-cycle datapath? 1. What are the task for each step?(balance the load of each step to minimize the number of clock cycles) Recommend one member to present your results?
32
Multi-cycle datapath 2 1 1 1 1 1 4 1 2 3 PCWriteCond PCWrite PCSource
IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For class handout B MDR 1 Write Data 4 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
33
RTL Summary Step R-type Mem Ref Branch Jump Instr fetch Decode Execute
IR = Memory[PC]; PC = PC + 4; Decode A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC +(sign-extend(IR[15-0])<< 2); Execute ALUOut = A op B; ALUOut = A + sign-extend (IR[15-0]); if (A==B) PC = ALUOut; PC = PC[31-28] ||(IR[25-0] << 2); Memory access Reg[IR[15-11]] = ALUOut; MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Write-back Reg[IR[20-16]] = MDR;
34
Group Discussion 2 Four-six members form a group and discuss the following questions related to multi-cycle datapath? 1. What are data paths and control signals for each step of each instruction? Recommend one member to present your results?
35
Multi-cycle datapath 2 1 1 1 1 1 4 1 2 3 PCWriteCond PCWrite PCSource
IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For class handout B MDR 1 Write Data 4 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
36
Step 1: Instruction Fetch
Use PC to get instruction from the memory and put it in the Instruction Register Increment the PC by 4 and put the result back in the PC Can be described succinctly using the RTL "Register-Transfer Language“ IR = Memory[PC]; PC = PC + 4; Note that the incremented PC is also stored in the ALUout register, but this action is benign Can we figure out the values of the control signals? What is the advantage of updating the PC now?
37
Datapath Activity During Instruction Fetch
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 00 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
38
Fetch Control Signals Settings
IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite Instr Fetch Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X Start For lecture
39
Step 2: Instruction Decode and Register Fetch
Don’t know what the instruction is yet, so can only Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch The RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC (sign-extend(IR[15-0])<< 2); Note we aren't setting any control lines based on the instruction (since we don’t know what it is (the control logic is busy "decoding" the op code bits))
40
Datapath Activity During Instruction Decode
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 00 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
41
Decode Control Signals Settings
IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite Instr Fetch Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 Start For lecture
42
Step 3 (instruction dependent)
ALU is performing one of four functions, based on instruction type Memory reference (lw and sw): ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; Jump: PC = PC[31-28] || (IR[25-0] << 2);
43
Datapath Activity During lw & sw Execute
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 00 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
44
Datapath Activity During R-type Execute
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 10 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
45
Datapath Activity During beq Execute
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 01 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
46
Datapath Activity During j Execute
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
47
Execute Control Signals Settings
Decode IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite Instr Fetch Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 Start (Op = R-type) (Op = beq) (Op = lw or sw) (Op = j) ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 PCSource=10 PCWrite Execute For lecture
48
Step 4 (also instruction dependent)
Memory reference: MDR = Memory[ALUOut]; lw or Memory[ALUOut] = B; sw R-type instruction completion Reg[IR[15-11]] = ALUOut; Remember, the register write actually takes place at the end of the cycle on the clock edge
49
Datapath Activity During lw Memory Access
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B 1 Write Data 4 MDR 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
50
Datapath Activity During sw Memory Access
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B 1 Write Data 4 MDR 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
51
Datapath Activity During R-type Completion
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B 1 Write Data 4 MDR 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
52
Memory Access Control Signals Settings
Decode IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite Instr Fetch Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 Start (Op = R-type) (Op = beq) (Op = lw or sw) (Op = j) ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 PCSource=10 PCWrite Execute (Op = lw) (Op = sw) Memory Access RegDst=1 RegWrite MemtoReg=0 PCWriteCond=0 MemRead IorD=1 PCWriteCond=0 MemWrite IorD=1 PCWriteCond=0 For lecture
53
Step 5: Memory Read Completion (Write Back)
All we have left is the write back into the register file the data just read from memory for lw instruction Reg[IR[20-16]]= MDR; What about all the other instructions?
54
Datapath Activity During lw Write Back
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 For lecture B MDR 1 Write Data 4 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]
55
Write Back Control Signals Settings
Decode IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite Instr Fetch Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 Start (Op = R-type) (Op = beq) (Op = lw or sw) (Op = j) ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 PCSource=10 PCWrite Execute (Op = lw) (Op = sw) Memory Access RegDst=1 RegWrite MemtoReg=0 PCWriteCond=0 MemRead IorD=1 PCWriteCond=0 MemWrite IorD=1 PCWriteCond=0 For lecture RegDst=0 RegWrite MemtoReg=1 PCWriteCond=0 Write Back
56
RTL Summary Step R-type Mem Ref Branch Jump Instr fetch Decode Execute
IR = Memory[PC]; PC = PC + 4; Decode A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC +(sign-extend(IR[15-0])<< 2); Execute ALUOut = A op B; ALUOut = A + sign-extend (IR[15-0]); if (A==B) PC = ALUOut; PC = PC[31-28] ||(IR[25-0] << 2); Memory access Reg[IR[15-11]] = ALUOut; MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Write-back Reg[IR[20-16]] = MDR;
57
Example Using the following instruction mix, what is the CPI, assuming that each state in the multicycle CPU requires 1 clock cycle? Load 25% store 10% branches 11% jumps 2% ALU 52%
58
Answer: Loads:5 Stores:4 ALU: 4 Branches:3 Jumps:3
CPI=0.25*5+0.10*4+0.52*4+0.11*3+0.02*3=4.12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.