Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winter 2017 S. Areibi School of Engineering University of Guelph

Similar presentations


Presentation on theme: "Winter 2017 S. Areibi School of Engineering University of Guelph"— Presentation transcript:

1 Winter 2017 S. Areibi School of Engineering University of Guelph
ENG3380 Computer Organization and Architecture “MIPS: Data Path Design Part 1” Winter 2017 S. Areibi School of Engineering University of Guelph

2 Topics Introduction Data Path Design Register File Memory
Arithmetic Logic Unit MIPS ALU Summary With thanks to W. Stallings, Hamacher, J. Hennessy, M. J. Irwin for lecture slide contents Many slides adapted from the PPT slides accompanying the textbook and CSE331 Course School of Engineering

3 References “Computer Organization and Architecture: Designing for Performance”, 10th edition, by William Stalling, Pearson. “Computer Organization and Design: The Hardware/Software Interface”, 4th editino, by D. Patterson and J. Hennessy, Morgan Kaufmann Computer Organization and Architecture: Themes and Variations”, 2014, by Alan Clements, CENGAGE Learning School of Engineering

4 Introduction

5 Processor’s building blocks
PC provides instruction address. Instruction is fetched into IR Instruction address generator updates PC Control circuitry interprets instruction and generates control signals to perform the actions needed. Register File stores all operands in registers that are manipulated by ALU. The ALU performs arithmetic and Logic operations and more ..

6 Parts of CPU Datapath Control unit
Registers, Multiplexors, Adders, Subtractors and logic to perform operations on them (Comb Logic) Control unit Generates signals to control data-path Accepts status signals to perform sequencing Control Data Path

7 Datapaths Guiding principles for basic datapaths: The set of registers
Collection of individual registers A set of registers with common access resources called a register file A combination of the above Microoperation implementation One or more shared resources for implementing microoperations Buses - shared transfer paths Arithmetic-Logic Unit (ALU) - shared resource for implementing arithmetic and logic microoperations Shifter - shared resource for implementing shift microoperations

8 Recall A Simple bus-based data path: four registers, an ALU, and a shifter. Each register is connected to two multiplexers to form ALU input buses A and B (Register File) Another Mux is used to choose between Registers and a constant. Functional Unit: ALU and a shifter Another Mux is used to choose between Functional Unit and external data (Memory)

9 Register File

10 Register File A Simple Register File: four registers,
Each register is connected to two multiplexers to form ALU input buses A and B (Register File)

11 Hardware components: Register file
A 2-port register file is needed to read the two source registers at the same time. It may be implemented using a 2-port memory.

12 Alternative implementation of 2-port register file
Using two single- ported memory blocks.

13 A conceptual view – computational instructions
Both source operands and the destination location are in the register file [RA] and [RB] denote values of registers that are identified by addresses A and B new [RC] denotes the result that is stored to the register identified by address C [RB] new [RC] [RA]

14 A conceptual view – immediate instructions
One of the source operands is the immediate value in the IR. new [RC] [RA]

15 Behavioral Description of a Register File
write_cntrl src1_addr src1_data src2_addr 32 words dst_addr src2_data write_data 32 bits library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity regfile is port(write_data: in std_logic_vector(31 downto 0); dst_addr,src1_addr,src2_addr: in UNSIGNED(4 downto 0); write_cntrl: in std_logic; src1_data,src2_data: out std_logic_vector(31 downto 0)); end entity regfile;

16 Behavioral Description of a Register File, con’t
architecture process_behavior of regfile is type reg_array is array(0 to 31) of std_logic_vector (31 downto 0); begin regfile_process: process(src1_addr,src2_addr,write_cntrl) variable data_array: reg_array := ( (X” ”), . . . (X” ”)); variable addrofsrc1, addrofsrc2, addrofdst: integer; addrofsrc1 := conv_integer(src1_addr); addrofsrc2 := conv_integer(src2_addr); addrofdst := conv_integer(dst_addr); if write_cntrl = ‘1’ then data_array(addrofdst) := write_data; end if; src1_data <= data_array(addrofsrc1) after 10 ns; src2_data <= data_array(addrofsrc2) after 10 ns; end process regfile_process; end architecture process_behavior;

17 VHDL Implementation library IEEE; use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL; entity register_file is port ( DataOut : out std_logic_vector(15 downto 0); DataOut : out std_logic_vector(15 downto 0); DataIn : in std_logic_vector(15 downto 0); writeEnable : in std_logic; ReadAddr1 : in std_logic_vector(3 downto 0); ReadAddr2 : in std_logic_vector(3 downto 0); WriteAddr : in std_logic_vector(3 downto 0); Clk : in std_logic ); end register_file;

18 VHDL Implementation architecture behavioral of register_file is
type registerFile is array(0 to 15) of std_logic_vector(15 downto 0); signal registers : registerFile; begin regFile : process (clk) is if rising_edge(clk) then -- Read A and B before bypass DataOut1 <= registers(to_integer(unsigned(ReadAddr1))); DataOut2 <= registers(to_integer(unsigned(ReadAddr2))); -- Write and bypass if WriteEnable = '1' then registers(to_integer(unsigned(WriteAddr))) <= DataIn; -- Write if ReadAddr1 = WriteAddr then -- Bypass for read A DataOut1 <= DataIn; end if; if ReadAddr2 = WriteAddr then -- Bypass for read B DataOut2 <= DataIn; end process; end behavioral;

19 Memory

20 Memory and I/O Control Unit + Data Path + Memory + Input/Output = Micro-computer System MEMORY Input and Output

21 Behavioral Description of Memory
The VHDL Code below implements a Single Port RAM When you synthesize this design, XST uses Block RAM by default for implementing memory If you want the memory to be implemented using distributed RAM then add the following: attribute ram_style: string; attribute ram_style of ram : signal is "distributed"; library IEEE; use IEEE.STD_LOGIC_1164.ALL; entity ram_example is port (Clk : in std_logic;       address : in integer;       we : in std_logic;       data_i : in std_logic_vector(7 downto 0);       data_o : out std_logic_vector(7 downto 0)      ); end ram_example; data_i Memory we address Clk data_o

22 Cont … Behavioral Description of Memory
architecture Behavioral of ram_example is --Declaration of type and signal of a 256 element RAM --with each element being 8 bit wide. type ram_t is array (0 to 255) of std_logic_vector(7 downto 0); signal ram : ram_t := (others => (others => '0')); begin --process for read and write operation. PROCESS(Clk) BEGIN     if(rising_edge(Clk)) then         if(we='1') then             ram(address) <= data_i;         end if;         data_o <= ram(address);     end if;  END PROCESS; end Behavioral;

23 Busses

24 Bus-Based Transfers A Bus is a shared transfer path.
It is characterized by a set of common lines (i) Data + (ii) Control, (iii) Status The control signals for the logic select a single source and one or more destinations on any clock cycle. SRC1 DEST1 DEST2 SRC2

25 Simple Case: using Muxes!
Signals S2, S1, S0 select the source Signals L0, L1, L2 enable loading of the registers. The single bus (on the right) One mux One output bus Capabilities??

26 Three-State Bus Remember three-state drivers allow having multiple outputs share wire Note the small inverted triangle denotes the 3-state output of the register. A bus can be constructed with the three state buffers. Many three state buffer outputs can be connected together to form a bit line of a bus less delay than multiplexer based systems

27 Same Example with 3-State
Notice that both systems in the figure have the same capability in term of transfers. However the 3-state bus has: Fewer wires Easier to expand!

28 Bus An example of an interconnection network.
When functional units are connected to a common bus, tri-state drivers are needed.

29 A 3-bus interconnection network

30 Memory Transfer Point to an address in Memory Read data from
the Memory and Write Into Register D2, D1, D0

31 ALU Design

32 Arithmetic/Logic Unit (ALU)
The ALU is a combinational circuit that performs a set of basic arithmetic and logic operations. An adder can perform addition, subtraction, … Select lines are used to determine the operation to be performed.

33 ALU Design using Hierarchy
The ALU will have: 2 control lines S0,S1 for operation selections 1 control line S2 to select logical versus arithmetic operations Start designing in parts

34 Single Stage ALU Design a 1-bit Arithmetic unit
Design a 1-bit Logic unit Combine the two units to form a 1-bit Arithmetic/Logic Replicate as many times to form an n-bit ALU

35 Arithmetic Circuit The basic component of an arithmetic circuit is a:
N-bit Ripple Carry Adder (Parallel Adder). By controlling the data inputs to the parallel adder, it is possible to obtain different types of arithmetic operations (Cin is also an input) Select lines S0, S1 can be used to control input Y. Why?

36 Looking Inside What possible functionality can I achieve if I control the ‘Y’ Value to the n-bit Adder? B Input Logic B B’ Table  Functionality. How to design the B Input Logic?

37 Design of B Select Logic
Use an 8-to-1 Mux (Straight forward Solution). Or … use a 4-to-1 mux! Can we do better? YES: simplify the expression from the truth table using a K-Map

38 1-bit (Single Stage) Arithmetic Circuit
The B logic is nothing but a 2-to-1 Mux instead of the 4-to-1 Mux

39 4-Bit Arithmetic Circuit
Duplicating the one stage four times will produce a 4-bit circuit

40 Logic Section Design Generous number of operations

41 Arithmetic/Logic Unit
The logic circuit can be combined with the arithmetic circuit to produce an ALU. Selection variables S1 and S0 can be common to both circuits, A third selection variable S2 can be used to differentiate between the logic and arithmetic operations.

42 One Stage Arithmetic Circuit

43 One Stage Logic Circuit

44 One Stage ALU Mux to choose Arithmetic or Logic

45 n-bit ALU Duplicate the one stage n times!!

46 Resulting Control The one stage ALU can provide 8 arithmetic, and
4 logic operations.

47 How to extend the ALU to support MIPS ISA?
Need to support the set-on-less-than instruction (slt) Uses subtraction to determine if (a – b) < 0 (implies a < b) Need to support test for equality (bne, beq) Again use subtraction: (a - b) = 0 implies a = b Need to add overflow detection hardware overflow detection enabled only for add, addi, sub Immediates are sign extended outside the ALU with wiring (i.e., no logic needed)

48 MIPS Data Path

49 Arithmetic Where we've been What's up ahead Abstractions
Instruction Set Architecture (ISA) Assembly and machine language What's up ahead Implementing the architecture (in VHDL) zero ovf 1 1 A 32 ALU result 32 B 32 4 m (operation)

50 ALU VHDL Representation
entity ALU is port(A, B: in std_logic_vector (31 downto 0); m: in std_logic_vector (3 downto 0); result: out std_logic_vector (31 downto 0); zero: out std_logic; ovf: out std_logic) end entity ALU; architecture process_behavior of ALU is . . . begin ALU: process(A, B, m) result := A + B; end process ALU; end architecture process_behavior;

51 Design the MIPS Arithmetic Logic Unit (ALU)
32 m (operation) result A B ALU 4 zero ovf 1 Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub Tradeoffs of cost and speed based on frequency of occurrence, hardware budget

52 MIPS Arithmetic and Logic Instructions
31 25 20 15 5 R-type: op Rs Rt Rd funct I-Type: op Rs Rt Immed 16 Type op funct ADDI xx ADDIU xx SLTI xx SLTIU xx ANDI xx ORI xx XORI xx LUI xx Type op funct ADD ADDU SUB SUBU AND OR XOR NOR Type op funct SLT SLTU funct = m (b3, b2, b1, b0) where b0 tells whether its signed (0) or not (1) (i.e., is overflow activated), b1 tells whether its an add (0) or subtract (1) operation, b2 tells whether it’s a logic operation (1) or arithmetic operation (2), and b3 tells whether its an immediate operation (1) or not (0) (except for slt)

53 Design Trick: Divide & Conquer
Break the problem into simpler problems, solve them and glue together the solution Example: assume the immediates have been taken care of before the ALU now down to 10 operations can encode in 4 bits 0 add 1 addu 2 sub 3 subu 4 and 5 or 6 xor 7 nor a slt b sltu

54 Addition & Subtraction
Just like in grade school (carry/borrow 1s)      0101 Two's complement operations are easy do subtraction by negating and then adding   + 1010 Overflow (result too large for finite computer word) e.g., adding two n-bit numbers does not yield an n-bit number  0001 for lecture 1000

55 Building a 1-bit Binary Adder
carry_in A B carry_in carry_out S 1 A 1 bit Full Adder S B carry_out S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function) How can we use it to build a 32-bit adder? How can we modify it easily to build an adder/subtractor?

56 Building 32-bit Adder 1-bit FA A0 B0 S0 c0=carry_in c1 Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . . 1-bit FA A1 B1 S1 c2 1-bit FA A2 B2 S2 c3 Ripple Carry Adder (RCA) advantage: simple logic, so small (low cost) disadvantage: slow and lots of glitching (so lots of energy consumption) c32=carry_out 1-bit FA A31 B31 S31 c31 . . .

57 A 32-bit Ripple Carry Adder/Subtractor
add/sub 1-bit FA S0 c0=carry_in c1 S1 c2 S2 c3 c32=carry_out S31 c31 . . . A0 A1 A2 A31 Remember 2’s complement is just complement all the bits add a 1 in the least significant bit B0 control (0=add,1=sub) B0 if control = 0 !B0 if control = 1 A  B  + For lecture 1001 1 0001 1 0001

58 Overflow Detection and Effects
Overflow: the result is too large to represent in the number of bits allocated When adding operands with different signs, overflow cannot occur! Overflow occurs when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive gives a negative or, subtract a positive from a negative gives a positive On overflow, an exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address (address of instruction causing the overflow) is saved for possible resumption Don't always want to detect (interrupt on) overflow Recalled from some earlier slides that the biggest positive number you can represent using 4-bit is 7 and the smallest negative you can represent is negative 8. So any time your addition results in a number bigger than 7 or less than negative 8, you have an overflow. Keep in mind is that whenever you try to add two numbers together that have different signs, that is adding a negative number to a positive number, overflow can NOT occur. Overflow occurs when you to add two positive numbers together and the sum has a negative sign. Or, when you try to add negative numbers together and the sum has a positive sign. If you spend some time, you can convince yourself that If the Carry into the most significant bit is NOT the same as the Carry coming out of the MSB, you have a overflow.

59 New MIPS Instructions Sign extend – addiu, addiu, slti, sltiu
Category Instr Op Code Example Meaning Arithmetic (R & I format) add unsigned 0 and 21 addu $s1, $s2, $s3 $s1 = $s2 + $s3 sub unsigned 0 and 23 subu $s1, $s2, $s3 $s1 = $s2 - $s3 add imm.unsigned 9 addiu $s1, $s2, 6 $s1 = $s2 + 6 Data Transfer ld byte unsigned 24 lbu $s1, 25($s2) $s1 = Mem($s2+25) ld half unsigned 25 lhu $s1, 25($s2) Cond. Branch (I & R format) set on less than unsigned 0 and 2b sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0 set on less than imm unsigned b sltiu $s1, $s2, 6 if ($s2<6) $s1=1 else similarity of the binary representation of related instructions simplifies the hardware design Sign extend – addiu, addiu, slti, sltiu Zero extend – andi, ori, xori Overflow detected – add, addi, sub

60 Review: MIPS Arithmetic Instructions
31 25 20 15 5 32 m (operation) result A B ALU 4 zero ovf 1 R-type: op Rs Rt Rd funct I-Type: op Rs Rt Immed 16 expand immediates to 32 bits before ALU 10 operations so can encode in 4 bits 0 add 1 addu 2 sub 3 subu 4 and 5 or 6 xor 7 nor a slt b sltu Type op funct ADD ADDU SUB SUBU AND OR XOR NOR Type op funct SLT SLTU

61 Review: A 32-bit Adder/Subtractor
add/subt c0=carry_in Built out of 32 full adders (FAs) A0 1-bit FA S0 B0 c1 1 bit FA A B S carry_in carry_out A1 1-bit FA S1 B1 c2 A2 1-bit FA S2 B2 c3 S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function) . . . c31 A31 1-bit FA S31 B31 c32=carry_out Small but slow!

62 Tailoring the ALU to the MIPS ISA
Also need to support the logic operations (and, nor, or, xor) Bit wise operations (no carry operation involved) Need a logic gate for each function and a mux to choose the output Also need to support the set-on-less-than instruction (slt) Uses subtraction to determine if (a – b) < 0 (implies a < b) Also need to support test for equality (bne, beq) Again use subtraction: (a - b) = 0 implies a = b Also need to add overflow detection hardware overflow detection enabled only for add, addi, sub Immediates are sign extended outside the ALU with wiring (i.e., no logic needed)

63 A Simple ALU Cell with Logic Op Support
B add/subt 1-bit FA carry_in carry_out result op Old book shows the B input to the logic gates as the output of the inverter mux (xor in our case) . This way you can also get A and !B, A or !B, A xor !B (which is A xnor B) and !(A or !B) (which is !A and B) in addition to A and B, A or B, A xor B, and A nor B by setting the add/subt control correctly. wouldn’t it be better to pull it directly from the B input? Yes, so I modified the design from that presented in the (old) book. Leads to simplier decoding of m bits (to ALUlogic) to the add_subt and op control lines. how many bits does op need to be?

64 Modifying the ALU Cell for slt
add/subt carry_in op A 1 2 result 3 1-bit FA 6 B less 7 add/subt carry_out Remember that “slt” instruction sets a register value to 1 if $S1 < $S2 0 … otherwise

65 Modifying the ALU for slt
B1 A0 B0 A31 B31 + result1 less result0 result31 First perform a subtraction $S1 - $S2 … A - B Make the result 1 if the subtraction yields a negative result i.e. A < B Make the result 0 if the subtraction yields a positive result i.e. A > B set tie the most significant sum bit (sign bit) to the low order less input. Why? For lecture

66 Modifying the ALU for Zero
op add/subt Modifying the ALU for Zero A0 result0 First perform subtraction Insert additional logic to detect when all result bits are zero zero . . . B0 + less A1 result1 B1 + less A31 For lecture Note zero is a 1 when result is all zeros result31 B31 + less set

67 Overflow Detection Overflow occurs when the result is too large to represent in the number of bits allocated adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive gives a negative or, subtract a positive from a negative gives a positive On your own: Prove you can detect overflow by: Carry into MSB xor Carry out of MSB For lecture Recalled from some earlier slides that the biggest positive number you can represent using 4-bit is 7 and the smallest negative you can represent is negative 8. So any time your addition results in a number bigger than 7 or less than negative 8, you have an overflow. Keep in mind is that whenever you try to add two numbers together that have different signs, that is adding a negative number to a positive number, overflow can NOT occur. Overflow occurs when you to add two positive numbers together and the sum has a negative sign. Or, when you try to add negative numbers together and the sum has a positive sign. If you spend some time, you can convince yourself that If the Carry into the most significant bit is NOT the same as the Carry coming out of the MSB, you have a overflow. 1 1 1 1 1 1 1 + 7 3 1 + –4 – 5 – 6 1 1 7

68 Modifying the ALU for Overflow
op add/subt Modifying the ALU for Overflow A0 Modify the most significant cell to determine overflow output setting Enable overflow bit setting for signed arithmetic (add, addi, sub) result0 B0 + less A1 result1 B1 + . . . zero less A31 For slt (and slti and sltiu and sltu) “No integer overflow exception occurs under any circumstances. The comparison is valid even if the subtraction used during the comparison overflows.” The way I read this is that if the result overflows during the subtraction, no attempt is made to correct the set line to reflect that! Otherwise, you would need to add additional logic in front of the set line to do the correction in case of overflow. Like exoring the overflow bit with the sign bit. result31 overflow + B31 less set

69 But What about Performance?
Critical path of n-bit ripple-carry adder is n*CP Design trick – throw hardware at it (Carry Lookahead) CarryIn0 A0 1-bit ALU Result0 B0 CarryOut0 CarryIn1 A1 1-bit ALU Result1 B1 CarryOut1 CarryIn2 A2 1-bit ALU Result2 B2 CarryOut2 CarryIn3 A3 1-bit ALU Result3 B3 CarryOut3

70 More complicated than addition
Multiplication More complicated than addition Can be accomplished via shifting and adding (multiplicand) x_ (multiplier) (partial product array) (product) Double precision product produced More time and more area to compute

71 MIPS Multiply Instruction
Multiply produces a double precision product mult $s0, $s1 # hi||lo = $s0 * $s1 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file op rs rt rd shamt funct multu – does multiply unsigned Both multiplies ignore overflow, so its up to the software to check to see if the product is too big to fit into 32 bits. There is no overflow if hi is 0 for multu or the replicated sign of lo for mult. Multiplies are done by fast, dedicated hardware and are much more complex (and slower) than adders Hardware dividers are even more complex and even slower; ditto for hardware square root

72 Division Division is just a bunch of quotient digit guesses and left shifts and subtracts n n quotient dividend divisor partial remainder array remainder n

73 MIPS Divide Instruction
Divide generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mflo rd and mfhi rd are provided to move the quotient and reminder to (user accessible) registers in the register file op rs rt rd shamt funct Seems odd to me that the machine doesn’t support a double precision dividend in hi || lo but it looks like it doesn’t As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

74 Shift Operations Shifts move all the bits in a word left or right
sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits op rs rt rd shamt funct Notice that a 5-bit shamt field is enough to shift a 32-bit value 25 – 1 or 31 bit positions Logical shifts fill with zeros, arithmetic left shifts fill with the sign bit An arithmetic shift (sra) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) The shift operation is implemented by hardware separate from the ALU using a barrel shifter (which would takes lots of gates in discrete logic, but is pretty easy to implement in VLSI)

75 Wrap-Up We can build an ALU to support the MIPS ISA
we can efficiently perform subtraction using two’s complement we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware all of the gates are always working (concurrently) the speed of a gate is affected by the number of inputs to the gate (fan-in) and the number of gates that the output is connected to (fan-out) the speed of a circuit is affected by the speed of and number of gates in series (on the “critical path” or the “number of levels of logic”) and the length of wires interconnecting the gates Our primary focus is comprehension, however clever changes to organization can improve performance (similar to using better algorithms in software)

76 End Slides


Download ppt "Winter 2017 S. Areibi School of Engineering University of Guelph"

Similar presentations


Ads by Google