ECE 448 Lecture 13 Multipliers Timing Parameters ECE 448 – FPGA and ASIC Design with VHDL
Required reading S. Brown and Z. Vranesic, Fundamentals of Digital Logic with VHDL Design Chapter 10.2.3, Shift-and-Add Multiplier ECE 448 – FPGA and ASIC Design with VHDL
Shift-and-Add Multiplier ECE 448 – FPGA and ASIC Design with VHDL
An algorithm for multiplication Manual method Multiplicand, A 1 Product Multiplier, B 1 1 0 1 001111 ´ Binary 13 11 143 Decimal P = ; for i to n 1 do if b then + A end if; Left-shift for; (b) Pseudo-code – ECE 448 – FPGA and ASIC Design with VHDL
Expected behavior of the multiplier ECE 448 – FPGA and ASIC Design with VHDL
Datapath for the multiplier DataA LA EA A Clock P DataP Register EP Sum z B b DataB LB EB + 2n n Shift-left register Shift-right Psel 1 Datapath for the multiplier ECE 448 – FPGA and ASIC Design with VHDL
ASM chart for the multiplier Shift left A , Shift right B Done P A + ¬ = ? s Load A b Reset S3 1 S1 S2 Load B ASM chart for the multiplier ECE 448 – FPGA and ASIC Design with VHDL
ASM chart for the multiplier control circuit ECE 448 – FPGA and ASIC Design with VHDL
VHDL code of multiplier circuit (1) LIBRARY ieee ; USE ieee.std_logic_1164.all ; USE ieee.std_logic_unsigned.all ; USE work.components.all ; ENTITY multiply IS GENERIC ( N : INTEGER := 8; NN : INTEGER := 16 ) ; PORT ( Clock : IN STD_LOGIC ; Resetn : IN STD_LOGIC ; LA, LB, s : IN STD_LOGIC ; DataA : IN STD_LOGIC_VECTOR(N–1 DOWNTO 0) ; DataB : IN STD_LOGIC_VECTOR(N–1 DOWNTO 0) ; P : OUT STD_LOGIC_VECTOR(N–1 DOWNTO 0) ; Done : OUT STD_LOGIC ) ; END multiply ; ECE 448 – FPGA and ASIC Design with VHDL
VHDL code of multiplier circuit (2) ARCHITECTURE Behavior OF multiply IS TYPE State_type IS ( S1, S2, S3 ) ; SIGNAL y : State_type ; SIGNAL Psel, z, EA, EB, EP, Zero : STD_LOGIC ; SIGNAL PF, B, N_Zeros : STD_LOGIC_VECTOR(N–1 DOWNTO 0) ; SIGNAL A, Ain, DataP, Sum : STD_LOGIC_VECTOR(NN–1 DOWNTO 0) ; BEGIN FSM_transitions: PROCESS ( Resetn, Clock ) IF Resetn = '0’ THEN y <= S1 ; ELSIF (Clock'EVENT AND Clock = '1') THEN CASE y IS WHEN S1 => IF s = '0' THEN y <= S1 ; ELSE y <= S2 ; END IF ; WHEN S2 => IF z = '0' THEN y <= S2 ; ELSE y <= S3 ; END IF ; WHEN S3 => IF s = '1' THEN y <= S3 ; ELSE y <= S1 ; END IF ; END CASE ; END IF ; END PROCESS ; ECE 448 – FPGA and ASIC Design with VHDL
VHDL code of multiplier circuit (3) FSM_outputs: PROCESS ( y, s, B(0) ) BEGIN EP <= '0' ; EA <= '0' ; EB <= '0' ; Done <= '0' ; Psel <= '0'; CASE y IS WHEN S1 => EP <= '1‘ ; WHEN S2 => EA <= '1' ; EB <= '1' ; Psel <= '1‘ ; IF B(0) = '1' THEN EP <= '1' ; ELSE EP <= '0' ; END IF ; WHEN S3 => Done <= '1‘ ; END CASE ; END PROCESS ; ECE 448 – FPGA and ASIC Design with VHDL
Datapath for the multiplier DataA LA EA A Clock P DataP Register EP Sum z B b DataB LB EB + 2n n Shift-left register Shift-right Psel 1 Datapath for the multiplier ECE 448 – FPGA and ASIC Design with VHDL
VHDL code of multiplier circuit (4) - - Define the datapath circuit Zero <= '0' ; N_Zeros <= (OTHERS => '0' ) ; Ain <= N_Zeros & DataA ; ShiftA: shiftlne GENERIC MAP ( N => NN ) PORT MAP ( Ain, LA, EA, Zero, Clock, A ) ; ShiftB: shiftrne GENERIC MAP ( N => N ) PORT MAP ( DataB, LB, EB, Zero, Clock, B ) ; z <= '1' WHEN B = N_Zeros ELSE '0' ; Sum <= A + PF ; P <= PF; - - Define the 2n 2-to-1 multiplexers for DataP GenMUX: FOR i IN 0 TO NN–1 GENERATE Muxi: mux2to1 PORT MAP ( Zero, Sum(i), Psel, DataP(i) ) ; END GENERATE; RegP: regne GENERIC MAP ( N => NN ) PORT MAP ( DataP, Resetn, EP, Clock, PF ) ; END Behavior ; ECE 448 – FPGA and ASIC Design with VHDL
Array Multiplier ECE 448 – FPGA and ASIC Design with VHDL
a Multiplicand ak-1ak-2 . . . a1 a0 x Multiplier xk-1xk-2 . . . x1 x0 Notation a Multiplicand ak-1ak-2 . . . a1 a0 x Multiplier xk-1xk-2 . . . x1 x0 p Product (a x) p2k-1p2k-2 . . . p2 p1 p0 ECE 448 – FPGA and ASIC Design with VHDL
Unsigned Multiplication a4 a3 a2 a1 a0 x x4 x3 x2 x1 x0 ax0 20 a4x0 a3x0 a2x0 a1x0 a0x0 ax1 21 a4x1 a3x1 a2x1 a1x1 a0x1 + ax2 22 a4x2 a3x2 a2x2 a1x2 a0x2 ax3 23 a4x3 a3x3 a2x3 a1x3 a0x3 ax4 24 a4x4 a3x4 a2x4 a1x4 a0x4 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 ECE 448 – FPGA and ASIC Design with VHDL
5 x 5 Array Multiplier ECE 448 – FPGA and ASIC Design with VHDL
Array Multiplier - Basic Cell cin x y FA cout s ECE 448 – FPGA and ASIC Design with VHDL
Array Multiplier – Modified Basic Cell am ci si-1 xn FA ci+1 si ECE 448 – FPGA and ASIC Design with VHDL
5 x 5 Array Multiplier with modified cells ECE 448 – FPGA and ASIC Design with VHDL
Pipelined 5 x 5 Multiplier ECE 448 – FPGA and ASIC Design with VHDL
Array Multiplier – Modified Basic Cell am ci si-1 xn FA ci+1 si Flip-flops ECE 448 – FPGA and ASIC Design with VHDL
rising edge rising edge Timing parameters definition units delay time from pointpoint ns rising edge rising edge of clock ns clock period T 1 MHz clock frequency clock period latency time from inputoutput ns throughput #output bits/time unit Mbits/s ECE 448 – FPGA and ASIC Design with VHDL
Latency top-level entity input output 100 MHz clk input(0) input(1) 8 bits 8 bits input D Q Combinational Logic D Q Combinational Logic output D Q clk clk clk 100 MHz clk input(0) input(1) input(2) input (unknown) output(0) output(1) output Latency is the time between input(n) and output(n) i.e. time it takes from first input to first output, second input to second output, etc. Latency is usually constant for a system (but not always) Also called input-to-output latency Count the number of rising edges of the clock! In this example, 2 rising edges from input to output latency is 2 cycles Latency is measured in clock cycles and then translated to units of time (nanoseconds) In this example, say clock period is 10 ns, then latency is 20 ns ECE 448 – FPGA and ASIC Design with VHDL
1 cycle betweeen output samples Throughput top-level entity 8 bits 8 bits input D Q Combinational Logic D Q Combinational Logic output D Q clk clk clk clk input(0) input(1) input(2) input (unknown) output(0) output(1) output 1 cycle betweeen output samples Throughput = (bits per output sample) / (time between consecutive output samples) Bits per output sample: In this example, 8 bits per output sample Time between consecutive output samples: clock cycles between output(n) to output(n+1) Can be measured in clock cycles, then translated to time In this example, time between consecutive output samples = 1 clock cycle = 10 ns Throughput = (8 bits per output sample) / (10 ns) = 0.8 bits / ns = 800 Mbits/s ECE 448 – FPGA and ASIC Design with VHDL
Pipelining—Conceptual Combinational Logic register splits logic in half Combinational Logic A D Q D Q Combinational Logic A D Q clk clk clk tLOGICA = 5 ns tLOGICB = 5 ns Purpose of pipelining is to reduce the critical path of the circuit by inserting an additional register (called a pipeline register) This splits the combinational logic in half Now critical path delay is 5 ns, so maximum clock frequency is 200 MHz Double the clock frequency Area is increased due to additional register In general, pipelining increases throughput at the cost of increased area/power and a minor increase in latency ECE 448 – FPGA and ASIC Design with VHDL