George Mason University ECE 645 – Computer Arithmetic Introduction to FPGA Devices
2ECE 645 – Computer Arithmetic World of Integrated Circuits Integrated Circuits Full-Custom ASICs Semi-Custom ASICs User Programmable PLDFPGA PALPLAPML LUT (Look-Up Table) MUXGates
3ECE 645 – Computer Arithmetic designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves Two competing implementation approaches ASIC Application Specific Integrated Circuit FPGA Field Programmable Gate Array designed all the way from behavioral description to physical layout no physical layout design; design ends with a bitstream used to configure a device
4ECE 645 – Computer Arithmetic Block RAMs Configurable Logic Blocks I/O Blocks What is an FPGA? Block RAMs
5ECE 645 – Computer Arithmetic Which Way to Go? Off-the-shelf Low development cost Short time to market Reconfigurability High performance ASICsFPGAs Low power Low cost in high volumes
6ECE 645 – Computer Arithmetic Other FPGA Advantages Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits Easy upgrades like in case of software Unique applications reconfigurable computing
7ECE 645 – Computer Arithmetic Major FPGA Vendors SRAM-based FPGAs Xilinx, Inc. Altera Corp. Atmel Lattice Semiconductor Flash & antifuse FPGAs Actel Corp. Quick Logic Corp. Share over 60% of the market
8ECE 645 – Computer Arithmetic Xilinx Primary products: FPGAs and the associated CAD software Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan ) Programmable Logic Devices ISE Alliance and Foundation Series Design Software
9ECE 645 – Computer Arithmetic Xilinx FPGA Families Old families XC3000, XC4000, XC5200 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. High-performance families Virtex (0.22µm) Virtex-E, Virtex-EM (0.18µm) Virtex-II, Virtex-II PRO (0.13µm) Virtex-4 (0.09µm) Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3
10ECE 645 – Computer Arithmetic
11ECE 645 – Computer Arithmetic Xilinx FPGA Block Diagram
12ECE 645 – Computer Arithmetic CLB Structure
13ECE 645 – Computer Arithmetic CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control
14ECE 645 – Computer Arithmetic LUT (Look-Up Table) Functionality Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs
15ECE 645 – Computer Arithmetic 5-Input Functions implemented using two LUTs One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT
16ECE 645 – Computer Arithmetic 5-Input Functions implemented using two LUTs LUT OUT
17ECE 645 – Computer Arithmetic RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0DPO DPRA1 DPRA2 DPRA3 or Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual- Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read
18ECE 645 – Computer Arithmetic DQ CE DQ DQ DQ LUT IN CE CLK DEPTH[3:0] OUT LUT = Shift Register Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth
19ECE 645 – Computer Arithmetic Shift Register Register-rich FPGA Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality 64 Operation A 4 Cycles8 Cycles Operation B 3 Cycles Operation C Cycles 3 Cycles 9-Cycle imbalance
20ECE 645 – Computer Arithmetic COUT D Q CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table F5IN BY SR S Carry & Control Logic CIN CLK CE SLICE Carry & Control Logic
21ECE 645 – Computer Arithmetic Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources Fast Carry Logic LSB MSB Carry Logic Routing
22ECE 645 – Computer Arithmetic Accessing Carry Logic All major synthesis tools can infer carry logic for arithmetic functions Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then…) Counters (count <= count +1)
23ECE 645 – Computer Arithmetic Block RAM Spartan-II True Dual-Port Block RAM Port A Port B Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 104 memory blocks 18 kbits = 18,432 bits per block Use multiple blocks for larger memories Builds both single and true dual-port RAMs
24ECE 645 – Computer Arithmetic Spartan-3 Block RAM Amounts
25ECE 645 – Computer Arithmetic Block RAM Port Aspect Ratios
26ECE 645 – Computer Arithmetic Block RAM Port Aspect Ratios 0 16, , , k x 1 8k x 2 4k x 4 2k x (8+1) 1024 x (16+2)
27ECE 645 – Computer Arithmetic Dual Port Block RAM
28ECE 645 – Computer Arithmetic RAMB4_S4_S16 Port A Out 18-Bit Width Port B In 2k-Bit Depth Port A In 1K-Bit Depth Port B Out 9-Bit Width DOA[17:0] DOB[8:0] WEA ENA RSTA ADDRA[9:0] CLKA DIA[17:0] WEB ENB RSTB ADDRB[8:0] CLKB DIB[15:0] Dual-Port Bus Flexibility Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic
29ECE 645 – Computer Arithmetic VCC, ADDR[12:0] GND, ADDR[12:0] RAMB4_S1_S1 Port B Out 1-Bit Width DOA[0] DOB[0] WEA ENA RSTA ADDRA[12:0] CLKA DIA[0] WEB ENB RSTB ADDRB[12:0] CLKB DIB[0] Port B In 8K-Bit Depth Port A Out 1-Bit Width Port A In 8K-Bit Depth Two Independent Single-Port RAMs To access the lower RAM Tie the MSB address bit to Logic Low To access the upper RAM Tie the MSB address bit to Logic High Added advantage of True Dual- Port No wasted RAM Bits Can split a Dual-Port 16K RAM into two Single-Port 8K RAM Simultaneous independent access to each RAM
30ECE 645 – Computer Arithmetic New 18 x 18 Embedded Multiplier Fast arithmetic functions Optimized to implement multiply / accumulate modules
31ECE 645 – Computer Arithmetic 18 x 18 Multiplier Embedded 18-bit x 18-bit multiplier 2’s complement signed operation Multipliers are organized in columns 18 x 18 Multiplier Output (36 bits) Data_A (18 bits) Data_B (18 bits) Note: See Virtex-II Data Sheet for updated performances
32ECE 645 – Computer Arithmetic Basic I/O Block Structure D EC Q SR D EC Q SR D EC Q SR Three-State Control Output Path Input Path Three-State Output Clock Set/Reset Direct Input Registered Input FF Enable
33ECE 645 – Computer Arithmetic IOB Functionality IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered advised for high-performance I/O Inputs can be delayed
34ECE 645 – Computer Arithmetic Routing Resources PSM CLB PSM CLB Programmable Switch Matrix
35ECE 645 – Computer Arithmetic Clock Distribution
36ECE 645 – Computer Arithmetic Spartan-3 FPGA Family Members
37ECE 645 – Computer Arithmetic FPGA Nomenclature
38ECE 645 – Computer Arithmetic Device Part Marking We’re Using: XC3S100-4FG256
39ECE 645 – Computer Arithmetic
40ECE 645 – Computer Arithmetic Virtex-II 1.5V Architecture C onfigurable L ogic B lock Block RAMs I / O B lock Multipliers 18 x 18 Block RAMs Multipliers 18 x 18 Block RAMs Multipliers 18 x 18 Block RAMs Multipliers 18 x 18
41ECE 645 – Computer Arithmetic Virtex-II 1.5V DeviceCLB Array SlicesMaximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits XC2V408x ,192 XC2V8016x ,384 XC2V25024x161, ,152 XC2V50032x243, ,304 XC2V100040x325, ,840 XC2V150048x407, ,760 XC2V200056x4810, ,064 XC2V300064x5614, ,752 XC2V400080x7223, ,280 XC2V600096x8833,7921, ,081,344 XC2V x10446,5921, ,490,944
42ECE 645 – Computer Arithmetic Virtex-II Block SelectRAM Virtex-II BRAM is 18 kbits Additional “parity” bits available in selected configurations WidthDepthAddressDataParity 116,386[13:0][0]N/A 28,192[12:0][1:0]N/A 44,096[11:0][3:0]N/A 92,048[10:0][7:0][0] 181,024[9:0][15:0][1:0] 36512[8:0][31:0][3:0]
George Mason University ECE 645 – Computer Arithmetic Using Library Components in VHDL Code
44ECE 645 – Computer Arithmetic RAM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X1_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC; DATA_OUT : out STD_LOGIC ); end RAM_16X1_DISTRIBUTED;
45ECE 645 – Computer Arithmetic RAM 16x1 (2) architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is attribute INIT : string; attribute INIT of RAM16X1_S_1: label is "F0C1"; -- Component declaration of the "ram16x1s(ram16x1s_v)" unit -- File name contains "ram16x1s" entity:./src/unisim_vital.vhd component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component;
46ECE 645 – Computer Arithmetic RAM 16x1 (3) begin RAM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN, WCLK=>CLK, WE=>WE ); end RAM_16X1_DISTRIBUTED_STRUCTURAL;
47ECE 645 – Computer Arithmetic RAM 16x8 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity RAM_16X8_DISTRIBUTED is port( CLK : in STD_LOGIC; WE : in STD_LOGIC; ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0); DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0) ); end RAM_16X8_DISTRIBUTED;
48ECE 645 – Computer Arithmetic RAM 16x8 (2) architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is attribute INIT : string; attribute INIT of RAM16X1_S_1: label is "0000"; -- Component declaration of the "ram16x1s(ram16x1s_v)" unit -- File name contains "ram16x1s" entity:./src/unisim_vital.vhd component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component;
49ECE 645 – Computer Arithmetic RAM 16x8 (3) begin GENERATE_MEMORY: for I in 0 to 7 generate RAM_16X1_S_1: ram16x1s generic map (INIT => X"0000") port map (O=>DATA_OUT(I), A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>DATA_IN(I), WCLK=>CLK, WE=>WE ); end generate; end RAM_16X8_DISTRIBUTED_STRUCTURAL;
50ECE 645 – Computer Arithmetic ROM 16x1 (1) library IEEE; use IEEE.STD_LOGIC_1164.all; library UNISIM; use UNISIM.all; entity ROM_16X1_DISTRIBUTED is port( ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_OUT : out STD_LOGIC ); end ROM_16X1_DISTRIBUTED;
51ECE 645 – Computer Arithmetic ROM 16x1 (2) architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is attribute INIT : string; attribute INIT of ROM16X1_S_1: label is "F0C1"; component ram16x1s generic( INIT : BIT_VECTOR(15 downto 0) := X"0000"); port( O : out std_ulogic; A0 : in std_ulogic; A1 : in std_ulogic; A2 : in std_ulogic; A3 : in std_ulogic; D : in std_ulogic; WCLK : in std_ulogic; WE : in std_ulogic); end component; signal Low : std_ulogic := ‘0’;
52ECE 645 – Computer Arithmetic ROM 16x1 (3) begin ROM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1") port map (O=>DATA_OUT, A0=>ADDR(0), A1=>ADDR(1), A2=>ADDR(2), A3=>ADDR(3), D=>Low, WCLK=>Low, WE=>Low ); end ROM_16X1_DISTRIBUTED_STRUCTURAL;