ECE 448 Lecture 7 FPGA Devices ECE 448 – FPGA and ASIC Design with VHDL
Reading Required P. Chu, FPGA Prototyping by VHDL Examples Chapter 2.2, FPGA Recommended S. Brown and Z. Vranesic, Fundamentals of Digital Logic with VHDL Design Chapter 3.6.5 Field-Programmable Gate Arrays ECE 448 – FPGA and ASIC Design with VHDL
Recommended Reading Xilinx, Inc. Spartan-3E FPGA Family Module 1: Introduction Features Architectural Overview Package Marking Module 2: Configurable Logic Block (CLB) and Slice Resources Dedicated Multipliers ECE 448 – FPGA and ASIC Design with VHDL
Required Reading Xilinx, Inc. Spartan-3 Generation FPGA User Guide Extended Spartan-3A, Spartan-3E, and Spartan-3 FPGA Families Chapter 5 Using Configurable Logic Blocks (CLBs) Chapter 6 Using Look-Up Tables as Distributed RAM Chapter 7 Using Look-Up Tables as Shift Registers (SRL16) [up to Library Primitives] ECE 448 – FPGA and ASIC Design with VHDL
Two competing implementation approaches FPGA Field Programmable Gate Array ASIC Application Specific Integrated Circuit designed all the way from behavioral description to physical layout no physical layout design; design ends with a bitstream used to configure a device designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves ECE 448 – FPGA and ASIC Design with VHDL
What is an FPGA? Configurable Logic Blocks I/O Blocks Block RAMs ECE 448 – FPGA and ASIC Design with VHDL
Which Way to Go? ASICs FPGAs Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Reconfigurability ECE 448 – FPGA and ASIC Design with VHDL
Other FPGA Advantages Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits Easy upgrades like in case of software Unique applications reconfigurable computing ECE 448 – FPGA and ASIC Design with VHDL
Major FPGA Vendors SRAM-based FPGAs Xilinx, Inc. Altera Corp. Lattice Semiconductor Atmel Flash & antifuse FPGAs Actel Corp. (Microsemi SoC Products Group) Quick Logic Corp. ~ 51% of the market ~ 85% ~ 34% of the market ECE 448 – FPGA and ASIC Design with VHDL
ISE Alliance and Foundation Series Design Software Xilinx Primary products: FPGAs and the associated CAD software Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan) Samsung (Korea) Programmable Logic Devices ISE Alliance and Foundation Series Design Software ECE 448 – FPGA and ASIC Design with VHDL
Xilinx FPGA Families High-performance families Virtex (220 nm) Virtex-E, Virtex-EM (180 nm) Virtex-II (130 nm) Virtex-II PRO (130 nm) Virtex-4 (90 nm) Virtex-5 (65 nm) Virtex-6 (40 nm) Virtex-7 (28 nm) Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 (90 nm) Spartan-3E (90 nm) – logic optimized Spartan-3A (90 nm) – I/O optimized Spartan-3AN (90 nm) – non-volatile, Spartan-3A DSP (90 nm) – DSP optimized Spartan-6 (45 nm) Artix-7 (28 nm) ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
CLB Structure ECE 448 – FPGA and ASIC Design with VHDL
General structure of an FPGA The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL
Xilinx Spartan 3E CLB ECE 448 – FPGA and ASIC Design with VHDL The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL
CLB Slice = 2 Logic Cells SLICE COUT YB Look-Up Table Carry & Control Logic Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. CK EC R CIN CLK CE SLICE ECE 448 – FPGA and ASIC Design with VHDL
Xilinx Multipurpose LUT (MLUT) 16 x 1 ROM (logic) The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
CLB Structure ECE 448 – FPGA and ASIC Design with VHDL The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. ECE 448 – FPGA and ASIC Design with VHDL
CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM (SLICEM only) or 16-bit shift register (SLICEM only) Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control Two slices form a CLB. These slices can be used independently or together for wider logic functions.Within each slice also, the LUT and the flip flop can be used for the same function or for independent functions. The flip flops do not handcuff the designers into only having a set or clear. And for more ASIC like flows, the flip flop can be sued as latch. So, the designers do not need to re-code the design for the device architecture. ECE 448 – FPGA and ASIC Design with VHDL
Multipurpose Look-Up Table (MLUT) COUT YB Look-Up Table Carry & Control Logic Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. CK EC R CIN CLK CE SLICE
MLUT as 16x1 ROM 16 x 1 ROM (logic) The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL
LUT (Look-Up Table) in the Basic ROM Mode Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs ECE 448 – FPGA and ASIC Design with VHDL
5-Input Functions implemented using two LUTs One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT ECE 448 – FPGA and ASIC Design with VHDL
5-Input Functions implemented using two LUTs OUT LUT ECE 448 – FPGA and ASIC Design with VHDL
MLUT as 16x1 RAM 16 x 1 ROM (logic) The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL
Distributed RAM = or CLB LUT configurable as Distributed RAM RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S A4 RAM16X2S O1 D0 D1 O0 = LUT or RAM16X1D SPO DPRA0 DPO DPRA1 DPRA2 DPRA3 CLB LUT configurable as Distributed RAM A single LUT equals 16x1 RAM Two LUTs Implement Single and Dual-Port RAMs Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read When the CLB LUT is configured as memory, it can implement 16x1 synchronous RAM. One LUT can implement 16x1 Single-Port RAM. Two LUTs are used to implement 16x1 dual port RAM. The LUTs can be cascaded for desired memory depth and width. The write operation is synchronous. The read operation is asynchronous and can be made synchronous by using the accompanying flip flops of the CLB LUT. The distributed ram is compact and fast which makes it ideal for small ram based functions. ECE 448 – FPGA and ASIC Design with VHDL
MLUT as 16-bit Shift Register (SRL16) 16 x 1 ROM (logic) The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL
Shift Register = Each LUT can be configured as shift register Q CE LUT IN CLK DEPTH[3:0] OUT = Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth The LUT can be configured as a shift register (serial in, serial out) with bit width programmable from 1 to 16. For example, DEPTH[3:0] = 0010(binary) means that the shift register is 3-bit wide. In the simplest case, a 16 bit shift register can be implemented in a LUT, eliminating the need for 16 flip flops, and also eliminating extra routing resources that would have been lowered the performance otherwise. ECE 448 – FPGA and ASIC Design with VHDL
Using Multipurpose Look-Up Tables in the Shift Register Mode (SRL16) Inferred from behavioral description in VHDL for shift-registers with one serial input, one serial output no reset, no set ECE 448 – FPGA and ASIC Design with VHDL
Cascading LUT Shift Registers into Shift Registers Longer than 16 bits ECE 448 – FPGA and ASIC Design with VHDL
Shift Register Register-rich FPGA 64 Operation A 4 Cycles 8 Cycles Operation B 3 Cycles Operation C 12 Cycles 9-Cycle imbalance Register-rich FPGA Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality In this example, there is a cycle imbalance, which must be fixed. Let’s think of how the shift register can fix the imbalanced cycles. As seen from the slide, the logic will be off by nine clock cycles. ECE 448 – FPGA and ASIC Design with VHDL
Logic Cell = ½ of a CLB Slice ECE 448 – FPGA and ASIC Design with VHDL
CLB Slice = 2 Logic Cells ECE 448 – FPGA and ASIC Design with VHDL
Examples: Determine the amount of Spartan 3 resources needed to implement a given circuit
Circuit 1: Top level m run w F clk a b c d y 1 R0 R1 R2 R3 R4 R5 R6 R7 1 run Circuit 1: Top level
Circuit 1: F – function a 2 3 b 4 5 6 c 7 1 d f e <<<3 g h 2 3 4 5 6 7 cin x y cout s <<<3 x3 x2 x1 x0 y3 y2 y1 y0 w1 w0 En a b c d e f 2-to-4 Decoder Full Adder g h
Circuit 2: Top level d run z F clk a b c d e y 1 R0 R1 R2 R3 R4 R5 R6 1 run z R0 R1 R2 R3 R4 R5 R6 a b c d e R7 F R8 y R9 R10 R11 R12 R13 R14 R15 clk
Circuit 2: F – function a e a 1 2 b 3 4 5 c 6 7 1 d 1 g y h a f b g 1 2 3 4 5 6 7 a w3 w2 w1 w0 y1 y0 z b c 1 d 1 g y Priority Encoder h a x3 x2 x1 x0 y3 y2 y1 y0 3 f b g >>2 c h d s i cout Half Adder x y e i
Carry & Control Logic SLICE ECE 448 – FPGA and ASIC Design with VHDL COUT YB Look-Up Table Carry & Control Logic Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. CK EC R CIN CLK CE SLICE ECE 448 – FPGA and ASIC Design with VHDL
Full-adder x cout FA y s cin x + y + cin = ( cout s )2 x y cin cout s 1 1 1 1 1 2 1 x + y + cin = ( cout s )2
Alternative implementations Full-adder Alternative implementations x y cout s 1 1 1 cin cin cin cin cin cin
Alternative implementations Full-adder Alternative implementations Implementation used to generate fast carry logic in Xilinx FPGAs x y A2 A1 XOR D 1 Cin Cout S p g x y cout 1 cin p = x y g = y s= p cin = x y cin
Carry & Control Logic in Spartan 3 FPGAs LUT Hardwired (fast) logic
Critical Path for an Adder Implemented Using Xilinx Spartan 3/Spartan 3E FPGAs
Number and Length of Carry Chains for Spartan 3E FPGAs
Bottom Operand Input to Carry Out Delay TOPCYF 0.9 ns for Spartan 3
Carry Propagation Delay tBYP 0.2 ns for Spartan 3
Carry Input to Top Sum Combinational Output Delay TCINY 1.2 ns for Spartan 3
Fast Carry Logic Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources MSB Carry Logic Routing LSB ECE 448 – FPGA and ASIC Design with VHDL
Accessing Carry Logic All major synthesis tools can infer carry logic for arithmetic functions Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then…) Counters (count <= count +1) ECE 448 – FPGA and ASIC Design with VHDL
Embedded Multipliers ECE 448 – FPGA and ASIC Design with VHDL
RAM Blocks and Multipliers in Xilinx FPGAs The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL
Combinational and Registered Multiplier ECE 448 – FPGA and ASIC Design with VHDL
Dedicated Multiplier Block ECE 448 – FPGA and ASIC Design with VHDL
Interface of a Dedicated Multiplier ECE 448 – FPGA and ASIC Design with VHDL
3 Ways to Use Dedicated Hardware Three (3) ways to use dedicated (embedded) hardware Inference Instantiation CORE Generator
Inferred Multiplier library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity mult18x18 is generic ( word_size : natural := 17; signed_mult : boolean := true); port ( clk : in std_logic; a : in std_logic_vector(1*word_size-1 downto 0); b : in std_logic_vector(1*word_size-1 downto 0); c : out std_logic_vector(2*word_size-1 downto 0)); end entity mult18x18; architecture infer of mult18x18 is begin process(clk) if rising_edge(clk) then if signed_mult then c <= std_logic_vector(signed(a) * signed(b)); else c <= std_logic_vector(unsigned(a) * unsigned(b)); end if; end process; end architecture infer;
Unsigned vs. Signed Multiplication 1111 15 1111 -1 x x x x 11100001 225 00000001 1 ECE 448 – FPGA and ASIC Design with VHDL
Forcing a particular implementation in VHDL Synthesis tool: Xilinx XST Attribute MULT_STYLE: string; Attribute MULT_STYLE of mult18x18: entity is block; Allowed values of the attribute: block – dedicated multiplier lut - LUT-based multiplier pipe_block – pipelined dedicated multiplier pipe_lut – pipelined LUT-based multiplier auto – automatic choice by the synthesis tool
CORE Generator
CORE Generator
FPGA Block RAM
Block RAM Most efficient memory implementation Spartan-3 Dual-Port Port A Port B Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 36 memory blocks in Spartan 3E 18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM) The Block Ram is true dual port, which means it has 2 independent Read and Write ports and these ports can be read and/or written simultaneously, independent of each other. All control logic is implemented within the RAM so no additional CLB logic is required to implement dual port configuration. The Altera 10KE and ACEX 1K families have only 2-port RAM. To emulate dual port capability, they would need twice the number of memory blocks and at half the performance.
RAM Blocks and Multipliers in Xilinx FPGAs The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
Spartan-3E Block RAM Amounts
Block RAM can have various configurations (port aspect ratios) 1 2 4 4k x 4 8k x 2 4,095 16k x 1 8,191 8+1 2k x (8+1) 2047 16+2 1024 x (16+2) 1023 16,383
Block RAM Port Aspect Ratios
Single-Port Block RAM DO[w-p-1:0] DI[w-p-1:0]
Dual-Port Block RAM DOA[wA-pA-1:0] DIA[wA-pA-1:0] DOA[wB-pB-1:0] DIB[wB-pB-1:0]
Input/Output Blocks (IOBs) ECE 448 – FPGA and ASIC Design with VHDL
Basic I/O Block Structure Three-State D Q FF Enable EC Three-State Control Clock SR Set/Reset Output D Q FF Enable EC Output Path SR Direct Input FF Enable Input Path Registered Input Q D EC SR ECE 448 – FPGA and ASIC Design with VHDL
IOB Functionality IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered advised for high-performance I/O Inputs can be delayed ECE 448 – FPGA and ASIC Design with VHDL
Spartan-3E Family Attributes ECE 448 – FPGA and ASIC Design with VHDL
Spartan-3E FPGA Family Members ECE 448 – FPGA and ASIC Design with VHDL
FPGA Nomenclature ECE 448 – FPGA and ASIC Design with VHDL
FPGA device present on the Digilent Basys2 board XC3S100E-4CP132 Spartan 3E family 100 k equivalent logic gates speed grade -4 = standard performance 132 pins package type ECE 448 – FPGA and ASIC Design with VHDL