Download presentation
Presentation is loading. Please wait.
1
Xilinx FPGA Architecture Overview
This customer presentation provides an overview of the Xilinx FPGA architecture. The example used is the Virtex/Spartan-II architecture, but the information is generic enough for any Xilinx FPGA family.
2
Virtex/Spartan-II Top-level Architecture
Gate-array like architecture Configurable logic blocks Implement logic here! I/O blocks 16 signal standards Block RAM On-chip memory for higher performance Clocks & Delay-Locked Loop Interconnect resources Three-state internal buses
3
Logic Cell Capacity A better first-order alternative to gate counting
Better comparisons among different FPGAs Logic cell definition: 4-input look-up table + dedicated flip-flop Logic cells per CLB: Xc4000/Spartan (2 4-LUTs, 1 3-LUT, 2 FFs) Virtex/Spartan-II (4 4-LUTs, 1 F5MUX, 4 FFs) Counting CLBs provides a method of comparing different members of a common family, but counting logic cells is more effective to compare across different architectures. A logic cell is the combination of a lookup table and a flip-flop, which is the common building block of all leading FPGAs.
4
Configurable Logic Block (CLB)
Combinational logic generated in a lookup table (LUT) Any function of available inputs LUT output feeds CLB output or D input of flip-flop Combinational Logic Function (LUT) Flip- Flop Inputs Outputs
5
Virtex/Spartan-II Function Generators
Four 4-input function generators Independent inputs (4 functions of 4 inputs) MUXF5 combines 2 LUTs to form 4x1 multiplexer Or any 5-input function MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function CLB Slice LUT MUXF6 LUT MUXF5 Slice LUT LUT MUXF5
6
Lookup Table Generates any function of its inputs
Typically 4 inputs Logically equivalent to a 16 x 1 ROM Inputs Output LUT
7
Targeting LUT-based Logic
LUT limit is on inputs, not complexity Reducing inputs/function (fan-in) to fit CLBs improves density and speed Automatically done by Xilinx synthesis and implementation tools Inverters are free CLB Lookup Table
8
Duplicating Logic Can Improve Results
Collapsing of logic into CLBs affects number of levels required and therefore speed The gates you use will determine mapping Nets with a fanout >1 may be outside a CLB I1 N1 must go to two places, so O1 may require a second level of logic Duplicating first gate allows N1A to always be collapsed inside a single lookup table O1 N1 N1A N1B
9
Defining Lookup Tables With Gate Primitives
Example of gate primitive Up to five inputs with all combinations of inversion AND2B1 indicates 1 “bubbled” or inverted input Up to nine inputs non-inverted Add external INV primitives if desired AND2
10
Flip-Flops Stores data (D) on rising edge of clock (K)
Clock enable (CE) Asynchronous clear (C) K CE C D Q X x 1 x 0 1 0 d d 0 x 0 x q D K Q C CE
11
Additional Flip-Flop Controls
Reset (Clear) and/or Set Global initialization (GSR) Use to initialize all flip-flops Programmable clock polarity Clock enable can be left unconnected
12
Virtex/Spartan-II CLB Slice
1 CLB holds 2 slices Each slice has two sets of Four-input LUT Any 4-input logic function Or 16-bit x 1 RAM Or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or Async. Control
13
Dedicated Multiplier Logic
Highly efficient ‘Shift & Add’ implementation For a 16x16 multiplier 30% reduction in area 1 less logic level
14
On-chip RAM All Xilinx FPGAs use RAM-based programming
Adding Write Enable to LUT creates on-chip SelectRAM memory
15
SelectRAM Benefits Single-Port Dual-Port Synchronous Simple timing
Data Write Enable Write Clock Address Output Data Write Enable Write Clock Write Address/ Single-Port Read Address Single-Port Output Dual-Port Dual-Port Read Address
16
Memory Bandwidth and Flexibility
Virtex/Spartan-II On-Chip SelectRAM+ Memory Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory Deep/Wide SDRAM ZBTRAM SSRAM SGRAM DSP Coefficients Small FIFOs Shallow/Wide 4Kx1 2Kx2 1Kx4 512x8 256x16 16x1 Distributed RAM Block RAM External RAM bytes kilobytes megabytes 200 MHz Memory Continuum
17
Spartan-II Memory CLB LUTs provide small distributed RAM (16 bits/LUT)
Block RAM provides 4K bits each Dual read/write port. Each port has… Independent Clock, R/W, and Enable Independently configurable data width from 4K x 1 to 256 x 16 W Port A Spartan-II Dual-R/W Port Block RAM Port B R R W W W R R
18
I/O Block (IOB) Periphery of identical I/O blocks
Input, output, or bi-directional Direct or registered (or latched input) Pullup/Pulldown Programmable slew rate Three-state output Programmable thresholds IOB I Pad O TS Bonded to Package Pin Clocks
19
Use Special IOB Primitives
User explicitly defines what resources in the IOB are to be used I/Os are defined with 1 pad primitive At least 1 function primitive 1 input element, 1 output element or both Inverters may also be pulled into IOBs IPAD IBUF
20
Locking Down I/O Locations
LOC=Pxx attribute defines I/O pad location(s) Avoid locking IOBs early Makes routing more difficult Use IOB LOC= to lock pins late in design cycle once PCB is built Can lock IOBs if floorplanning the connected CLBs
21
Use Pullups/Pulldowns
Pullup automatically connected on unused IOBs User can specify PULLUP or PULLDOWN primitive on used IOBs Inputs should not be left floating Add Pullup to design inputs that may be left floating to reduce power and noise IPAD IBUF
22
Faster Setup With NODELAY
Delay included by default Compensates for clock routing delay to prevent hold time NODELAY attribute removes delay element Creates hold time Example IOB External Data External Clock Routed Clock Pad Q D Delay External Data X Input Buffer Delay Data X External Clock Routing Delay Pad
23
Slew Rate Control Slew rate controls output speed
Default slow slew rate reduces noise & ground bounce Use fast slew rate wherever speed is important FAST parameter on output logic primitive FAST OPAD OBUF
24
Output Three-State Control
Free inverter on output buffer control Use OBUFE macro for active-high enable Use OBUFT primitive for active-low enable OE OBUFE T OBUFT
25
Global Three-State 3-state control either local and/or via a dedicated global net Global three-state controlled by STARTUP... primitive STARTUP GTS GSR
26
Virtex/Spartan-II I/O Block (Simplified)
27
Multiple I/O Interface Standards
16 to 20 I/O interface standards supported CMOS, HSTL, SSTL, GTL, CTT, PCI As many as eight banks on a device Package dependent Different banks can support different standards at the same time Logic level translation Boards with mixed standards
28
High Performance Routing
Hierarchical Routing Singles, Hexes, Longs Sparse connections on longer interconnects for high speed Routing delay depends primarily on distance Direction independent Device-size independent Predictable for early design analysis Vector Based Interconnect 2ns 2ns 2ns 2ns CLB Array
29
Flexible General-Purpose Interconnect
Flexible but slow if crosses many channels Programmable switch matrix at each channel crossing Connects across, changes direction or fans out
30
Switch Matrix Bidirectional pass transistors High routing flexibility
31
Reduce Fanout Higher fanout nets (>16 loads) are harder to route & slower Consider duplicating source in schematic to improve routing or speed D Q fn1
32
Long Lines for High Fanout Nets
Metal lines that traverse length & width of chip Lowest skew Ideal for high fan-out signals Ideal for clocking Requires vertical or horizontal alignment of loads CLB
33
Internal Three-State Buses
Two 3-state drivers per CLB OR-AND logic implementation in place of 3-state drivers With no drivers enabled, bus is a logic 1 Low power No danger of contention when multiple BUFTs enabled No physical pullups or large capacitance to drive
34
General Clock Support Use clock buffers for highest fanout clocks
Drive high-speed long line resources Lowest skew across a device No internal hold times Use generic BUFG primitive Allows software to choose best type of buffer Allows easy migration across families Four dedicated global low skew buffers Dedicated input pin (clock distribution only) Additional shared resources (i.e., long lines) Distribute low-skew/high-fanout signals (10ns max.) Four delay-locked loops on each device All-digital implementation Two global buffers associated with each DLL pair
35
Configuration Schematic or HDL description is converted to a configuration file by the Xilinx development system Configuration file is loaded into FPGA on power-up Stored in configuration latches Controls CLBs, IOBs, interconnect, etceteras Configuration is the process of programming the FPGA. The programming file is often maintained in a PROM on the board and loaded into the FPGA on power-up.
36
Configuration Bitstream
Binary programming file Length depends only on device, not utilization Typically 1 ms per bit (total from a few ms to <1s) FPGA can load its configuration automatically on power-up, or under microprocessor control Can be loaded directly into device/configuration PROM The programming file is called a bitstream. The FPGA programs very quickly after power-up.
37
Configuration Modes Bit-serial configuration
Simple, uses few device pins Controlled by FPGA (Master) or externally (Slave) Xilinx serial proms available Byte-parallel configuration Can drive PROM addresses (Master) Can be microprocessor-controlled The user can select one of several configuration methods, according to the needs of the system. The Xilinx device can program itself from an external serial or parallel PROM, or be programmed under microprocessor control. Note that parallel configuration modes are not available in the Spartan Series.
38
Configuration Pins Configuration starts on power-up
Mode pin(s) checked to determine method Usable as extra I/O after configuration All I/O not used for configuration are disabled Reconfiguration possible by pulling PROGRAM pin low Three MODE pins on the device are driven high or low at power-up to determine the configuration mode. At power-up and during configuration, all I/O pins are disables and all flip-flops are initialized. The device can be re-programmed by pulling the PROGRAM pin low.
39
Readback Configuration data can be read back serially
Allows verification of programming Readback data can include user-register values Allows in-circuit functional verification Requires READBACK... symbol RIP DATA TRIG CLK READBACK
40
Boundary Scan IEEE 1149.1-compatible boundary scan (JTAG)
Available before configuration Configuration & readback possible via boundary scan logic IEEE compatible boundary scan is provided to simplify board-level testing.
41
Power Consumption CMOS SRAM technology provides low standby power
Operating power is mostly dynamic Proportional to transition frequency of internal nodes Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power FPGA power is almost entirely due to switching of capacitive metal. Xilinx segmented interconnect minimizes the amount of metal used to create a net, which also minimizes power requirements.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.