Download presentation
Presentation is loading. Please wait.
1
Xilinx FPGA Architecture
Gate-array-like architecture Programmable logic, I/O & interconnect Programmable Interconnect I/O Blocks (IOBs) Configurable Logic Blocks (CLBs)
2
Logic Cell Capacity A better first-order alternative to gate counting
Better comparisons among different FPGAs Logic cell definition: 4-input look-up table + dedicated flip-flop Logic cells per CLB: XC (2 4-LUTs, 1 3-LUT, 2 FFs) Spartan (2 4-LUTs, 1 3-LUT, 2 FFs) Virtex (4 4-LUTs, 1 F5MUX, 4 FFs) XC (4 4-LUTs, 4 FFs) Counting CLBs provides a method of comparing different members of a common family, but counting logic cells is more effective to compare across different architectures. A logic cell is the combination of a lookup table and a flip-flop, which is the common building block of all leading FPGAs.
3
Configurable Logic Block (CLB)
Combinational logic generated in a lookup table (LUT) Any function of available inputs LUT output feeds CLB output or D input of flip-flop Combinational Logic Function (LUT) Flip- Flop Inputs Outputs
4
XC4000/Spartan Series Function Generators
Two 4-input function generators Independent inputs (2 functions of 4 inputs) One 3-input function generator Independent inputs Combines 4-input functions to make any 5-input function & some 9-input functions F H G
5
Lookup Table Generates any function of its inputs
Typically 4 inputs Logically equivalent to a 16x1 ROM Inputs Output LUT
6
Targetting LUT-based Logic
LUT limit is on inputs, not complexity Reducing inputs/function (fan-in) to fit CLBs improves density and speed Automatically done by Xilinx synthesis and implementation tools Inverters are free CLB Lookup Table
7
Duplicating Logic Can Improve Results
Collapsing of logic into CLBs affects number of levels required and therefore speed The gates you use will determine mapping Nets with a fanout >1 may be outside a CLB O1 O1 I1 N1A I1 N1 N1B N1 must go to two places, so O1 may require a second level of logic Duplicating first gate allows N1A to always be collapsed inside a single lookup table
8
Defining Lookup Tables with Gate Primitives
Example of gate primitive Up to five inputs with all combinations of inversion AND2B1 indicates 1 “bubbled” or inverted input Up to nine inputs non-inverted Add external INV primitives if desired AND2
9
Flip-Flops Stores data (D) on rising edge of clock (K)
Clock Enable (CE) Asynchronous Clear (C) K CE C D Q X X 1 X 0 1 0 D D 0 X 0 X Q D Q CE K C
10
Additional Flip-Flop Controls
Reset (Clear) or Set Global initialization (GSR) Programmable clock polarity Clock enable can be left unconnected
11
Use Global Reset All flip-flops initialized on configuration and global net Source of global net specified via STARTUP component GSR Q2 GTS Q3 STARTUP Q1Q4 DoneIn CLK
12
Direct Input Direct Input bypasses LUT and goes directly to flip-flop
Provides higher speed if no logic is required Frees LUT for other functions DIN D Q LUT
13
On-Chip RAM All Xilinx FPGAs use RAM-based programming
Adding Write Enable to LUT creates on-chip SelectRAM memory
14
SelectRAM Benefits Asynchronous Synchronous Dual-Port
Compatible with original XC4000 Synchronous Simpler timing Dual-Port Data Write Enable Address Output Data Write Enable Write Clock Address Output Data Write Enable Write Clock Write Address/ Single-Port Read Address Single-Port Output Dual-Port Dual-Port Read Address
15
Write Clock Same clock as for flip-flops Programmable polarity
Independent of flip-flop polarity Self-timed write Latches data, write enable, address on edge Generates write pulse No effect on read operation The clock used for synchronous RAM is the same as the one used for the flip-flops in the CLB. The clock edge generates the necessary pulse to write data into the block of RAM. The read function is still asynchronous to the clock.
16
Supported RAM Modes Per CLB: X 16 x 1 16 x 2 32 x 1 Edge- Triggered
Timing Level- Sensitive Timing Single- Port X X X X X Dual- Port X X
17
I/O Block (IOB) Periphery of identical I/O blocks
Input, output, or bidirectional Direct or registered (or latched input) Pullup/pulldown Programmable slew rate Three-state output Programmable thresholds IOB I Pad O TS Bonded to Package Pin Clocks
18
Use Special IOB Primitives
User explicitly defines what resources in the IOB are to be used I/Os are defined with 1 pad primitive At least 1 function primitive 1 input element, 1 output element or both Inverters may also be pulled into IOBs IPAD IBUF
19
Locking Down I/O Locations
LOC=Pxx attribute defines I/O pad location(s) Avoid locking IOBs early Makes routing more difficult Use IOB LOC= to lock pins late in design cycle once PCB is built Can lock IOBs if floorplanning the connected CLBs
20
Use Pullups/Pulldowns
Pullup automatically connected on unused IOBs User can specify PULLUP or PULLDOWN primitive on used IOBs Inputs should not be left floating Add pullup to design inputs that may be left floating to reduce power and noise IPAD IBUF
21
Faster Setup with NODELAY
Delay included by default Compensates for clock routing delay to prevent hold time NODELAY attribute removes delay element Creates hold time Example IOB External Data External Clock Routed Clock External Data Delay Data X Pad Q D Delay Input Buffer External Clock Routing Delay Pad
22
Slew Rate Control Slew rate controls output speed
Default slow slew rate reduces noise & ground bounce Use fast slew rate wherever speed is important FAST parameter on output logic primitive FAST OPAD OBUF
23
Output Three-State Control
Free inverter on output buffer control Use OBUFE macro for active-high enable Use OBUFT primitive for active-low enable OE OBUFE T OBUFT OE T
24
Global Three-State 3-state control either local and/or via a dedicated global net Global three-state controlled by STARTUP primitive STARTUP GTS GSR
25
I/O Thresholds 5V devices have globally selectable TTL or CMOS I/O thresholds Inputs and outputs separately controllable Default is TTL 3V devices can interface to 5V or 3V logic 2.5V Virtex devices have programmable interfaces 5V devices can be set to CMOS or TTL thresholds according to an option set when the configuration file is produced.
26
Programmable Interconnect
Resources to create arbitrary interconnection networks Various types of interconnect Flexible general-purpose interconnect Low-skew long lines Internal three-state buffers Long Lines CLB CLB General Purpose Switch Matrix CLB CLB
27
Interconnect Single-length, double-length, and long lines
Clock buffers and dedicated long lines Global set/reset and global three-state Each channel has three types of general interconnect, which are metal segments of different lengths. The clock buffers drive dedicated routing resources to distribute clocks with high speed and low skew. Global initialization of the flip-flops and disabling of the outputs uses dedicated resources and do not interfere with logic routing.
28
Fast Direct Interconnect
Direct connections from CLB to adjacent CLB or IOB Fastest interconnect < 1 ns delay Carry logic uses direct interconnect
29
Flexible General-Purpose Interconnect
Flexible but slow if crosses many channels Programmable switch matrix at each channel crossing Connects across, changes direction or fans out Single-Length lines Double-Length lines skip every other switch matrix
30
Switch Matrix Bidirectional pass transistors High routing flexibility
31
Reduce Fanout Higher fanout nets (>16 loads) are harder to route & slower Consider duplicating source in schematic to improve routing or speed fn1 fn1 D Q D Q fn1 D Q
32
Long Lines for High Fanout Nets
Metal lines that traverse length & width of chip Lowest skew Ideal for high fan-out signals Ideal for clocking Requires vertical or horizontal alignment of loads CLB CLB CLB CLB
33
Advantages of Vertical Orientation
Bidirectional data bus lines run horizontally Enable lines run vertically Large registered functions align vertically Clock lines run vertically Most non-clock, non-BUFT long lines run vertically Carry logic runs vertically D Q D Q D Q D Q D Q D Q D Q D Q
34
Use Global Clock Buffers
Use clock buffers for highest fanout clocks Drive high-speed long line resources <2ns skew across a device No internal hold times Use generic BUFG primitive Allows software to choose best type of buffer Allows easy migration across families XC4000/Spartan has eight clock buffers, but four is the recommended limit for the best placement & routing. Routing delays from the clock buffers to the flip-flops are balanced to reduce skew. BUFG Primitive Technology-independent Uses BUFGS as first choice (XC4000/Spartan)
35
Using a Clock Generated Off-Chip
Connect IPAD directly to clock buffer primitive Required for BUFGP Place & route uses special fast input pin Provides higher speed and uses fewer routing resources D IPAD BUFG
36
Internal Oscillator Oscillator used to generate configuration clock can be used after configuration as part of design +/- 50% frequency range Can be divided down to desired frequency range An internal oscillator provides the clock for the configuration logic, and can also be used after configuration as a user clock. It is not very precise but is useful as a heartbeat clock if needed.
37
Use BUFT for Buses BUFT references internal three-state buffers
Use to multiplex signals onto long routing lines to use as buses Multiplexer macros use lookup tables (M4_1E, etc.) _ENABLE_A _ENABLE_B A3 B3 BUS<3> A2 B2 BUFT BUS<2> A1 B1 BUS<1> A0 B0 BUS<0>
38
BUFT Output Net Never Floats
Cross-coupled inverters remember last value to insure that line never floats Valid signal is always read from output of BUFT No need to reference “keeper” circuit
39
Special Resources Arithmetic/counter carry logic
Wide decode or cascade functions Configuration Boundary scan (JTAG) This section describes some of the other resources in the architecture.
40
Carry Logic Use carry logic in CLBs to increase arithmetic speed
High density via serial implementation of carry Carry propagates in upward direction Use library’s carry-based macros (RPMs) or LogiBLOX synthesis carry CLB carry CLB carry CLB
41
Wide Decoders Decoder is effectively a dedicated wired-AND
4 decoder lines per edge Direct inputs from all IOBs on an edge Half as many general inputs Useful for address decoding
42
Using Wide Decoders Use DECODEx macro Diamond indicates open-drain
Can tie multiple outputs together Must use a PULLUP primitive DECODE8 A0 A1 A2 A3 A4 A5 A6 A7 O
43
Wide Wired-AND Using Three-State Buffers
WAND8 Use WANDx symbol I1 I2 I3 I4 I5 I6 I7 I8 O Underlying implementation A BUFT (Horizontal Long Line) B H
44
Configuration Schematic or HDL description is converted to a configuration file by the Xilinx development system Configuration file is loaded into FPGA on power-up Stored in configuration latches Controls CLBs, IOBs, interconnect, etc. Configuration is the process of programming the FPGA. The programming file is often maintained in a PROM on the board and loaded into the FPGA on power-up.
45
Configuration Bitstream
Binary programming file Length depends only on device, not utilization Typically 1 ms per bit (total from a few ms to <1s) FPGA can load its configuration automatically on power-up, or under microprocessor control Can be loaded directly into device/configuration PROM The programming file is called a bitstream. The FPGA programs very quickly after power-up.
46
Configuration Modes Bit-serial configuration
Simple, uses few device pins Controlled by FPGA (Master) or externally (Slave) Xilinx Serial PROMs available Byte-parallel configuration Can drive PROM addresses (Master) Can be microprocessor-controlled (Peripheral) The user can select one of several configuration methods, according to the needs of the system. The Xilinx device can program itself from an external serial or parallel PROM, or be programmed under microprocessor control. Note that parallel configuration modes are not available in the Spartan Series.
47
Configuration Pins Configuration starts on power-up
Mode pin(s) checked to determine method Usable as extra I/O after configuration All I/O not used for configuration are disabled Reconfiguration possible by pulling PROGRAM pin Low No partial configuration Three MODE pins on the device are driven high or low at power-up to determine the configuration mode. At power-up and during configuration, all I/O pins are disables and all flip-flops are initialized. The device can be re-programmed by pulling the PROGRAM pin low.
48
Readback Configuration data can be read back serially
Allows verification of programming Readback data can include user-register values Allows in-circuit functional verification Requires READBACK symbol RIP DATA TRIG CLK READBACK
49
Boundary Scan IEEE 1149.1-compatible boundary scan (JTAG)
Available before configuration Configuration & readback possible via boundary scan logic IEEE compatible boundary scan is provided to simplify board-level testing.
50
Power Consumption CMOS SRAM technology provides low standby power
Operating power is mostly dynamic Proportional to transition frequency of internal nodes Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power FPGA power is almost entirely due to switching of capacitive metal. Xilinx segmented interconnect minimizes the amount of metal used to create a net, which also minimizes power requirements.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.