Xilinx FPGA Architecture Overview

Slides:



Advertisements
Similar presentations
Lecture 15 Finite State Machine Implementation
Advertisements

Spartan-3 FPGA HDL Coding Techniques
FPGA Configuration. Introduction What is configuration? – Process for loading data into the FPGA Configuration Data Source Configuration Data Source FPGA.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
® Xilinx FPGA Architecture Overview. ® Virtex/Spartan-II Top-level Architecture  Gate-array like architecture  Configurable logic blocks.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Implementing Logic Gates and Circuits Discussion D5.1.
Implementing Logic Gates and Circuits Discussion D5.3 Section 11-2.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Programmable logic and FPGA
Configuration. Mirjana Stojanovic Process of loading bitstream of a design into the configuration memory. Bitstream is the transmission.
CMPUT Computer Organization and Architecture II1 CMPUT329 - Fall 2003 Topic: Internal Organization of an FPGA José Nelson Amaral.
General FPGA Architecture Field Programmable Gate Array.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Architecture and Features
® Additional Spartan-XL Features. ® Family Highlights  Spartan (5.0 Volt) family introduced in Jan. 98 —Fabricated on advanced 0.5µ process.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Basic Sequential Components CT101 – Computing Systems Organization.
Tools - Design Entry - Chapter 4 slide 1 FPGA Tools Course Design Entry.
“Supporting the Total Product Life Cycle”
® /1 The E is the Edge. ® /2 Density Leadership Virtex XCV1000 Density (system gates) 10M Gates In 2002 Virtex-E.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Redefining the FPGA. SSTL3 1x CLK 2x CLK LVTTL LVCMOS GTL+ Virtex as a System Component 2x CLK SDRAM Backplane Logic Translators Custom Logic Clock Mgmt.
Introduction to the FPGA and Labs
XC5200 Series A low cost gate array alternative Gate Array
Issues in FPGA Technologies
Sequential Logic Design
COMP541 Memories II: DRAMs
Memories.
Lecture 15 Sequential Circuit Design
REGISTER TRANSFER LANGUAGE (RTL)
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
XILINX FPGAs Xilinx lunched first commercial FPGA XC2000 in 1985
ECE 4110–5110 Digital System Design
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
KU College of Engineering Elec 204: Digital Systems Design
Memory Units Memories store data in units from one to eight bits. The most common unit is the byte, which by definition is 8 bits. Computer memories are.
Dr. Michael Nasief Lecture 2
Chapter 11 Sequential Circuits.
Interfacing Memory Interfacing.
Field Programmable Gate Array
Field Programmable Gate Array
COOLRUNNER II REAL DIGITAL CPLD
Field Programmable Gate Array
We will be studying the architecture of XC3000.
Chapter 13 – Programmable Logic Device Architectures
Xilinx FPGA Architecture
AT91 Memory Interface This training module describes the External Bus Interface (EBI), which generatesthe signals that control the access to the external.
The Xilinx Virtex Series FPGA
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Timing Analysis 11/21/2018.
FPGA Tools Course Answers
Graphics Hardware: Specialty Memories, Simple Framebuffers
Registers.
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family)
Overview Last lecture Digital hardware systems Today
The Xilinx Virtex Series FPGA
"Computer Design" by Sunggu Lee
CSC3050 – Computer Architecture
Optimizing RTL for EFLX Tony Kozaczuk, Shuying Fan December 21, 2016
Implementing Logic Gates and Circuits
FPGA’s 9/22/08.
Presentation transcript:

Xilinx FPGA Architecture Overview This customer presentation provides an overview of the Xilinx FPGA architecture. The example used is the Virtex/Spartan-II architecture, but the information is generic enough for any Xilinx FPGA family.

Virtex/Spartan-II Top-level Architecture Gate-array like architecture Configurable logic blocks Implement logic here! I/O blocks 16 signal standards Block RAM On-chip memory for higher performance Clocks & Delay-Locked Loop Interconnect resources Three-state internal buses

Logic Cell Capacity A better first-order alternative to gate counting Better comparisons among different FPGAs Logic cell definition: 4-input look-up table + dedicated flip-flop Logic cells per CLB: Xc4000/Spartan 2.375 (2 4-LUTs, 1 3-LUT, 2 FFs) Virtex/Spartan-II 4.5 (4 4-LUTs, 1 F5MUX, 4 FFs) Counting CLBs provides a method of comparing different members of a common family, but counting logic cells is more effective to compare across different architectures. A logic cell is the combination of a lookup table and a flip-flop, which is the common building block of all leading FPGAs.

Configurable Logic Block (CLB) Combinational logic generated in a lookup table (LUT) Any function of available inputs LUT output feeds CLB output or D input of flip-flop Combinational Logic Function (LUT) Flip- Flop Inputs Outputs

Virtex/Spartan-II Function Generators Four 4-input function generators Independent inputs (4 functions of 4 inputs) MUXF5 combines 2 LUTs to form 4x1 multiplexer Or any 5-input function MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function CLB Slice LUT MUXF6 LUT MUXF5 Slice LUT LUT MUXF5

Lookup Table Generates any function of its inputs Typically 4 inputs Logically equivalent to a 16 x 1 ROM Inputs Output 0000 0 0001 1 0010 1 0011 0 LUT

Targeting LUT-based Logic LUT limit is on inputs, not complexity Reducing inputs/function (fan-in) to fit CLBs improves density and speed Automatically done by Xilinx synthesis and implementation tools Inverters are free CLB Lookup Table

Duplicating Logic Can Improve Results Collapsing of logic into CLBs affects number of levels required and therefore speed The gates you use will determine mapping Nets with a fanout >1 may be outside a CLB I1 N1 must go to two places, so O1 may require a second level of logic Duplicating first gate allows N1A to always be collapsed inside a single lookup table O1 N1 N1A N1B

Defining Lookup Tables With Gate Primitives Example of gate primitive Up to five inputs with all combinations of inversion AND2B1 indicates 1 “bubbled” or inverted input Up to nine inputs non-inverted Add external INV primitives if desired AND2

Flip-Flops Stores data (D) on rising edge of clock (K) Clock enable (CE) Asynchronous clear (C) K CE C D Q X x 1 x 0 1 0 d d 0 x 0 x q D K Q C CE

Additional Flip-Flop Controls Reset (Clear) and/or Set Global initialization (GSR) Use to initialize all flip-flops Programmable clock polarity Clock enable can be left unconnected

Virtex/Spartan-II CLB Slice 1 CLB holds 2 slices Each slice has two sets of Four-input LUT Any 4-input logic function Or 16-bit x 1 RAM Or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or Async. Control

Dedicated Multiplier Logic Highly efficient ‘Shift & Add’ implementation For a 16x16 multiplier 30% reduction in area 1 less logic level

On-chip RAM All Xilinx FPGAs use RAM-based programming Adding Write Enable to LUT creates on-chip SelectRAM memory

SelectRAM Benefits Single-Port Dual-Port Synchronous Simple timing Data Write Enable Write Clock Address Output Data Write Enable Write Clock Write Address/ Single-Port Read Address Single-Port Output Dual-Port Dual-Port Read Address

Memory Bandwidth and Flexibility Virtex/Spartan-II On-Chip SelectRAM+ Memory Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory Deep/Wide SDRAM ZBTRAM SSRAM SGRAM DSP Coefficients Small FIFOs Shallow/Wide 4Kx1 2Kx2 1Kx4 512x8 256x16 16x1 Distributed RAM Block RAM External RAM bytes kilobytes megabytes 200 MHz Memory Continuum

Spartan-II Memory CLB LUTs provide small distributed RAM (16 bits/LUT) Block RAM provides 4K bits each Dual read/write port. Each port has… Independent Clock, R/W, and Enable Independently configurable data width from 4K x 1 to 256 x 16 W Port A Spartan-II Dual-R/W Port Block RAM Port B R R W W W R R

I/O Block (IOB) Periphery of identical I/O blocks Input, output, or bi-directional Direct or registered (or latched input) Pullup/Pulldown Programmable slew rate Three-state output Programmable thresholds IOB I Pad O TS Bonded to Package Pin Clocks

Use Special IOB Primitives User explicitly defines what resources in the IOB are to be used I/Os are defined with 1 pad primitive At least 1 function primitive 1 input element, 1 output element or both Inverters may also be pulled into IOBs IPAD IBUF

Locking Down I/O Locations LOC=Pxx attribute defines I/O pad location(s) Avoid locking IOBs early Makes routing more difficult Use IOB LOC= to lock pins late in design cycle once PCB is built Can lock IOBs if floorplanning the connected CLBs

Use Pullups/Pulldowns Pullup automatically connected on unused IOBs User can specify PULLUP or PULLDOWN primitive on used IOBs Inputs should not be left floating Add Pullup to design inputs that may be left floating to reduce power and noise IPAD IBUF

Faster Setup With NODELAY Delay included by default Compensates for clock routing delay to prevent hold time NODELAY attribute removes delay element Creates hold time Example IOB External Data External Clock Routed Clock Pad Q D Delay External Data X Input Buffer Delay Data X External Clock Routing Delay Pad

Slew Rate Control Slew rate controls output speed Default slow slew rate reduces noise & ground bounce Use fast slew rate wherever speed is important FAST parameter on output logic primitive FAST OPAD OBUF

Output Three-State Control Free inverter on output buffer control Use OBUFE macro for active-high enable Use OBUFT primitive for active-low enable OE OBUFE T OBUFT

Global Three-State 3-state control either local and/or via a dedicated global net Global three-state controlled by STARTUP... primitive STARTUP GTS GSR

Virtex/Spartan-II I/O Block (Simplified)

Multiple I/O Interface Standards 16 to 20 I/O interface standards supported CMOS, HSTL, SSTL, GTL, CTT, PCI As many as eight banks on a device Package dependent Different banks can support different standards at the same time Logic level translation Boards with mixed standards

High Performance Routing Hierarchical Routing Singles, Hexes, Longs Sparse connections on longer interconnects for high speed Routing delay depends primarily on distance Direction independent Device-size independent Predictable for early design analysis Vector Based Interconnect 2ns 2ns 2ns 2ns CLB Array

Flexible General-Purpose Interconnect Flexible but slow if crosses many channels Programmable switch matrix at each channel crossing Connects across, changes direction or fans out

Switch Matrix Bidirectional pass transistors High routing flexibility

Reduce Fanout Higher fanout nets (>16 loads) are harder to route & slower Consider duplicating source in schematic to improve routing or speed D Q fn1

Long Lines for High Fanout Nets Metal lines that traverse length & width of chip Lowest skew Ideal for high fan-out signals Ideal for clocking Requires vertical or horizontal alignment of loads CLB

Internal Three-State Buses Two 3-state drivers per CLB OR-AND logic implementation in place of 3-state drivers With no drivers enabled, bus is a logic 1 Low power No danger of contention when multiple BUFTs enabled No physical pullups or large capacitance to drive

General Clock Support Use clock buffers for highest fanout clocks Drive high-speed long line resources Lowest skew across a device No internal hold times Use generic BUFG primitive Allows software to choose best type of buffer Allows easy migration across families Four dedicated global low skew buffers Dedicated input pin (clock distribution only) Additional shared resources (i.e., long lines) Distribute low-skew/high-fanout signals (10ns max.) Four delay-locked loops on each device All-digital implementation Two global buffers associated with each DLL pair

Configuration Schematic or HDL description is converted to a configuration file by the Xilinx development system Configuration file is loaded into FPGA on power-up Stored in configuration latches Controls CLBs, IOBs, interconnect, etceteras Configuration is the process of programming the FPGA. The programming file is often maintained in a PROM on the board and loaded into the FPGA on power-up.

Configuration Bitstream Binary programming file Length depends only on device, not utilization Typically 1 ms per bit (total from a few ms to <1s) FPGA can load its configuration automatically on power-up, or under microprocessor control Can be loaded directly into device/configuration PROM The programming file is called a bitstream. The FPGA programs very quickly after power-up.

Configuration Modes Bit-serial configuration Simple, uses few device pins Controlled by FPGA (Master) or externally (Slave) Xilinx serial proms available Byte-parallel configuration Can drive PROM addresses (Master) Can be microprocessor-controlled The user can select one of several configuration methods, according to the needs of the system. The Xilinx device can program itself from an external serial or parallel PROM, or be programmed under microprocessor control. Note that parallel configuration modes are not available in the Spartan Series.

Configuration Pins Configuration starts on power-up Mode pin(s) checked to determine method Usable as extra I/O after configuration All I/O not used for configuration are disabled Reconfiguration possible by pulling PROGRAM pin low Three MODE pins on the device are driven high or low at power-up to determine the configuration mode. At power-up and during configuration, all I/O pins are disables and all flip-flops are initialized. The device can be re-programmed by pulling the PROGRAM pin low.

Readback Configuration data can be read back serially Allows verification of programming Readback data can include user-register values Allows in-circuit functional verification Requires READBACK... symbol RIP DATA TRIG CLK READBACK

Boundary Scan IEEE 1149.1-compatible boundary scan (JTAG) Available before configuration Configuration & readback possible via boundary scan logic IEEE 1149.1-compatible boundary scan is provided to simplify board-level testing.

Power Consumption CMOS SRAM technology provides low standby power Operating power is mostly dynamic Proportional to transition frequency of internal nodes Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power FPGA power is almost entirely due to switching of capacitive metal. Xilinx segmented interconnect minimizes the amount of metal used to create a net, which also minimizes power requirements.