Introduction To VIRTEX II Architecture Presented By: Ankur Agarwal
Xilinx Design Flow Translate Map Place & Route Plan & BudgetHDL RTL Simulation Synthesize to create netlist Functional Simulation Create Bit File Attain Timing Closure Timing Simulation Implement Create Code/ Schematic
Xilinx Architecture features High performance at 2.5, 3.3V and 5V Technology Independence EDIF, VHDL, Verilog, SDF Interface Footprint compatibility Devices with each family are compatible with each other Pin locking
VIRTEX Up to 2 Million System Gates at 100+ MHz Features: Distributed and Block RAM available Low Power Delay Logic Loops 2.5V Internal Operation with support of common power
Naming Conventions XC4028XL-3-BG256 Sub-Family (3V = XL, 5V = no XL) Package Speed Grade No. of Gates Family (4000, 9500) Spartan starts with XCS
CPLD and FPGA ArchitecturePAL/22V10-like Gate array-like More CombinationalMore Registers + RAM DensityLow-to-medium Medium-to-high K logic gates 1K to 3.2M system gates PerformancePredictable timing Application dependent Up to 250 MHz today Up to 200 MHz Interconnect“Crossbar Switch” Incremental Complex Programmable Logic Device (CPLD) Field-Programmable Gate Array (FPGA)
Overview of Xilinx FPGA Architecture Programmable Interconnect I/O Blocks (IOBs) Configurable Logic Blocks (CLBs) Tristate Buffers Global Resources
Block Diagram of VIRTEX-II Architecture 18Kb BRAM CAM Multiplier BLVDS Backplane PCI-X DDR CAM QDR SRAM DDR SDRAM Distri RAM LVDS Shift Registers DCM FIFO PCI SONET / SDH
CLB Resources Basic resource unit is the Logic Cell 1 CLB contains Logic Cells, depending on device family Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop LUT capacity limited by number of inputs, not complexity of function LUTs can be used as ROM or synchronous RAM Flip-flop can be configured as a transparent latch in Virtex and Spartan-II LUTFF
Closer Look at a CLB Structure Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs
Interconnect Technology Offered by VIRTEX-II Interconnect an array of switch matrices All Virtex II features can access routing resources through the switch matrix Simplify design and place & route Switch Matrix CLB Switch Matrix IOB Switch Matrix DCM Switch Matrix Switch Matrix Switch Matrix 18Kb BRAM Switch Matrix MULT 18x18
Simplified SLICE Structure Each Slice has four outputs: Two registered outputs Two non-registered outputs Two BUFTs associated, accessible by all 16 CLB outputs Carry Logic for fast addition Two independent carry chain per CLB
Fast Carry Logic Each CLB contains separate logic and routing for the fast generation of carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources LSB MSB Carry Logic Routing
CLB (Configurable Logic Blocks) CIN Switch Matrix TBUF COUT Slice S0 X0Y0 Slice S1 X0Y1 Fast Connects Slice S2 X1Y0 Slice S3 X1Y1 CIN SHIFT Each CLB is connected to one switch matrix Providing access to general routing resources High level of logic integration Wide-input functions: —16:1 multiplexer in 1 CLB or any function —32:1 multiplixer in 2 CLBs (1 level of LUT) Fast arithmetic functions —2 look-ahead carry chains per CLB column Addressable shift registers in LUT —16-b shift register in 1 LUT —128-b shift register in 1 CLB (dedicated shift chain)
Four-Input LUT Implements combinatorial logic Any 4-input logic function Cascaded for wide-input functions Truth Table LUT = 4-input logic function CDCD Z ABAB
Multiplexers MUXF5 combines 2 LUTs to create 4x1 multiplexer Or any 5-input function (LUT5) Or selected functions up to 9 inputs MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function (LUT6) Or selected functions up to 19 inputs Dedicated muxes are faster and more space efficient CLB MUXF6 Slice LUT MUXF5 Slice LUT MUXF5
CLB Multiplexers CLB Multiplexer Location F5 F8 F5 F6 CLB Slice S3 Slice S2 Slice S0 Slice S1 F5 F7 F5 F6 MUXF8 combines the 2 MUXF7 outputs (Two CLB) MUXF6 combines Slices X1Y0 & X1Y1 MUXF7 combines the 2 MUXF6 outputs MUXF6 combines Slices X0Y0 & X0Y1
Horizontal Cascade Chain Wide AND-OR functions (Sum Of Products) SOP Slice S0 Slice S1 Slice S2 Slice S3 CLB Slice S0 Slice S1 Slice S2 Slice S3 CLB Slice S0 Slice S1 Slice S2 Slice S3 CLB SOP CY ORCY SOP
Shift Register DQ CE DQ DQ DQ LUT IN CE CLK DEPTH[3:0] OUT LUT = Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth
Shift Register 64 Operation A 4 Cycles8 Cycles Operation B 3 Cycles Operation C Cycles 3 Cycles 9-Cycle imbalance Register FPGA Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality
Shift Register Look-Up Table High density integration of shift registers DSP applications use SRL16 for delay matching CDMA wireless and video applications require shift registers Multiple SRLC16 cascadable to any length
Digital Clock Manager High-Speed 420 MHz clock generation: Clock de-skew on-chip and off-chip
Digital Clock Manager: DCM Delay-Locked Loop Clock phase de-skew Duty cycle correction Temperature compensation RST input LOCKED output Attributes: DUTY_CYCLE_CORRECTION DLL_FREQUENCY_MODE CLKDV_DIVIDE = 1.5 to 16.0 STARTUP_WAIT CLK_FEEDBACK = CLK0 or CLK2X Up to 4 clock outputs per DCM CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLKDV LOCKED CLKFX180 PSEN CLKFX PSDONE CLK2X180 PSINCDEC STATUS[7:0] DSSEN PSCLK CLK2X DCM Clock signal Control signal
Advanced Frequency Synthesis Frequency Synthesis CLKFX is any M / D product of CLKIN frequency M = 2 to 32, D = 1 to 32 Default: M=4, D=1 (4X CLKIN) Always nominal 50/50 duty-cycle Attributes: CLKFX_MULTIPLY (integer) CLKFX_DIVIDE (integer) DFS_FREQUENCY_MODE After LOCKED: Freq CLKFX = (M/D) x Freq CLK IN CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLKDV LOCKED CLKFX180 PSEN CLKFX PSDONE CLK2X180 PSINCDEC STATUS[7:0] DSSEN PSCLK CLK2X DCM Clock signal Control signal
High Resolution Phase Shifting Fine Phase Shifting Applies to all CLK outputs Phase shift = fraction CLKIN period Fixed or variable modes Inputs in variable mode: PSINCDEC input =Increase /Decrease PSEN = Enable Phase Shift PSCLK synchronizes Phase Shift PSDONE output Attributes: CLOCKOUT_PHASE_SHIFT = NONE, FIXED, VARIABLE PHASE_SHIFT (signed integer) -255 to +255 CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLKDV LOCKED CLKFX180 PSEN CLKFX PSDONE CLK2X180 PSINCDEC STATUS[7:0] DSSEN PSCLK CLK2X DCM Clock signal Control signal
Up to 16 Dedicated Low Skew Clocks Global Clocks
Clock Distribution 16 Global Clock Multiplexers Eight on the top Eight on the bottom Switch “glitch free” from 1 clock to the other 8 Clocks selectable per quadrant 8 BUFGMUX 16 Clocks SE NE NW SW 8 BUFGMUX 8 max NW SW NE SW 16 Clocks 8 BUFGMUX Unused Branches are Disable (Power Saving)
Use Global Buffers to Reduce Clock Skew Global buffers are connected to dedicated routing. This routing network is balanced to minimize skew All Xilinx FPGAs have global buffers DQ CLK2 CLK1 BUFG DQ Introduces clock skew between CLK1 and CLK2 Uses an extra BUFG to reduce skew on CLK2 Design contains 2 clock signals
Global Clocks: BUFGMUX Three modes: Clock buffer Low skew clock distribution BUFG primitive Clock enable Stop the clock High or Low BUFGCE (stop Low) Clock multiplexer “glitch-free” Switch from one clock to another BUFGMUX unrelated clocks BUFGMUX O I1 I0 S OI CE BUFGCE OI BUFG No pulse width shorter than 1/2 of the period
Memory Terabit Memory Continuum On-Chip SelectRAM TM Memory bytes 128x1 DSP Coefficients Small FIFOs CAM Shallow/Wide Distributed RAM kilobytes 18 kb Blocks Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory CAM Deep/Wide Block RAM megabytes Up to 400 Mbps/pin DDR & QDR External RAM/CAM
Embedded 18 kb Block RAM Up to 3 Mb on-chip block RAM High internal buffering bandwidth Reduced I/O count and more embedded memory
Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual- Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0DPO DPRA1 DPRA2 DPRA3 or
18 x 18 Embedded Multiplier Fast arithmetic functions Optimized to implement multiply / accumulate modules
18 x 18 Multiplier Embedded 18-bit x 18-bit multiplier 2’s complement signed operation Multipliers are organized in columns 18 x 18 Multiplier Output (36 bits) Data_A (18 bits) Data_B (18 bits)
Basic I/O Block Structure
I/O Signal Types LVCMOSHSTLSSTL Single-Ended LVDSBus LVDSLVPECL Differential I/O Signal Type LVTTL NOTE: Only the popular IO types shown here
IOB: Double Data Rate Registers DDR registers can be clocked by Clock and not (clock) if the duty cycle is 50/50 CLK0 and CLK180 DLL outputs DATA_1 CLK DATA_2 Dual Data Rate D1AD1BD1C D2AD2BD2C D1AD2AD1BD2BD1C
Built-In HSTL II Support What is the advantage of using HSTL Class II? High-speed IO interface Bi-directional Double parallel termination Zo = 50 Vtt = 0.75V Vref = 0.75V R=50
Digitally Controlled Impedance Dynamically adjusted termination resistors Provides drivers that matched to the impedance of the traces Provides on-chip termination Transmitter or receiver On-Chip termination advantages: No termination resistors on board Improve signal integrity by eliminating stub reflection Eliminates the need for source termination (single-ended I/O) Reduces board routing headaches and component count
Virtex-II Family: Four and Six Columns Block RAM & Multiplier Device XC2V250
Virtex-II Family Members 6 Columns BRAM & Multipliers 4 Columns BRAM & Multipliers 2 Columns BRAM & Multipliers
VIRTEX-II Packaging FF and BF are flip-chip ball grid arrays packages Pinout compatibility inside same color rectangle