Introduction To VIRTEX II Architecture Presented By: Ankur Agarwal.

Slides:



Advertisements
Similar presentations
Basic HDL Coding Techniques
Advertisements

ECE 506 Reconfigurable Computing ece. arizona
Lecture 15 Finite State Machine Implementation
Spartan-3 FPGA HDL Coding Techniques
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
Xilinx FPGAs:Evolution and Revolution. Evolution results in bigger, faster, cheaper FPGAs; better software with fewer bugs, faster compile times; coupled.
Basic FPGA Architecture © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-II Architecture Virtex™-II architecture’s core voltage.
© 2003 Xilinx, Inc. All Rights Reserved Architecture Wizard and PACE FPGA Design Flow Workshop Xilinx: new module Xilinx: new module.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Virtex-II Architecture. Virtex-II/Spartan-III 2 Outline CLB Resources Memory and Multipliers I/O Resources Clock Resources.
Implementing Logic Gates and Circuits Discussion D5.1.
ECE 448 Lecture 7 FPGA Devices
Digital Design using VHDL and Xilinx FPGA.
Implementing Logic Gates and Circuits Discussion D5.3 Section 11-2.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Evolution of implementation technologies
Programmable logic and FPGA
CMPUT Computer Organization and Architecture II1 CMPUT329 - Fall 2003 Topic: Internal Organization of an FPGA José Nelson Amaral.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
EE4OI4 Engineering Design Programmable Logic Technology.
Unit 9 Multiplexers, Decoders, and Programmable Logic Devices
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Spartan-II Memory Controller For QDR SRAMs Lobby Pitch February 2000 ®
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
J. Christiansen, CERN - EP/MIC
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
Advance Digital Design Hassan Bhatti, Lecture 10.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Architecture and Features
® Spartan-II High Volume Solutions Overview. ® High Performance System Features Software and Cores Smallest Die Size Lowest Possible Cost.
® Additional Spartan-XL Features. ® Family Highlights  Spartan (5.0 Volt) family introduced in Jan. 98 —Fabricated on advanced 0.5µ process.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
© 2003 Xilinx, Inc. All Rights Reserved Synchronous Design Techniques.
ECE 448 Lecture 6 FPGA devices
BR 1/991 Issues in FPGA Technologies Complexity of Logic Element –How many inputs/outputs for the logic element? –Does the basic logic element contain.
ECE 448: Spring 11 Lab 3 Part 2 Finite State Machines.
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Basic FPGA Architecture FPGA Design Flow Workshop.
® /1 The E is the Edge. ® /2 Density Leadership Virtex XCV1000 Density (system gates) 10M Gates In 2002 Virtex-E.
This material exempt per Department of Commerce license exception TSU Synchronous Design Techniques.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Survey of Reconfigurable Logic Technologies
FPGA: Field Programmable Gate Array
George Mason University ECE 448 – FPGA and ASIC Design with VHDL FPGA Devices ECE 448 Lecture 5.
Redefining the FPGA. SSTL3 1x CLK 2x CLK LVTTL LVCMOS GTL+ Virtex as a System Component 2x CLK SDRAM Backplane Logic Translators Custom Logic Clock Mgmt.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Basic FPGA Architecture
Issues in FPGA Technologies
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Electronics for Physicists
Spartan FPGAs مرتضي صاحب الزماني.
ASP-H Clocks John DeHart Applied Research Laboratory Computer Science and Engineering Department
Introduction.
Field Programmable Gate Array
Field Programmable Gate Array
The Xilinx Virtex Series FPGA
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Block Diagrams 1.
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family)
Xilinx FPGA Architecture Overview
The Xilinx Virtex Series FPGA
Basic FPGA Architecture
Electronics for Physicists
Introduction.
FPGA’s 9/22/08.
Presentation transcript:

Introduction To VIRTEX II Architecture Presented By: Ankur Agarwal

Xilinx Design Flow Translate Map Place & Route Plan & BudgetHDL RTL Simulation Synthesize to create netlist Functional Simulation Create Bit File Attain Timing Closure Timing Simulation Implement Create Code/ Schematic

Xilinx Architecture features High performance at 2.5, 3.3V and 5V Technology Independence EDIF, VHDL, Verilog, SDF Interface Footprint compatibility Devices with each family are compatible with each other Pin locking

VIRTEX Up to 2 Million System Gates at 100+ MHz Features: Distributed and Block RAM available Low Power Delay Logic Loops 2.5V Internal Operation with support of common power

Naming Conventions XC4028XL-3-BG256 Sub-Family (3V = XL, 5V = no XL) Package Speed Grade No. of Gates Family (4000, 9500) Spartan starts with XCS

CPLD and FPGA ArchitecturePAL/22V10-like Gate array-like More CombinationalMore Registers + RAM DensityLow-to-medium Medium-to-high K logic gates 1K to 3.2M system gates PerformancePredictable timing Application dependent Up to 250 MHz today Up to 200 MHz Interconnect“Crossbar Switch” Incremental Complex Programmable Logic Device (CPLD) Field-Programmable Gate Array (FPGA)

Overview of Xilinx FPGA Architecture Programmable Interconnect I/O Blocks (IOBs) Configurable Logic Blocks (CLBs) Tristate Buffers Global Resources

Block Diagram of VIRTEX-II Architecture 18Kb BRAM CAM Multiplier BLVDS Backplane PCI-X DDR CAM QDR SRAM DDR SDRAM Distri RAM LVDS Shift Registers DCM FIFO PCI SONET / SDH

CLB Resources Basic resource unit is the Logic Cell 1 CLB contains Logic Cells, depending on device family Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop LUT capacity limited by number of inputs, not complexity of function LUTs can be used as ROM or synchronous RAM Flip-flop can be configured as a transparent latch in Virtex and Spartan-II LUTFF

Closer Look at a CLB Structure Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

Interconnect Technology Offered by VIRTEX-II Interconnect an array of switch matrices All Virtex II features can access routing resources through the switch matrix Simplify design and place & route Switch Matrix CLB Switch Matrix IOB Switch Matrix DCM Switch Matrix Switch Matrix Switch Matrix 18Kb BRAM Switch Matrix MULT 18x18

Simplified SLICE Structure Each Slice has four outputs: Two registered outputs Two non-registered outputs Two BUFTs associated, accessible by all 16 CLB outputs Carry Logic for fast addition Two independent carry chain per CLB

Fast Carry Logic Each CLB contains separate logic and routing for the fast generation of carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources LSB MSB Carry Logic Routing

CLB (Configurable Logic Blocks) CIN Switch Matrix TBUF COUT Slice S0 X0Y0 Slice S1 X0Y1 Fast Connects Slice S2 X1Y0 Slice S3 X1Y1 CIN SHIFT Each CLB is connected to one switch matrix Providing access to general routing resources High level of logic integration  Wide-input functions: —16:1 multiplexer in 1 CLB or any function —32:1 multiplixer in 2 CLBs (1 level of LUT)  Fast arithmetic functions —2 look-ahead carry chains per CLB column  Addressable shift registers in LUT —16-b shift register in 1 LUT —128-b shift register in 1 CLB (dedicated shift chain)

Four-Input LUT Implements combinatorial logic Any 4-input logic function Cascaded for wide-input functions Truth Table LUT = 4-input logic function CDCD Z ABAB

Multiplexers MUXF5 combines 2 LUTs to create 4x1 multiplexer Or any 5-input function (LUT5) Or selected functions up to 9 inputs MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function (LUT6) Or selected functions up to 19 inputs Dedicated muxes are faster and more space efficient CLB MUXF6 Slice LUT MUXF5 Slice LUT MUXF5

CLB Multiplexers CLB Multiplexer Location F5 F8 F5 F6 CLB Slice S3 Slice S2 Slice S0 Slice S1 F5 F7 F5 F6 MUXF8 combines the 2 MUXF7 outputs (Two CLB) MUXF6 combines Slices X1Y0 & X1Y1 MUXF7 combines the 2 MUXF6 outputs MUXF6 combines Slices X0Y0 & X0Y1

Horizontal Cascade Chain Wide AND-OR functions (Sum Of Products) SOP Slice S0 Slice S1 Slice S2 Slice S3 CLB Slice S0 Slice S1 Slice S2 Slice S3 CLB Slice S0 Slice S1 Slice S2 Slice S3 CLB SOP CY ORCY SOP

Shift Register DQ CE DQ DQ DQ LUT IN CE CLK DEPTH[3:0] OUT LUT = Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth

Shift Register 64 Operation A 4 Cycles8 Cycles Operation B 3 Cycles Operation C Cycles 3 Cycles 9-Cycle imbalance Register FPGA Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality

Shift Register Look-Up Table High density integration of shift registers DSP applications use SRL16 for delay matching CDMA wireless and video applications require shift registers Multiple SRLC16 cascadable to any length

Digital Clock Manager High-Speed 420 MHz clock generation: Clock de-skew on-chip and off-chip

Digital Clock Manager: DCM Delay-Locked Loop Clock phase de-skew Duty cycle correction Temperature compensation RST input LOCKED output Attributes: DUTY_CYCLE_CORRECTION DLL_FREQUENCY_MODE CLKDV_DIVIDE = 1.5 to 16.0 STARTUP_WAIT CLK_FEEDBACK = CLK0 or CLK2X Up to 4 clock outputs per DCM CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLKDV LOCKED CLKFX180 PSEN CLKFX PSDONE CLK2X180 PSINCDEC STATUS[7:0] DSSEN PSCLK CLK2X DCM Clock signal Control signal

Advanced Frequency Synthesis Frequency Synthesis CLKFX is any M / D product of CLKIN frequency M = 2 to 32, D = 1 to 32 Default: M=4, D=1 (4X CLKIN) Always nominal 50/50 duty-cycle Attributes: CLKFX_MULTIPLY (integer) CLKFX_DIVIDE (integer) DFS_FREQUENCY_MODE After LOCKED: Freq CLKFX = (M/D) x Freq CLK IN CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLKDV LOCKED CLKFX180 PSEN CLKFX PSDONE CLK2X180 PSINCDEC STATUS[7:0] DSSEN PSCLK CLK2X DCM Clock signal Control signal

High Resolution Phase Shifting Fine Phase Shifting Applies to all CLK outputs Phase shift = fraction CLKIN period Fixed or variable modes Inputs in variable mode: PSINCDEC input =Increase /Decrease PSEN = Enable Phase Shift PSCLK synchronizes Phase Shift PSDONE output Attributes: CLOCKOUT_PHASE_SHIFT = NONE, FIXED, VARIABLE PHASE_SHIFT (signed integer) -255 to +255 CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLKDV LOCKED CLKFX180 PSEN CLKFX PSDONE CLK2X180 PSINCDEC STATUS[7:0] DSSEN PSCLK CLK2X DCM Clock signal Control signal

Up to 16 Dedicated Low Skew Clocks Global Clocks

Clock Distribution 16 Global Clock Multiplexers Eight on the top Eight on the bottom Switch “glitch free” from 1 clock to the other 8 Clocks selectable per quadrant 8 BUFGMUX 16 Clocks SE NE NW SW 8 BUFGMUX 8 max NW SW NE SW 16 Clocks 8 BUFGMUX Unused Branches are Disable (Power Saving)

Use Global Buffers to Reduce Clock Skew Global buffers are connected to dedicated routing. This routing network is balanced to minimize skew All Xilinx FPGAs have global buffers DQ CLK2 CLK1 BUFG DQ  Introduces clock skew between CLK1 and CLK2  Uses an extra BUFG to reduce skew on CLK2  Design contains 2 clock signals

Global Clocks: BUFGMUX Three modes: Clock buffer Low skew clock distribution BUFG primitive Clock enable Stop the clock High or Low BUFGCE (stop Low) Clock multiplexer “glitch-free” Switch from one clock to another BUFGMUX unrelated clocks BUFGMUX O I1 I0 S OI CE BUFGCE OI BUFG No pulse width shorter than 1/2 of the period

Memory Terabit Memory Continuum On-Chip SelectRAM TM Memory bytes 128x1 DSP Coefficients Small FIFOs CAM Shallow/Wide Distributed RAM kilobytes 18 kb Blocks Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory CAM Deep/Wide Block RAM megabytes Up to 400 Mbps/pin DDR & QDR External RAM/CAM

Embedded 18 kb Block RAM Up to 3 Mb on-chip block RAM High internal buffering bandwidth Reduced I/O count and more embedded memory

Distributed RAM CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual- Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0DPO DPRA1 DPRA2 DPRA3 or

18 x 18 Embedded Multiplier Fast arithmetic functions Optimized to implement multiply / accumulate modules

18 x 18 Multiplier Embedded 18-bit x 18-bit multiplier 2’s complement signed operation Multipliers are organized in columns 18 x 18 Multiplier Output (36 bits) Data_A (18 bits) Data_B (18 bits)

Basic I/O Block Structure

I/O Signal Types LVCMOSHSTLSSTL Single-Ended LVDSBus LVDSLVPECL Differential I/O Signal Type LVTTL NOTE: Only the popular IO types shown here

IOB: Double Data Rate Registers DDR registers can be clocked by Clock and not (clock) if the duty cycle is 50/50 CLK0 and CLK180 DLL outputs DATA_1 CLK DATA_2 Dual Data Rate D1AD1BD1C D2AD2BD2C D1AD2AD1BD2BD1C

Built-In HSTL II Support What is the advantage of using HSTL Class II? High-speed IO interface Bi-directional Double parallel termination Zo = 50  Vtt = 0.75V Vref = 0.75V R=50 

Digitally Controlled Impedance Dynamically adjusted termination resistors Provides drivers that matched to the impedance of the traces Provides on-chip termination Transmitter or receiver On-Chip termination advantages: No termination resistors on board Improve signal integrity by eliminating stub reflection Eliminates the need for source termination (single-ended I/O) Reduces board routing headaches and component count

Virtex-II Family: Four and Six Columns Block RAM & Multiplier Device XC2V250

Virtex-II Family Members 6 Columns BRAM & Multipliers 4 Columns BRAM & Multipliers 2 Columns BRAM & Multipliers

VIRTEX-II Packaging FF and BF are flip-chip ball grid arrays packages Pinout compatibility inside same color rectangle