Designing for 100+ MHz with Xilinx Virtex. 1999 Designs Demand...  Higher system speed  Higher integration —smaller size, less power, better reliability.

Slides:



Advertisements
Similar presentations
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Advertisements

Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Accelerating DRAM Performance
JAZiO ™ IncorporatedPlatform JAZiO ™ Supplemental SupplementalInformation.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
Reconfigurable Computing - Clocks John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia.
Clock Design Adopted from David Harris of Harvey Mudd College.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 8: Clock Distribution, PLL & DLL
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Evolution of implementation technologies
Programmable logic and FPGA
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
EE4OI4 Engineering Design Programmable Logic Technology.
Highest Performance Programmable DSP Solution September 17, 2015.
The GANDALF Multi-Channel Time-to-Digital Converter (TDC)  GANDALF module  TDC concepts  TDC implementation in the FPGA  measurements.
Designing for 100+MHz Designs Demand...  Higher system speed  Higher integration —smaller size, less power, better reliability  Lower cost.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Spartan-II Memory Controller For QDR SRAMs Lobby Pitch February 2000 ®
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
J. Christiansen, CERN - EP/MIC
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
Architecture and Features
® Spartan-II High Volume Solutions Overview. ® High Performance System Features Software and Cores Smallest Die Size Lowest Possible Cost.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
BR 1/991 Issues in FPGA Technologies Complexity of Logic Element –How many inputs/outputs for the logic element? –Does the basic logic element contain.
“Supporting the Total Product Life Cycle”
Introduction to Microprocessors
Computer Architecture Lecture 32 Fasih ur Rehman.
® /1 The E is the Edge. ® /2 Density Leadership Virtex XCV1000 Density (system gates) 10M Gates In 2002 Virtex-E.
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Redefining the FPGA. SSTL3 1x CLK 2x CLK LVTTL LVCMOS GTL+ Virtex as a System Component 2x CLK SDRAM Backplane Logic Translators Custom Logic Clock Mgmt.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Issues in FPGA Technologies
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Architecture & Organization 1
We will be studying the architecture of XC3000.
AT91 Memory Interface This training module describes the External Bus Interface (EBI), which generatesthe signals that control the access to the external.
The Xilinx Virtex Series FPGA
Architecture & Organization 1
XC4000E Series Xilinx XC4000 Series Architecture 8/98
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
The Xilinx Virtex Series FPGA
Presentation transcript:

Designing for 100+ MHz with Xilinx Virtex

1999 Designs Demand...  Higher system speed  Higher integration —smaller size, less power, better reliability  Lower cost  Shorter development time  Better product differentiation

Traditional Multi-Chip Boards  Discrete design components —CPU, memory —bus transceivers, PCI controller, FIFOs —Ethernet controller, Graphics accelerator, MPEG, DSP, etc. —programmable logic as glue and custom function  Advantages: —well-documented sophisticated functions —readily available as IP in silicon

Multi-Chip Board Problems  Physical size  Power consumption and reliability  PC board signal integrity  Limited flexibility —prevents design modifications and upgrades —prevents product diversification —prevents product customization  Poor product differentiation —standard parts = standard architecture

The FPGA Solution 4th Generation FPGA Logic+Memory+Routing Multi-Standard Select I/O Temperature Sensing Delay-Locked Loop for Fast Clock and I/O 3.3 ns Synchronous Dual-Port SRAM 500 Mbps SelectMAP Configuration

DLLs Maximize I/O Speed  Clock-to-output time plus set-up time determines the I/O speed and data bandwidth —min clock period = max clock-to-out + max set-up  Traditional solution: —use highly buffered, balanced clock trees –needed to reduce internal clock skew –cannot totally eliminate the delay  The Virtex solution: —use a Delay-Locked-Loop ( DLL ) –aligns the internal and external clocks –effectively eliminates the clock-distribution delay

Clock Data Comparator Error Delay Virtex Has 4 Independent DLLs  DLLs adjust clock delay to internal and external clocks —digital closed-loop control —25 to 200-MHz range, 35-picosecond resolution CLB IOB

LVTTL Data Rate with DLL 1.4 ns measured clock-to-output delay Output standard = LVTTL Fast 16mA (OBUF_F_16) Temp=100C, Vdd=2.375V, Vcco=3.3V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing w/o DLLw/ DLLr->r r->f 3.9n 3.9n1.4n 1.4n

Other DLL Functions  Double the incoming clock frequency —fast internal operation – slow external clock  Clock mirroring to the PCB  Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16  Adjust clock duty cycle to  Create four quadrature clock phases —input four sequential bits per clock period

25 MHz 25% Duty Cycle 25 MHz 50% Duty Cycle Virtex FPGA 1X1X Duty Cycle Correction ~25% duty cycle in – 50% duty cycle out DLL

Clock Doubling and Mirroring  Clock mirror with less than 100 ps skew —simplifies PCB clock distribution Virtex Zero-Delay Internal Clock Buffer 37 MHz 74 MHz #1 74 MHz #2 74 MHz Internal 37 MHz Internal System Clock SDRAM Inside FPGA System Clock 1 Input Load Exactly Aligned Exactly Aligned Actual HDTV Customer Example SDRAM DLL 2 DLL 1

66MHz Clock 132 MHz Clock Virtex FPGA 2X2X DLL Precise Clock Mirroring 2x system clock for board use

CLKIn 200 MHz CLKout 200 MHz CLKDV 12.5 MHz Clock Division  Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16 —maintain synchronous edges

Multi-Standard SelectI/O GTL+ 5V Tolerant 2.5V SSTL 1.8V 3.3V LVTTL 5V5V MicroProcessor SRAM DSP Mixed Signal Busses/Backplanes (3/5V PCI, ISA, GTL…) FLASH SDRAM

Mix & Match Output Standards  User-supplied voltages determine output swing —3.3 V, 2.5 V, 1.5 V —one voltage per bank —a bank is half of a chip edge  Output characteristics are programmable on a per-pin basis —push-pull or open-drain —LVTTL drive strength –2-mA to 24-mA sink and source current —LVTTL Slew rate

Internal Reference V REF Input V REF Mix & Match Input Standards  Internal or user-supplied threshold voltage —selectable on a per-pin basis —one user-supplied threshold voltage per bank  Programmable over-voltage protection —5-V tolerant or diode clamp to VCCO —selectable on a per-pin basis

SSTL Clock-to-Out With DLL  200 MHz inter-chip data rate —SSTL 3, Class II —IOB register to IOB register Clock 2.8 ns Virtex FPGA Q DLL D 1.9 ns 0.3 ns (Stub Series Transceiver Logic)

SSTL Data Rate with DLL Output standard = SSTL 3 Class 2 (OBUF_SSTL3_II) Temp=100C, Vdd=2.375V, Vcco=3.3V, Vtt=1.5V Waveforms: 1: CLKIN 2: DATA OUT (no DLL) 3: DATA OUT (DLL deskewed) Timing w/o DLLw/ DLLr->r r->f 3.5n 3.8n1.1n 1.3n  1.3 ns measured clock-to-output delay —much lower noise than LVTTL

‘Redefining the FPGA’ From FPGA to System Component ‘Redefining the FPGA’ "Virtex moves FPGAs from glue to system component” - Ron Neale, EE GTL+ High Speed System Backplane Low Voltage CPU LVTTL SDRAM (133MHz) SSTL3 Cache SRAM (Mbytes) LVCMOS Chip 1 x1 CLK x2 CLK

Power and Thermal Issues  Power and heat are serious concerns  All CMOS power consumption is dynamic —proportional to V CC 2 —proportional to capacitance —proportional to frequency  Virtex conserves power —2.5-V supply voltage —small geometries and short interconnects reduce capacitance

bit Counters2.5 W Total bit Counters3.7 W Total bit Counters9.8 W Total bit Counters14.7 W Total XCV300 XCV1000 Virtex Power Consumption  Virtex is designed to conserve power —100 MHz 16-bit counters –12.5 MHz average transition rate –6.5 mW per counter including clock distribution —100 MHz 8-bit counters –25 MHz average transition rate –5 mW per counter including clock distribution

DXP DXN Virtex FPGA SBMCLK SBMDATA ALERT Maxim MAX1617 Thermal Management  Temperature-sensing diode —matched to maxim MAX 1617 A/D —programmable alarms —similar to the Pentium II solution

Power Supply Decoupling  CMOS power-supply current is dynamic —current pulse every active clock edge  Peak current can be 5x the average current —instantaneous current peaks can only be supplied by decoupling capacitors  Use one 0.1 µF ceramic chip capacitor for each power-supply pin —low L and R are more important than high C —double up for lower L and R if necessary —use direct vias to the supply planes, close to the power-supply pins

Virtex FPGA WE, CS Data Virtex Configuration  New byte-wide SelectMAP mode —up to 528 Mbps at 66 MHz –simple handshake protocol —up to 400 Mbps at 50 MHz –no handshake required  Configuration bit-stream length —0.5 Mbits to 6.1 Mbits CS Address Configuration EPROM Control Logic (EPLD) Busy

Volts, Amps, and Watts: Recap  PCB design issues —minimize capacitance for higher speed —terminate transmission lines to reduce ringing  Chip inputs and outputs —use DLLs to maximize I/O bandwidth —use SelectI/O to interface with different standards  Power and thermal considerations —use the sensing diode to manage chip temperature —decouple the power supply well  Configuration —configure faster with the SelectMAP mode

Spending the 10 ns Budget  Fast logic requires fast function generators —signals often pass through several function generators  Routing delays must also be kept short —there are routing delays between every function generator  Arithmetic delays are important —carry chains often create critical paths

You Don’t Have To Be An Expert  You don’t have to be an FPGA architecture expert to implement high-performance designs —the benefits of a good architecture are automatic –all the logic goes faster –software provides easy access to the features  You can achieve high-performance only with a good FPGA architecture —a good FPGA empowers its users  You’ll design better if you know the architecture —matching your design style to the available features increases performance and/or lowers cost

Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Virtex CLB  Logic and arithmetic delay reduction demands improvements in the CLB  Virtex CLB is divided into two slices, each with: –2 function generators –2 flip-flops –2 bits of carry logic

Fast Function Generators  Each function generator emulates 2 to 3 levels of logic —a 10-level logic path typically requires 3 to 5 Function Generators in series —at 100 MHz, they must be less than 2 ns each including the routing  Virtex has 0.6-ns function generators —leaves 1.4 ns for each route

F5 Fnct Gen F6 Fnct Gen Fnct Gen Fnct Gen Connecting Function Generators  Some functions need several function generators —F5 MUXs connect pairs of function generators –functions with 5 to 9 inputs —F6 MUXs connect all 4 function generators –functions with 6 to 17 inputs

Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Carry Fnct Gen Fast Local Routing  Local routing provides fast interconnects —in a CLB, Function Generators connect with minimal routing delays —fast paths between adjacent CLBs increases flexibility

Use Pipelining for Speed  Shorter clock periods means doing less each period —create a pipeline structure —pipeline stages operate concurrently —more functions are done at the same time —throughput increases  All function generators have output flip-flops —most pipeline support is “free”

 In directly cascaded pipelines the flip-flops are not free  One SRLUT can implement up to 16 bits of delay —shift data in and select the appropriate tap 16-Bit Shift Register 16-Bit Pipeline in One LUT Input Output Delay Select

Fast Logic Needs Fast Routing  Our typical design with 3 to 5 CLBs needed an average routing delay of 1.4 ns or less —the Virtex routing architecture delivers this performance  Delay is independent of direction —dependably short delays

Go Farther, Faster  Virtex achieves its speed through a hierarchy of highly buffered routing resources —wires span 1, 2, or 6 CLBs  The Virtex routing architecture is designed for large arrays —today’s FPGAs are big… but tomorrow’s will be even bigger  Virtex is designed to maintain its performance even in very large arrays

No Routing Congestion  For high-speed applications, routing must be dependably fast —not just capable of being fast  In the past, high device utilization has caused routing congestion —critical nets might be forced to meander  Virtex minimizes these problems —abundant resources prevent congestion If it needs to be fast, it will be fast – automatically!

CLB Built-in Tri-State Busses  Bi-directional busses are supported directly by tri-state buffers built into each CLB —two drivers per CLB —segmentable every four CLB columns

Arithmetic – A Special Case  Adders, accumulators, counters, and comparators all depend on carry chains  Carry-chain logic is usually much deeper than the rest of the design —32 levels for a 16-bit ripple adder —too deep to use function generators at 100 MHz —arithmetic delays would limit performance  Dedicated carry logic provides the desired speed —16-bit adders can operate at up to 200 MHz register-to-register

Wide Arithmetic  64-bit adders would require 128 levels of logic —expensive complex carry schemes would be needed to preserve performance  Virtex minimizes the carry propagation delay —100 ps per bit pair —zero routing delay between CLBs  Minimal performance loss for each extra bit 16-bit adders operate at up to 200 MHz 64-bit adders operate at up to 135 MHz

Fast Address Decoders  Wide address decoders could slow operation —wide AND gates with invertable inputs  Virtex carry-chain MUXs can act as AND gates —combine function generator ANDs  64-bit decoders operate at up to 155 MHz

Speed Is Never Wasted  You can never have too much performance —excess performance can always be traded for size and cost reduction  Replace single-cycle functions with smaller multi-cycle versions —a 2-cycle multiplier is half the cost of a single-cycle multiplier Reduce costs by designing down to the performance you need

2X2X 2X2X DLL2 DLL1 90 MHz 180 MHz 45 MHz Creating a High-Speed Clock  Logic sometimes needs to operate faster than the available clock —multiple RAM accesses in a single cycle —low-speed PCB clock distribution for power or noise reduction  Virtex DLLs can double and redouble incoming clocks

Optimized for the Future  Deep sub-micron technology permits larger and larger array sizes —poses new circuit-design challenges —changes the rules of FPGA architecture  Across-chip routing is the most vulnerable —could easily limit design performance  Virtex is designed for long-term growth —even long, across-chip routes will remain fast Virtex is tomorrow’s FPGA … today!

10 ns is Long Enough  Virtex CLBs can implement relatively complex functions in 10 ns — 0.6 ns per 4-input function generator  Virtex offers fast interconnections —even across-chip when fully utilized —fast tri-state buses  Support for very fast arithmetic operations —16-bit adders at 200MHz

Implement Designs Automatic  You don’t have to be an FPGA wizard to use Virtex  Virtex is optimized for automated implementation —uniform structure –efficient mapping/synthesis —ample routing –simple placement and no congestion —predictable performance –effective synthesis  IP cores speed design even more —validated functionality with guaranteed performance

100+ MHz Memory  Virtex memory operates up to 200 MHz  High-speed memory has two benefits —data storage –“work-in-progress” –input/output buffers, FIFOs —accelerating complex functions –store pre-computed values in look-up tables

Data Storage Hierarchy Virtex supports 3 levels of memory hierarchy  On-chip SelectRAM + —small-to-medium memories —0.6-ns read access time  On-chip Block SelectRAM + —larger memories —true dual-ported operation —3.3-ns read access time  Fast SelectI/O interfaces to external RAM —DLL boosts memory bandwidth

SelectRAM+  SelectRAM+ uses CLB LUTs as user memory —16-deep RAMs —32-deep RAMs —16-deep dual-ported RAMs —16-deep shift registers  Cascadable for larger memories —128 or more words deep —uses logic resources for expansion

Block SelectRAM+  Up to 32 dual-ported 4096-bit RAM Blocks —synchronous read and write  True dual-port memory —each port has full read and write capability —different clocks for each port  Configurable aspect ratio —trade width for depth –4096 x 1 bit to 256 x 16 bits —separate configurations for each port  Dedicated routing for memory expansion

High-Speed Memory Interfaces  SelectI0 and DLLs together provide fast access to many types of external memory  Xilinx currently offers two reference designs —fully synthesized —automatic placement and routing SDRAM … up to 125 MHz ZBTRAM … up to 143 MHz (Zero Bus-Turn-around)

Input/Output Data Buffers  High-performance systems need data buffers to decouple internal operation from I/O activity —I/O may be sporadic (burst-mode busses) —I/O may be faster or slower —I/O may be wider or narrower  I/O buffers can take several forms —dual-ported RAMs —ping-pong buffers —FIFOs

Dual-ported I/O Buffers  Block SelectRAM+ is ideal for I/O buffers —dual-ported operation –independent clocks and controls –bridges between clock domains –simultaneous read and write —port-specific aspect-ratio control –built-in rate/width conversions  SelectRAM+ provides similar benefits on a smaller scale

 Ping-pong buffers are pairs of blocks that alternate between input and processing  SRLUT for small buffers —self-addressing input —0.6-ns read access  Larger buffers can use the dual-ported Block RAM —one address bit alternates read/write areas —3.3-ns read access 16-Bit Shift Register Select Read Address Input Output Ping Pong Buffers { {

 Small FIFOs can be implemented in SRLUTs —word count addresses the output data —increment and enable SRLUT to Push —decrement to Pop —enable only for both  16-Byte FIFO in 4 CLBs —16 x 16 in 6 CLBs —200+ MHz  Expandable for deeper FIFOs 16-Bit Shift Register { Input Down Word Counter Up Push Pop Small FIFOs in SRLUTs Output

Large FIFOs in Block RAM  Large FIFOs can use the dual-ported block RAM —add read and write address counters  Asynchronous push and pop  Different port sizes give rate-for-width conversion  Block RAM FIFOs can operate at up to 170 MHz including flag logic Block SelectRAM+ InputOutput Push Pop Addrs WE Data Counter En Control Logic FullEmpty Counter

Pre-computing for Speed  Some functions are too complex for 10-ns logic implementation —pipelining is not always possible  An alternative is to pre-compute all the possible results and store them in memory —select a result according to the inputs  Function time is independent of complexity —0.6 ns SelectRAM + access time —3.3 ns Block SelectRAM + access time  The function table can be smaller than the logic

Multiplication By A Constant  Sometimes, data has to be “scaled” —multiplied by a constant value  A full multiplier is too expensive —it can multiply by a variable —unnecessarily general and too complex  Storing all multiples of the constant is a better alternative —smaller and much faster Constant Input Multiplier Array Scaled Data Input Scaled Data Product Table

 A word product table is impractical —partition the input into nibbles –use 16-word LUTs for nibble products –combine the partial products in adders  Roughly half the CLBs of a full multiplier —for a 16-bit Coefficient: 36 CLBs vs. 62 CLBs  Pipeline the adders for extra speed Scaled Data Input LUT x16 x256 x bit Scaler

 The SRLUT mode can be used to update the table —“push-only” stack —last 16 bits loaded define the table  A simple accumulator computes all products of a new constant Output Clear Constant Change Constant Reg- ister Reg- ister Load Changing the Constant 16-Bit Shift Register { Input

Large Function Tables  Larger functions can be implemented in the Block SelectRAM + —12-input functions —micro-coded state machines  Data tables can also be implemented —sine/cosine tables for DSP, for example —dual-ported access gives the sine and cosine simultaneously —a simple address offset gives 90º phase shift for accessing sine and cosine from a single table

Block RAM/ROM Creation  CORE Generator software creates RAMs and ROMs —simple GUI interface  Initialization file is loaded into RAMs and ROMs at configuration time

Memory Summary  Virtex has two kinds of internal memory —distributed SelectRAM+ for small RAMs —Block SelectRAM+ for larger RAMs  SelectRAM+ —0.6 ns read access time —16- and 32-word RAMs / 16-word dual-ported RAMs —16-word shift registers –sequential write/random read FIFOs, pipelining, LUT functions  Dual-ported 4096-bit Block SelectRAM+ —3.3 ns read access time —true dual-ported operation –both ports are read/write / ports can be clocked asynchronously —configurable aspect ratio –4096 x 1 bit to 256 x 16 bits / configure ports differently for width/rate conversion  High-speed SelectI/O access to external RAM

Designing for 100+ MHz Volts, Amps, and Watts —DLLs and flexible I/O standards —fast inter-chip communication —simple rules for good signal integrity Ones and zeros —fast logic and fast interconnect —dependable high performance Bits and bytes —distributed SelectRAM + —dual-ported Block SelectRAM +

The Virtex Family The complete Virtex Data Sheet is on your AppLinx CD-ROM and at