® E is the Edge.

Slides:



Advertisements
Similar presentations
Spartan-3 FPGA HDL Coding Techniques
Advertisements

FPGA Configuration. Introduction What is configuration? – Process for loading data into the FPGA Configuration Data Source Configuration Data Source FPGA.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
The 8085 Microprocessor Architecture
Chapter 9 High Speed Clock Management. Agenda Inside the DCM Inside the DFS Jitter Inside the V5 PLL.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
Reconfigurable Computing - Clocks John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 291 Lecture 29 IEEE JTAG Advanced Boundary Scan & Description Language (BSDL) n Special scan.
Page 1 Simplifying MSO-based debug of designs with Xilinx FPGAs.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Programmable logic and FPGA
TAP (Test Access Port) JTAG course June 2006 Avraham Pinto.
Configuration. Mirjana Stojanovic Process of loading bitstream of a design into the configuration memory. Bitstream is the transmission.
NS Training Hardware. Memory Interface Support for SDRAM, asynchronous SRAM, ROM, asynchronous flash and Micron synchronous flash Support for 8,
Link A/D converters and Microcontrollers using Long Transmission Lines John WU Precision Analog - Data Converter Applications Engineer
Time Division Multiplexing School of Physics and Astronomy Department of Particle Physics Elissavet Papadima 29/5/2014.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
Memory Technology “Non-so-random” Access Technology:
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Scan and JTAG Principles1 Scan and JTAG Principles ARM Advanced RISC Machines.
® ChipScope ILA TM Xilinx and Agilent Technologies.
© 2003 Xilinx, Inc. All Rights Reserved FPGA Design Techniques.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Spartan-II Memory Controller For QDR SRAMs Lobby Pitch February 2000 ®
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
Memory and Storage Dr. Rebhi S. Baraka
Architecture and Features
CPEN Digital System Design
® Spartan-II High Volume Solutions Overview. ® High Performance System Features Software and Cores Smallest Die Size Lowest Possible Cost.
® Additional Spartan-XL Features. ® Family Highlights  Spartan (5.0 Volt) family introduced in Jan. 98 —Fabricated on advanced 0.5µ process.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
© 2003 Xilinx, Inc. All Rights Reserved Synchronous Design Techniques.
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTEMS Boundary Scan.
BR 1/991 Issues in FPGA Technologies Complexity of Logic Element –How many inputs/outputs for the logic element? –Does the basic logic element contain.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
® /1 The E is the Edge. ® /2 Density Leadership Virtex XCV1000 Density (system gates) 10M Gates In 2002 Virtex-E.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Introduction to Microprocessors - chapter3 1 Chapter 3 The 8085 Microprocessor Architecture.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Survey of Reconfigurable Logic Technologies
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
REGISTER TRANSFER LANGUAGE (RTL) INTRODUCTION TO REGISTER Registers1.
Computer Architecture Chapter (5): Internal Memory
October 12th 2005 ICALEPCS 2005D.Charlet The SPECS field bus  Global description  Module description Master Slave Mezzanine  Implementation  Link development.
Status and Plans for Xilinx Development
SYSTEM-LEVEL TEST TECHNIQUES INTRODUCTION In the 1970s, the in-circuit testing (ICT) method appeared. In the 1970s, the in-circuit testing (ICT) method.
Issues in FPGA Technologies
The 8085 Microprocessor Architecture
FPGA Configuration Chris Stinson, 1998.
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
The 8085 Microprocessor Architecture
CPE/EE 428/528 VLSI Design II – Intro to Testing (Part 3)
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
The Xilinx Virtex Series FPGA
XC4000E Series Xilinx XC4000 Series Architecture 8/98
The 8085 Microprocessor Architecture
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family)
The Xilinx Virtex Series FPGA
8253 – PROGRAMMABLE INTERVAL TIMER (PIT). What is a Timer? Timer is a specialized type of device that is used to measure timing intervals. Timers can.
Presentation transcript:

® E is the Edge

DLLs

The Need for Clock Management As system speeds increase, we can no longer ignore clock skew and noise problems A 2ns clock skew matters more with a 6ns clock, than it does with a 20ns clock Need a way to control clock skew and decrease the effect of noise on the clock Notes: Purpose: Introduce the customer to the concept of Need for Clock Mangement. Why do we need to control the clock delay? Next Slide talks about ways of controlling the clock Higher Speed Skew & Noise Problems

Ways to Manage the Clock DLLs All digital Triggered by incoming clock edge Creates output jitter less than 50ps Less susceptible to analog noise Easily transferable from one process technology to another PLLs Uses analog VCO Can suppress incoming clock jitter Adds undefined output jitter Susceptible to analog noise Not easily transferable from one process technology to another Notes: Purpose: Introduce the customer to the 2 most common ways of managing a clock: DLLs and PLLs. Don’t go into too much details on the differences. Just a high level overview, this is not a competitive comparison with Altera. Next slide: Technical DLL info.

DLL Basics A DLL works by inserting delay on the clock net until the next clock input rising edge is in phase with the clock feedback rising edge. Requires a well designed low-skew clock distribution network so that the clock edges arrive simultaneously everywhere in the part. Delay CLKIN Phase Delay Control CLKOUT CLKFB Clock Distribution Network Key Notes: DLL inserts a delay until the delayed feedback clock aligns with the input clock. At that point the DLL is locked.

DLL Functions Virtex Clock Phase Synthesis For Use Internally Or Externally Clock Mirror Zero-Delay Board Clock Buffer Virtex Speedup Tc2o Zero-Delay Internal Clock Buffer Clock Multiplication & Division

DLL Tclock-to-out Speedup Tclock = 0ns DLL D Q > OUT CLKext Tc2q + Tout = Tc2o CLKint Nullify clock delay - fast Tc2o on XCV1000 External CLKext pin and internal CLKint pin are aligned 2.5ns setup/0.0ns hold & 3.5ns Tc2o on all devices Optional Duty Cycle correction 50/50 Duty Cycle correction applied when specified

DLL Multiplication Generate 2x & 4x clocks 16 16 32 Data Buffer Internal Logic IO 2x CLK x Generate 2x & 4x clocks Reduce board EMI and trace concerns by routing low frequency clocks externally and multiplying internally Cross clock domains without worry Multiplied & divided clocks have synchronized edges No external clock drift & minimal external clock skew

DLL Division Selectable Division Values 1.5, 2, 2.5, 3, 4, 5, 8, or 16 50/50 Duty Cycle correction available Use DLL pair to combine functions Input 180 2X DV2 30 MHz - 180° Phase Shift 15 MHz (Divide by 2) 30 MHz 180° Phase Shift - Clock Multiply & Clock Divide 30 MHz (180° Shift) 60 MHz (Multiply by 2) 30 MHz (180° Shift) Used for FB DLL 30 MHz DLL

System Synchronization Synchronize all devices Eliminate board clock skew Nullifies clock input & board delay in addition to internal distribution delay Removes chip to chip race conditions Increases chip to chip interface speed - 240MHz for Virtex-E CLK DLL DLL FPGA 1 DLL DLL DLL FPGA 2 FPGA 3 FPGA N

DLL Applications Clock to out Speedup Clock Multiplication/Division High Speed Memory interfaces High Speed chip to chip requirements Clock Multiplication/Division Multiply clock internally, so that the external clock is slower, thus decreasing the signal integrity problems on the board Clock Phase Shift and Duty Cycle Correction Double Data Rate applications Generation of multiple clocks Clock Mirroring Generate extra external clocks for fanout issues Board level clock management

Virtex-E DLL Modes Low Frequency High Frequency Input Frequency Range - 25 MHz to 160 MHz Maximum Output Frequency - 320 MHz Minimum High/Low Time - 2.0 ns* All 6 Outputs Available for use Internally & Externally CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV High Frequency Input Frequency Range - 60 MHz to 320 MHz Minimum High/Low Time - 1.3 ns* 3 Outputs Available for use Internally & Externally CLK0, CLK180 & CLKDV Both Modes Supported with Simple Design Primitives VHDL & Verilog Simulation Support Available * Varies with frequency

DLL Software Support Use BUFGDLL macro for common clock usage Build complex structures using clkdll primitive DLL FB IBUFG BUFG PAD To distributed clock network 0ns BUFGDLL Equivalent Structure CLKDLL CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED

What happens if the CLKIN phase shifts? The outputs will phase shift 1-4 clock edges after the CLKIN shifts. Due to this delay inter-chip communication could have problems since the clock sources are not aligned. LOCKED will stay asserted and the control logic will remain at the previous setting Advice: Keep the phase shift to a longer LOW pulse.

What happens if the CLKIN changes frequency? The control logic is may not able to catch period changes of 1.0ns or more The outputs may start to destabilize as the control logic tries to adjust the delay lines to compensate. What to do: Make sure that a change of frequency is followed by a reset of the CLKDLL. The LOCK signal may or may not change.

What happens if the operating temperature changes? The DLL will automatically adjust for temperature variance DLL specs are guaranteed for chip temperatures between 0ºC and 85ºC

Why can’t I mux the CLKIN line? The CLKIN input must come from an IBUFG, a BUFG driven from another CLKDLL, or DLLIOB If a LUT or other route is placed in the circuit the CLKDLL can not adjust for this unknown delay What to do: Route the net out of the chip and into an IBUFG or DLLIOB

DLL Information XAPP132: Using the Virtex DLL XAPP400: DLL usage in Software http://www.xilinx.com/apps/virtexapp.htm

Differential Signaling LVDS, LVPECL, BusLVDS

Moore’s Law at Work Blasting Thru the 100M Transistor Barrier XCV1000 75M Transistors XCV2000E 125M Transistors XCV3200E 211M Transistors 100M 200M 1998 1999 2000

I/O Bandwidth Trends Bandwidth (MB/s) Ethernet SCSI 10,000 1,000 PCI-X 1986 1988 1990 2002 1992 1994 1996 1998 2000 Bandwidth (MB/s) SCSI Internet Backbone Ethernet PCI-X PCI 1,000 100 10 10,000 Notes: Purpose: Introduce the customer to the rising I/O bandwidth trends. Next Slide: The problem of noise

I/O Signaling TTL HSTL SSTL Single-Ended I/O Signaling LVDS BLVDS LVPECL Differential Notes: Purpose: intro to I/O standards. 2 big divisions: single-ended and differential Next Slide: The problem the single ended I/Os have

The Problem As the process shrinks, the absolute I/O noise margin shrinks as well 5V CMOS 3.3V CMOS 1.8V CMOS 1V 2V 3V 4V 5V 1.6 V 1.0 V 0.86 V Logic 1 Logic 0 Notes: Purpose: Educate the customer on the problem of noise on the board 1. As speeds go up, process technology shrinks. As process shrinks, power supply voltages go down, as well as logic level thresholds. This also lowers the absolute noise margin on the I/Os 2. Most customers have different process technology chips on the same board, which means multiple power supplies at different voltages Each device generates noise on the board, up to the power supply value. The 5V and 3.3V devices will effect the 1.8V devices greatly.

Differential Signaling The Solution Differential I/O signaling has a higher noise immunity The data is transmitted in the voltage difference of two lines The noise effects both lines, but the voltage difference stays about the same, which means that the data is not effected by the noise Notes: Purpose: intro to Differential signaling 1. The idea behind differential signaling is higher noise immunity 2. Explain the third bullet: Data (1 or 0) is ‘coded’ into the voltage difference of the two lines Noise effects a line in such way that the voltage on the line rises or falls For single-ended transmissions, the noise might change the voltage enough to cross the threshold and change logic value (from 1 to 0 and vise versa) Differential lines are effected by noise as well The noise will raise/decrease the voltage levels of both lines equally (if lines are close enough on the board), so the voltage difference between the lines stays about the same (very little variance). Since the voltage difference is the same, the data has not changed value (from 1 to 0 and vice versa)

Differential Signaling The Benefits High Noise Immunity… Huge Benefit Low Power High Speed I/O transfer Low EMI Noise due to switching cancels between the two lines, since both lines switch at the same time, in the opposite direction Notes: Purpose: Introduce the customer to the benefits of differential signals. 1. Low EMI is a nice side effect of differential signaling Since the differential lines switch in opposite direction, the EMIs from each lines cancel each other out. (switching is in the opposite direction, at the same time)

Differential Configurations Multidrop Point to Point Multi-Point Notes: Point to Point: One driver driving one receiver (simplest termination) Multidrop: One driver drives multiple receivers Multi-point: Every device is capable of driving and receiving from every other device

Signal Interconnect Classification Dual-Pin Differential 30  Transmission Lines + _ 50  Transmission Lines Point-to-Point LVDS LVPECL Multi-Drop Bus LVDS LVPECL Typically found in backplanes Multi-Point Bus LVDS LVPECL Typically found in backplanes

VIRTEX-E as a Differential Receiver Point-to-point configuration Data out Data in LVDS/LVPECL Line driver Virtex-E FPGA Rt Q QB IN INX Zo = 50 Notes: Purpose: Show that it is easy to connect to any other LVDS or LVPECL device. This is the termination circuit when using Virex-E as a receiver. This applies both to LVDS and LVPECL. The only difference is in the resistor values. Values given in AppNotes. Don’t get into too much details, the end of this section includes PCB design guidelines VIRTEX-E can be driven by any standard LVDS or LVPECL driver VIRTEX-E receiver complies with the LVDS or LVPECL specs

VIRTEX-E as an Differential Driver Point-to-point configuration Zo = 50 Data out Data in Standard LVDS or LVPECL receiver, or VIRTEX-E LVDS or LVPECL receiver Virtex-E FPGA Q QB OUT OUTX Rs Rdiv Rt Notes: Purpose: Show the customer that Virtex-E can drive any LVDS or LVPECL device This circuit applies both to LVDS and LVPECL. The values of the resistors are different. Zo is line impedance. For typical line it is 50 Ohms Capable of driving any standard LVDS or LVPECL receiver

It’s a way of communication using low voltage LVDS LVDS stands for: Low Voltage Differential Signaling. It’s a way of communication using low voltage Swing (~350 mV) over two differential connections. The Big motivation for developing LVDS is the need for noise immunity for board to board communication Notes: 1. Low Voltage indicates Less power consumption 2. Differential = Good Noise Immunity

Requires different termination than LVDS BLVDS BLVDS stands for: Bus LVDS Bidirectional LVDS The device can transmit and receive LVDS signals through the same pins Requires different termination than LVDS Notes:

Virtex-E LVDS Signaling Q 1.5V Q _ +/- 175 mV Swing @ 1.25V Midpoint 1.0V 0.5V 0.0V Notes: Seeing is believing. This is an LVDS waveform You can see the negative and the positive line switching in the opposite direction at the same time. The voltage swing is about 175mV on each line. \ Very low power Computed Signal Differential 2 x (Q-QB)

LVDS Standards Parameter RS-422 PECL LVDS Driver output voltage ~2 - 5 V ~600 - 1.000 mV ~250 - 450 mV Receiver input threshold ~200 mV ~200 - 300 mV ~100 mV Data Rate <30 Mbps > 400 Mbps > 400 Mbps Dynamic Power Low High Low Noise Low Low Low Cost Medium High Low

LVDS Characteristics Termination The transmission medium must be terminated with a 100  + 20 . The resistor is placed across the differential inputs With this termination as LVDS driver can drive signals over several meters at speeds in excess of 155.5 Mbps (77.7 MHz). The real limitation of speed is: How fast can data be delivered to the driver. Bandwidth performance of the selected media. The simple LVDS termination is easy to implement ECL and PECL require more complex termination schemes. Notes: - Resistor should be placed as close as possible to the receiver input. - PECL drivers commonly require 220  pull-down resistors from each driver output, along with 100  across the receiver input - The simple LVDS termination is easy to implement in most applications.

LVDS Advantages Saving Power LVDS technology saves power in several important way’s. Power dissipation at the terminator is ~1.2 mW RS-422 driver delivers 3 V across a termination of 100 , for 90 mW power consumption... 75 times more than LVDS! Due to the current mode driver design, the frequency component of Icc is greatly reduced. Compared to TTL / CMOS transceivers where the dynamic power consumption increases exponentially with the frequency.

LVDS Advantages Save Money High performance can be achieved using off the shelf FPGA’s LVDS consumes less power, therefore one can use cheaper power supplies, or fewer fans LVDS is low noise, so no more EMI headaches (save time). Since LVDS is much faster than CMOS / TTL, LVDS signals can be serialized. This results in smaller packages, simpler connectors, etc

Virtex-E LVDS All IO have LVDS capability IOBs configured as LVDS can be : Synchronous or asynchronous. Input or output Two IOBs (pair) form one LVDS signal. One IOB will function as + or P The other IOB will function as - or N. LVDS pin pairs are indicated in the datasheet Maximum number of LVDS pin-pairs: 344

LVPECL LVPECL stands for Low Voltage Positive Emitter Coupled Logic Well known industry standard for fast clocking Voltage swing (~750 mV) over two differential connections. Virtex-E offers easy interface with other standard LVPECL chips Notes: 1. Positive = Uses Positive Power supply, as opposed to negative used for typical ECL 2. ECL = Emitter Coupled Logic, classical high speed bipolar technology used in mainframes, telecom, and instrumentation

LVPECL Clocking TTL is not the most desired clocking technique for clock frequencies higher than 150 MHz System Clock Speed Notes: As system clock frequencies get higher and higher, the standard TTL clocking techniques are no longer efficient. At around 150MHz, we cross over to the LVPECL realm. LVPECL can support system clocks at much higher speed. Having LVPECL on VIRTEX-E chips allows customers to move up in system speeds, and design for applications which require high speeds. LVPECL TTL 150 MHz

Clock Sources TTL Oscillator TTL/CMOS Up to ~135MHz LVPECL Oscillator Generic LVPECL Oscillator LVPECL Up to ~250 MHz Example: Saronix SEL3400 Series Quartz Crystal 16MHz Nom Notes: Purpose: Familiarize the customer with the clock sources available on the market This also shows that LVPECL is used for high frequency clocking. LVPECL Clock Synthesizer LVPECL Up to ~400 MHz Example: Motorola MC12429 Synergy SY89429V

Virtex-E 300+ MHz LVPECL Clocking LVPECL Clock Source LVPECL Clock Distributor 2 Virtex-E 1 Virtex-E n Virtex-E 2 Example Devices: Motorola MC10/100E111 Synergy SY10E111LE Virtex-E No LVPECL-TTL Translator Equal-Length Point-to-Point LVPECL PCB Clock Traces Notes: Here is an example for LVPECL interface at the board level, using VIRTEX-E devices as receivers of LVPECL clock. VIRTEX-E can connect directly to a LVPECL clock distributor, eliminating clock delays by not having to have PECL-to-TTL converters. A designer must be aware that in order to have a fully synchronous system, the distances from the LVPECL clock distributor to the VIRTEX-E devices must be equal. Typical trace delays are 185ps/inch. Typical Discrete Solution: Motorola MC100EPT23 Dual Differential PECL to TTL Translator, TPD = 2.0ns Virtex-E Eliminates PECL-to-TTL Converters -- Eliminates 2ns Delay & Skew

Virtex-E LVPECL Clock Conversion Receive and convert high speed clocks with zero delay External RAM, etc. Zero-Delay Local Clock Generation to Any of Virtex-E I/O Standards SSTL TTL DLL Virtex-E LVPECL Clock Notes: Not only that can VIRTEX-E receive LVPECL clocks, it can also act as a clock converter for other devices on the board that can not receive LVPECL signals. Each I/O bank can be configured to comply with different I/O standards. Therefore, as shown on this slide, VIRTEX-E can take it LVPECL clocks and outsource non-LVPECL clocks. Using the internal DLLs, these output clocks have zero delays from the input LVPECL clock. Again, the VIRTEX-E device is increasing the system speed by eliminating the LVPECL-to-TTL (or other I.O standard) converter.

Putting it All Together ... LVPECL Clock Source LVPECL Clock Distributor 2 Virtex-E 1 Virtex-E n Virtex-E 2 Example Devices: Motorola MC10/100E111 Synergy SY10E111LE Virtex-E No LVPECL-TTL Translator Equal-Length Point-to-Point LVPECL PCB Clock Traces Device Notes: So if we put the two previous slides together, we end up with a complete system, where we see that the VIRTEX-E devices act as key elements on the board: they interface directly to other LVPECL devices, and they convert any LVPECL clocks to any other I/O standard used by the non-LVPECL devices.

Designing With LVDS and LVPECL Some Facts Impedance Matching is VERY important Discontinuities in impedance WILL create reflections. Reflections degrade signals and show up as Common Mode Noise. Common Mode Noise cancels the magnetic shield effect of differential lines and radiates as EMI. Do not make sharp turns since this causes impedance discontinuities. Keep stubs and uncontrolled tracks < 10 mm. Notes: - Impedance matching is very important, even for short traces

Designing With LVDS and LVPECL (Continued) PCB guidelines: Use at least 4 PCB layers (LVDS signals, ground, power, TTL/CMOS signals) Separate TTL/CMOS signals from the LVDS signals Keep LVDS driver/receiver connections as close to the connectors as possible. Decouple the power supply as good as possible. Connect all the VCC and Ground pins of the component. Make power and ground tracks as wide as possible. Connect to power and ground tracks with multiple vias.

Designing With LVDS and LVPECL (Continued) PCB guidelines Match the tracks to the impedance of your transmission medium and termination resistor. Run differential tracks as close together as possible as soon as they leave the IC Use Microstrip or Stripline for tracks Match electrical length of tracks to reduce skew. Keep the distance of a pair of tracks as constant as possible to avoid discontinuities in impedance. Notes: - If any stubs are used, they should be less than 7mm - Skew between a pair of tracks results in phase shift. This destroys magnetic field cancellation and result in EMI.

Designing With LVDS and LVPECL (Continued) PCB guidelines Use a good matching termination resistor. LVDS will not work without resistor termination. Typically a single resistor at the receiver is OK. Surface mount resistors are best. Stubs are short. Distance between receiver and termination is short. No component leads. At extra cost you can use the center tap capacitance termination scheme. R/2 R C R/2

More LVDS and LVPECL Info At Xilinx’ website: http://www.xilinx.com/apps/xapp.htm Look at AppNotes XAPP230, XAPP231, XAPP232

Memory Interfaces ZBT RAM, SDRAM, DDR SDRAM

Virtex-E and High Speed Memory Interfaces Features needed for interface to high speed memory Fast I/Os Clock management capabilities Virtex-E has both: SSTL2, HSTL, LVDS, LVPECL and many more 8 on-chip DLLs - use for Clk-to-Out speed up, clock deskew, clock multiplication/division

Benefits of using an FPGA for the Memory Interface Easy to implement Can add functionality in the future easily ASIC is a one-time-deal Combine multiple discrete devices into the FPGA Save space, money, and power Notes: 1. The memory interface designs don’t take much space on the FPGA, so the designer can use the rest of the FPGA for other designs as well. 2. One can change the functionality of the memory controller easily in the future. ASICs are not flexible

High Speed Memory Interfaces ZBT RAM Interface SDRAM Interface DDR SDRAM Interface

Zero Bus Turn-around SRAM Extremely high bandwidth Other non-cache applications in telecom, test equipment, DSP and embedded memory applications ZBT stands for “Zero Bus Turnaround” No idle cycles between read-to-write and write-to-read 100% bus use Previous architectures had a Turnaround Cycle Completely Deterministic Timing - Simplifies System Design Any cycle can perform any operation Notes: This is a general description of the ZBT RAM, and ZBT applications - Networking and communications - routers, switches and hubs. Fully utilizing a system's ability to read and write data throughout the network. - What distinguishes ZBT from SRAMs is that ZBT has no idle cycles between a read and a write, and vice versa. Meaning, when you have been reading from the RAM, and than you change the command to a “write”, the data to be written in the ZBT RAM can be put out on the data bus on the next clock cycle - Therefore, the timing can be determined easily.

ZBT SRAM Parameters Densities 2, 4 and 8 Mbits Data bus widths 18, 32, and 36-bit IO Voltage and standards 2.5V, 3.3V, LVTTL Flow thru speed 8, 10ns (Clock cycle time) Pipeline speed 5, 6, 7.5ns (Clock cycle time) Notes: These are general ZBT SRAM parameters

ZBT Flow-ThroughTiming Read Operation - data available after single clock latency Control Data Address Clk 1 2 Write Operation - “Late Write” data to be written is presented on next clock Control Data Address Clk 1 2 Notes: Thing to point out: - Data is available on the next clock cycle.

ZBT Pipelined Timing Read Operation - data available after two clock latency Control Address Clk 1 2 3 Data Write Operation - “Late Write” data is written 2 cycles later Control Address Clk 1 2 3 Data Things to point out: - Initially (only on the first clock cycle), there is a latency of 2 clock cycle (2 stage pipeline) - after that, the data is available on every cycle Looking at the waveforms of the pipelined and the flow-through ZBT RAMs, the pipelined version seams to be in disadvantage, since it has an initial 2 clock cycle latency. However, the pipelined version is much faster than the flow-through version.

ZBT 100% Bus Use Write/Write/Read/Write/Read/Burst Read Clock Command Write1 Write2 Read1 WRITE3 Read2 RdBrst Address Addw1 Addw2 AddR1 Addw3 AddR2 Dout w1 Dout w2 Din R1 Dout w3 Din R2 Din R2+1 DQ This is a waveform for the pipelined ZBT RAM. The data for each command (Write or Read), comes out 2 clock cycles after the command has been put on the command bus. Pipelined part’s timing is illustrated above

Virtex-E ZBT Bandwidth 800 Mbytes/sec @ 32bits wide These are the speeds at which Virtex-E can interface to ZBT RAMs. Very High Performance Synchronous, Static Memory

ZBT Interface Reference Design XCV300-E CLKin DLL 1 DLL 2 Clk2x Clk2x Tester Controller ZBT SRAM Data out Reset Data in Data Addr Addr Notes: Things to point out: 1. DLLs are used for - clock deskew - 2X multiplication Error RW#

ZBT Interface Application Note 7.2 Giga-bits/s @ 36 bits wide 200 MHz Synthesisable HDL Controller Design XCV300-E, -6 speed grade

ZBT Bus Contention - Real World 143 MHz Clock R/W Address [0] Data [0] Notes: - The R/W (Read/Write) command is switching on every clock cycle - With other RAMs, this will cause some contention on the data bus - As we can see, with ZBT RAM, the contention is not noticeable, and the data bus is not effected - This is the benefit of ZBT RAM: 100% bus use! Scope shot taken directly from the ZBT controller reference board.

Virtex-E High Speed SDRAM Interface SDRAM Overview Features Virtex-E SDRAM controller Block diagram Timing

SDRAM Features: Synchronous interface (free system from wait states) Burst mode access (reduce CAS access time) Multiple banks (parallel processing: access one bank, precharge/refresh the other) LVTTL, 3.3V Programmable burst length, CAS latency CAS latency=2 Burst length=4 READ Col D4 D3 Clock Command Address DQ D1 D2 Notes: - SDRAMs are Synchronous, Dynamic RAMs, which means that they need to be re-freshed. - They support programmable CAS latency, and burst mode (show on waveforms)

SDRAM Controller Application Note Synthesizable Verilog/VHDL Programmable burst length (1, 2, 4, 8) Programmable CAS latency (2, 3) Automatically issues refresh commands Supports LOAD_MR, AUTO_REFRESH, PRECHARGE, ACT_ROW, READA, WRITEA, BURST_STOP, NOP Interfaces with SDRAM at 125MHz (Virtex-E, -6 speed) Uses 2 DLLs and 165 CLB slices (5% of XCV300E) Notes: - Xilinx’ Appnote supports programmable CAS and burst length - It also automatically issues a refresh command

SDRAM controller system XCV300-E -6 62.5MHz clock 125MHz clock SDRAM controls controls system XCV300-E -6 data_addr_n addr 11 This is a block diagram of the appnote AD data 32 32

SDRAM controller Controller Things to point out: - DLL used for 2X multiplication. The customer doesn’t have to worry about bringing in a fast clock. - The system interface (BLUE), contains a MUX for the row and column addresses - The Controller (RED), contains the Refresh Counter, which is used in issuing the refresh command

SDRAM controller IO timing Read Cycle is the critical timing: SDRAM-8 clk-to-out = 6.0ns Virtex-6 setup = 1.7ns 125 MHz operation (8ns cycle), 300ps left for board routing on data lines Write Cycle: Virtex-6 clk-to-out = 3.9ns SDRAM-8 setup = 2.0ns 125 MHz operation (8ns cycle), 2.1ns left for board routings Notes: This is analysis of the critical timing for the desing. - The READ cycle is the critical timing for this design. Looking at the numbers, we only have 300ps for board routing delays. That’s not much. Options are to select a faster FPGA or a faster SDRAM - The WRITE cycle allows for more board delays (2.1 ns)

Virtex-E DDR-SDRAM Interface DDR SDRAM Overview Features Differences from SDRAM Virtex-E SDRAM controller Block diagram Timing Board layout guideline

DDR SDRAM Features: Next generation SDRAM DDR data I/O (twice the bandwidth at the same clock frequency as SDRAM) Peak bandwidth: 1.6 GBytes/s (64-bit @ 100MHz) 2.5V, SSTL2, 100/133MHz Advantages over RDRAM cost, package, open industry spec, compatible with existing spec Supported by major vendors Micron, Samsung, IBM, Fujitsu, Hitachi, Huyndai, Toshiba,... General DDR SDRAM Overview

DDR SDRAM Differences compared to standard SDRAM: All IOs are SSTL2, 2.5V (reduce power and noise) Differential clock (CLK and CLKB). Positive edge clock is the crossing of CLK going high and CLB going low. Bidirectional data strobe (clock-to-data skew is eliminated) Double Data Rate data transfer Differences over standard SDRAM

Write Cycle SDRAM: DDR SDRAM: clk cmd addr data clk clkb cmd addr dqs ACT NOP WRITE addr ROW COL data D1 D2 D3 D4 DDR SDRAM: clk clkb Notes: - The Command and Address buses are running at the ‘normal’ clock rate - The Data bus is running at double data rate (twice the ‘normal’ clock) cmd ACT NOP WRITE addr ROW COL dqs data D1 D2 D3 D4

Read Cycle SDRAM: DDR SDRAM: clk cmd addr data clk clkb cmd addr dqs ACT NOP READ addr ROW COL data D1 D2 D3 D4 DDR SDRAM: clk clkb cmd ACT NOP READ addr ROW COL dqs data D1 D2 D3 D4

DDR SDRAM controller Application Note Synthesizable Verilog Virtex-E, -6 speed grade: 100 MHz Clk 200 MHz Data rate 1.6 Giga-Bytes/S bandwidth @ 64 bits wide Programmable CAS latency, burst length 2 DLLs, 474 slices (15% of XCV300-E) Uses “Logic Accessible Clock” technique Uses Clock to latch Read Data, instead of DQS

DDR SDRAM controller Virtex-E

DDR SDRAM IO timing Data Lines: Read Cycle Read cycle is critical. Data is strobed by clk, instead of DQS ddr_clk -0.8ns minimum DDR clk-out -0.4ns minimum Virtex-E hold time Minimum trace delay on data = 0.8ns - 0.4ns - clock skew between ddr_clk & fpga_clk = 0.4ns- clock skew

DDR SDRAM IO timing Addr/Cntrl Lines Address and Control lines are generated on the negative edge of the clock, to guarantee DDR hold time ddr_clk 2.4ns 1.2ns Virtex-E clk_out (max) DDR setup time 5ns Maximum trace delay on Addr/Cntrl = 5ns - 2.4ns - 1.2ns - clock skew = 1.4ns - clock skew

DDR SDRAM IO timing Summary The I/O spec for DDR is very tight Carefully calculate data and address trace delays to guarantee setup and hold times The minimum trace delay on the data lines can be eliminated by delaying the ddr_clk Since DDR has negative tAC(min), delaying the ddr_clk helps meet Virtex-E’s hold time requirement

Board Layout Guideline All high speed memory interfaces Virtex device and the memory chips must be placed close to each other Consider/Simulate board level signal integrity and timing, pay particular attention to clocks Use matched impedance traces DDR All bi-directional signals use IOBUF_SSTL2_II (data & data strobes) other output signals use OBUF_SSTL2_I DQ lines must be closely matched, and kept short to minimize cross talk DQS trace lengths should match DQ CLK and CLKB delays and loads should match (CLKB can also be routed back to an unused IOB near the feedback pin)

Memory Interface Application Notes ZBT RAM: XAPP136 SDRAM: XAPP134 DDR SDRAM: XAPP200 http://www.xilinx.com/apps/virtexapp.htm

CAM in Virtex-E

CAM Overview Content Addressable Memory Storage Array (like RAM) Find a location of a particular stored value Compare input against data in memory If Match found, output the Address Maximum performance, if match in a single clock cycle Notes: Explain the basic stuff about CAM.

CAM Overview Simple RAM and CAM compared RAM 1024 x 8 CAM 1024 x 8 Add [9:0] Dout [7:0] CAM 1024 x 8 Add [9:0] Notes: By comparing the input against the data memory, a CAM determines if an input value matches one or more values stored n the array. If the comparison is done simultaneously, the CAM is said to be at maximum efficiency. A match, when it exists, is found in one clock cycle. Similar to a RAM, a CAM stores words in an array. The write mode is comparable, but the read mode is different. In a RAM, the word in a specific location is read by the address. In a CAM, the data on the input is looking for a match. When a match is found, the output is the address in the array. Din [7:0] Match

CAM Applications Telecommunications Networking Ethernet ATM Protocol

CAM Overview CAM features: Word Size (width) Number of Words (depth) Match or Compare Time (read) Significance of Write Speed Clock Frequency Masks Decoded and/or Encoded Address (outputs) Notes: Basic CAM features. The time it takes to write to the CAM, is not as important as the time it takes to read from the CAM

CAMs in Virtex-E Flexible CAM designs in Virtex and Virtex-E CAM implemented in a LUT CAM implemented in a Block SelectRAM A Content Addressable Memory is a storage array designed to quickly find the location of a To determine the correct CAM implementation for a particular application, the following features should be investigated. Virtex devices allow different approaches to designing an optimal CAM. There is not a specific CAM type to fit all CAM applications, therefore, different approaches are necessary to achieve optimal results. A small fast read and write CAM can be implemented in Block SelectRAM+. Large CAMs can be implemented in slices configured either as 16-bit shift registers or distributed SelectRAM+ 16x1.

Designing CAM in Virtex slices XAPP203: “Designing Flexible, Fast CAMs with Virtex Family FPGAs”: VHDL and Verilog Reference Designs available Features 4 bits per LUT 16-word x 4-bit organization Match in one clock cycle 16 Write clock cycles Decoded address output Generic word width from 4 bits up to any multiple by 4 Generic number of 16 words CAM blocks Cascadable Address Encoder in logic or tri-state buffers (TBUF) Notes: This is the first way of implementing CAMs in Virtex. This method uses SRL16 as a basic module. - Read is in 1 clock cycle - Write is in 16 clock cycles. - Encoded address is available

CAM in a LUT Match Operation Reconfigurable 8-bit Word Comparator 8 LUT SRL16 D Q A[0:3] “1” Wide AND FF CLK MATCH_SIGNAL 1 slice 4 DATA_IN Notes: This is the schematic allowing us to see the Match Operation, for an 8-bit wide CAM. Since this fits in one slice, it is clear that the Match operation can be done in one clock cycle.

Match Waveforms for CAM in a LUT 16WORDS ENCODE MATCH DATA_IN MATCH_ENABLE R_MATCH_ADDR R_MATCH_OK “…1001” “xxxx xxxx xxxx xxxx” “0000 0000 0000 0100” “xxxx” “0010” CLK Match_cycle Encode_cycle Notes: We can also see from the waveforms that the match is found in one clock cycle. The Encoded address is available one clock cycle after the match.

CAM in a LUT Write Operation Counter 4-bit Compare Reconfigurable 8-bit Word Comparator 4 8 DATA_IN LUT SRL16 D Q A[0:3] 1 slice MSB LSB Notes: The Write operation takes 16 Clock Cycles. For most applications the Write cycle is not as important as the Match (Read) cycle.

Cascading CAMs in LUTs CAM match path (1 CLK) & encode (1 CLK) DATA_IN 8 Array of N x 16_WORDS MATCH_ADDR Encode MSB FF D Q CAM_16WORDS 16 Encode 4 LSB CAM_16WORDS Encode 4 LSB 16 FFs CAM_16WORDS Encode 4 LSB MATCH_OK Notes: Cascading the CAMs does not add to the time it takes to perform the operations, since the CAMs operate in parallel, as shown on the picture. The match operation is done in 1 clock cycle, and the encode operation is done in 2 clock cycles CAM_16WORDS FF D Q Encode 4 LSB CLK MATCH_ENABLE

CAM in Block SelectRAM XAPP204: “Using Block SelectRAM+ for High-Performance Read/Write CAMs”: VHDL and Verilog Reference Designs available Features 128 bits per Block SelectRAM+ 16-word x 8-bit organization Match in one clock cycle Write in one clock cycle (and Erase in one clock cycle) Decoded address output Fully synchronous match and write ports (Independent) Cascadable Address Encoder in logic or tri-state buffers (TBUF) Notes: This is another way of implementing CAM in Virtex. This way uses the Block RAM. Match and Write in one clock cycle.

CAM in a Block SelectRAM+ CAM 16x8 Macro in 1 Block SelectRAM+ MATCH[15:0] DATA_WRITE[7:0] ADDR[3:0] ERASE_WRITE CLK_WRITE DATA_MATCH[7:0] WRITE_ENABLE MATCH_ENABLE MATCH_RST CLK_MATCH RAMB4_S1_S16 DOB[15:0] DOA N.C. DIA[0] ADDRA[11:0] WEA ENA RSTA CLKA DIB[15:0] ADDRB[7:0] WEB ENB RSTB CLKB “0000….0000” “0” 12 8 4 PORT A PORT B Notes: There is a macro available to use in the software. It automatically infers a 16X8 CAM

Cascading Block SelectRAM+ CAMs for bigger depth CAM 64-word x 8-bit in Read Mode CAM (16x8) 16 32 48 64 MATCH[63:0] DATA_MATCH[7:0] CLK_MATCH 8 [15:0] [31:16] [47:32] [63:48] Notes: The CAMs can be cascaded for bigger depth, without compromising the speed. This is possible since the CAMs are connected in parallel.

Cascading Block SelectRAM+ CAMs for higher width CAM 16-word x 16-bit in Read Mode CAM (16x8) DATA_MATCH[15:0] CLK_MATCH [15:0] [15:8] [7:0] MATCH[15:0] [0] [1] [15] Notes: The CAMs can also be cascaded to increase the width of the CAM. In this case, the match bus (address bus) can be generated by feeding the outputs of the CAMs through AND gates.

CAM in Block SelectRAM+ The final picture CAM16x8 Macro Match flag and encoded outputs DATA[7:0] Write port A (4096 x 1) Read port B (256 x 16) MATCH[15:0] CLKB CLK_MATCH ADDRB[7:0] DOB[15:0] Decoded Address 16 FF D Q ENCODE MATCH_ADDR[3:0] 4 MATCH_SIGNAL Notes: This is the CAM design as a whole, with the match flag and encoded address.

CAM in Virtex FPGAs Basic decoder/comparator block designed using: Virtex slices configured as 16-bit shift registers (8 bits per slice) Virtex dual port block SelectRAM+ (128 bits per block) Use an array of basic blocks to implement a CAM Width (bits) XCV2000E Notes: This diagram shows the different CAM sizes, designed in XCV2000E. Size = 20,480 bits Size = 122,880 bits CAM depth in words

XILINX CAMs comparison

SRL16

SelectShift D Q CE LUT IN CLK ADDR[3:0] OUT Slice CLB Dynamically addressable Shift Registers, implemented in one LUT 1 2 15

SelectShift Features Serial In, Serial Out Does not require an address counter Programmable cycle delay from 1 to 16 Addr[3:0] specifies the desired delay Cascade for cycle delays greater than 16 CLB Flip-Flops can be used to add depth

Software Support Primitives available in software Positive or negative clock edge triggered Clock Enable optional Available for VHDL or Verilog instantiations D CLK A3 A2 A1 A0 Q SRL16 16-bit Shift Register Look-Up-Table D CLK A3 A2 A1 A0 Q SRL16E CE 16-bit Shift Register Look-Up-Table with Clock Enable

SRL16 Applications Shift Registers Delayed Signal Generation Linear Feedback Shift Registers (LFSRs) CRC circuits

Virtex- E Configuration

Agenda Review of configuration Modes Startup Sequence Serial, Parallel, JTAG Startup Sequence XC1800 PROM interfacing Daisy Chaining Tips in debugging configuration issues JTAG Configuration

Operation Flow Configuration Data stored in a PROM or downloaded through a cable Configuration time dependents device size type of configuration clock speed POWER UP Device Operational CONFIGURATION Serial Mode Parallel Mode JTAG

Configuration Modes Serial Modes Parallel Mode JTAG Master Slave SelectMAP JTAG

Serial Mode Configuration Master Serial Configuration Mode PROM CLK DATA /CE /RESET/OE Virtex-E CCLK DIN DONE /INIT Serial Configuration Master mode: the Virtex-E device is initiating the configuration Slave mode: the Virtex-E device is waiting for some other device to start the configuration

Serial Mode Configuration Data is loaded serially- one bit per CCLK A Virtex-E device in Master Serial Mode produces it’s own CCLK CCLK rate is controllable in software Mode used with a PROM In a Slave Serial Mode, Virtex-E device needs a CCLK provided by another device All download cables do this

Parallel Mode Configuration SelectMAP Microprocessor Virtex-E CCLK D0-D7 DONE /CS /WRITE PROG One byte loaded per CCLK Designed to be driven by other logic device Another FPGA or CPLD Processor Microcontroller MultiLinx Cable SelectMAP is a parallel configuration mode 1. Fastest 2. Best to use with microprocessors

Important Signals in SelectMAP Data(D0-D7)- bi-directional data bus D0 is the MSB /WRITE- direction of data on the bus Low for configuration (Write) High for readback /CS- enable for the data bus a High will ignore CCLK transitions BUSY- output that indicates when data can be received Not needed for CCLK < 50 MHz All pins shown in pinout tables in the datasheet 1. When /WRITE is high, the device is in READBACK mode 2. BUSY is used for handshaking when the CCLK is fast (> than 50 MHz)

SelectMAP- Things to Know Initialization needed after /INIT goes high 3 CCLKs needed If /CS and /WRITE are asserted early , no data will be transferred on the first CCLK To strobe data, use /CS, not /WRITE If a CCLK rising edge occurs when /CS is asserted and /WRITE is de-asserted, an ABORT will occur Need to reload Sync Word and redo last packet Purpose- Special things about SelectMAP Notes: - The initialization is needed for Serial Modes too - More information on ABORT in the datasheet

Virtex-E Bitstream Format 10 internal configuration registers Bitstream is actually a set sequence of writes into those registers Configuration data still broken into frames All data is encapsulated into packets- Type I and Type II When migrating from Virtex to Virtex-E a new bitstream is needed Purpose- explain the main idea of virtex bitstreams Notes: Configuration Logic acts kind of like a processor, with registers, and writes to those registers A Frame represents one vertical line of configuration bits in the device- that includes IOB bits, CLB bits, and routing bits.

Configuration Registers Each register has a 5-bit address Detailed information in XAPP 138

Configuration Startup Sequence Four signals to control GWE (Global Write Enable) GSR (Global Set/Reset) GTS (Global 3-State) DONE (External Done Pin) Six phases to select assertion/de-assertion (1-6) Sequencer will wait in the DONE phase until DONE goes high Can create “Sync-To-Done” behavior by setting GTS, GSR, and GWE to same cycle as DONE Purpose- explain what the startup sequence does and what aspects of it can be changed Notes: Between the configuration data being loaded and the device being functional, it needs to go through the startup sequence This sequence brings the device through a state machine where certain signals are activated and deactivated The user can choose where these things happen in Bitgen

Startup Sequence Phase 0 1 2 3 4 5 6 7 StartupClk DONE Default Phase in Bold GTS GSR GWE

Virtex-E and XC1800 PROM’s Can program via serial or SelectMAP mode serial vs. parallel controlled in software Purpose- The new 1800 PROM's are particularly useful for Virtex device download Can be used for serial or SelectMAP download- so it can be faster than other PROM's. the PROM's are reprogrammable via JTAG

Daisy Chaining Available only is Serial or JTAG Mode Master Slave Slave Virtex-E #1 Virtex-E #2 Virtex/4kX #3 DIN DOUT DIN DOUT DIN PROM Available only is Serial or JTAG Mode Concatenation of bitstreams does not work Use the software to generate the necessary bitstreams (PROMGen)

Debugging Tips and Info What causes /INIT to go low? CRC check fails Internal error, e.g. data loaded too fast When will an error stay undetected? A bit is missed or added- this will misalign the instructions, and the CRC check won’t happen Mode pin considerations Internal pullups are guaranteed Make sure pulldown is strong enough (4.7k) Virtex does not have indicator pins (HDC, /LDC) like the 4k families, so there is less information on what went wrong Some signals need to be looked at if there is a configuration problem Notes: - DONE and INIT are the most useful - If INIT goes low there was a CRC or internal error

JTAG Configuration

What is JTAG? JTAG - Joint Test Action Group Developed as standard testing interface Boundary Scan, IEEE STD 1149.1 Four Dedicated Pins Required: TDI, TDO, TMS, and TCK TRST is an optional 5th pin that Xilinx does not use Notes: Originally developed for testing We use the JTAG standard for programming, not testing

JTAG Standard JTAG Standard - 16 State, State Machine TAP (Test Access Port) IR (Instruction Register) DR (Data Register) Bypass Register Notes The JTAG standard defines that it has to be implemented in dedicated hardware, which must be able to perform the functions listed

JTAG Tap Controller Test-Logic-Reset Exit2-DR Capture-DR Shift-DR Test-Logic-Reset Exit2-DR Capture-DR Shift-DR Exit1-DR Select-IR-Scan Capture-IR Shift-IR Exit1-IR Pause-DR Run-Test/Idle Select-DR-Scan Update-IR Update-DR 1 Reset state is entered by the 9500, and it resets the tap controller and loads the IR and DR registers with benign values. Idle is the wait state of the controller. (This state triggers the execution of program and erase instructions) Select DR is transition decision state. Capture loads the active register with a pre-determined value. Shift state is used to place the value into either the DR or the IR. Update is the last state of each DR/IR transfer in this state, the value is shifted into either the DR or the IR is actually loaded.

JTAG TAP Controller: Architecture Notes: This is the JTAG Tap controller architecture There are 3 registers for every pin: for a “0”, “1”, and tri-state. The Bypass register is used to bypass the device, and send a JTAG bitstream to the next device

BSDL Files Boundary Scan Description Language BSDL Files define the hardware Description of the die, with pins and scan chain order Information about the size of the various chip specific registers (e.g. instruction register length) Unconfigured BSDL files are provided Assumes all I/Os are bidirectional Notes: Unconfigured files are OK, since we are using it for configuration of the device, and we don’t know if the I/Os are going to be inputs or outputs If the customer wants to use JTAG for testing (which is not the topic here), than he/she has to modify the BSDL files to make them configured. Xilinx provides unconfigured files only.

BSDL Availability Files on the web are continuously updated Current software does not always have most recent BSDL file HTTP://support.xilinx.com -> Software

JTAG Programmer Software Support for Virtex-E JTAG Software Support in M2.1i SP3 Non invasive: Idcode, Bypass, Usercode SVF file generation Stay current with the download tools Service packs Web Pack (pc only) Foundation or Alliance software updates at: http://support.xilinx.com/support/techsup/sw_updates/ JTAG Programmer at: http://www.xilinx.com/sxpresso/webpack.htm

Cables Provided by Xilinx Multilinx Parallel Cable III XChecker Supported in 2.1i sp2 JTAG Programmer USB or Serial ports Win 98 only Parallel Cable III XChecker

Cables: JTAG Connections This shows a chain of devices and how the tdi -> tdo process. How many devices? What virtexE in a 5 volt chain = bad * If there is a TRST trace on the board, it should be tied high

JTAG Debugging Tips Debug Chain Software Tool (Logic Probe) /TRST pin should be tied high on 3rd party chips Noise or bad parallel port ISP Checklist app note XAPP104 Know all devices in chain and the order Virtex-E does not tolerate 5V signals directly

Good References Virtex-E Datasheet- basic information on configuration modes XAPP138- Configuration modes, packets and readback XAPP151- Detailed bitwise explanation of configuration registers, partial reconfiguration hints and advanced concepts in readback XAPP139 - Detailed information on JTAG configuration and readback for VIRTEX devices XAPP153 - Status and Control register information for partial reconfiguration information http://www.xilinx.com/apps/virtexapp.htm