Presentation is loading. Please wait.

Presentation is loading. Please wait.

® E is the Edge.

Similar presentations


Presentation on theme: "® E is the Edge."— Presentation transcript:

1 E is the Edge

2 DLLs

3 The Need for Clock Management
As system speeds increase, we can no longer ignore clock skew and noise problems A 2ns clock skew matters more with a 6ns clock, than it does with a 20ns clock Need a way to control clock skew and decrease the effect of noise on the clock Notes: Purpose: Introduce the customer to the concept of Need for Clock Mangement. Why do we need to control the clock delay? Next Slide talks about ways of controlling the clock Higher Speed Skew & Noise Problems

4 Ways to Manage the Clock
DLLs All digital Triggered by incoming clock edge Creates output jitter less than 50ps Less susceptible to analog noise Easily transferable from one process technology to another PLLs Uses analog VCO Can suppress incoming clock jitter Adds undefined output jitter Susceptible to analog noise Not easily transferable from one process technology to another Notes: Purpose: Introduce the customer to the 2 most common ways of managing a clock: DLLs and PLLs. Don’t go into too much details on the differences. Just a high level overview, this is not a competitive comparison with Altera. Next slide: Technical DLL info.

5 DLL Basics A DLL works by inserting delay on the clock net until the next clock input rising edge is in phase with the clock feedback rising edge. Requires a well designed low-skew clock distribution network so that the clock edges arrive simultaneously everywhere in the part. Delay CLKIN Phase Delay Control CLKOUT CLKFB Clock Distribution Network Key Notes: DLL inserts a delay until the delayed feedback clock aligns with the input clock. At that point the DLL is locked.

6 DLL Functions Virtex Clock Phase Synthesis
For Use Internally Or Externally Clock Mirror Zero-Delay Board Clock Buffer Virtex Speedup Tc2o Zero-Delay Internal Clock Buffer Clock Multiplication & Division

7 DLL Tclock-to-out Speedup
Tclock = 0ns DLL D Q > OUT CLKext Tc2q + Tout = Tc2o CLKint Nullify clock delay - fast Tc2o on XCV1000 External CLKext pin and internal CLKint pin are aligned 2.5ns setup/0.0ns hold & 3.5ns Tc2o on all devices Optional Duty Cycle correction 50/50 Duty Cycle correction applied when specified

8 DLL Multiplication Generate 2x & 4x clocks
16 16 32 Data Buffer Internal Logic IO 2x CLK x Generate 2x & 4x clocks Reduce board EMI and trace concerns by routing low frequency clocks externally and multiplying internally Cross clock domains without worry Multiplied & divided clocks have synchronized edges No external clock drift & minimal external clock skew

9 DLL Division Selectable Division Values 1.5, 2, 2.5, 3, 4, 5, 8, or 16
50/50 Duty Cycle correction available Use DLL pair to combine functions Input 180 2X DV2 30 MHz - 180° Phase Shift 15 MHz (Divide by 2) 30 MHz 180° Phase Shift - Clock Multiply & Clock Divide 30 MHz (180° Shift) 60 MHz (Multiply by 2) 30 MHz (180° Shift) Used for FB DLL 30 MHz DLL

10 System Synchronization
Synchronize all devices Eliminate board clock skew Nullifies clock input & board delay in addition to internal distribution delay Removes chip to chip race conditions Increases chip to chip interface speed - 240MHz for Virtex-E CLK DLL DLL FPGA 1 DLL DLL DLL FPGA 2 FPGA 3 FPGA N

11 DLL Applications Clock to out Speedup Clock Multiplication/Division
High Speed Memory interfaces High Speed chip to chip requirements Clock Multiplication/Division Multiply clock internally, so that the external clock is slower, thus decreasing the signal integrity problems on the board Clock Phase Shift and Duty Cycle Correction Double Data Rate applications Generation of multiple clocks Clock Mirroring Generate extra external clocks for fanout issues Board level clock management

12 Virtex-E DLL Modes Low Frequency High Frequency
Input Frequency Range - 25 MHz to 160 MHz Maximum Output Frequency MHz Minimum High/Low Time ns* All 6 Outputs Available for use Internally & Externally CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV High Frequency Input Frequency Range - 60 MHz to 320 MHz Minimum High/Low Time ns* 3 Outputs Available for use Internally & Externally CLK0, CLK180 & CLKDV Both Modes Supported with Simple Design Primitives VHDL & Verilog Simulation Support Available * Varies with frequency

13 DLL Software Support Use BUFGDLL macro for common clock usage
Build complex structures using clkdll primitive DLL FB IBUFG BUFG PAD To distributed clock network 0ns BUFGDLL Equivalent Structure CLKDLL CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED

14 What happens if the CLKIN phase shifts?
The outputs will phase shift 1-4 clock edges after the CLKIN shifts. Due to this delay inter-chip communication could have problems since the clock sources are not aligned. LOCKED will stay asserted and the control logic will remain at the previous setting Advice: Keep the phase shift to a longer LOW pulse.

15 What happens if the CLKIN changes frequency?
The control logic is may not able to catch period changes of 1.0ns or more The outputs may start to destabilize as the control logic tries to adjust the delay lines to compensate. What to do: Make sure that a change of frequency is followed by a reset of the CLKDLL. The LOCK signal may or may not change.

16 What happens if the operating temperature changes?
The DLL will automatically adjust for temperature variance DLL specs are guaranteed for chip temperatures between 0ºC and 85ºC

17 Why can’t I mux the CLKIN line?
The CLKIN input must come from an IBUFG, a BUFG driven from another CLKDLL, or DLLIOB If a LUT or other route is placed in the circuit the CLKDLL can not adjust for this unknown delay What to do: Route the net out of the chip and into an IBUFG or DLLIOB

18 DLL Information XAPP132: Using the Virtex DLL
XAPP400: DLL usage in Software

19 Differential Signaling
LVDS, LVPECL, BusLVDS

20 Moore’s Law at Work Blasting Thru the 100M Transistor Barrier
XCV1000 75M Transistors XCV2000E 125M Transistors XCV3200E 211M Transistors 100M 200M 1998 1999 2000

21 I/O Bandwidth Trends Bandwidth (MB/s) Ethernet SCSI 10,000 1,000 PCI-X
1986 1988 1990 2002 1992 1994 1996 1998 2000 Bandwidth (MB/s) SCSI Internet Backbone Ethernet PCI-X PCI 1,000 100 10 10,000 Notes: Purpose: Introduce the customer to the rising I/O bandwidth trends. Next Slide: The problem of noise

22 I/O Signaling TTL HSTL SSTL Single-Ended I/O Signaling LVDS BLVDS
LVPECL Differential Notes: Purpose: intro to I/O standards. 2 big divisions: single-ended and differential Next Slide: The problem the single ended I/Os have

23 The Problem As the process shrinks, the absolute I/O noise margin shrinks as well 5V CMOS 3.3V CMOS 1.8V CMOS 1V 2V 3V 4V 5V 1.6 V 1.0 V 0.86 V Logic 1 Logic 0 Notes: Purpose: Educate the customer on the problem of noise on the board 1. As speeds go up, process technology shrinks. As process shrinks, power supply voltages go down, as well as logic level thresholds. This also lowers the absolute noise margin on the I/Os 2. Most customers have different process technology chips on the same board, which means multiple power supplies at different voltages Each device generates noise on the board, up to the power supply value. The 5V and 3.3V devices will effect the 1.8V devices greatly.

24 Differential Signaling The Solution
Differential I/O signaling has a higher noise immunity The data is transmitted in the voltage difference of two lines The noise effects both lines, but the voltage difference stays about the same, which means that the data is not effected by the noise Notes: Purpose: intro to Differential signaling 1. The idea behind differential signaling is higher noise immunity 2. Explain the third bullet: Data (1 or 0) is ‘coded’ into the voltage difference of the two lines Noise effects a line in such way that the voltage on the line rises or falls For single-ended transmissions, the noise might change the voltage enough to cross the threshold and change logic value (from 1 to 0 and vise versa) Differential lines are effected by noise as well The noise will raise/decrease the voltage levels of both lines equally (if lines are close enough on the board), so the voltage difference between the lines stays about the same (very little variance). Since the voltage difference is the same, the data has not changed value (from 1 to 0 and vice versa)

25 Differential Signaling The Benefits
High Noise Immunity… Huge Benefit Low Power High Speed I/O transfer Low EMI Noise due to switching cancels between the two lines, since both lines switch at the same time, in the opposite direction Notes: Purpose: Introduce the customer to the benefits of differential signals. 1. Low EMI is a nice side effect of differential signaling Since the differential lines switch in opposite direction, the EMIs from each lines cancel each other out. (switching is in the opposite direction, at the same time)

26 Differential Configurations
Multidrop Point to Point Multi-Point Notes: Point to Point: One driver driving one receiver (simplest termination) Multidrop: One driver drives multiple receivers Multi-point: Every device is capable of driving and receiving from every other device

27 Signal Interconnect Classification Dual-Pin Differential
30  Transmission Lines + _ 50  Transmission Lines Point-to-Point LVDS LVPECL Multi-Drop Bus LVDS LVPECL Typically found in backplanes Multi-Point Bus LVDS LVPECL Typically found in backplanes

28 VIRTEX-E as a Differential Receiver Point-to-point configuration
Data out Data in LVDS/LVPECL Line driver Virtex-E FPGA Rt Q QB IN INX Zo = 50 Notes: Purpose: Show that it is easy to connect to any other LVDS or LVPECL device. This is the termination circuit when using Virex-E as a receiver. This applies both to LVDS and LVPECL. The only difference is in the resistor values. Values given in AppNotes. Don’t get into too much details, the end of this section includes PCB design guidelines VIRTEX-E can be driven by any standard LVDS or LVPECL driver VIRTEX-E receiver complies with the LVDS or LVPECL specs

29 VIRTEX-E as an Differential Driver Point-to-point configuration
Zo = 50 Data out Data in Standard LVDS or LVPECL receiver, or VIRTEX-E LVDS or LVPECL receiver Virtex-E FPGA Q QB OUT OUTX Rs Rdiv Rt Notes: Purpose: Show the customer that Virtex-E can drive any LVDS or LVPECL device This circuit applies both to LVDS and LVPECL. The values of the resistors are different. Zo is line impedance. For typical line it is 50 Ohms Capable of driving any standard LVDS or LVPECL receiver

30 It’s a way of communication using low voltage
LVDS LVDS stands for: Low Voltage Differential Signaling. It’s a way of communication using low voltage Swing (~350 mV) over two differential connections. The Big motivation for developing LVDS is the need for noise immunity for board to board communication Notes: 1. Low Voltage indicates Less power consumption 2. Differential = Good Noise Immunity

31 Requires different termination than LVDS
BLVDS BLVDS stands for: Bus LVDS Bidirectional LVDS The device can transmit and receive LVDS signals through the same pins Requires different termination than LVDS Notes:

32 Virtex-E LVDS Signaling
Q 1.5V Q _ +/- 175 mV Swing @ 1.25V Midpoint 1.0V 0.5V 0.0V Notes: Seeing is believing. This is an LVDS waveform You can see the negative and the positive line switching in the opposite direction at the same time. The voltage swing is about 175mV on each line. \ Very low power Computed Signal Differential 2 x (Q-QB)

33 LVDS Standards Parameter RS-422 PECL LVDS
Driver output voltage ~2 - 5 V ~ mV ~ mV Receiver input threshold ~200 mV ~ mV ~100 mV Data Rate <30 Mbps > 400 Mbps > 400 Mbps Dynamic Power Low High Low Noise Low Low Low Cost Medium High Low

34 LVDS Characteristics Termination
The transmission medium must be terminated with a 100  + 20 . The resistor is placed across the differential inputs With this termination as LVDS driver can drive signals over several meters at speeds in excess of Mbps (77.7 MHz). The real limitation of speed is: How fast can data be delivered to the driver. Bandwidth performance of the selected media. The simple LVDS termination is easy to implement ECL and PECL require more complex termination schemes. Notes: - Resistor should be placed as close as possible to the receiver input. - PECL drivers commonly require 220  pull-down resistors from each driver output, along with 100  across the receiver input - The simple LVDS termination is easy to implement in most applications.

35 LVDS Advantages Saving Power
LVDS technology saves power in several important way’s. Power dissipation at the terminator is ~1.2 mW RS-422 driver delivers 3 V across a termination of 100 , for 90 mW power consumption times more than LVDS! Due to the current mode driver design, the frequency component of Icc is greatly reduced. Compared to TTL / CMOS transceivers where the dynamic power consumption increases exponentially with the frequency.

36 LVDS Advantages Save Money
High performance can be achieved using off the shelf FPGA’s LVDS consumes less power, therefore one can use cheaper power supplies, or fewer fans LVDS is low noise, so no more EMI headaches (save time). Since LVDS is much faster than CMOS / TTL, LVDS signals can be serialized. This results in smaller packages, simpler connectors, etc

37 Virtex-E LVDS All IO have LVDS capability
IOBs configured as LVDS can be : Synchronous or asynchronous. Input or output Two IOBs (pair) form one LVDS signal. One IOB will function as + or P The other IOB will function as - or N. LVDS pin pairs are indicated in the datasheet Maximum number of LVDS pin-pairs: 344

38 LVPECL LVPECL stands for
Low Voltage Positive Emitter Coupled Logic Well known industry standard for fast clocking Voltage swing (~750 mV) over two differential connections. Virtex-E offers easy interface with other standard LVPECL chips Notes: 1. Positive = Uses Positive Power supply, as opposed to negative used for typical ECL 2. ECL = Emitter Coupled Logic, classical high speed bipolar technology used in mainframes, telecom, and instrumentation

39 LVPECL Clocking TTL is not the most desired clocking technique for clock frequencies higher than 150 MHz System Clock Speed Notes: As system clock frequencies get higher and higher, the standard TTL clocking techniques are no longer efficient. At around 150MHz, we cross over to the LVPECL realm. LVPECL can support system clocks at much higher speed. Having LVPECL on VIRTEX-E chips allows customers to move up in system speeds, and design for applications which require high speeds. LVPECL TTL 150 MHz

40 Clock Sources TTL Oscillator TTL/CMOS Up to ~135MHz LVPECL Oscillator
Generic LVPECL Oscillator LVPECL Up to ~250 MHz Example: Saronix SEL3400 Series Quartz Crystal 16MHz Nom Notes: Purpose: Familiarize the customer with the clock sources available on the market This also shows that LVPECL is used for high frequency clocking. LVPECL Clock Synthesizer LVPECL Up to ~400 MHz Example: Motorola MC Synergy SY89429V

41 Virtex-E 300+ MHz LVPECL Clocking
LVPECL Clock Source LVPECL Clock Distributor 2 Virtex-E 1 Virtex-E n Virtex-E 2 Example Devices: Motorola MC10/100E111 Synergy SY10E111LE Virtex-E No LVPECL-TTL Translator Equal-Length Point-to-Point LVPECL PCB Clock Traces Notes: Here is an example for LVPECL interface at the board level, using VIRTEX-E devices as receivers of LVPECL clock. VIRTEX-E can connect directly to a LVPECL clock distributor, eliminating clock delays by not having to have PECL-to-TTL converters. A designer must be aware that in order to have a fully synchronous system, the distances from the LVPECL clock distributor to the VIRTEX-E devices must be equal. Typical trace delays are 185ps/inch. Typical Discrete Solution: Motorola MC100EPT23 Dual Differential PECL to TTL Translator, TPD = 2.0ns Virtex-E Eliminates PECL-to-TTL Converters -- Eliminates 2ns Delay & Skew

42 Virtex-E LVPECL Clock Conversion Receive and convert high speed clocks with zero delay
External RAM, etc. Zero-Delay Local Clock Generation to Any of Virtex-E I/O Standards SSTL TTL DLL Virtex-E LVPECL Clock Notes: Not only that can VIRTEX-E receive LVPECL clocks, it can also act as a clock converter for other devices on the board that can not receive LVPECL signals. Each I/O bank can be configured to comply with different I/O standards. Therefore, as shown on this slide, VIRTEX-E can take it LVPECL clocks and outsource non-LVPECL clocks. Using the internal DLLs, these output clocks have zero delays from the input LVPECL clock. Again, the VIRTEX-E device is increasing the system speed by eliminating the LVPECL-to-TTL (or other I.O standard) converter.

43 Putting it All Together ...
LVPECL Clock Source LVPECL Clock Distributor 2 Virtex-E 1 Virtex-E n Virtex-E 2 Example Devices: Motorola MC10/100E111 Synergy SY10E111LE Virtex-E No LVPECL-TTL Translator Equal-Length Point-to-Point LVPECL PCB Clock Traces Device Notes: So if we put the two previous slides together, we end up with a complete system, where we see that the VIRTEX-E devices act as key elements on the board: they interface directly to other LVPECL devices, and they convert any LVPECL clocks to any other I/O standard used by the non-LVPECL devices.

44 Designing With LVDS and LVPECL
Some Facts Impedance Matching is VERY important Discontinuities in impedance WILL create reflections. Reflections degrade signals and show up as Common Mode Noise. Common Mode Noise cancels the magnetic shield effect of differential lines and radiates as EMI. Do not make sharp turns since this causes impedance discontinuities. Keep stubs and uncontrolled tracks < 10 mm. Notes: - Impedance matching is very important, even for short traces

45 Designing With LVDS and LVPECL (Continued)
PCB guidelines: Use at least 4 PCB layers (LVDS signals, ground, power, TTL/CMOS signals) Separate TTL/CMOS signals from the LVDS signals Keep LVDS driver/receiver connections as close to the connectors as possible. Decouple the power supply as good as possible. Connect all the VCC and Ground pins of the component. Make power and ground tracks as wide as possible. Connect to power and ground tracks with multiple vias.

46 Designing With LVDS and LVPECL (Continued)
PCB guidelines Match the tracks to the impedance of your transmission medium and termination resistor. Run differential tracks as close together as possible as soon as they leave the IC Use Microstrip or Stripline for tracks Match electrical length of tracks to reduce skew. Keep the distance of a pair of tracks as constant as possible to avoid discontinuities in impedance. Notes: - If any stubs are used, they should be less than 7mm - Skew between a pair of tracks results in phase shift. This destroys magnetic field cancellation and result in EMI.

47 Designing With LVDS and LVPECL (Continued)
PCB guidelines Use a good matching termination resistor. LVDS will not work without resistor termination. Typically a single resistor at the receiver is OK. Surface mount resistors are best. Stubs are short. Distance between receiver and termination is short. No component leads. At extra cost you can use the center tap capacitance termination scheme. R/2 R C R/2

48 More LVDS and LVPECL Info
At Xilinx’ website: Look at AppNotes XAPP230, XAPP231, XAPP232

49 Memory Interfaces ZBT RAM, SDRAM, DDR SDRAM

50 Virtex-E and High Speed Memory Interfaces
Features needed for interface to high speed memory Fast I/Os Clock management capabilities Virtex-E has both: SSTL2, HSTL, LVDS, LVPECL and many more 8 on-chip DLLs - use for Clk-to-Out speed up, clock deskew, clock multiplication/division

51 Benefits of using an FPGA for the Memory Interface
Easy to implement Can add functionality in the future easily ASIC is a one-time-deal Combine multiple discrete devices into the FPGA Save space, money, and power Notes: 1. The memory interface designs don’t take much space on the FPGA, so the designer can use the rest of the FPGA for other designs as well. 2. One can change the functionality of the memory controller easily in the future. ASICs are not flexible

52 High Speed Memory Interfaces
ZBT RAM Interface SDRAM Interface DDR SDRAM Interface

53 Zero Bus Turn-around SRAM
Extremely high bandwidth Other non-cache applications in telecom, test equipment, DSP and embedded memory applications ZBT stands for “Zero Bus Turnaround” No idle cycles between read-to-write and write-to-read 100% bus use Previous architectures had a Turnaround Cycle Completely Deterministic Timing - Simplifies System Design Any cycle can perform any operation Notes: This is a general description of the ZBT RAM, and ZBT applications - Networking and communications - routers, switches and hubs. Fully utilizing a system's ability to read and write data throughout the network. - What distinguishes ZBT from SRAMs is that ZBT has no idle cycles between a read and a write, and vice versa. Meaning, when you have been reading from the RAM, and than you change the command to a “write”, the data to be written in the ZBT RAM can be put out on the data bus on the next clock cycle - Therefore, the timing can be determined easily.

54 ZBT SRAM Parameters Densities 2, 4 and 8 Mbits
Data bus widths 18, 32, and 36-bit IO Voltage and standards 2.5V, 3.3V, LVTTL Flow thru speed 8, 10ns (Clock cycle time) Pipeline speed 5, 6, 7.5ns (Clock cycle time) Notes: These are general ZBT SRAM parameters

55 ZBT Flow-ThroughTiming
Read Operation - data available after single clock latency Control Data Address Clk Write Operation - “Late Write” data to be written is presented on next clock Control Data Address Clk Notes: Thing to point out: - Data is available on the next clock cycle.

56 ZBT Pipelined Timing Read Operation - data available after two clock latency Control Address Clk Data Write Operation - “Late Write” data is written 2 cycles later Control Address Clk Data Things to point out: - Initially (only on the first clock cycle), there is a latency of 2 clock cycle (2 stage pipeline) - after that, the data is available on every cycle Looking at the waveforms of the pipelined and the flow-through ZBT RAMs, the pipelined version seams to be in disadvantage, since it has an initial 2 clock cycle latency. However, the pipelined version is much faster than the flow-through version.

57 ZBT 100% Bus Use Write/Write/Read/Write/Read/Burst Read
Clock Command Write1 Write2 Read1 WRITE3 Read2 RdBrst Address Addw1 Addw2 AddR1 Addw3 AddR2 Dout w1 Dout w2 Din R1 Dout w3 Din R2 Din R2+1 DQ This is a waveform for the pipelined ZBT RAM. The data for each command (Write or Read), comes out 2 clock cycles after the command has been put on the command bus. Pipelined part’s timing is illustrated above

58 Virtex-E ZBT Bandwidth 800 Mbytes/sec @ 32bits wide
These are the speeds at which Virtex-E can interface to ZBT RAMs. Very High Performance Synchronous, Static Memory

59 ZBT Interface Reference Design
XCV300-E CLKin DLL 1 DLL 2 Clk2x Clk2x Tester Controller ZBT SRAM Data out Reset Data in Data Addr Addr Notes: Things to point out: 1. DLLs are used for - clock deskew - 2X multiplication Error RW#

60 ZBT Interface Application Note
bits wide 200 MHz Synthesisable HDL Controller Design XCV300-E, -6 speed grade

61 ZBT Bus Contention - Real World
143 MHz Clock R/W Address [0] Data [0] Notes: - The R/W (Read/Write) command is switching on every clock cycle - With other RAMs, this will cause some contention on the data bus - As we can see, with ZBT RAM, the contention is not noticeable, and the data bus is not effected - This is the benefit of ZBT RAM: 100% bus use! Scope shot taken directly from the ZBT controller reference board.

62 Virtex-E High Speed SDRAM Interface
SDRAM Overview Features Virtex-E SDRAM controller Block diagram Timing

63 SDRAM Features: Synchronous interface (free system from wait states)
Burst mode access (reduce CAS access time) Multiple banks (parallel processing: access one bank, precharge/refresh the other) LVTTL, 3.3V Programmable burst length, CAS latency CAS latency=2 Burst length=4 READ Col D4 D3 Clock Command Address DQ D1 D2 Notes: - SDRAMs are Synchronous, Dynamic RAMs, which means that they need to be re-freshed. - They support programmable CAS latency, and burst mode (show on waveforms)

64 SDRAM Controller Application Note
Synthesizable Verilog/VHDL Programmable burst length (1, 2, 4, 8) Programmable CAS latency (2, 3) Automatically issues refresh commands Supports LOAD_MR, AUTO_REFRESH, PRECHARGE, ACT_ROW, READA, WRITEA, BURST_STOP, NOP Interfaces with SDRAM at 125MHz (Virtex-E, -6 speed) Uses 2 DLLs and 165 CLB slices (5% of XCV300E) Notes: - Xilinx’ Appnote supports programmable CAS and burst length - It also automatically issues a refresh command

65 SDRAM controller system XCV300-E -6 62.5MHz clock 125MHz clock SDRAM
controls controls system XCV300-E -6 data_addr_n addr 11 This is a block diagram of the appnote AD data 32 32

66 SDRAM controller Controller Things to point out:
- DLL used for 2X multiplication. The customer doesn’t have to worry about bringing in a fast clock. - The system interface (BLUE), contains a MUX for the row and column addresses - The Controller (RED), contains the Refresh Counter, which is used in issuing the refresh command

67 SDRAM controller IO timing
Read Cycle is the critical timing: SDRAM-8 clk-to-out = 6.0ns Virtex-6 setup = 1.7ns 125 MHz operation (8ns cycle), 300ps left for board routing on data lines Write Cycle: Virtex clk-to-out = 3.9ns SDRAM-8 setup = 2.0ns 125 MHz operation (8ns cycle), 2.1ns left for board routings Notes: This is analysis of the critical timing for the desing. - The READ cycle is the critical timing for this design. Looking at the numbers, we only have 300ps for board routing delays. That’s not much. Options are to select a faster FPGA or a faster SDRAM - The WRITE cycle allows for more board delays (2.1 ns)

68 Virtex-E DDR-SDRAM Interface
DDR SDRAM Overview Features Differences from SDRAM Virtex-E SDRAM controller Block diagram Timing Board layout guideline

69 DDR SDRAM Features: Next generation SDRAM
DDR data I/O (twice the bandwidth at the same clock frequency as SDRAM) Peak bandwidth: 1.6 GBytes/s 100MHz) 2.5V, SSTL2, 100/133MHz Advantages over RDRAM cost, package, open industry spec, compatible with existing spec Supported by major vendors Micron, Samsung, IBM, Fujitsu, Hitachi, Huyndai, Toshiba,... General DDR SDRAM Overview

70 DDR SDRAM Differences compared to standard SDRAM:
All IOs are SSTL2, 2.5V (reduce power and noise) Differential clock (CLK and CLKB). Positive edge clock is the crossing of CLK going high and CLB going low. Bidirectional data strobe (clock-to-data skew is eliminated) Double Data Rate data transfer Differences over standard SDRAM

71 Write Cycle SDRAM: DDR SDRAM: clk cmd addr data clk clkb cmd addr dqs
ACT NOP WRITE addr ROW COL data D1 D2 D3 D4 DDR SDRAM: clk clkb Notes: - The Command and Address buses are running at the ‘normal’ clock rate - The Data bus is running at double data rate (twice the ‘normal’ clock) cmd ACT NOP WRITE addr ROW COL dqs data D1 D2 D3 D4

72 Read Cycle SDRAM: DDR SDRAM: clk cmd addr data clk clkb cmd addr dqs
ACT NOP READ addr ROW COL data D1 D2 D3 D4 DDR SDRAM: clk clkb cmd ACT NOP READ addr ROW COL dqs data D1 D2 D3 D4

73 DDR SDRAM controller Application Note
Synthesizable Verilog Virtex-E, -6 speed grade: 100 MHz Clk 200 MHz Data rate 1.6 Giga-Bytes/S 64 bits wide Programmable CAS latency, burst length 2 DLLs, 474 slices (15% of XCV300-E) Uses “Logic Accessible Clock” technique Uses Clock to latch Read Data, instead of DQS

74 DDR SDRAM controller Virtex-E

75 DDR SDRAM IO timing Data Lines: Read Cycle
Read cycle is critical. Data is strobed by clk, instead of DQS ddr_clk -0.8ns minimum DDR clk-out -0.4ns minimum Virtex-E hold time Minimum trace delay on data = 0.8ns - 0.4ns - clock skew between ddr_clk & fpga_clk = 0.4ns- clock skew

76 DDR SDRAM IO timing Addr/Cntrl Lines
Address and Control lines are generated on the negative edge of the clock, to guarantee DDR hold time ddr_clk 2.4ns 1.2ns Virtex-E clk_out (max) DDR setup time 5ns Maximum trace delay on Addr/Cntrl = 5ns - 2.4ns - 1.2ns - clock skew = 1.4ns - clock skew

77 DDR SDRAM IO timing Summary
The I/O spec for DDR is very tight Carefully calculate data and address trace delays to guarantee setup and hold times The minimum trace delay on the data lines can be eliminated by delaying the ddr_clk Since DDR has negative tAC(min), delaying the ddr_clk helps meet Virtex-E’s hold time requirement

78 Board Layout Guideline
All high speed memory interfaces Virtex device and the memory chips must be placed close to each other Consider/Simulate board level signal integrity and timing, pay particular attention to clocks Use matched impedance traces DDR All bi-directional signals use IOBUF_SSTL2_II (data & data strobes) other output signals use OBUF_SSTL2_I DQ lines must be closely matched, and kept short to minimize cross talk DQS trace lengths should match DQ CLK and CLKB delays and loads should match (CLKB can also be routed back to an unused IOB near the feedback pin)

79 Memory Interface Application Notes
ZBT RAM: XAPP136 SDRAM: XAPP134 DDR SDRAM: XAPP200

80 CAM in Virtex-E

81 CAM Overview Content Addressable Memory Storage Array (like RAM)
Find a location of a particular stored value Compare input against data in memory If Match found, output the Address Maximum performance, if match in a single clock cycle Notes: Explain the basic stuff about CAM.

82 CAM Overview Simple RAM and CAM compared RAM 1024 x 8 CAM 1024 x 8
Add [9:0] Dout [7:0] CAM 1024 x 8 Add [9:0] Notes: By comparing the input against the data memory, a CAM determines if an input value matches one or more values stored n the array. If the comparison is done simultaneously, the CAM is said to be at maximum efficiency. A match, when it exists, is found in one clock cycle. Similar to a RAM, a CAM stores words in an array. The write mode is comparable, but the read mode is different. In a RAM, the word in a specific location is read by the address. In a CAM, the data on the input is looking for a match. When a match is found, the output is the address in the array. Din [7:0] Match

83 CAM Applications Telecommunications Networking Ethernet ATM Protocol

84 CAM Overview CAM features: Word Size (width) Number of Words (depth)
Match or Compare Time (read) Significance of Write Speed Clock Frequency Masks Decoded and/or Encoded Address (outputs) Notes: Basic CAM features. The time it takes to write to the CAM, is not as important as the time it takes to read from the CAM

85 CAMs in Virtex-E Flexible CAM designs in Virtex and Virtex-E
CAM implemented in a LUT CAM implemented in a Block SelectRAM A Content Addressable Memory is a storage array designed to quickly find the location of a To determine the correct CAM implementation for a particular application, the following features should be investigated. Virtex devices allow different approaches to designing an optimal CAM. There is not a specific CAM type to fit all CAM applications, therefore, different approaches are necessary to achieve optimal results. A small fast read and write CAM can be implemented in Block SelectRAM+. Large CAMs can be implemented in slices configured either as 16-bit shift registers or distributed SelectRAM+ 16x1.

86 Designing CAM in Virtex slices
XAPP203: “Designing Flexible, Fast CAMs with Virtex Family FPGAs”: VHDL and Verilog Reference Designs available Features 4 bits per LUT 16-word x 4-bit organization Match in one clock cycle 16 Write clock cycles Decoded address output Generic word width from 4 bits up to any multiple by 4 Generic number of 16 words CAM blocks Cascadable Address Encoder in logic or tri-state buffers (TBUF) Notes: This is the first way of implementing CAMs in Virtex. This method uses SRL16 as a basic module. - Read is in 1 clock cycle - Write is in 16 clock cycles. - Encoded address is available

87 CAM in a LUT Match Operation
Reconfigurable 8-bit Word Comparator 8 LUT SRL16 D Q A[0:3] “1” Wide AND FF CLK MATCH_SIGNAL 1 slice 4 DATA_IN Notes: This is the schematic allowing us to see the Match Operation, for an 8-bit wide CAM. Since this fits in one slice, it is clear that the Match operation can be done in one clock cycle.

88 Match Waveforms for CAM in a LUT
16WORDS ENCODE MATCH DATA_IN MATCH_ENABLE R_MATCH_ADDR R_MATCH_OK “…1001” “xxxx xxxx xxxx xxxx” “ ” “xxxx” “0010” CLK Match_cycle Encode_cycle Notes: We can also see from the waveforms that the match is found in one clock cycle. The Encoded address is available one clock cycle after the match.

89 CAM in a LUT Write Operation
Counter 4-bit Compare Reconfigurable 8-bit Word Comparator 4 8 DATA_IN LUT SRL16 D Q A[0:3] 1 slice MSB LSB Notes: The Write operation takes 16 Clock Cycles. For most applications the Write cycle is not as important as the Match (Read) cycle.

90 Cascading CAMs in LUTs CAM match path (1 CLK) & encode (1 CLK)
DATA_IN 8 Array of N x 16_WORDS MATCH_ADDR Encode MSB FF D Q CAM_16WORDS 16 Encode 4 LSB CAM_16WORDS Encode 4 LSB 16 FFs CAM_16WORDS Encode 4 LSB MATCH_OK Notes: Cascading the CAMs does not add to the time it takes to perform the operations, since the CAMs operate in parallel, as shown on the picture. The match operation is done in 1 clock cycle, and the encode operation is done in 2 clock cycles CAM_16WORDS FF D Q Encode 4 LSB CLK MATCH_ENABLE

91 CAM in Block SelectRAM XAPP204: “Using Block SelectRAM+ for High-Performance Read/Write CAMs”: VHDL and Verilog Reference Designs available Features 128 bits per Block SelectRAM+ 16-word x 8-bit organization Match in one clock cycle Write in one clock cycle (and Erase in one clock cycle) Decoded address output Fully synchronous match and write ports (Independent) Cascadable Address Encoder in logic or tri-state buffers (TBUF) Notes: This is another way of implementing CAM in Virtex. This way uses the Block RAM. Match and Write in one clock cycle.

92 CAM in a Block SelectRAM+
CAM 16x8 Macro in 1 Block SelectRAM+ MATCH[15:0] DATA_WRITE[7:0] ADDR[3:0] ERASE_WRITE CLK_WRITE DATA_MATCH[7:0] WRITE_ENABLE MATCH_ENABLE MATCH_RST CLK_MATCH RAMB4_S1_S16 DOB[15:0] DOA N.C. DIA[0] ADDRA[11:0] WEA ENA RSTA CLKA DIB[15:0] ADDRB[7:0] WEB ENB RSTB CLKB “0000….0000” “0” 12 8 4 PORT A PORT B Notes: There is a macro available to use in the software. It automatically infers a 16X8 CAM

93 Cascading Block SelectRAM+ CAMs for bigger depth
CAM 64-word x 8-bit in Read Mode CAM (16x8) 16 32 48 64 MATCH[63:0] DATA_MATCH[7:0] CLK_MATCH 8 [15:0] [31:16] [47:32] [63:48] Notes: The CAMs can be cascaded for bigger depth, without compromising the speed. This is possible since the CAMs are connected in parallel.

94 Cascading Block SelectRAM+ CAMs for higher width
CAM 16-word x 16-bit in Read Mode CAM (16x8) DATA_MATCH[15:0] CLK_MATCH [15:0] [15:8] [7:0] MATCH[15:0] [0] [1] [15] Notes: The CAMs can also be cascaded to increase the width of the CAM. In this case, the match bus (address bus) can be generated by feeding the outputs of the CAMs through AND gates.

95 CAM in Block SelectRAM+ The final picture
CAM16x8 Macro Match flag and encoded outputs DATA[7:0] Write port A (4096 x 1) Read port B (256 x 16) MATCH[15:0] CLKB CLK_MATCH ADDRB[7:0] DOB[15:0] Decoded Address 16 FF D Q ENCODE MATCH_ADDR[3:0] 4 MATCH_SIGNAL Notes: This is the CAM design as a whole, with the match flag and encoded address.

96 CAM in Virtex FPGAs Basic decoder/comparator block designed using:
Virtex slices configured as 16-bit shift registers (8 bits per slice) Virtex dual port block SelectRAM+ (128 bits per block) Use an array of basic blocks to implement a CAM Width (bits) XCV2000E Notes: This diagram shows the different CAM sizes, designed in XCV2000E. Size = 20,480 bits Size = 122,880 bits CAM depth in words

97 XILINX CAMs comparison

98 SRL16

99 SelectShift D Q CE LUT IN CLK ADDR[3:0] OUT Slice CLB Dynamically addressable Shift Registers, implemented in one LUT 1 2 15

100 SelectShift Features Serial In, Serial Out
Does not require an address counter Programmable cycle delay from 1 to 16 Addr[3:0] specifies the desired delay Cascade for cycle delays greater than 16 CLB Flip-Flops can be used to add depth

101 Software Support Primitives available in software
Positive or negative clock edge triggered Clock Enable optional Available for VHDL or Verilog instantiations D CLK A3 A2 A1 A0 Q SRL16 16-bit Shift Register Look-Up-Table D CLK A3 A2 A1 A0 Q SRL16E CE 16-bit Shift Register Look-Up-Table with Clock Enable

102 SRL16 Applications Shift Registers Delayed Signal Generation
Linear Feedback Shift Registers (LFSRs) CRC circuits

103 Virtex- E Configuration

104 Agenda Review of configuration Modes Startup Sequence
Serial, Parallel, JTAG Startup Sequence XC1800 PROM interfacing Daisy Chaining Tips in debugging configuration issues JTAG Configuration

105 Operation Flow Configuration Data stored in a PROM or downloaded through a cable Configuration time dependents device size type of configuration clock speed POWER UP Device Operational CONFIGURATION Serial Mode Parallel Mode JTAG

106 Configuration Modes Serial Modes Parallel Mode JTAG Master Slave
SelectMAP JTAG

107 Serial Mode Configuration
Master Serial Configuration Mode PROM CLK DATA /CE /RESET/OE Virtex-E CCLK DIN DONE /INIT Serial Configuration Master mode: the Virtex-E device is initiating the configuration Slave mode: the Virtex-E device is waiting for some other device to start the configuration

108 Serial Mode Configuration
Data is loaded serially- one bit per CCLK A Virtex-E device in Master Serial Mode produces it’s own CCLK CCLK rate is controllable in software Mode used with a PROM In a Slave Serial Mode, Virtex-E device needs a CCLK provided by another device All download cables do this

109 Parallel Mode Configuration SelectMAP
Microprocessor Virtex-E CCLK D0-D7 DONE /CS /WRITE PROG One byte loaded per CCLK Designed to be driven by other logic device Another FPGA or CPLD Processor Microcontroller MultiLinx Cable SelectMAP is a parallel configuration mode 1. Fastest 2. Best to use with microprocessors

110 Important Signals in SelectMAP
Data(D0-D7)- bi-directional data bus D0 is the MSB /WRITE- direction of data on the bus Low for configuration (Write) High for readback /CS- enable for the data bus a High will ignore CCLK transitions BUSY- output that indicates when data can be received Not needed for CCLK < 50 MHz All pins shown in pinout tables in the datasheet 1. When /WRITE is high, the device is in READBACK mode 2. BUSY is used for handshaking when the CCLK is fast (> than 50 MHz)

111 SelectMAP- Things to Know
Initialization needed after /INIT goes high 3 CCLKs needed If /CS and /WRITE are asserted early , no data will be transferred on the first CCLK To strobe data, use /CS, not /WRITE If a CCLK rising edge occurs when /CS is asserted and /WRITE is de-asserted, an ABORT will occur Need to reload Sync Word and redo last packet Purpose- Special things about SelectMAP Notes: - The initialization is needed for Serial Modes too - More information on ABORT in the datasheet

112 Virtex-E Bitstream Format
10 internal configuration registers Bitstream is actually a set sequence of writes into those registers Configuration data still broken into frames All data is encapsulated into packets- Type I and Type II When migrating from Virtex to Virtex-E a new bitstream is needed Purpose- explain the main idea of virtex bitstreams Notes: Configuration Logic acts kind of like a processor, with registers, and writes to those registers A Frame represents one vertical line of configuration bits in the device- that includes IOB bits, CLB bits, and routing bits.

113 Configuration Registers
Each register has a 5-bit address Detailed information in XAPP 138

114 Configuration Startup Sequence
Four signals to control GWE (Global Write Enable) GSR (Global Set/Reset) GTS (Global 3-State) DONE (External Done Pin) Six phases to select assertion/de-assertion (1-6) Sequencer will wait in the DONE phase until DONE goes high Can create “Sync-To-Done” behavior by setting GTS, GSR, and GWE to same cycle as DONE Purpose- explain what the startup sequence does and what aspects of it can be changed Notes: Between the configuration data being loaded and the device being functional, it needs to go through the startup sequence This sequence brings the device through a state machine where certain signals are activated and deactivated The user can choose where these things happen in Bitgen

115 Startup Sequence Phase 0 1 2 3 4 5 6 7 StartupClk DONE Default
Phase in Bold GTS GSR GWE

116 Virtex-E and XC1800 PROM’s Can program via serial or SelectMAP mode
serial vs. parallel controlled in software Purpose- The new 1800 PROM's are particularly useful for Virtex device download Can be used for serial or SelectMAP download- so it can be faster than other PROM's. the PROM's are reprogrammable via JTAG

117 Daisy Chaining Available only is Serial or JTAG Mode
Master Slave Slave Virtex-E #1 Virtex-E #2 Virtex/4kX #3 DIN DOUT DIN DOUT DIN PROM Available only is Serial or JTAG Mode Concatenation of bitstreams does not work Use the software to generate the necessary bitstreams (PROMGen)

118 Debugging Tips and Info
What causes /INIT to go low? CRC check fails Internal error, e.g. data loaded too fast When will an error stay undetected? A bit is missed or added- this will misalign the instructions, and the CRC check won’t happen Mode pin considerations Internal pullups are guaranteed Make sure pulldown is strong enough (4.7k) Virtex does not have indicator pins (HDC, /LDC) like the 4k families, so there is less information on what went wrong Some signals need to be looked at if there is a configuration problem Notes: - DONE and INIT are the most useful - If INIT goes low there was a CRC or internal error

119 JTAG Configuration

120 What is JTAG? JTAG - Joint Test Action Group
Developed as standard testing interface Boundary Scan, IEEE STD Four Dedicated Pins Required: TDI, TDO, TMS, and TCK TRST is an optional 5th pin that Xilinx does not use Notes: Originally developed for testing We use the JTAG standard for programming, not testing

121 JTAG Standard JTAG Standard - 16 State, State Machine
TAP (Test Access Port) IR (Instruction Register) DR (Data Register) Bypass Register Notes The JTAG standard defines that it has to be implemented in dedicated hardware, which must be able to perform the functions listed

122 JTAG Tap Controller Test-Logic-Reset Exit2-DR Capture-DR Shift-DR
Test-Logic-Reset Exit2-DR Capture-DR Shift-DR Exit1-DR Select-IR-Scan Capture-IR Shift-IR Exit1-IR Pause-DR Run-Test/Idle Select-DR-Scan Update-IR Update-DR 1 Reset state is entered by the 9500, and it resets the tap controller and loads the IR and DR registers with benign values. Idle is the wait state of the controller. (This state triggers the execution of program and erase instructions) Select DR is transition decision state. Capture loads the active register with a pre-determined value. Shift state is used to place the value into either the DR or the IR. Update is the last state of each DR/IR transfer in this state, the value is shifted into either the DR or the IR is actually loaded.

123 JTAG TAP Controller: Architecture
Notes: This is the JTAG Tap controller architecture There are 3 registers for every pin: for a “0”, “1”, and tri-state. The Bypass register is used to bypass the device, and send a JTAG bitstream to the next device

124 BSDL Files Boundary Scan Description Language
BSDL Files define the hardware Description of the die, with pins and scan chain order Information about the size of the various chip specific registers (e.g. instruction register length) Unconfigured BSDL files are provided Assumes all I/Os are bidirectional Notes: Unconfigured files are OK, since we are using it for configuration of the device, and we don’t know if the I/Os are going to be inputs or outputs If the customer wants to use JTAG for testing (which is not the topic here), than he/she has to modify the BSDL files to make them configured. Xilinx provides unconfigured files only.

125 BSDL Availability Files on the web are continuously updated
Current software does not always have most recent BSDL file -> Software

126 JTAG Programmer Software Support for Virtex-E
JTAG Software Support in M2.1i SP3 Non invasive: Idcode, Bypass, Usercode SVF file generation Stay current with the download tools Service packs Web Pack (pc only) Foundation or Alliance software updates at: JTAG Programmer at:

127 Cables Provided by Xilinx Multilinx Parallel Cable III XChecker
Supported in 2.1i sp2 JTAG Programmer USB or Serial ports Win 98 only Parallel Cable III XChecker

128 Cables: JTAG Connections
This shows a chain of devices and how the tdi -> tdo process. How many devices? What virtexE in a 5 volt chain = bad * If there is a TRST trace on the board, it should be tied high

129 JTAG Debugging Tips Debug Chain Software Tool (Logic Probe)
/TRST pin should be tied high on 3rd party chips Noise or bad parallel port ISP Checklist app note XAPP104 Know all devices in chain and the order Virtex-E does not tolerate 5V signals directly

130 Good References Virtex-E Datasheet- basic information on configuration modes XAPP138- Configuration modes, packets and readback XAPP151- Detailed bitwise explanation of configuration registers, partial reconfiguration hints and advanced concepts in readback XAPP139 - Detailed information on JTAG configuration and readback for VIRTEX devices XAPP153 - Status and Control register information for partial reconfiguration information


Download ppt "® E is the Edge."

Similar presentations


Ads by Google