FPGA Design Techniques I

FPGA Design Techniques I
Ruth Markenson: Version changed Original slides 11, 14, 23, removed FPGA Design Techniques I

After completing this module, you will be able to:
Objectives After completing this module, you will be able to: Use hierarchy effectively Increase circuit reliability and performance by applying synchronous design techniques

Hierarchical Design Using Hierarchy leads to greater design readability, re-use,and debug Top Level of Design Infer or instantiate I/O here State Machines Data Paths One-hot Pipelining Binary Muxing/De-muxing Enumerated Arithmetic Counters Building Blocks Adders/Subtractors Standard widths Bit Shifters Pipeline RAMs Accumulators Specific Functions Logiblox RAM Other IP/Cores A hierarchical design is composed of several large blocks. Each of these blocks may be composed of smaller blocks, and so on. Designs that do not have this type of structure are called flat designs. We will use the term block in this section to mean any of the following: A macro or symbol on a schematic An entity in VHDL A module in Verilog One way to divide your design into hierarchical blocks is by logic type, as this drawing illustrates. We will see in a later slide why this kind of division is recommended for HDL designs. Coregen FIFOs Parametizable functions FIR Filters RAM Technology Specific Functions

Benefits of Using Hierarchy
Use the best design entry method for each type of logic Design readability Easier to understand design functionality and data flow Easier to debug Easy to reuse parts of a design Synthesis tool benefits Covered in a later section Here is a list of some benefits of hierarchical design. The next several slides will go into more detail on each benefit. The potential synthesis benefits of hierarchical design will be covered in the “Introduction to Efficient Synthesis” module. One major benefit of hierarchical design, not listed here, is team-based design.

Design Entry Methods Use HDL for: Use schematics for:
Ruth Markenson: 3rd bullet changed 3rd bullet, sub-bullet added Design Entry Methods Use HDL for: State machines Control logic Bused functions Use schematics for: Top-level design Manually optimized logic “Mixed mode” designs can utilize the best of both worlds However, they are less portable than an all-HDL design HDLs are becoming more and more popular, and synthesis vendors are constantly improving their tools to produce better results. HDL entry is great for state machines and control logic because you do not have to manually optimize all of the decoding logic. Wide-bused functions are as easy to build as single-bit functions. There are still times when it is easier, or better, to use schematics. You may find it easier to understand the overall structure of a design from a schematic top-level. You should also use schematics when you need highly-optimized logic that your synthesis tool cannot produce. Designs that use both schematic and HDL blocks are called mixed-mode designs. A mixed-mode design might be 90 percent HDL blocks, with just a few schematic blocks for critical functions. Many mixed-mode designs are entirely HDL except for the top-level.

Design Readability Tips
Ruth Markenson: 4th bullet changed Notes, 1st paragraph, bold text changed Design Readability Tips Choose hierarchical blocks that have: Minimum of routing between blocks Logical data flow between blocks Choose descriptive labels for blocks and signals Keep clock domains separated Makes the interaction between clocks very clear Keep the length of each source file manageable Easier to read, synthesize, and debug File length: This recommendation is strictly for readability of your code. Your synthesis vendor may have additional recommendations on block size. Descriptive Labels: Good naming conventions will help you locate trouble spots in your design during simulation and implementation. A name like “U1” for an instantiated component may not be as useful as a name like “FIFO_CTRL.”

Design Reuse Tips Build a set of blocks that are available to all designers Register banks FIFOs Other standard functions Custom functions commonly used in your applications Name blocks by function and target Xilinx family Easy to locate the block you want Example: REG_4X8_SP (bank of four eight-bit registers targeting Spartan) Store in a separate directory from the Xilinx tools Prevents accidental deletion when updating tools

Why Synchronous Design?
Ruth Markenson: Originally slide 12 Why Synchronous Design? Synchronous circuits are more reliable Events are triggered by clock edges that occur at well-defined intervals Outputs from one logic stage have a full clock cycle to propagate to the next stage Skew between data arrival times is tolerated within the same clock period Asynchronous circuits are less reliable Delay may need to be a specific amount (for example, 12 ns) Multiple delays may need to hold a specific relationship (For example, DATA arrives 5-ns before SELECT) Xilinx strongly recommends synchronous design for reliability reasons. Asynchronous circuits may work correctly in one device but fail in another “identical” device due to normal variations of delays inside the chip.

Asynchronous Design: Case Studies
Ruth Markenson: Originally slide 13 2nd bullet and it’s sub-bullet deleted Asynchronous Design: Case Studies The design I did two years ago no longer works What did Xilinx change in their FPGAs? SRAM process improvements and geometry shrinks increase speed Normal variations between wafer lots My design passes a timing simulation test but fails in circuit. Is the timing simulation accurate? YES Timing simulation uses worst-case delays Actual board-level conditions are usually better

Clock Skew This shift register will not work because of clock skew!
Ruth Markenson: Originally slide 15 Clock Skew D Q_B INPUT D Q_A 3.1 3.0 3.3 12.5 A B C Q_C CLOCK This shift register will not work because of clock skew! Clock skewed version A & C Clock Q_A Q_B Q_C B Clock 2 cycles Expected operation Clock Q_A Q_B Q_C 3 cycles In synchronous circuits events are controlled by the clock signal. If the clock arrives at different flip-flops at different times, your circuit may not function correctly. This is an example of a clock skew problem.

Use Global Buffers to Reduce Clock Skew
Ruth Markenson: Originally slide 16 2nd bullet, all sub-bullets changed Use Global Buffers to Reduce Clock Skew Global buffers are connected to dedicated routing This routing network is balanced to minimize skew All Xilinx FPGAs have global buffers Virtex-II devices have sixteen BUFGMUXes Virtex and Spartan-II have four BUFGs You can always use a BUFG symbol, and the software will choose an appropriate buffer type All major synthesis tools can infer global buffers onto clock signals that come from off-chip

Traditional Clock Divider
Ruth Markenson: Originally slide 17 Traditional Clock Divider Introduces clock skew between CLK1 and CLK2 Uses an extra BUFG to reduce skew on CLK2 D Q CLK2 D Q BUFG CLK1 This is a typical circuit that divides CLK1 by two. The potential problem with this circuit is that there is skew between CLK1 and CLK2. If this skew does not cause a problem in your design, it is okay to use this circuit. BUFG

Recommended Clock Divider
Ruth Markenson: Originally slide 18 Recommended Clock Divider No clock skew between flip-flops CLK2_CE D Q D Q CE CLK1 This implementation is very similar, but it eliminates the skew because all flip-flops are clocked with CLK1. Note: If you are using a Virtex or Spartan-II device, you can use the DLL to generate a divided clock with no skew. BUFG

Ruth Markenson: Originally slide 19 Avoid Clock Glitches Because flip-flops in today’s FPGAs are very fast, they can respond to very narrow clock pulses Never source a clock signal from combinatorial logic Also known as “gating the clock” MSB transition can become due to faster MSB MSB Shorter routing Again, because it is such an important signal, you must avoid glitches on the clock. In the past, flip-flops were slower and would ignore brief glitches on the clock. Today’s flip-flops are so fast that they can react to clock glitches that are less than 1-ns wide. FF LSB Glitch may occur here Binary Counter

Avoid Clock Glitches INPUT CLOCK
Ruth Markenson: Originally slide 20 Avoid Clock Glitches This circuit creates the same function, but without glitches on the clock D INPUT D Q3 Q2 CE Q Q1 Q0 FF Notice two things 1. The output of the AND gate is connected to the clock enable of the flip-flop instead of to the clock pin. 2. The AND gate is decoding the 1110 state of the counter. This is so that the flip-flop will be enabled during the same clock edge that the counter reaches 1111. CLOCK Counter

Avoid Set/Reset Glitches
Ruth Markenson: Originally slide 21 Avoid Set/Reset Glitches Glitches on asynchronous Set/Reset inputs can lead to incorrect circuit behavior INPUT Q Asynchronous Reset D FF Binary Counter CLR Q[x] RESET Q[0] CLOCK Besides the clock, there may be other signals in your design that need to be glitch free. One example is asynchronous sets or resets. In this circuit, a glitch could occur on RESET during the 01 -> 10 transition of the counter output (01 -> 11 -> 10).

Avoid Set/Reset Glitches
Ruth Markenson: Originally slide 22 Avoid Set/Reset Glitches Convert to synchronous set/reset when possible INPUT Q Synchronous Reset D FF Counter R RESET Q[x] Q[0] CLOCK Convert to synchronous reset by placing a different library component on your schematic (FDR and FDS have synchronous control; FDC and FDP have asynchronous control), or by modifying your HDL code. Notice that the lower AND gate is decoding the 10 state of the counter instead of the 11 state, so the flip-flop resets when the counter transitions to 11.

Review Questions List three benefits of hierarchical design
Ruth Markenson: Originally slide 33 Review Questions List three benefits of hierarchical design Why should you use global buffers for your clock signals?

Answers List three benefits of hierarchical design
Ruth Markenson: Originally slide 24 Answers List three benefits of hierarchical design Allows you to use different design entry methods Design readability Design reuse Why should you use global buffers for your clock signals? To reduce clock skew

Summary Proper use of hierarchy aids design readability and debug
Ruth Markenson: Originally slide 25 3rd bullet, last 2 sub-bullets deleted (change to last 3 if FSM removed) Summary Proper use of hierarchy aids design readability and debug Synchronous designs are more reliable than asynchronous designs FPGA design tips Global clock buffers and DLLs eliminate skew Avoid glitches on clocks and asynchronous set/resets

Where Can I Learn More? Application notes on http://support.xilinx.com
Ruth Markenson: Originally slide 36 Where Can I Learn More? Application notes on Under the Design tab, click App Notes Software documentation Development System Reference Guide, Chapter 2 “Design Flow,” FPGA Design Techniques section Libraries Guide Documentation for your synthesis tool

FPGA Design Techniques II
Version 4

Objectives After completing this module, you will be able to:
Increase design performance by duplicating flip-flops Increase design performance by adding pipeline stages Increase board performance by using I/O flip-flops Build reliable synchronization circuits

Duplicating Flip-Flops
High-fanout nets can be slow and hard to route Duplicating flip-flops can fix both problems Reduced fanout shortens net delays Each flip-flop can fanout to a different physical region of the chip to help with routing Design tradeoffs Gain routability and performance Design area increases fn1 D Q fn1 D Q fn1 D Q

Which Flip-Flops Should I Duplicate?
Duplicate flip-flops that source: Address and control lines to large memory arrays Clock enables or output enables Synchronous reset signals Do not duplicate flip-flops that are sourced by asynchronous signals Synchronize the signal first Feed synchronized signal to multiple flip-flops “Synchronize the signal first:” We will talk about synchronization circuits later in this module.

Tips on Duplicating Flip-Flops
Name duplicated flip-flops _a, _b NOT _1, _2 Numbered flip-flops are mapped by default into the same CLB You want duplicated flip-flops to be separated Especially if the loads are spread across the chip Synthesis tips Most synthesis tools have fanout-control features However, they do not always pick the best implementation Xilinx recommends explicitly creating duplicate flip-flops in HDL code Many synthesis tools will optimize out duplicated flip-flops Tell your synthesis tool to keep redundant logic Synthesis tools: for example, FPGA Express has an option called “Merge Duplicate Registers,” which needs to be de-selected to prevent optimization of your duplicated flip-flops.

Duplicating Flip-Flops Example
The source flip-flop drives two register banks that are constrained to different regions of the chip The source flip-flop and pad are not constrained PERIOD = 5 ns timing constraint Implemented with default options Longest path = ns Fails to meet timing constraint In this simple design, the source flip-flop is “trapped” between the two sets of loads. Moving the source flip-flop closer to one register moves it farther away from the other register. The overall result is that timing cannot be met.

Duplicating Flip-Flops Example
The source flip-flop has been duplicated Each flip-flop drives a region of the chip Each flip-flop can be placed closer to the register that it is driving Shorter routing delays Longest path = ns Meets timing constraint By duplicating the source flip-flop the tools are able to move each flip-flop closer to its set of loads. There is a trade-off: The paths from the input pad to the duplicated flip-flops are increased. This design does not contain an OFFSET IN constraint. If you have an OFFSET IN requirement, you must consider how much slack you have on the OFFSET before deciding to duplicate the flip-flop. You may also consider duplicating the input pad that is connected to the flip-flops. This will allow the implementation tools to keep the input setup time short, while improving the internal clock frequency.

Pipelining Concept fMAX = nMHz fMAX = 2nMHz two logic levels one level
D Q D Q one level one level fMAX = 2nMHz D Q D Q D Q Adding a pipeline stage will not exactly double fMAX. The flip-flop that is added to the circuit has an input setup time and a clock-to-Q time that makes the pipelined circuit run at less than double the original frequency. We will see a more detailed example of increasing performance by pipelining later in this section.

Pipelining Inserting flip-flops into a datapath is called pipelining
Pipelining increases circuit performance by reducing the number of logic levels between flip-flops Fewer logic levels = shorter delays between flip-flops = faster clock speeds Xilinx FPGA architectures support pipelining Many flip-flops are available Slice structure is a Look-Up Table (Logic Level) followed by a flip-flop

Pipelining Considerations
Are there enough flip-flops available? Refer to the Map Report You generally will not run out of FFS Are there multiple logic levels between flip-flops? If there is only one logic level between flip-flops, pipelining will not improve performance Exception: Long carry-logic chains can benefit from pipelining Refer to the Post-Map Static Timing Report or Post-Place & Route Static Timing Report Can the system tolerate latency? The Design Summary section of the Map Report contains resource utilization information. In most cases, you will have enough flip-flops available to add pipeline stages. Timing reports show which paths in your design are the longest and how many logic levels are in each path. The timing report counts delays, such as clock-to-Q delays, as a logic level. Therefore, look at the detailed path analysis section of the report and count the number of look-up table delays (tILO) to determine the true number of logic levels in the path. Latency is explained in more detail on the next page.

Latency in Pipelines Each pipeline stage adds one clock cycle of delay before the first output will be available Also called “filling the pipeline” After the pipeline is filled, a new output is available every clock cycle This is an example of a pipelined circuit. Follow the data as it goes through the multiplication and then the addition. Latency in pipelines can be visualized as a factory assembly line. Raw materials enter the assembly line, then go through several stations. At each station one production step is performed. When you start the assembly line, you have to wait a while before the first finished product is produced. After that waiting period, you get a constant stream of products coming off the assembly line.

Pipelining Example Original circuit
Two logic levels between SOURCE_FFS and DEST_FF fMAX = ~137 MHz LUT D Q D Q LUT LUT SOURCE_FFS This path has two logic levels between SOURCE_FFS and DEST_FF. The first logic level is the column of three LUTs in parallel. The second logic level is the single LUT on the right. The delays in this path are: SOURCE_FFS clock-to-Q net delay LUT delay DEST_FFS setup time Estimating routing delays to be short (~2 ns), this circuit could run at about 117 MHz in a Virtex device (slowest speed grade). DEST_FF LUT

Pipelining Example Pipelined circuit
One logic level between each set of flip-flops fMAX = ~228 MHz LUT D Q D Q LUT D Q LUT D Q SOURCE_FFS After adding a pipeline stage, the circuit has been split into two paths. The first path is from SOURCE_FFS through one logic level to PIPE_FFS. The second path is from PIPE_FFS through one logic level to DEST_FF. The delays in either path are: Starting FF clock-to-Q net delay LUT delay Ending FF setup time Estimating routing delays to be approximately two ns, as in the original circuit, this circuit could run at about 178 MHz (52 percent increase). In this example, the performance increase was much less than double because the net delays were estimated to be short. If we estimated routing delays to be four ns, the performance would jump from 80 MHz to 133 MHz (66 percent increase). DEST_FF LUT D Q PIPE_FFS

Pipelining Questions Pipelined Circuit Original Circuit
Given the original circuit, what is wrong with the pipelined circuit? How can the problem be corrected? Pipelined Circuit Original Circuit This simple circuit is a multiplexed adder where the SELECT signal determines whether the output is (A + B) or (C + D). The designer decided to add a pipeline stage to the circuit to improve performance, but something is not quite right. What is the correct way to pipeline this circuit?

Pipelining Answers What is wrong with the pipelined circuit?
Latency mismatch Older data is mixed with newer data Circuit output is incorrect How can the problem be corrected? Add a flip-flop on SELECT All data inputs now experience the same amount of latency When the designer simulated the circuit on the previous page, the designer saw that the MUX was not selecting the correct sum. The designer forgot to add a pipeline stage to the SELECT input. This concept of matching latency in pipelines can easily be overlooked, especially when you have multiple pipelined circuits in your design. If the outputs of two different pipelines need to be combined, you must make sure that the pipelines have the same amount of latency.

I/O Flip-Flop Overview
Each Virtex-II I/O Block contains six flip-flops For single data rate: IFD on the input OFD on the output ENBFF on the three-state enable (not shown) OFD Data from FPGA D Q Output clock enable CE Clock I/O pad In Virtex devices all three flip-flops share the same clock signal but have separate clock enables. The XC4000 and Spartan families have only the IFD and OFD flip-flops. The flip-flops share the same clock enable but have separate clocks. IFD Data to FPGA Q D Input clock enable CE Clock

Benefits of Using I/O Flip-Flops
Guaranteed setup, hold, and clock-to-out times When the clock signal comes from a BUFG Programmable delay on input flip-flop Programmable slew rate on output flip-flop OFD Data from FPGA D Q Output clock enable CE Clock I/O pad IFD Data to FPGA Q D Input clock enable CE Clock

Programmable Input Delay
Delay is added to the D input of the IFD or ILD Guarantees 0-ns hold time when using a global buffer Trades an increase in setup time for a decrease in hold time Delay is controlled by attributes NONE is the default, if the clk input of the IFD/ILD is driven by a DCM clk output IFD is the default, if the clk input of the IFD/ILD is not driven by a DCM clk output IFD D CE Q Delay NONE NONE = No Delay IFD = Use Delay input Flip-Flop or Latch is used in the IOB DLL Clock outputs include: clk0, clk90, clk180, clk270, clk2x, clk2x180, and clkdv This IOBDELAY attributes is used for Virtex/II/Pro and Spartan-II/IIE. VirtexE uses the NODELAY attribute which is similar for applications using the IOB input register. BUFG

Programmable Input Delay Example
XC2V250-5 (taken from timing analysis report with v1.90 Virtex-II speed files) Clk = DCM output (default is NONE) IFD Delay tSU = 4.0ns tH = 0.0ns NONE tSU = 1.3ns tH = 0.0ns Clk = NOT a DCM output (default is IFD Delay) IFD Delay tSU = 1.2ns tH = 0.0ns NONE tSU = -1.5ns tH = 1.7ns BUFG IFD D CE Q Delay NONE In Virtex/II/Pro and Spartan-II/IIE a delay can also be specified for an IOB input that does NOT use the IFD/ILD. This delay is applied to the IBUF input as shown below. VirtexE does not allow this delay to be applied to this IBUF path. For the IOB circuit below: IOBDELAY = NONE or IBUF. Sample values for Pad to I output of the IOB: NONE delay = 0.8 ns IBUF delay = 3.5 ns Delay NONE IBUF

Programmable Output Slew Rate
Slew rate can be SLOW or FAST By default the slew rate is SLOW Virtex/E/II/Pro: Only the LVTTL I/O standard supports programmable slew rate Slew rate is controlled by attributes Included in the input design, or specified in the Constraints Editor The FAST slew rate saves you between 1.5 and 3.0 ns on output delay times, depending on the target device family and speed grade. See the “IOB Output Switching Characteristics” table in the device data sheet for exact numbers. Using the Fast slew rate option will generally have little effect for purely resistive loads, but for capacitive loads it will reduce the delays. The most accurate delays can be made by using IBIS models to simulate the propagation delays.

How to Access I/O Flip-Flops
During synthesis Based on the input/output timing requirements, the synthesis tool will move a flip-flop into the IOB The designer may specify which registers should be placed in the IOB The synthesis tool will specify an IOB=True attribute on a given register in the netlist for both cases above Most synthesis tools can infer I/O flip-flop primitives for the XC4000 and Spartan families but not for Virtex and derivatives. Refer to the documentation for your synthesis tool or the Xilinx Answers Database on for more information. You can also instantiation IOB register macros for Virtex-II (IFD, OFD, etc.).

How to Access I/O Flip-Flops
During implementation The mapper can move eligible flip-flops into the I/O blocks In the Implementation Options dialog, select the Optimize and Map tab Pack I/O Registers/Latches into IOBs for (-pr): [Off | Inputs Only | Outputs Only | Inputs and Outputs] -Timing option will map registers in IOB for timing critical paths Using the Xilinx Constraint Editor’s Misc. tab Specify individual registers that should be placed in the IOB Most synthesis tools can infer I/O flip-flop primitives for the XC4000 and Spartan families but not for Virtex and derivatives. Refer to the documentation for your synthesis tool or the Xilinx Answers Database on for more information. You can also instantiation IOB register macros for Virtex-II (IFD, OFD, etc.).

IOB Flip-Flop Requirements
There are some restrictions to accessing the IO flip-flops: Input and output registers cannot have combinatorial logic between pad and register For three-state IO, all registers must share the same reset Can have separate clocks and clock enable For three-state flip-flop, output must be active-low The input to the three-state flip-flop can be inverted, but there is no inverter between the flip-flop and the three-state buffer

Synchronization Circuits
What is a synchronization circuit? Captures an asynchronous input signal and outputs it onto a clock edge Why do I need synchronization circuits? Prevent setup and hold time violations End result is a more reliable design When do I need synchronization circuits? Signals cross between unrelated clock domains Chip inputs that are fully asynchronous (not controlled by any clock) For clock domains that have a clearly-defined and constant phase relationship, proper timing constraints ensure that there are no setup or hold time violations. Refer to the Timing Constraints II module for more information on constraining paths between clock domains.

Setup and Hold Time Violations
Violations occur when the flip-flop input changes too close to a clock edge Three possible results: Flip-flop clocks in old data value Flip-flop clocks in new data value Flip-flop output becomes metastable

Metastability What is it? When does it occur?
Flip-flop output enters a transitory state Neither a valid zero nor a valid one logic state The value may be interpreted as a zero by some loads and as a one by others The output remains in this state for an unpredictable length of time before settling to a valid zero or one Only statistical analysis of metastability is possible When does it occur? Result of a transition on the input during the tiny timing window where the flip-flop accepts new data Even smaller than the setup-hold-time window When a signal is at a metastable value, it may be interpreted as a logic zero by some parts of the circuit, and as a logic one by other parts. This inconsistency will never be shown during simulation and can be very hard to track down during board-level testing. When the flip-flop leaves the metastable state, it can go to a logic zero or one. There is no known “correct” state when the flip-flop input changes so close to the clock edge. It is better to have an “incorrect” value propagating through your circuit than to have a metastable value.

Guarding Against Metastability
Due to its statistical nature, the occurrence of metastable events can only be reduced, not eliminated Mean Time Between Failures (MTBF) is exponentially related to the length of time the flip-flop is given to recover A few extra ns of recovery time can dramatically reduce the chances of a metastable event The circuits shown in this section allow a full clock cycle for metastable recovery A formal definition of recovery time is: <recovery time> = <time before the data is used> - <datapath delay> Another term students may have heard for recovery time is “slack.” Example: In this circuit, <time before data is used> = <period of CLK1>, and <datapath delay> = <FF1 clock-to-Q + logic delay + FF2 setup time>. If CLK1 has a period of 50 ns, and the datapath delay is 45 ns, then the recovery time of FF1 is = 5 ns. D Q D Q FF1 FF2 CLK1

Asynchronous Inputs This refers to two types of signals
Signals not controlled by any clock Signals controlled by a clock that is not seen by the FPGA Two cases to consider Input pulses are always at least one clock period wide Input pulses may be less than one clock period wide

Synchronization Circuit #1
Use this circuit when input pulses will always be at least one clock period wide The “extra” flip-flop guards against metastability Guards against metastability Asynchronous Input Synchronized signal D Q D Q FF1 FF2 This circuit is a simple two-bit shift register. CLK

Sample Waveform Guards against metastability Asynchronous Input CLK Synchronized signal D Q FF2 FF1 If the input to this circuit changes close to the clock edge, FF1 might go metastable. If this happens, the flip-flop has almost a full clock cycle to settle (minus the routing delay and setup time of FF2) before being clocked in to FF2.

Use this circuit when input pulses may be less than one clock period wide FF1 captures short pulses VCC Guards against metastability Synchronized signal D Q D Q D Q FF1 FF2 FF3 Asynchronous Input CLR Add a flip-flop that is clocked by the asynchronous input to the front of Circuit #1 and an AND gate to get this circuit. FF1 is a flip-flop with asynchronous clear. The AND gate prevents FF1 from being reset if the input to the circuit is still high. This allows for long input pulses as well as short ones. FF2 and FF3 act in the same way as FF1 and FF2 in Circuit #1. To avoid using this circuit, which uses the input as a clock signal (probably not on a global buffer), you can use a faster clock signal to synchronize inputs (allowing you to use Circuit #1). Then use a divided version of the same clock signal in the rest of the design. Because the clocks have a fixed-phase relationship, you will not need to resynchronize the signals. If you are targeting Virtex, use the DLL’s CLK2X output to get a faster clock signal for synchronizing inputs. CLK

Sample Waveform D Q CLR Asynchronous Input CLK Synchronized signal Guards against metastability VCC FF3 FF2 FF1 If multiple short pulses occur on the input within a space of three clock cycles, only the first pulse will be seen by this circuit.

Capture Asynchronous Bus
Leading edge detector Asynchronous Input Clk CLK One-shot enable D Q Guards against metastability FF2 FF1 Circuit 1 n bit bus CE Guards against metastability Synchronized bus inputs For a leading-edge detector (or falling-edge), first the data bus is registered by the Asynchronous clock; second, the one-shot enable (synchronized to the “system” clock) signals (clock enable) to the internal circuit that data is captured Circuit 1: Use this circuit when input pulses will always be at least one clock period wide Circuit 2: Use this circuit when input pulses may be less than one clock period wide Falling Edge detector can be designed a couple of different ways: 1: Easiest method -- Invert Asynchronous Input Clk. 2: a) For Circuit 1: Change the AND gate to have bubble on top instead of on the bottom. b) For Circuit 2: Change the one-shot enable AND gate to have bubble on top instead of on the bottom, remove Reset AND gate bubble, and, finally, change VCC to GND.

Capture Asynchronous Bus
Leading edge detector VCC Guards against metastability Q Asynchronous Input Clk CLK One-shot enable D FF3 FF2 CLR FF1 n bit bus CE Synchronized bus inputs Circuit 2 For a leading-edge detector (or falling-edge), first the data bus is registered by the Asynchronous clock; second, the one-shot enable (synchronized to the “system” clock) signals (clock enable) to the internal circuit that data is captured. Circuit 1: Use this circuit when input pulses will always be at least one clock period wide. Circuit 2: Use this circuit when input pulses may be less than one clock period wide. Falling Edge detector can be designed a couple of different ways: 1: Easiest method -- Invert Asynchronous Input Clk. 2: a) For Circuit 1: Change the AND gate to have bubble on top instead of on the bottom. b) For Circuit 2: Change the one-shot enable AND gate to have bubble on top instead of on the bottom, remove Reset AND gate bubble, and, finally, change VCC to GND.

Use a FIFO to cross domains Asynchronous Input Clk Asynchronous Input Clk PortA - Write Port Wr Write Addr Wr System Clk Data BlockRAM Rd System Clk FIFO Flags PortB - Read Port Read Addr Data Rd Full AlmostFull You must still synchronize FIFO status flags (full, half_full, almost_full, empty, half_empty, almost_empty) based on read and write operations, synchronized to read and write clocks respectively. This can generally be accomplished by synchronizing the slower enable (read/write) to the faster clock using Synchronization Circuit #1. If synchronizing to the slower clock, use Synchronization Circuit #2. HalfFull Empty AlmostEmpty HalfEmpty

Skills Check This page intentionally left blank

Review Questions High fanout is one reason to duplicate a flip-flop. What is another reason? Give an example of a time when you do not need to resynchronize a signal that crosses between clock domains

Answers High fanout is one reason to duplicate a flip-flop. What is another reason? Loads are divided between multiple locations on the chip Give an example of a time when you do not need to resynchronize a signal that crosses between clock domains Well-defined phase relationship between the clocks Example: Clocks are the same frequency, 180º out of phase, net delays will meet timing

Summary You can increase circuit performance by Some tradeoffs
Duplicating flip-flops Adding pipeline stages Using I/O flip-flops Some tradeoffs Duplicating flip-flops increases circuit area Pipelining introduces latency I/O flip-flop programmable delay trades off setup time and hold time Synchronization circuits increase reliability

Where Can I Learn More? Online Documentation
Virtex-II Data Book: Chapter 2 > Virtex-II Switching Characteristics Virtex-II Data Book: Chapter 3 > Detailed Description > Input/Output Blocks (IOBs) XAPP 094: Metastability Recovery

FPGA Design Techniques I

Similar presentations

Presentation on theme: "FPGA Design Techniques I"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FPGA Design Techniques I

Similar presentations

Presentation on theme: "FPGA Design Techniques I"— Presentation transcript:

Similar presentations

About project

Feedback