Download presentation
Presentation is loading. Please wait.
1
FPGA and ASIC Technology Comparison
Part 1
2
Minimum: 6 months design experience
FPGA and ASIC Technology Comparison Intro to VHDL or Intro to Verilog 3 days FPGA and ASIC Technology Comparison FREE Curriculum Path FPGA vs. ASIC Design Flow FREE ASIC to FPGA Coding Conversion FREE Virtex-5 Coding Techniques Spartan-3 Coding Techniques FREE Don’t forget to listen to these FREE RELs… FPGA and ASIC Technology Comparison, Part 2 FPGA vs. ASIC Design Flow ASIC to FPGA Coding Conversion, Part 1 and 2 Virtex-5 Coding Techniques, Part 1 and 2 Spartan-3 Coding Techniques, Part 1 and 2 Fundamentals is a very essential essential course if you are new to FPGA design. I recommend that all customers take this course every 3-5 years, since the tools change every year. for ASIC Design Fundamentals of FPGA Design 1 day Designing for Performance 2 days Minimum: 6 months design experience Advanced FPGA Implementation 2 days
3
Welcome If you are an experienced ASIC designer transitioning to FPGAs, this course will help you reduce your learning curve by leveraging your ASIC experience Careful attention to how FPGAs are different than ASICs will help you create a fast and reliable FPGA design FPGAs are not usually referenced by gate count, like ASICs. This is because programmable logic uses LUTs which use many more gates to build the programmability into the architecture. Routing delays are fixed, but the tools will select an optimum routing based on the source and destination of the logic on the FPGA. The routing may use multiple lengths of interconnect to route a signal from a source to a destination, but this will depend on many factors. Performance is dependent on your HDL code style and if you instantiated dedicated hardware resources. Proper use of the ISE tools will also affect the performance of your design. Proper use of the ISE Tools is covered in Fundamentals.
4
Objectives After completing this training you will be able to:
Describe the differences between ASIC and FPGA architectures Explain the features of Xilinx FPGA architecture Benefit from the Xilinx dedicated resources FPGAs are not usually referenced by gate count, like ASICs. This is because programmable logic uses LUTs which use many more gates to build the programmability into the architecture. Routing delays are fixed, but the tools will select an optimum routing based on the source and destination of the logic on the FPGA. The routing may use multiple lengths of interconnect to route a signal from a source to a destination, but this will depend on many factors. Performance is dependent on your HDL code style and if you instantiated dedicated hardware resources. Proper use of the ISE tools will also affect the performance of your design. Proper use of the ISE Tools is covered in Fundamentals.
5
Contrasting Architectures
ASIC architecture compared to the Xilinx FPGA architecture Gates versus LUTs Delays Performance Fundamental part selection considerations Cost Size Volume Analog circuitry Time to market Reprogrammability FPGAs are not usually referenced by gate count, like ASICs. This is because programmable logic uses LUTs which use many more gates to build the programmability into the architecture. Routing delays are fixed, but the tools will select an optimum routing based on the source and destination of the logic on the FPGA. The routing may use multiple lengths of interconnect to route a signal from a source to a destination, but this will depend on many factors. Performance is dependent on your HDL code style and if you instantiated dedicated hardware resources. Proper use of the ISE tools will also affect the performance of your design. Proper use of the ISE Tools is covered in Fundamentals.
6
Standard Cell Advantages Disadvantages
Lowest price for high-volume production (greater than 200K per year) Fastest clock frequency (performance) Unlimited size Integrated analog functions Custom ASICs Low power Disadvantages Highest non-recurring engineering costs Longest design cycle Limited vendor IP with high cost High cost for engineering change orders
7
Embedded Array Advantages Disadvantages
Low price for medium-volume to high-volume production Performance only slightly slower than a standard cell 50+ million gates Custom macro support More flexibility than an FPGA Low power Disadvantages High non-recurring engineering costs Design cycle longer than an FPGA Vendor IP has high cost Generally digital only
8
Xilinx FPGAs Field-Programmable Gate Arrays
Advantages Lowest cost for low-volume to medium-volume production No non-recurring engineering costs Standard product Fastest time to market Xilinx has extensive library of IP Inexpensive compared to ASIC vendors Ability to make bug fixes quickly and inexpensively Disadvantages Slower performance Size limited to ~25 million system gates Digital only
9
Field-Programmable Gate Arrays
Xilinx FPGAs are made using SRAM Today’s FPGAs use 65-nm copper CMOS process Potential to accommodate 25M system gates Includes RAM and logic gates Performance up to 550 MHz Integrated synthesis, simulation, and place & route tools PC and UNIX Inexpensive: $2500 or less for the ISE Design Suite Use of third-party tools will increase costs Free ISE WebPACK is available Since FPGAs are SRAM based they must have an external memory resource (usually SDRAM or Flash device). Performance of your chosen device will vary by family and speed grade. This performance is usually obtained with a single logic level (LUT) between two registers. This performance model is very simple, and not realistic unless the design has been optimized for the architecture. The ISE Design Suite listed here does not include ISIM, the Xilinx Simulator (extra). Xilins also offers an on-chip logic analysis tool called ChipScope. Evaluation copies are available.
10
Configuration Introduction
When does configuration happen? On power up On demand Why do FPGAs need to be configured? FPGA configuration memory is volatile Configuration data is stored in a PROM or other external data source What do you need to know about FPGA configuration? What happens during configuration How to set up various configuration modes and daisy chains To learn more about configuration, refer to your device data sheet or check out the free REL. A daisy chain is used by customer that have multiple FPGAs configured from a single memory device.
11
Configuration Cost of ownership is reduced with the ability to reconfigure the hardware—extending the life of the product Reduces the costly physical deployment of repair technicians Extends the life of the product Upgrades Bug fixes Adding additional functionality Faster time to market Partial reconfiguration Many customers prototype their ASIC design using FPGAs. This enables them to try their design out and if it fails, does not waste the FPGA.
12
FPGA Configuration Methods
Xilinx PROMs: Slave/Master Serial Slave/Master SelectMAP Xilinx Cables: JTAG Slave Serial Slave SelectMAP FPGA Xilinx cables allow you to configure an FPGA with just a computer (prototyping and debugging). Configuration with a Microprocessor or Microcontroller involves some other control logic managing the configuration of the FPGA from a common memory device (SDRAM or Flash, usually). The Xilinx Platform Flash PROM uses the FPGAs built in configuration control logic to manage the programming of the FPGA. The SPI (Serial Peripheral Interface) and BPI (Byte-Wide Peripheral Interface) allow users to purchase off the shelf commodity flash devices for configuration. Microprocessor: JTAG Slave Serial Slave SelectMAP Commodity Flash: Slave SelectMAP SPI* BPI* *SPI and BPI support is available in the newer Virtex™-5 and Spartan™-3E families
13
Five Primary Elements Xilinx FPGAs Configurable logic blocks
Dedicated blocks There are a variety of dedicated blocks in the FPGA, including PCI, EMACs, Block RAM, DSP slice, and processors. The Clocking resources include PLLs, DLLs, and global routing resources (for clock signal transmission). I/O Blocks include some SERDES resources for encoding and decoding clock information from input signals. Input and output blocks Routing * Clocking Resources
14
Logic Cells Logic cells include
Combinatorial logic, arithmetic logic, and a register Combinatorial logic is implemented using Look-Up Tables (LUTs) Register functions can include latches, JK, SR, D, and T-type flip-flops Arithmetic logic is a dedicated carry chain for implementing fast arithmetic operations Carry out Carry Chain LUT D Q S/R Carry in LUTs are made of gates and are used for combinatorial logic (primarily). A LUT can also implement a 32-bit memory and emulate a shift register. Any register can built from the FFs available, however, D FFs are what is in existence in the silicon. To implement another FF will just use extra gates to convert the FF. Inferring latches is not recommended, unless the designer is aware of the caveats. Xilinx recommends proper synchronous design practices always be used. Synchronous Design techniques is covered as part of the Fundamentals of FPGA Design course. Large arithmetic functions can also be implemented with the DSP slice for best performance.
15
Combinatorial Logic LUTs function as a ROM LUT Z
A B C D E F Z Combinatorial Logic LUT They generate the output value… for a given set of inputs A B C D F E A LUT can perform any combinatorial function of up to 6 inputs with constant delay. A LUT has one output. Z Constant delay through a LUT Limited by the number of inputs and outputs, not by complexity
16
Wide Input Functions LUT
For wider input functions, LUTs can be combined using a multiplexer These muxes are dedicated, so they are fast LUT LUT Dedicated muxes save designers from wasting LUTs for this function and improve the speed of large muxes. MUX LUT
17
LUT-Based Memory Can store 64 bits of memory as either a RAM or a ROM
Fundamentally, the LUT is a ROM Can become RAM with activation of configuration write strobe Combine multiple LUTs for larger memories—larger in both in depth and width 128 x 8 is not uncommon 6-input LUT contains two 5-input LUTs, which adds more flexibility LUT While this is not as commonly used, it is still designed into the device. Many customers use the Shift-Register LUT implementation to delay a pipeline stage. Pipelining is good way to improve the speed of long combinatorial paths. This is covered in the Designing for Performance course.
18
Carry Logic The carry logic chain is dedicated logic that computes high-speed arithmetic logic functions The carry chain generally consists of a multiplexer and an XOR gate The LUT computes the multiplexer selector The multiplexer determines the carry-out The XOR gate computes the addition Carry Logic is easily inferred by all synthesis tools. Synthesis tools look for an arithmetic operator to target this resource. This resource improves the speed of almost all arithmetic functions.
19
Memory Blocks Support single- and dual-port synchronous operations
In dual-port mode, these RAM blocks support fully independent ports for both reading and writing Each RAM block can be configured for 36 kb Can be used as 2 independent 18-kb RAMs Dedicated cascade logic allows 2 RAM blocks to be configured as 72 kb Blocks of memory are generally spread out across the die Dedicated FIFO logic enables each RAM to be configured as a FIFO Every device has at least some Block RAM resources. Each Block RAM can be used as 2 independent 18-kb Block RAMs if your memory is small. It is important that your try to take advantage of as much of these dedicated resources (Block RAM, etc.) to assure optimum system speed. The easiest way to add this resource is with the Architecture Wizard utility (covered in Fundamentals).
20
Block RAM Configurations
Configurations available on each port Independent configurations on ports A and B, read and write Supports data-width conversion, including parity bits Configuration Depth Data Bits Parity Bits 32k x 1 32 kb 1 16k x 2 16 kb 2 8k x 4 8 kb 4 4k x 9 4 kb 8 2k x 18 2 kb 16 1k x 36 1 kb 32 Each Block RAM is configurable as a dual-port (2 inputs and 2 output ports) Block RAM. This memory space can be accessed as one large memory or two separate memories. IN 8 bit Port A: 8 bits Port B: 32 bits OUT 32 bit
21
IOB Element Input path Output path
Two DDR registers Output path Two 3-state enable DDR registers Each path can be combinatorial or registered Separate clocks and clock enables for I and O Set and reset signals are shared There is also a tri-state buffer associated with each IOB (not shown) The use of these registers is the fastest and best way to bring data in/out of the FPGA. The setup and hold times on these registers programmable. If your design does not use these registers (you bypass them) they will be wasted.
22
IOB Element Default I/O standard varies by family
Fast and slow slew rate Programmable drive strength Other I/O standards Built in SERDES functionality ISERDES divides input data by up to 10 OSERDES multiplies output data by up to 10 This SERDES functionality is not available with every IOB on the FPGA. Use of the SERDES resources will require optimizing the design through instantiation. Every FPGA defaults to slow slew rate. Fast slew rate should be used carefully, since it makes the device more susceptible to ground bounce. Over 40 I/O standards are supported with the newest FPGAs. Also note the newest devices also have additional registers to suport DDR.
23
DSP Slice 25x18 Multiply ALU Mode Dedicated A Cascading
This resources is, of course, ideal for DSP applications. However, it is also very useful for large arithmetic functions like multipliers, accumulators, etc. The DSP slice also has a dedicated carry chain that connects to the neighboring slice. This enables even larger arithmetic functions. The easiest way to instantiate this resource is to use the Core Generator. Dedicated A Cascading Pattern Detection Independent C input
24
Routing A combination of programmable and dedicated routing lines
Global clocks with predefined clock tree Regional clocks and IO clocks Global low-skew routing resources for other high-fanout signals Carry chain routing Dedicated routing among other dedicated resources General interconnect Routing of local signals between CLBs and IOBs In general the routing of signals is invisible to the user. Xilinx FPGAs are designed to use interconnect and switch matrices to interconnect signals from a source to any destination. Besides general interconnect there are low skew routing resources for local high speed signals, which is often useful in Serdes applications. There are also high fanout routing resources designed for distribution of high fanout control signals, such as clocks, CE, sets, and resets.
25
Clock Management Dedicated clock trees are pre-optimized clock networks that balance the skew and minimize delay Virtex-5 FPGA has 32 separate clock networks Spartan-3 FPGA has 8 separate clock networks Each can be configured for a built-in clock enable (BUFGCE) or switching clock sources (BUFGMUX) Local clock routing includes regional (BUFR) and SERDES (BUFIO) The global routing resources are often called dedicated clock trees. The easiest way to add a global routing resource is with the Clocking Wizard, which most users employ to customize their DCMs and PLLs. Virtex-5 also allows the extra clock resources to be used for high-fanout secondary control signals, as well.
26
Clock Management CMT PLL Digital Clock Manager (DCM) consists of…
Digital Delay Locked Loop (DLL) Digital Frequency Synthesis (DFS) Digital Phase Shifter (DPS) PLLs are typically used for reducing clock jitter and some generation of clock frequencies. DCMs are typically used for generating clock frequencies, correcting clock duty cycles, and phase shifting clocks. There are up to 6 Clock Management Tiles (CMT) per device. There is a large amount of programmable functionality associated with the DCM and PLLs that is not shown here.
27
I/O Translators Programmable input and output thresholds
Supported standards include LVCMOS (several classes), LVPECL, HSTL (several classes), SSTL (several classes), PCI, PCI-X, LVDS (several classes), GTL, GTL+, and HyperTransport™ (LDT) technology - Supported standards vary, check your data sheet Different I/O standards require a separate input and output reference voltage for each bank supporting a separate I/O standard Generally, each bank can support several standards, as long as they share the same vref (input) or vcco (output) For a complete list of the I/O standards supported for your device, it is important that you refer to your device data sheet. The newest FPGAs support over 40 I/O standards, including several differential standards, but this will vary by device family.
28
Dedicated and Special Resources
Clock management (CMT) DCM and PLL Dedicated clock trees (not shown) Test logic Built-in JTAG I/O translators Supporting many different thresholds Other resources Dual-Data Rate (DDR) registers in IOB SERDES resources Dedicated Cores Block RAM DSP Slices Gigabit transceivers, MGTs (all devices) Tri-mode Ethernet MAC (all devices) PCI Express® core (all devices) Additional FXT Cores PowerPC® 440 processors (not shown) Faster GTX transceiver (not shown) This slide is designed to give you an idea of the other dedicated resources that exist in the silicon. Note that this slide is showing all of the possible dedicated resources for Virtex-5, the available resources and quantity will vary by family and density.
29
Other Resources Embedded processor cores
32-bit PowerPC 440 processor core (hard) MicroBlaze processor core (soft) Digitally controlled termination resistance (DCI) The v2 Pro and Virtex-4 device families support the PPC 405 processor. MicroBlaze can be used in any device family and density, as long as there are sufficient resources available. It is a simple microprocessor and does not have all the functionality of high-end processor core like the PPC. The DCI saves users from having to place many termination resistors by having the impedance replicated on each of the IOB in each I/O bank.
30
Summary FPGA flexibility
Reconfigurable logic Time to market Lowest “cost of change” Xilinx combinatorial resources use flexible LUTs Xilinx slices also contain registers, carry logic, clocking resources, and dedicated muxes to improve the performance for all applications Xilinx FPGAs have dedicated resources for DSP, RAM, PCI, EMAC, and I/O that make these critical paths equivalent to a custom ASIC
31
Where Can I Learn More? Xilinx online documents
Software manuals Data sheets Application notes User guides Xilinx Education Services courses Xilinx tools and architecture courses Hardware description language courses Free Videos
32
FPGA and ASIC Technology Comparison
Part 2
33
Curriculum Path ASIC Design
FPGA and ASIC Technology Comparison Intro to VHDL or Intro to Verilog 3 days FPGA and ASIC Technology Comparison FREE Curriculum Path FPGA vs. ASIC Design Flow FREE ASIC to FPGA Coding Conversion FREE Virtex-5 Coding Techniques Spartan-3 Coding Techniques FREE Don’t forget to listen to these FREE RELs… FPGA and ASIC Technology Comparison, Part 2 FPGA vs. ASIC Design Flow ASIC to FPGA Coding Conversion, Part 1 and 2 Virtex-5 Coding Techniques, Part 1 and 2 Spartan-3 Coding Techniques, Part 1 and 2 Fundamentals is a very essential essential course if you are new to FPGA design. I recommend that all customers take this course every 3-5 years, since the tools change every year. for ASIC Design Fundamentals of FPGA Design 1 day Designing for Performance 2 days Advanced FPGA Implementation 2 days
34
Welcome If you are an experienced ASIC designer transitioning to FPGAs, this course will help you reduce your learning curve by leveraging your ASIC experience Careful attention to how FPGAs are different than ASICs will help you create a fast and reliable FPGA design FPGAs are not usually referenced by gate count, like ASICs. This is because programmable logic uses LUTs which use many more gates to build the programmability into the architecture. Routing delays are fixed, but the tools will select an optimum routing based on the source and destination of the logic on the FPGA. The routing may use multiple lengths of interconnect to route a signal from a source to a destination, but this will depend on many factors. Performance is dependent on your HDL code style and if you instantiated dedicated hardware resources. Proper use of the ISE tools will also affect the performance of your design. Proper use of the ISE Tools is covered in Fundamentals.
35
Objectives After completing this training you will be able to:
Describe how a simple logic implementation can differ between ASIC and FPGAs Recognize gate counts as an estimation of design size Explain some of the FPGA design practices you must follow to get peak performance in your FPGA FPGAs are not usually referenced by gate count, like ASICs. This is because programmable logic uses LUTs which use many more gates to build the programmability into the architecture. Routing delays are fixed, but the tools will select an optimum routing based on the source and destination of the logic on the FPGA. The routing may use multiple lengths of interconnect to route a signal from a source to a destination, but this will depend on many factors. Performance is dependent on your HDL code style and if you instantiated dedicated hardware resources. Proper use of the ISE tools will also affect the performance of your design. Proper use of the ISE Tools is covered in Fundamentals.
36
This means built-in design flexibility!
Gate Comparison In retargeting HDL code for an ASIC design to an FPGA, gate conversion is rarely one to one A 0.13-µ standard cell can have up to 100K gates per mm2 A Virtex®-5 FPGA has about 20K usable gates per mm2 Why the difference? Xilinx has programmable logic in addition to the functional logic Routing Multiplexers Configuration memory registers This means built-in design flexibility!
37
Gate Translation Separate out logic, flip-flops, RAM, cores, and I/O
Partition cores into logic and RAM Assume 6 to 24 gates per LUT (depending on the number of inputs used) RAM bits are equivalent Up to 100 ASIC gates per I/O; translate to IOBs 7 gates per register So what design strategy do you think you need to use? To get the most out of the FPGA try to use as many features as possible, especially the FPGA’s dedicated hardware
38
Example ASIC FPGA 250K logic gates Four 32-kb blocks of RAM
243 pads, including power and ground FPGA 20,800 to 41,600 LUTs Equivalent Equivalent number of pins LUTs use between 6 and 12 gates (based on number of inputs and functionality). Block RAM bits are equivalent. You must have the same number of pins even though you will use extra gates. This technique does not factor getting your timing objectives met. Roughly, if you need high performance plan on using 80% of the device or less. If you are operating at low speed (150 MHz or less) then you may be able to pack more logic into the FPGA. Timing estimation is also an inexact science. Depending on the number of LUTs needed, this design could use a Virtex-5 LX30, LX50, or LX85 FPGA
39
Gate counts are influenced by
Coding style Metal layers Process geometry Library quality Placement and routing algorithms Core contents (RAM versus gates) I/O requirements Special features Any ASIC-to-FPGA gate counting method is only a rough estimate. Taking ASIC code directly to an FPGA will not utilize the dedicated resources of the FPGA. CONCLUSION All estimates are rough estimates when it comes to gate counting. You will have to optimize your design. More on this later…
40
AND Gate Example 8-input AND gate assign and_out = & vec;
For vec(7 downto 0) and_out <= vec(0) AND vec(1) AND vec(2) AND vec(3) AND vec(4) AND vec(5) AND vec(6) AND vec(7); VHDL Simple 8-input AND gate. For vec(7.0) assign and_out = & vec; Verilog
41
ASIC Implementation 8-input AND gate
Two four-input NAND gates feeding a two-input NOR gate Note that the ASIC library quality will dictate how many gates are needed. It is very important for you to understand that if you build high fan-in combinatorial functions in an ASIC they must be optimized for the FPGA. Specifically they will need to be re-synthesized and probably pipelined. For example, if this was a 32-input AND gate, the ASIC version would be fine, but the FPGA version would have to fit into 4 or 6-input LUTs, and thus would require several LUTs in series. Approximate gate count = 14 Approximate delay in a standard-cell ASIC with 0.13-µ process = 0.47 ns Beware of ASIC libraries with very wide gate types!
42
Xilinx Implementation
8-input AND gate implemented in three 4-input LUTs and two logic levels This is an optimum utilization of the 4-input LUT, and is usually what synthesis tools will do. When this was targeted to Virtex-5 (6-input LUT) it synthesized to the same result. This is correct. For example, if this was a 32-input AND gate, the ASIC version would be fine, but the FPGA version would have to fit into 4 or 6-input LUTs, and thus would require several LUTs in series. Approximate max delay in a Spartan®-3 FPGA = ns Approximate gate count = 18 gates Approximate max delay in a Virtex-5 FPGA = ns Approximate gate count = 18 gates
43
Question How many 4-input LUTs would be required to implement a 32-input OR gate? How many Logic Levels would they generate? This is an optimum utilization of the 4-input LUT, and is usually what synthesis tools will do. When this was targeted to Virtex-5 (6-input LUT) it synthesized to the same result. This is correct. For example, if this was a 32-input AND gate, the ASIC version would be fine, but the FPGA version would have to fit into 4 or 6-input LUTs, and thus would require several LUTs in series.
44
Answer LUT How many 4-input LUTs would be required to implement a 32-input OR gate? 11 How many Logic Levels would they generate? 3 LUT LUT This is an optimum utilization of the 4-input LUT, and is usually what synthesis tools will do. You can now see that if you have, for example, four 32-bit OR gates feeding a 4-input OR gate that its going to generate a very long delay. This kind of path would need to be pipelined. Beware of ASIC libraries that have very wide gate types. LUT LUT LUT If net delays ~ .3 ns and LUT delays ~.2 ns then total delay would be 2(.3) + 3(.2) ~ 1.2 ns …in a Spartan®-3 FPGA How do you think this would be implemented in Virtex-5 with a 6-input LUT? (Answer: 7 LUTs and 2 Logic Levels) LUT LUT
45
Tri-State Busses Some ASIC designs have large tri-state busses
There are no tri-state buffers associated with each slice in the newest FPGAs These will have to be re-synthesized and be mapped to LUTs and the F7 and F8 dedicated muxes You may need to code these with a CASE statement and a high-Z output The F7 can implement an 8-to-1 mux The F8 can implement a 16-to-1 mux Most customers will just re-synthesize their tri-state bus and use their schematic viewer to verify that they were able to infer the dedicated multiplexers. Check your synthesis tools solutions data base to see if you need an attribute, case statement, or a hi-z output.
46
Registered AND gate VHDL always @ (posedge clk) begin Verilog
process (clk) begin if rising_edge(clk) then vec_q <= vec; and_out <= vec_q(0) AND vec_q(1) AND vec_q(2) AND vec_q(3) AND vec_q(4) AND vec_q(5) AND vec_q(6) AND vec_q(7); end if; end process; VHDL Now we inferred input and output registers to the 8-input AND gate. (posedge clk) begin vec_q <= vec; and_out <= & vec_q; end Verilog
47
Performance Comparison
A comparison of the achieved performance for the registered 8-input AND gate Virtex-5 FPGA ~550 MHz ~88 gates 0.13-µ standard cell ASIC ~850 MHz ~77 gates Typical high-performance frequencies (no optimization for the FPGA) ~275 MHz for four-levels of LUT (combinatorial) logic ~550 MHz for equivalent logic Note that typical system speed, for a non-optimized design, for the newest FPGAs is about half of what the speed is in an ASIC. If you optimize the HDL and the design for the FPGA you will get a lot closer to ASIC performance. Don’t forget to optimize your HDL code!
48
ASIC versus FPGA Combinatorial logic implemented in an ASIC is typically faster than in an FPGA implementation The fine-grain architecture of an ASIC allows wider input functions to be implemented with significantly less delay ASICs have a dedicated routing structure rather than a programmable routing structure Critical paths typically include I/O, RAM, PCI™ technology, EMAC, and DSP resources Xilinx has dedicated FPGA resources to implement these functions, making these paths equivalent to an ASIC implementation Remember: Xilinx Virtex-5 devices are cutting-edge ASICs Instantiate dedicated resources with the Core Generator and the Architecture Wizard. This is taught in the Fundamentals of FPGA Design. Don’t forget to include Xilinx-dedicated resources in your design!
49
Pipelining fMAX = n MHz fMAX 2n MHz D Q D Q Two Logic Levels
One Level D Q Inserting flip-flops into a datapath is called pipelining. Pipelining increases performance by reducing the number of logic levels (LUTs) between flip-flops. All Xilinx FPGA device families support pipelining. Adding a pipeline stage, as shown in this example, will not exactly double fMAX. The flip-flop that is added to the circuit has an input setup time and a clock-to-Q time that make the pipelined circuit run at less than double the original frequency. Also be aware that this method will increase your register usage and require the designer to balance their system latency. Pipelining is not as necessary for designs targeting Virtex-5, because it has a 6-input LUT architecture.
50
Sequential Design How do you get high performance from an FPGA?
Pipelining For large combinatorial paths, additional registers may need to be inferred to break up combinatorial paths to increase performance This technique increases the size of the design This is not as likely to be needed for Virtex-5 FPGA designs because the Virtex-5 FPGA has a 6-input LUT Evaluate the number of logic levels your design has by generating a timing report from the ISE® Design Suite or your synthesis tool Usually the registers are added at a hierarchical boundary Evaluate your timing critical data paths with your synthesis tools schematic editor and timing analyzer. You can use the XST schematic viewer and the ISE Design Suite’s Timing Analyzer. Don’t forget to evaluate the number of logic levels for your timing-critical paths!
51
Timing Constraints How do you get high performance from an FPGA?
Timing constraints communicate the performance goals to the implementation tools Global timing constraints constrain virtually all the paths in your design based on your system frequency, input, and output times (PERIOD, OFFSET IN, OFFSET OUT) Path-specific timing constraints need to be added to constrain multi-cycle paths and false paths Timing Constraints are taught in Fundamentals of FPGA Design and the Designing for Performance course. Adding timing constraints is essential if you want good system speed!
52
Coding Style How do you get high performance out of an FPGA?
Coding style has a large impact on the performance Because FPGA combinatorial and routing resources are inherently slower, the HDL coding style needs to be improved Write your code to limit the number of logic levels inferred Learn about proper HDL coding styles by listening to the REL modules Don’t waste time! Evaluate your HDL!
53
Synchronous Design How do you get reliability out of an FPGA?
Always build a synchronous design Asynchronous circuits are less reliable Lot variations exist for all FPGAs, which means that your design has to be able to work for faster devices Timing constraints Cannot fix asynchronous design problems—only you can Never sacrifice design reliability. Remove all asynchronous design practices. Asynchronous design will usually fail with ALL FPGAs. This topic is taught in the Fundamentals of FPGA Design course. Lot variations occur frequently. You wont be able to tell from the packaging what speed grade device you actually have. We tell our customers that your speed grade MAY be faster (but it is not guaranteed). So build your design to work synchronously so your system can get a faster FPGA.
54
Synchronous Design Methodology
One clock (or at least as few as possible) Use one edge (all flip-flops use rising or falling edge) Use D-type flip-flops Register the outputs of each behavioral block In place of multiple clocks, use clock enables Synchronize asynchronous signals to the “single” clock (synchronization circuits) Do NOT create Gated, derived, or divided clocks Local asynchronous set/reset Avoid global asynchronous set/reset This content is covered in detail in the Fundamentals of FPGA Design course. I, personally, have had to fix customer designs that had built in the last three problems. Get it right the first time!
55
Summary Don’t worry too much about gate counting methodologies. They are only rough estimates, anyway Optimize your HDL coding style Instantiate Xilinx-dedicated hardware resources into your design to improve your system speed and maximize what you get from your FPGA Pipeline your timing-critical paths Timing constraints are a primary means for improving system speed Get your design to work properly the first time by designing synchronously
56
Where Can I Learn More? Xilinx Answers Browser
Answers Browser window Enter keywords like “pipelining” or “period constraint” Xilinx Education Services courses Xilinx tools and architecture courses Fundamentals of FPGA Design Learn about synchronous design, global timing constraints, the Architecture Wizard, and the CORE Generator™ tool Designing for Performance Learn about avoiding metastability, path-specific timing constraints, and the Timing Analyzer Free Video-based Training Learn about proper HDL coding techniques
57
Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes. Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design. THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY. The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk. © 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. PowerPC is a trademark of IBM, Inc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.