Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic FPGA Architectures

Similar presentations


Presentation on theme: "Basic FPGA Architectures"— Presentation transcript:

1 Basic FPGA Architectures
This material exempt per Department of Commerce license exception TSU

2 Objectives After completing this module, you will be able to:
Describe the basic slice resources available in Spartan-6 FPGAs Identify the basic I/O resources available in Spartan-6 FPGAs List some of the dedicated hardware features of Spartan-6 FPGAs Differentiate the Virtex-6 family of devices from the Spartan-6 family Identify latest members of Virtex-7 device family \\ Basic Architecture 2

3 Outline Overview Logic Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

4 Overview All Xilinx FPGAs contain the same basic resources
Logic Resources Slices (grouped into CLBs) Contain combinatorial logic and register resources Memory Multipliers Interconnect Resources Programmable interconnect IOBs Interface between the FPGA and the outside world Other resources Global clock buffers Boundary scan logic

5 Spartan-6 FPGA CLB Memory Controller I/O CMT MGT BUFG PCIe Endpoint
BUFIO Block RAM DSP48

6 Spartan-6 Lowest Total Power
45 nm technology Static power reductions Process & architectural innovations Dynamic power reduction Lower node capacitance & architectural innovations More hard IP functionality Integrated transceivers & other logic reduces power Hard IP uses less current & power than soft IP Lower IO power Low power option -1L reduces power even further Fewer supply rails reduces power Two families: LX and LXT Basic Architecture 6

7 Spartan-6 LX / LXT FPGAs These are the planned product offerings for the LX (base) and LXT (High Speed Serial) platforms. Note that you do not lose I/O as you migrate to larger devices within the same package. Again, the smaller device packages are .8mm and the larger devices have 1mm packages to aid in saving overall system cost due to increase routing complexity and board layers. ** All memory controller support x16 interface, except in CS225 package where x8 only is supported Basic Architecture 7

8 Outline Overview Logic Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

9 Spartan-6 FPGA CLB CLB contains two slices
Connected to switch matrix for routing to other FPGA resources Carry chain runs vertically through Slice0 only Switch Matrix Slice0 Slice1 CIN COUT

10 Three Types of Slices in Spartan-6 FPGAs
SLICEM: Full slice LUT can be used for logic and memory/SRL Has wide multiplexers and carry chain SLICEL: Logic and arithmetic only LUT can only be used for logic (not memory) SLICEX: Logic only No wide multiplexers or carry chain SLICEX SLICEM or SLICEX SLICEL In the Spartan-6 FPGA, ¼ of slices are SLICEM, ¼ are SLICEL, and ½ are SLICEX. One slice in each CLB is a SLICEX; the other alternates between SLICEL and SLICEM in adjacent columns. The carry chain exists in the SLICEM or SLICEL half of each CLB.

11 Spartan-6 CLB Logic Slices
SliceM (25%) SliceL (25%) SliceX (50%) LUT6 8 Registers Carry Logic Wide Function Muxes Distributed RAM / SRL logic LUT6 8 Registers Carry Logic Wide Function Muxes LUT6 Optimized for Logic 8 Registers Each CLB has 2 side-by-side Slices = total of 8 LUTs and 16 flip-flops Each Slice has 4 six-input LUTs and 8 flip-flops with common clock, CE and S/R Each LUT has 2 flip-flops, one of which can be configured as latch Note: the latch option is seldom used 25% of slices provide carry logic and memory / SRL (SLICE_M) Additional 25% of slices provide carry logic but no memory (SLICE_L) The remaining 50% of slices provide neither memory nor carry (SLICE_X) Eliminating carry in 50% of the slices saves area and thus cost Carry is needed only for arithmetic, accumulators & counters Slice mix chosen for the optimal balance of Cost, Power & Performance Basic Architecture 11

12 Spartan-6 FPGA SLICE Four LUTs Eight storage elements F7MUX and F8MUX
Four flip-flop/latches Four flip-flops F7MUX and F8MUX Connects LUT outputs to create wide functions Output can drive the flip-flop/latches Carry chain (Slice0 only) Connected to the LUTs and the four flip-flop/latches LUT/RAM/SRL 0 1 Each LUT also has some associated logic that includes carry logic. The carry chain enables the propagation of a carry signal between the corresponding bits when implementing arithmetic functions (such as accumulators, subtractors, or comparators, for example). This enables high performance and efficient device utilization. The dedicated multiplexers, called the F7 and F8 multiplexers, allow for the implementation of wider logic. If two LUT6s and the associated F7MUX are used, any arbitrary 7-input combinatorial function can be implemented. Similarly, if all four LUTs, the F7MUX resources and the F8MUX are used, an arbitrary 8-input combinatorial function can be implemented. These multiplexers can also be used to build larger multiplexers. Because a 4-input multiplexer can be implemented in one LUT6 (4 data inputs and 2 control inputs), a 16-1 multiplexer can be implemented using all 4 LUTs and the F7MUX and F8MUX. Logic that uses these built-in multiplexers will be significantly faster than logic built using only LUTs.

13 6-Input LUT with Dual Output
6-input LUT can be two 5-input LUTs with common inputs Minimal speed impact to a 6-input LUT One or two outputs Any function of six variables or two independent functions of five variables 5-LUT D A5 A4 A3 A2 A1 A6 O6 O5 6-LUT Each 6-input LUT can be configured as two 5-input LUTs. This gives the device a great deal of flexibility to build an efficient design. So it can build any function of six variables or two independent functions of five variables. LUTs can perform any combinatorial function limited only by the number of inputs. It is your primary combinatorial logic resource and it is the industry standard.

14 Slice Flip-Flop and Flip-Flop/Latch Control
AFF All flip-flops and flip-flop/latches share the same CLK, SR, and CE signals This is referred to as the “control set” of the flip-flops CE and SR are active high CLK can be inverted at the slice boundary Set/Reset (SR) signal can be configured as synchronous or asynchronous All four flip-flop/latches are configured the same All four flip-flops are configured the same SR will cause the flip-flop to be set to the state specified by the SRINIT attribute AFF/LATCH D CE SR Q CK D CE SR Q CK D ● ● ● ● ● ● DFF DFF/LATCH D CE SR Q CK The four flip-flops in each slice are named AFF, BFF, CFF, and DFF. The four FF/LATCH elements are named AFF/LATCH, BFF/LATCH, CFF/LATCH, and DFF/LATCH. The SRINIT of a flip-flop is set by the software depending on the reset state of the flip-flop. It will be set to SRLOW if the flip-flop is set to 0 during the reset condition, or SRHIGH if the flip-flop is set to 1. D CE SR Q CK

15 Configuring LUTs as a Shift Register (SRL)
D D Q CE CE CLK D Q CE D Q CE Q In the SLICEM slices, the LUT can also be configured as a Dynamically Addressable Shift Register, or SRL. This component basically acts as a programmable delay element. This diagram seems to imply that each LUT has a number of registers as part of its construction, but this component only allows you to load data in serially and then make it available a few clock cycles later. As data is presented to be loaded, the previously loaded data will be shifted down. Also, there are no set or reset capabilities, it is not loadable, and data can only be read serially. So the SRL does not behave exactly the same as a shift register implemented with registers. So what does this imply about inferring the Shift Register LUT? Its contents cannot be read at any one time because there is no parallel read functionality. Remember that it is serial in/serial out. Likewise, if you coded for a shift register that was initialized, it could not be mapped to the SRL primitive. Note that most synthesis tools require this coding style and the use of an attribute. There is a maximum delay of 32 clock cycles per LUT. The SRLs can be cascaded to other LUTs or CLBs for longer shift registers. The shift register length also can be changed asynchronously by toggling address A. This means that you could dynamically change the delay associated with an SRL. LUT D Q CE A[4:0] Q31 (cascade out)

16 Shift Register LUT Example
Operation D - NOP must add 17 pipeline stages of 64 bits each 1,088 flip-flops (hence 136 slices) or 64 SRLs (hence 16 slices) 20 Cycles Operation A Operation B 64 8 Cycles 12 Cycles 64 Operation C Operation D - NOP 3 Cycles 17 Cycles Paths are Statically Balanced 20 Cycles The SRL can be used as a programmable delay element (or No Operation, NOP). In this example, you see a 64-bit bus be processed through operation A, B, and C. A has a delay of eight cycles, B has a delay of twelve cycles, and C has a delay of three cycles. Because the data processed is also grouped at its output with a multiplexer, these datapaths must synchronize so that appropriate data is compared at the multiplexer. To do this, the SRL can be used to delay the C operation by 17 clock cycles. If you were to do this with registers, it would require 1,088 registers. If you use the SRL functionality instead, you only need 64 LUTs, each programmed for 17 clock cycles of delay. So, this example uses 64 LUTs to replace 1,088 flip-flops and the associated routing resources to complete this (pretty good justification for using the SRL, right?). Because there are so many registers in FPGAs, pipelining is an effective way of designing to increase design performance. And because pipelines can sometimes become unbalanced when too much logic must be generated, it is necessary to delay some of the signals. One of the best uses of the SRL is to add delay to balance pipelines.

17 Outline Overview Logic Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

18 Interconnect to FPGA fabric
I/O Block Diagram Logical Resources Electrical Resources Interconnect to FPGA fabric Master IOSERDES IODELAY IOLOGIC Slave N P LVDS Termination The electrical resources include the I/O pads and buffers. The logical resources include single data rate and double data rate register resources, a SERDES converter, and a programmable I/O delay line.

19 Spartan-6 FPGA Supports 40+ Standards
Each input can be 3.3 V compatible LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V) LVCMOS_JEDEC LVPECL (3.3 V, 2.5 V) PCI I2C* HSTL (1.8 V, 1.5 V; Classes I, II, III, IV) DIFF_HSTL_I, DIFF_HSTL_I_18 DIFF_HSTL_II* SSTL (2.5 V, 1.8 V; Classes I, II) DIFF_SSTL_I, DIFF_SSTL18_I DIFF_SSTL_II* LVDS, Bus LVDS RSDS_25 (point-to-point) Easier and More Flexible I/O Design! Use the I/O Planner to assign your I/O standards and ensure that your pinout follows the I/O banking rules (more on this in next slide). * Newly added standards

20 Spartan-6 FPGA I/O Bank Structure
All I/Os are on the edges of the chip I/Os are grouped into banks 30 ~ 83 I/O per banks Eight clock pins per edge Common VCCO, VREF Restricts mixture of standards in one bank The differential driver is only available in Bank0 and Bank2 Differential receiver is available in all banks On-chip termination is available in all banks BANK 3 BANK 1 BANK 2 Chip View (LX45/T and Smaller) BANK 0 BANK 5 BANK 4 BANK 1 BANK 3 BANK 2 Chip View (LX100/T and Larger)

21 Interconnect to FPGA Fabric
I/O Logical Resources Two IOLOGIC block per I/O pair Master and slave Can operate independently or concatenated Each IOLOGIC contains IOSERDES Parallel to serial converter (serializer) Serial to parallel converter (De-serializer) IODELAY Selectable fine-grained delay SDR and DDR resources Interconnect to FPGA Fabric Master IOSERDES IODELAY IOLOGIC Slave The primary use of the IOBs is for registering data. Incoming and outgoing data can be registered using a simple single data rate (SDR) flip-flop or a double data rate (DDR) flip-flop. High speed incoming serial data can also be deserialized using the SERDES capability within the IOB. Similarly, outgoing parallel data can be serialized onto a single output pin.

22 Outline Overview Logic Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

23 SLICEM Used as Distributed SelectRAM Memory
Uses the same storage that is used for the look-up table function Synchronous write, asynchronous read Can be converted to synchronous read using the flip-flops available in the slice Various configurations Single port One LUT6 = 64x1 or 32x2 RAM Cascadable up to 256x1 RAM Dual port (D) 1 read / write port + 1 read-only port Simple dual port (SDP) 1 write-only port + 1 read-only port Quad-port (Q) 1 read / write port + 3 read-only ports Single Port Dual Port Simple Dual Port Quad Port 32x2 32x4 32x6 32x8 64x1 64x2 64x3 64x4 128x1 128x2 256x1 32x2D 32x4D 64x1D 64x2D 128x1D 32x6SDP 64x3SDP 32x2Q 64x1Q The look-up table functionality is essentially a small memory containing the desired output value for each combination of input values. These storage cells are programmed at configuration time, and the look-up itself is done by using the inputs as the control for a wide multiplexer. By allowing these storage elements to be modified using FPGA fabric resources, the LUT can be used for the implementation of a small distributed memory. Each LUT can be a single ported 64-bit RAM with synchronous write and asynchronous read. LUTs in slices can be combined to create small dual-port and multi-port RAMs. In the Virtex-6 and Spartan-6 FPGAs, approximately one quarter of slices are SLICEMs in which the LUTs can be programmed as distributed RAMs (this varies with family). Simple dual-port configurations can be used to implement LUT FIFOs and MicroBlaze™ processor register files. Each port has independent address inputs

24 Spartan-6 FPGA Block RAM Features
18 kb size Can be split into two independent 9-kb memories Performance up to 300 MHz Multiple configuration options True dual-port, simple dual-port, single-port Two independent ports access common data Individual address, clock, write enable, clock enable Independent widths for each port Byte-write enable 18k Memory Dual-Port BRAM

25 Better, More BRAM More Block RAMs More port flexibility
2x higher BRAM to Logic Cell ratio than Spartan-3A platform More port flexibility 18K can be split into two 9K BRAM blocks and can be independently addressed Improves buffering, caching & data storage Excellent for embedded processing, communication protocols Enables DSP blocks to provide more efficient video and surveillance algorithms Lower Static Power For Spartan-6, we just about doubled the amount of BRAM to logic cell ratio compared to the large Spartan-3E/A devices. This is a significant increase of up to 4.8 Mb. And we made it even more efficient by adding the ability to break the 18K blocks into 9K blocks. This is a critical enhancement since it allows the use of small BRAM requirements without exhausting the ports and wasting resources. To Lower Static Power we redesigned the memory circuitry to reduce leakage. This was the goal of many of the newly designed features in Spartan-6. Basic Architecture 25

26 Memory Controller Only low cost FPGA with a “hard” memory controller
Guaranteed memory interface performance providing Reduced engineering & board design time DDR, DDR2, DDR3 & LP DDR support Up to 12.8Mbps bandwidth for each memory controller Automatic calibration features Multiport structure for user interface Six 32-bit programmable ports from fabric Controller interface to 4, 8 or 16 bit memories devices Basic Architecture 26

27 Spartan-6 Hard Memory Controller
New Hard Block Memory Controller Up to 4 controllers per device Why a Hard Memory Block? Very common design component Multiple customer benefits Customer Requests Spartan-6 Hard Block Memory Controller Benefits Higher performance Up to 800 Mbps Lower cost Saves soft logic, smaller die Lower power Dedicated logic Easier designs Timing closure no longer an issue Configurable MultiPort user interface CoreGen/MIG wizard & EDK support Transcript: Okay, so let's look now a little bit at the features that we have. I talked about why hard block. Well, because you have to meet the minimum frequency, but there are a lot of benefits to hard block. It's dedicated logic, you consume less power. You have more fabric logic at your disposal to do other things. Also, you have to think about the cost, if you have to implement a soft controller in fabric, then you take up a lot of logic. So you may end up with a bigger device than you really need. So these things are really helpful. We have actually implemented multiple controllers. So if you look at -- I'll show you in a minute, in the mid-size devices you have two controllers, in the larger devices you have four. And you can interface on a 16-bit bus only, or 8-bit bus, depending if you use a x16 or x8. So then a question would come, well, okay so my customer wants to have more than the bandwidth you can give with the x16 device. Well in that case, you can use two controllers. Or in the more extreme cases with the larger parts, you can use four controllers. And you can have the soft logic to basically take data from two or four controllers and use that. So that's an option we thought about. And actually we're thinking to implement a reference design that actually would use, hookup two of these controllers, that's something to look at in the future. And of course everything will go through the same tool flow as the soft controllers. For the non-embedded applications, we have the MIG and for embedded you can use EDK and have the MPMC support. So I won't go over data rates, it's 800 for DDR3 and DDR2 and as fast as DDR and low-power DDR can go, which is 400 megabit. Author’s Original Notes: Yes, we are integrating a hard block memory controller into the Spartan-6 family. In fact, we will have up to 4 memory controller blocks per device. The block will support DDR3, DDR2, DDR, and LPDDR at the rates shown here on the right. And why did we choose to integrate a hard memory controller? Well, like most other hard blocks, we integrate them when we think they can be defined in a way that handles the vast majority of applications and when we know that a significant percentage of the customer base is using such a block in their designs. Memory controllers are a very common design component and in the Spartan space, a DRAM controller with the most common capabilities and features will address a big chunk of what customers need. Furthermore, we can hit much higher data rates (800 Mbps) with a hard solution to provide about 2X the memory bandwidth of prior generation soft solutions. The hard block also conserves FPGA resources for the “secret sauce” of the user design and potentially allows the user to get into a smaller device. And of course the hardened solution will save on power as well by only using the transistors necessary to do the job. Finally, the hard block is considerably easier to design with, because the block is tested to guarantee performance so there are no concerns about meeting timing. And the CoreGen MIG wizard, or alternatively the EDK IP configurator, guides the user through the complete design implementation. Expected Questions: What if a customer needs more interfaces or the MCB doesn’t support the interface / features needed, will we still do soft IP solutions for Spartan-6? Answer: we believe that the hard MCB blocks will be appropriate for most situations, and so all current development effort is directed at IP offerings based on the MCB. However, we are always willing to hear input about customer needs and will adjust our IP solution roadmaps when there is clear demand for additional offerings. Basic Architecture 27

28 Spartan-6 FPGA DSP48A1 Slice
PCIN D:A:B X M C 18 X 18 P Z +/- 48 OPMODE[3:0] OPMODE[5] OPMODE[7] PCOUT OPMODE[6,4] BCIN D A 18 B CCOUT CFOUT BCOUT CIN 12 Dual B, D Register With Pre-adder 36 MFOUT 18x18 signed multiplier 48-bit add/subtract/accumulate Pipeline registers for high speed Cascade paths for wide functions A0 A1 The DSP slice is designed for DSP applications and large arithmetic operations. DSP designers who traditionally use the FPGA fabric for arithmetic applications will find that much of their job is done for them internally to this block. All they need to do is configure the block by using OPMODE inputs, which control the flow of data in the block. The DSP slice has an 18x18 2’s complement multiplier with a pre-adder on one of the inputs. It also has a 2-input adder/subtractor following the multiplier, which can be used to create several different arithmetic operations. Cascade pins are included to support complex functions with no speed penalty. This allows you to implement larger arithmetic operations by linking multiple slices together. This is especially useful for DSP applications. There are optional pipeline registers at several points within the DSP slice to maximize performance. Most designers are targeting this resource with the CORE Generator™ software. Refer to the data sheet and user guides for more information about this resource.

29 Outline Overview Logic Resources I/O Resources Memory and DSP48
Clock Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

30 Spartan-6 FPGA Global Clock Network
16 global clock buffers in the Spartan-6 FPGA allow clocks to be distributed to potentially every clocked element on the die 16 HCLK lines connect clock signals to logic resources in each row HCLK lines can be driven by Global clock buffers DCM outputs PLL outputs Each BUFG and HCLK row can only drive the clock and reset ports of each synchronous element (flip-flop or DSP slice, for example). This means that besides global clocks, only global resets are going to be routed on BUFGs. All secondary control signals (CE, Set, and Reset) will be routed on general interconnect. The global clock network in Spartan-6 FPGAs is driven by 16 BUFGMUX resources located in the center of the device. Clocks in each row of the FPGA are driven by 16 HCLK lines. These HCLK lines can be driven by either global clock buffers, or by the PLL and DCM signals generated within the adjacent clock management tile (CMT).

31 Spartan-6 FPGA I/O Clock Network
P N CMT PLL IO bank IOLOGIC BUFIO2 BUFPLL Special clock network dedicated to I/O logical resources Independent of global clock resources Speeds up to 1 GHz Multiple sources for clocking I/O logic BUFIO2: for high-speed dedicated I/O clock signals BUFPLL: for clocks driven by the PLL in the CMT Each I/O bank has two I/O clock regions. There are four high-speed I/O clock networks (BUFIO2) in every I/O clock region, driven by four dedicated clock input pins.

32 Spartan-6 FPGA Clock Management Tile (CMT)
dcm1_clkout<9:0> dcm2_clkout<9:0> 10 PLL pll_clkout<5:0> 6 CLKIN CLKFB CLKOUT<5:0> DCM Clocks from BUFG CLKOUT<9:0> GCLK Inputs Feedback clocks from BUFIO2FB The Spartan-6 FPGA clock management tile includes two digital clock managers (DCM) and one phase locked loop (PLL). There are dedicated routing connections between components within the same tile, as well as connections to the global clock buffers and HCLK lines. The DCM can remove clock insertion delay using the DLL feature, as well as perform digital phase shifting and frequency synthesis. The PLL can perform more complex frequency synthesis and can filter clock jitter.

33 Outline Overview Slice Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary Basic Architecture 33 33

34 Designers Eccentrics Higher System Performance Lower System Cost
More design margin to simplify designs Higher integrated functionality Lower System Cost Reduce BOM Implement design in a smaller device & lower speed-grade Lower Power Help meet power budgets Eliminate heat sinks & fans Prevent thermal runaway Basic Architecture 34

35 Architecture Alignment
Virtex-6 FPGAs Spartan-6 FPGAs 760K Logic Cell Device 150K Logic Cell Device Common Resources LUT-6 CLB BlockRAM DSP Slices High-performance Clocking FIFO Logic Parallel I/O Hardened Memory Controllers Tri-mode EMAC HSS Transceivers* 3.3 Volt compatible I/O System Monitor PCIe® Interface *Optimized for target application in each family Enables IP Portability, Protects Design Investments Basic Architecture 35

36 Virtex-6 and Spartan-6 FPGA Sub-Families
CXT FPGA Virtex-6 LXT FPGA Virtex-6 SXT FPGA Virtex-6 HXT FPGA Upto 3.75Gbps serial connectivity and corresponding logic performance High Logic Density High-Speed Serial Connectivity High Logic Density High-Speed Serial Connectivity Enhanced DSP High Logic Density Ultra High-Speed Serial Connectivity Spartan-6 LX FPGA Spartan-6 LXT FPGA Note that a hard processor core is NOT available in any of the Spartan-6 or Virtex-6 devices. There are three Virtex-6 sub-families and two Spartan-6 sub-families. The Spartan-6 traditional logic (LX) sub-family contains block RAM, memory controllers and DSP slice resources. It is targeted for general logic applications. The Spartan-6 LXT sub-family includes low-cost serial gigabit transceivers (GTP) and PCI Express® cores. The Virtex-6 devices all contain high-performance serial transceivers (GTX) and PCI Express cores, as well as Tri-mode Ethernet MAC cores. The SXT sub-family has more block RAM and DSP slices than other sub-families and is ideal for DSP applications. The HXT sub-family has ultra-high speed serial transceivers (GTH). Logic Block RAM DSP Parallel I/O Serial I/O Lowest Cost Logic Lowest Cost Logic Low-Cost Serial Connectivity

37 Outline Overview Slice Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

38 Virtex® Product & Process Evolution
40-nm Virtex-5 65-nm Virtex-4 90-nm Virtex-II Pro 130-nm Virtex-II 150-nm Virtex-E 180-nm Virtex 220-nm 1st Generation 2nd Generation 3rd Generation 4th Generation 5th Generation 6th Generation Delivering Balanced Performance, Power, and Cost Basic Architecture 38 Virtex-6 Base Platform 38 38

39 Strong Focus on Power Reduction
Static Power Reduction Higher distribution of low leakage transistors Dynamic Power Reduction Reduced capacitance through device shrink Reduced Core Voltage Devices Lower Overall Power VCCINT = 0.9V option allows power / performance tradeoff I/O Power Improvements Dynamic termination System Monitor Allows sophisticated monitoring of temperature and voltage Up to 50% Power Reduction vs. Previous Generation Basic Architecture 39

40 Power Consumption Benefits
Virtex-6 Logic Fabric Virtex-6 Configurable Logic Block (CLB) Each CLB contains two slices Each slice contains four 6-input Lookup Tables (6LUT) Slices implement logic functions (slice_l) Slices for memories and shift registers (slice_m) LUT6 implements All functions of up to 6 variables Two functions of up to 5 or less variables each Shift registers up to 32 stages long Memories of 64 bits Multiple configurations within a slice Power Consumption Benefits Performance Benefits Cost Benefits Shift register mode greatly reduces power consumption over FF implementation Increased ratio of slice_m – memories available closer to the source or target logic Can pack logic and memory functions more efficiently Basic Architecture 40

41 Higher DSP Performance
Most advanced DSP architecture New optional pre-adder for symmetric filters 25x18 multiplier High resolution filters Efficient floating point support ALU-like second stage enables mapping of advanced operations Programmable op-code SIMD support Addition / Subtraction / Logic functions Pattern detector Lowest power consumption Highest DSP slice capacity Up to 2K DSP Slices Basic Architecture 41 41

42 Outline Overview Slice Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary Basic Architecture 42 42

43 Power, Performance and Productivity Drive Market Trends
Lower Power Legislation and Regulations Higher Performance System Capacity and Performance Improved Productivity Reduce Capital and Operating Expenses (OPEX, CAPEX) Flat panel/TV, Central Office, Server Farms, Portable Medical, Portable Consumer Wired Infrastructure, Wireless, Broadcast, 300G+ Networks, Aerospace and Defense, High Performance Computing All Market Segments Some applications see “performance per unit of power” to be most critical. Others consider “cost per unit of power” to be most critical. Others see both … and have more universal needs for low power that can be summarized as “capability per unit of power” What customer are telling us: Power Simpler heat sinks and airflow Overall power reduction with fewer and lower cost power supplies Excessive system operating expenses Mandated Energy Star Compliance Handheld/battery products require low static power System Performance Systems continue to drive more bandwidth for chip-to-chip and box-to-box Need to interface with cutting edge interface technology Increase the amount of parallel processing Productivity Lower development costs Lower BOM costs Integrated functionality allowing decreased device count Improved product reliability Need options for further cost reduction Cost Strained R&D budgets – do more with less time and less money Leverage prior investments #1 Customer Problem: Lower Power enables better Cost, Performance, and Capability Basic Architecture 43

44 The Unified Architecture Advantage
Common elements enable easy IP reuse for quick design portability across all 7 series families Design scalability from low-cost to high-performance Expanded eco-system support Quickest TTM Artix™-7 FPGA Logic Fabric LUT-6 CLB Precise, Low Jitter Clocking MMCMs Kintex™-7 FPGA On-Chip Memory 36Kbit/18Kbit Block RAM Enhanced Connectivity PCIe® Interface Blocks Simplified design reuse and migration Common building blocks minimize time for coding, simulation and de-bug Common hard IP for familiarity and reliability Common, optimized interconnect for improved place and route Quickly scale designs to address adjacent markets Minimum design and deployment effort lowers development costs Simplified design migration enables designers to carry forward today’s investment to future platforms DSP Engines DSP48E1 Slices Hi-perf. Parallel I/O Connectivity SelectIO™ Technology Hi-performance Serial I//O Connectivity Transceiver Technology Virtex®-7 FPGA Basic Architecture 44

45 The Xilinx 7 Series FPGAs Industry’s First Unified Architecture
Industry’s Lowest Power and First Unified Architecture Spanning Low-Cost to Ultra High-End applications Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance Xilinx 7 series FPGAs comprise three unified FPGA families that offer a breakthrough 50% reduction in power and address the complete range of system requirements, ranging from low cost and small form factor packaging for cost-sensitive, high-volume applications to ultra-high end connectivity bandwidth, logic capacity, and signal processing capability for the most demanding high-performance applications. Basic Architecture 45

46 Virtex-7 Sub-Families The Virtex-7 family has several sub-families
Virtex-7: General logic Virtex-7XT: Rich DSP and block RAM Virtex-7HT: Highest serial bandwidth Virtex-7 FPGA Virtex-7 XT FPGA Virtex-7 HT FPGA Logic Block RAM DSP Parallel I/O Serial I/O High Logic Density High-Speed Serial Connectivity High Logic Density High-Speed Serial Connectivity Enhanced DSP High Logic Density Ultra High-Speed Serial Connectivity

47 Outline Overview Logic Resources I/O Resources Memory and DSP48
Clocking Resources Latest Families Virtex-6 Family Virtex-7 Family Summary

48 Summary The Spartan-6 FPGA slices contain four 6-input LUTs, eight registers, and carry logic LUTs can perform any combinatorial function of up to six inputs LUTs are connected with dedicated multiplexers and carry logic Some LUTs can be configured as shift registers or memories The Spartan-6 FPGA IOBs contain DDR registers as well as SERDES resources The SelectIO™ interfaces enable direct connection to multiple I/O standards The Spartan-6 FPGA includes dedicated block RAM and DSP slice resources The Spartan-6 FPGA includes dedicated DCMs, PLLs, and routing resources to improve your system clock performance and generation capability Latest introduced families are architected for power efficiencies Consists of Artix, Kintex, and Virtex devices


Download ppt "Basic FPGA Architectures"

Similar presentations


Ads by Google