Programmable Logic- How do they do that?

Programmable Logic- How do they do that?
Class 3: Specialized Functions 1/14/2015 Warren Miller

This Week’s Agenda 1/12/15 An Introduction to Programmable Logic 1/13/15 Switches and Logic 1/14/15 Specialized Functions 1/15/15 Adding Processors 1/16/15 Software Tools

Course Description Often we don't think about the details of how a particular device or technology are implemented- we just use them in our designs. However sometimes you can’t help but wonder- “What did they do that?” This course will dig into the details of how programmable logic devices and the associated tools are implemented so you can better understand some of the ‘How’ behind common trade-offs you are faced with in your designs. Programmable logic starts first with the technology used to implement the configurable logic that makes up a programmable logic device. This class will review the primary technology use to implement the configurable elements common to all programmable logic devices.

Today’s Topics Goals and Objectives
What Functions are Inefficient for the Base Fabric? Memory, Counters, Decoders Adders, Multipliers Memory How Are These Specialized Functions Implemented? Memory, Counters Adders, Multipliers, DSP Blocks How Does Software Identify and Use These Functions? Synthesis Place and Route The general purpose nature of programmable logic switches and logic elements are very flexible, but inefficient for implementing common high-level building blocks for most digital sub-systems. Most programmable logic devices add some fixed function elements to avoid these inefficiencies and this class will describe the most common ones.

Goals and Objectives Understand How and Why FPGAs have Fixed Function Blocks Architecture Logic Interconnect Efficiency Compared to programmable fabric Software

FPGA Fabric- Review IO Blocks Programmable Interconnect Logic Blocks
Switches and Signal Lines Logic Blocks LUTs plus ‘stuff’ Carry Look Ahead, RAM ROM, Shift Registers Interconnect Limited

FPGA Fabric- Efficient Use?
State Machines One Hot Encoding Delay Counters Feedback shift registers Limited Fanout Duplicate Logic if needed Add retiming registers Register Rich Limited Inputs Limited Outputs Rent’s Rule! In the 1960s, E.F. Rent, an IBM employee, found a remarkable trend between the number of pins (terminals T) at the boundaries of integrated circuit designs at IBM and the number of internal components (g), such as logic gates or standard cells. On a log-log plot, these datapoints were on a straight line, implying a power-law relation T = t g^p where t and p are constants (p < 1.0, and generally 0.5 < p < 0.8). Rent disclosed his findings in IBM-internal memoranda that were published in the IBM Journal of Research and Development in 2005 (IBM J. Res. & Dev. Vol. 49, No. 4/5 July/September 2005, pp. 777–803), but the relation was described in 1971 by Landman and Russo.[1] They performed a hierarchical circuit partitioning in such a way that at each hierarchical level (top-down) the least number of interconnections had to be cut to partition the circuit (in more or less equal parts). At each partitioning step, they noted the number of terminals and the number of components in each partition and then partitioned the sub-partitions further. They found the power law rule applied to the resulting T versus g plot and named it "Rent's rule". Rent's rule is an empirical result based on observations of existing designs, and therefore it is less applicable to the analysis of non-traditional circuit architectures. However, it provides a useful framework with which to compare similar architectures.

Example Register Rich Logic Lean Adjust Fan-in to reduce logic levels
Adjust Fan-out to reduce routing delay

FPGA Fabric- What’s Inefficient
Large Memory Blocks Multiplication Division Floating Point Operations Standard Interfaces PCIe, Ethernet, etc. Priority Encoders Register Lean Many Inputs Many Outputs

Architecture for Fixed Blocks
Xilinx Series 7 Example Previous Approach New ASMBL approach Column Based Can create families Artix (Cost Sensitive) Kintex (Efficient) Virtex (Performance and Capacity) Xilinx created the Advanced Silicon Modular Block (ASMBL) architecture to enable FPGA platforms with varying feature mixes optimized for different application domains. Through this innovation Xilinx offers a greater selection of devices, enabling customers to select the FPGA with the right mix of features and capabilities for their specific design. Figure 2-1 provides a high-level description of the different types of column-based resources.

Xilinx Series 7 Block RAM
Same as Virtex-6 SRAM Block 36K/18K Block 32Kx1 to 512 x 72 Simple Dual-Port and True-Dual Port Built-in FIFO 64-bit ECC per Block Adjacent Blocks Combine to 64Kx1 without using fabric

Xilinx Series 7 DSP Block
25x18 Multiplier 25-bit pre-adder Pipeline Cascade and Carry 96-bit MAC SIMD Support 48-bit ALU Pattern Detect 17-bit Shifter Dynamic Operation (Cycle by cycle)

Altera Arria 10 FPGAs Arria 10: Column Based Core Logic Fabric
DSP Blocks Memory Blocks Memory Controllers PCIe Core Transceiver PCS Clocking

Altera Arria 10 Logic Module
Adaptive LUT 8-inputs 8-outputs Full Adder Registers Carry In/Out Adaptive LUT is ‘fracturable’ Made up of smaller LUTs Connected with Muxes

2 4-inout LUTs 4 3-input LUTs Muxes to combine in multiple ways Shared inputs Dabcd Def Separate inputs Control Signals

Combinations: Dual 4-input LUTs 5-input and 3-input 5-input and 4-input 5-input and 5-input 6-input 6-input and 6-input Cascaded 4-input and 3-input Software Impact Performance impact

Altera Arria 10 Interconnect
Row Column Local Variable Speed and Length Block and IO Connects ALM, LAB, MLAB Carry Chains Control Signals and Clocks

Arria 10 Block RAM

Altera DSP Block Floating-point arithmetic:
• Multiplication, addition, subtraction, multiply-add, and multiply-subtract • Multiplication with accumulation • Multiplication with cascade summation or subtraction • Complex multiplication • Direct vector dot product • Systolic FIR filter Features for fixed-point arithmetic: • High-performance, power-optimized, and fully registered multiplication operations • 18-bit and 27-bit word lengths • Two 18 x 19 multipliers or one 27 x 27 multiplier per DSP block • Built-in addition, subtraction, and 64-bit double accumulation register to combine multiplication results • Cascading 19-bit or 27-bit when pre-adder is disabled and cascading 18-bit when pre-adder is used to form the tap-delay line for filtering applications • Cascading 64-bit output bus to propagate output results from one block to the next block without external logic support • Hard pre-adder supported in 19-bit and 27-bit mode for symmetric filters • Internal coefficient register bank in both 18-bit and 27-bit modes for filter implementation • 18-bit and 27-bit systolic finite impulse response (FIR) filters with distributed output adder • Biased rounding support Features for floating-point arithmetic: • Multiplication, addition, subtraction, multiply-add, and multiply-subtract • Multiplication with accumulation capability and a dynamic accumulator reset control • Multiplication with cascade summation capability • Multiplication with cascade subtraction capability • Complex multiplication • Direct vector dot product • Systolic FIR filter

Altera DSP Block Fixed-point arithmetic:
• 18-bit and 27-bit word lengths • Two 18 x 19 multipliers or one 27 x 27 multiplier • Built-in addition, subtraction, and 64-bit double accumulation register • Cascading 19-bit or 27-bit and cascading 18-bit when pre-adder is used • Cascading 64-bit output bus • Hard pre-adder supported in 19-bit and 27-bit mode • Internal coefficient register bank in both 18-bit and 27-bit modes • 18-bit and 27-bit systolic FIR filters • Biased rounding support Features for fixed-point arithmetic: • High-performance, power-optimized, and fully registered multiplication operations • 18-bit and 27-bit word lengths • Two 18 x 19 multipliers or one 27 x 27 multiplier per DSP block • Built-in addition, subtraction, and 64-bit double accumulation register to combine multiplication results • Cascading 19-bit or 27-bit when pre-adder is disabled and cascading 18-bit when pre-adder is used to form the tap-delay line for filtering applications • Cascading 64-bit output bus to propagate output results from one block to the next block without external logic support • Hard pre-adder supported in 19-bit and 27-bit mode for symmetric filters • Internal coefficient register bank in both 18-bit and 27-bit modes for filter implementation • 18-bit and 27-bit systolic finite impulse response (FIR) filters with distributed output adder • Biased rounding support Features for floating-point arithmetic: • Multiplication, addition, subtraction, multiply-add, and multiply-subtract • Multiplication with accumulation capability and a dynamic accumulator reset control • Multiplication with cascade summation capability • Multiplication with cascade subtraction capability • Complex multiplication • Direct vector dot product • Systolic FIR filter

Conclusion Fixed Functions Architecture Interconnect Software

Additional Resources Max Maxfield: Bebop to the Boolean Boogie What is Programmable Logic? Programmable Logic Wikibook (Work in progress- want to help?) Altera, Lattice, Microsemi, Xilinx web sites for data sheets, users manuals and software downloads

This Week’s Agenda 1/12/15 An Introduction to Programmable Logic 1/13/15 Switches and Logic 1/14/15 Specialized Functions 1/15/15 Adding Processors 1/16/15 Software Tools

Programmable Logic- How do they do that?

Similar presentations

Presentation on theme: "Programmable Logic- How do they do that?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Programmable Logic- How do they do that?

Similar presentations

Presentation on theme: "Programmable Logic- How do they do that?"— Presentation transcript:

Similar presentations

About project

Feedback