Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

ECE 506 Reconfigurable Computing ece. arizona
Architecture-Specific Packing for Virtex-5 FPGAs
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
Altera FLEX 10K technology in Real Time Application.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
DSD 2007 Concurrent Error Detection for FSMs Designed for Implementation with Embedded Memory Blocks of FPGAs Andrzej Krasniewski Institute of Telecommunications.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Week 2 Dr. Kimberly E. Newman Hybrid Embedded Systems.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Programmable logic and FPGA
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
EE4OI4 Engineering Design Programmable Logic Technology.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Titan: Large and Complex Benchmarks in Academic CAD
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
Presented by Anthony B. Sanders NASA/GSFC at 2005 MAPLD Conference, Washington, DC #196 1 ALTERA STRATIX TM EP1S25 FIELD-PROGRAMMABLE GATE ARRAY (FPGA)
A Flexible DSP Block to Enhance FGPA Arithmetic Performance
Department of Communication Engineering, NCTU 1 Unit 5 Programmable Logic and Storage Devices – RAMs and FPGAs.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
ALTERA FPGAs and NIOSII
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
Power-Aware RAM Processing for FPGAs December 9, 2005 Power-aware RAM Processing for FPGA Embedded Memory Blocks Russell Tessier University of Massachusetts.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Review for Final Exam LC3 – Controller FPGAs Multipliers
BR 1/991 Issues in FPGA Technologies Complexity of Logic Element –How many inputs/outputs for the logic element? –Does the basic logic element contain.
CPLD Vs. FPGA Positioning Presentation
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
B0110 Fabric and Trust ENGR xD52 Eric VanWyk Fall 2013.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Reconfigurable Architectures
Altera Stratix II FPGA Architecture
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Design for Embedded Image Processing on FPGAs
Presentation on FPGA Technology of
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees
We will be studying the architecture of XC3000.
Digital Building Blocks
The Xilinx Virtex Series FPGA
Multiplier-less Multiplication by Constants
A Novel FPGA Logic Block for Improved Arithmetic Performance
Programmable Logic- How do they do that?
Basic Adders and Counters Implementation of Adders
The Xilinx Virtex Series FPGA
ADSP 21065L.
Programmable logic and FPGA
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

Notes on These Slides Altera has disclosed the details of their devices both in online documentation and academic papers The academic papers evaluate different design decisions and tradeoffs; the experiments are a bit too specialized for this course. – Please do not overly emphasize the experimentation in your studies

The Stratix TM Routing and Logic Architecture D.M. Lewis, et al., International Symposium on FPGAs, 2003 Online documentation

Altera Stratix FPGA

Stratix Logic Element (LE)

Register Feedback Mode

Register Cascade (Shift Regs.)

Logic Array Block (LAB)

Directionally Biased Routing Long vertical wires require power drivers – Fewer vertical wires More rows than columns – More demand for horizontal wires

The Stratix II Logic and Routing Architecture D.M. Lewis, et al., International Symposium on FPGAs, 2005 Online documentation

Logic Array Block (LAB)

Adaptive Logic Module (ALM)

Four ALM Operating Modes Normal Mode Extended LUT Mode Arithmetic Mode Shared Arithmetic Mode

Normal Mode

LUT Input Utilization

Extended LUT Mode Some 7-input logic functions

Arithmetic Mode

Arithmetic Mode Example R = (X < Y) ? Y : X (X < Y) Compute X-Y using the carry chain Only look at the carry output Use the carry output to select either X or Y accordingly Configure the LUTs to pass X through unmodified, and ignore the carry chain outputs

Shared Arithmetic Mode (3-input Add)

Register Chain (Shift Registers) Separates logic and shift register functions Cycle 1 Combination logic Cycles 2..k+1 Shift by k …

ALM Benefits Reduced LAB area by 2.6% compared to Stratix 15% performance improvement When shrinking from a 0.13um(Stratix) to 90nm (Stratix II) technology node – 51% performance improvement – 50% area decrease

TriMatrix Embedded Memories

M512 RAM Block Functions 1-port RAM 2-port RAM FIFO ROM Shift Register 576 RAM bits (32 x 18), includes parity bits

M4K RAM Block 4,608 RAM bits (128 x 36), includes parity bits Functions 1-port RAM 2-port RAM True 2-port RAM FIFO ROM Shift Register

M-RAM Block 589,824 RAM bits (4K x 144), includes parity bits Functions 1-port RAM 2-port RAM True 2-port RAM FIFO

MRAM LAB Interface

DSP Blocks Eight 9x9 multipliers Four 18x18 multipliers One 36x36 multiplier

Add/Sub/Accum Functions Multiplier Multiply-Accum AB + CD AB + CD + EF + GH DSP Block Internals

DSP Block Interconnect Interface

Architectural Enhancements in Stratix-III TM and Stratix-IV TM D.M. Lewis, et al., International Symposium on FPGAs, 2009 Online documentation (Stratix III) Online documentation (Stratix IV)

New Features Programmable power management LUT-RAM LUT-Register Mode Enhanced DSP Block

Programmable Body Bias Control Large regions Less body bias control circuitry Small regions Fine-grained power mgmt

Power Efficiency

LUT-RAM SRAM x y Idea Use the SRAM bits as memory Granularity is LAB-wide What is needed? Write capability Signals for address and data for the write path

LUT-RAM Architecture Supports one read + one write in a single cycle

MLAB vs. LAB

ALM LUT-Register Mode /c6/R-S_mk2.gif

ALM LUT-Register Mode

DSP Block Capabilities High-performance, power-optimized, fully registered and pipelined multiplication operations Natively supported 9-bit, 12-bit, 18-bit, and 36-bit wordlengths Natively supported 18-bit complex multiplications Efficiently supported floating-point arithmetic formats (24-bit for single precision and 53-bit for double precision) Signed and unsigned input support Built-in addition, subtraction, and accumulation units to combine multiplication results efficiently Cascading 18-bit input bus to form tap-delay line for filtering applications Cascading 44-bit output bus to propagate output results from one block to the next block without external logic support Rich and flexible arithmetic rounding and saturation units Efficient barrel shifter support Loopback capability to support adaptive filtering

DSP Block Overview

Multiply-Add

4-Multiply Add w/Accumulation

Cascading Output for FIR Filters

Full DSP Block

Half-DSP Block Architecture

Four 9-bit Independent Half-DSP Multiplier Mode

Three 12-bit Independent Half-DSP Multiplier Mode

Two 18-bit Independent Half-DSP Multiplier Mode

36-bit Half-DSP Multiplier Mode

54x54-bit Multiplier Mode Used for double-precision floating-point

Architectural Enhancements in Stratix-V TM D.M. Lewis, et al., International Symposium on FPGAs, 2013 Online documentation

Larger MLAB/LUT-RAM

4 Flip-Flops per ALM

Embedded Memories with Error Correction Codes (ECC)