MIT Lincoln Laboratory 999999-1 XYZ 3/11/2005 1 Hardware Based Floating Point Processing All the ingredients for FPGA based floating point –28 nm Variable.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
PCI Express® technology in 28-nm FPGAs
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
1 Group representations Consider the group C 4v ElementMatrix E C C C
K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems Ali Irturk †, Bridget Benson †, Nikolay Laptev ‡, Ryan Kastner.
FPGA chips and DSP Algorithms By Emily Fabes. 2 Agenda FPGA Background Reasons to use FPGA’s Advantages and disadvantages of using FPGA’s Sample VHDL.
Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
ASIC vs. FPGA – A Comparisson Hardware-Software Codesign Voin Legourski.
Distributed Arithmetic: Implementations and Applications
Physical Implementation 1)Manufactured Integrated Circuit (IC) Technologies 2)Programmable IC Technology 3)Other Technologies Manufactured IC Technologies.
DSP in FPGA.
S UB -N YQUIST S AMPLING DSP & S UPPORT C HANGE D ETECTOR M IDTERM PRESENTATION S UB -N YQUIST S AMPLING DSP & S UPPORT C HANGE D ETECTOR M IDTERM PRESENTATION.
MIT Lincoln Laboratory XYZ 3/11/2005 VSIPL and SAR Performance on Multiple Generations of Intel® Processors Peter Carlston, Platform Architect,
4.3 The Inverse of a Matrix Warm-up (IN)
ALTERA UP2 Tutorial 1: The 15 Minute Design. Figure 1.1 The Altera UP 1 CPLD development board. ALTERA UP2 Tutorial 1: The 15 Minute Design.
GPGPU platforms GP - General Purpose computation using GPU
© 2010 Altera Corporation—Public DSP Innovations in 28-nm FPGAs Danny Biran Senior VP of Marketing.
© 2012 Altera Corporation—Public Floating Point Vector Processing using 28nm FPGAs HPEC Conference, Sept Michael ParkerAltera Corp Dan PritskerAltera.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
Future FPGA Development Duane McDonald Digital Electronics 3.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
Highest Performance Programmable DSP Solution September 17, 2015.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
Efficient FPGA Implementation of QR
4.5 Solving Systems using Matrix Equations and Inverses.
CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 1 Ideas for the design of an ASIP for LQCD Target Compiler Technologies CASTNESS’11, Rome,
HPEC_NDP-1 MAH 10/11/2015 MIT Lincoln Laboratory Efficient Multidimensional Polynomial Filtering for Nonlinear Digital Predistortion Matthew Herman, Benjamin.
A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Decimal Multiplier on FPGA using Embedded Binary Multipliers Authors: H. Neto and M. Vestias Conference: Field Programmable Logic and Applications (FPL),
Department of Communication Engineering, NCTU
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
IV. Implementation system by Hardware Fig.3 Experimental system.
Advanced Trig Exam Review Day Three: Matrices. Solving Systems of Equations.
Have we ever seen this phenomenon before? Let’s do some quick multiplication…
A Flexible DSP Block to Enhance FGPA Arithmetic Performance
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Configuration Bitstream Reduction for SRAM-based FPGAs by Enumerating LUT Input Permutations The University of British Columbia© 2011 Guy Lemieux Ameer.
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.
Sub-Nyquist Sampling Algorithm Implementation on Flex Rio
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Company LOGO Final presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Presenters: Genady Paikin, Ariel Tsror. Supervisors : Inna Rivkin, Rolf Hilgendorf. High Speed Digital Systems Lab Yearly Project Part A.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
© 2009 Altera Corporation Floating Point Synthesis From Model-Based Design M. Langhammer, M. Jervis, G. Griffiths, M. Santoro.
3.8B Solving Systems using Matrix Equations and Inverses.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
Accelerating particle identification for high-speed data-filtering using OpenCL on FPGAs and other architectures for FPL 2016 Srikanth Sridharan CERN 8/31/2016.
Altera Stratix II FPGA Architecture
Introduction to Programmable Logic
Instructor: Dr. Phillip Jones
Enabling High-Precision, High-Performance DSP Variable-Precision DSP Architecture 2010 Technology Roadshow Hello and welcome to the “Enabling High-Precision,
Alternative Processor Panel Results 2008
1.11 Use Inverse Matrices to Solve Linear Systems
L4-5/L4-6 Objective: Students will be able to evaluate determinants of matrices.
Presentation transcript:

MIT Lincoln Laboratory XYZ 3/11/ Hardware Based Floating Point Processing All the ingredients for FPGA based floating point –28 nm Variable Precision DSP Block 28nm silicon enhancements for floating point –IP: Fused Datapath: efficient floating point in Altera FPGAs –Tools: DSP Builder Advanced Blockset Matlab/Simulink native support is floating point Cholesky decomposition / matrix inversion design –Programmable in both matrix size and vector size Ex: 20x20 with vector size of 10, 240x240 with vector size of 60 –Up to 1 Million matrices per second in a single core for time interleaved 20x20 matrices, multiple cores possible in single device Floating point roadmap –Silicon / Tools / IP

MIT Lincoln Laboratory XYZ 3/11/ Single Channel Results: Single Core Single Channel Cholesky Inversion Core Matrix Size Vector Size Dot Product Utilization Throughput (matrices/sec) Latency (us) 240x % x % x %17, x %24, x %37, x %14, x %22, x %37, x %73, Multiple single precision Cholesky cores may be implemented within a single FPGA

MIT Lincoln Laboratory XYZ 3/11/ Multi-channel Results: Single Core Multi-Channel Cholesky Inversion Core Matrix Size Vector Size Number of channels Dot Product Utilization Throughput (matrices/sec) Latency (us) 100x %29, x %148, x %42, x %118, x %526, x %1,000,00096 Multiple single precision Cholesky cores may be implemented within a single FPGA

MIT Lincoln Laboratory XYZ 3/11/ Cholesky Throughput per FPGA device Cholesky Throughput per Stratix V FPGA Matrix Size Vector Size Device: Stratix V 5SGSD8 Number of cores per FPGA Throughput per core (matrices/sec) Throughput per FPGA device (matrices/sec) 20x2020single die, 700 kLE, 4000 mults 181M18 Million 240x ,00018 Thousand Multiple Cholesky cores implemented within a single FPGA