By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010.

Slides:



Advertisements
Similar presentations
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
Advertisements

DSPs Vs General Purpose Microprocessors
Programmable FIR Filter Design
DSP-CIS Chapter-5: Filter Realization
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Distributed Arithmetic
Image Compression System Megan Fuller and Ezzeldin Hamed 1.
ECE 353 Introduction to Microprocessor Systems Michael G. Morrow, P.E. Week 14.
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Ordinary least squares regression (OLS)
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Distributed Arithmetic: Implementations and Applications
S UB -N YQUIST S AMPLING DSP & S UPPORT C HANGE D ETECTOR M IDTERM PRESENTATION S UB -N YQUIST S AMPLING DSP & S UPPORT C HANGE D ETECTOR M IDTERM PRESENTATION.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
GPGPU platforms GP - General Purpose computation using GPU
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
Sub-Nyquist Reconstruction Final Presentation Winter 2010/2011 By: Yousef Badran Supervisors: Asaf Elron Ina Rivkin Technion Israel Institute of Technology.
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
High Speed, Low Power FIR Digital Filter Implementation Presented by, Praveen Dongara and Rahul Bhasin.
A bit-streaming, pipelined multiuser detector for wireless communications Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Implementation of Finite Field Inversion
A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth Paul Nikhil Jayakumar Sunil P. Khatri Department of.
FPGA Implementations for Volterra DFEs
TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
Performed by: Yaron Recher & Shai Maylat Supervisor: Mr. Rolf Hilgendorf המעבדה למערכות ספרתיות מהירות הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל.
Characterization Presentation Spring 2010 ASIC Tester Abo-Raya Dia- 4 th year student Damouny Samer- 4 th year student 10-April1 Supervised by: Ina Rivkin.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.
Scientific Computing Singular Value Decomposition SVD.
Digital Phase Control System for SSRF LINAC C.X. Yin, D.K. Liu, L.Y. Yu SINAP, China
High Speed Digital Systems Lab. Agenda  High Level Architecture.  Part A.  DSP Overview. Matrix Inverse. SCD  Verification Methods. Verification Methods.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
Sub-Nyquist Sampling Algorithm Implementation on Flex Rio
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Final Presentation Final Presentation OFDM implementation and performance test Performed by: Tomer Ben Oz Ariel Shleifer Guided by: Mony Orbach Duration:
Performed by Greenberg Oleg Kichin Dima Winter 2010 Supervised by Moshe Mishali Inna Rivkin.
Chapter One Introduction to Pipelined Processors
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Sub-Nyquist Reconstruction Characterization Presentation Winter 2010/2011 By: Yousef Badran Supervisors: Asaf Elron Ina Rivkin Technion Israel Institute.
Presenters: Genady Paikin, Ariel Tsror. Supervisors : Inna Rivkin, Rolf Hilgendorf. High Speed Digital Systems Lab Yearly Project Part A.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
1 Fundamentals of Computer Science Combinational Circuits.
Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Annual project אביב תשס " ט.
Company LOGO Project Characterization Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
1 Implementation of Polymorphic Matrix Inversion using Viva Arvind Sudarsanam, Dasu Aravind Utah State University.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
96-channel, 10-bit, 20 MSPS ADC board with Gb Ethernet optical output
DESIGN AND IMPLEMENTATION OF DIGITAL FILTER
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Green Filters Cascade Polyphase M-to-1 Down Sample Filter, Inner Filter, and Polyphase 1-to-M Up Sample Filter fred harris.
Real time signal processing
Presentation transcript:

By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010

Agenda Project Overview Hardware Features Project Goals & Agenda Design Overview – End to End Expander Module CTF Module DSP & SCD Modules Memory Module Debug Module Gantt Chart

Project Overview This project is part of the Sub-Nyquist Sampling & Reconstruction card. The design is to be implemented on a card consisting of 4 Altera Stratix-III FPGAs, as well as a set of DDR memories. The currently suggested implementation requires significant resources, and is implemented on 3 FPGA’s. The design consists 5 of separate blocks, designed by 8 groups. Data is represented in 18 bit Fixed Point, 16 bits fraction.

Hardware Features

Hardware Features (cont.)

HIDDEN Expand Sequences 4:12 מרחיב 4:12 וגם שולח פרוסות של 2 MHz Memory שומר את ה Y כדי לחשב בהמשך Z=A^Y Support Change Detector לגלות שינוי ב Y ים DSP יחידת שיחזור מחשב ^A Analog Back-End Analog System + A/D Controller Samples Bundle Support CTF 1) בונה Q 2) מחשב מ Q וקטור אנרגיה U u y 288 bits Z 12bits 20MHz 12bits 10 X 2MHz Request for iteration If change

Project Goals & Agenda Reducing the design to 2 FPGA’s, at the expense of latency: Studying each group’s activity – algorithms, implementations, resources utilized, etc. Pointing out possible efficiency improvements: Resources that can be reused Implementations that exceed requirements Hardware idleness Implementing improvements Ultimately, suggesting the optimal architecture to be implemented in an ASIC

Design Overview – End to End Expand Sequences 4:12 Morad, Amir Memory Architecture CTF Support Change Detector Omer, Daniel DSP Omer, Daniel Analog Back-End Analog System + A/D Controller Samples Bundle Support Eli, Tzvika Yoni A†A†

Expander Module Description: In Normal Operation Mode: Receives 4 channels at 60 MHz, expands each to 3 slices of 20MHz - a total of 12 channels - and sends them to the Memory block (for later reconstruction) as well as to the CTF & Support Change Detector In Iteration Mode: Creates 10 slices of 2MHz out of each 20MHz slice, and sends them to the CTF block for support calculation, in iterations – a different slice each cycle – A total of 12 slices per iteration, 10 iterations required

Expander Module Algorithm: Modulate (if needed) – multiply by Sine/Cosine coefficients LPF – using a FIR polyphase Kaiser filter, 240 taps FIR filters are used for added stability and linear phase Polyphase filters are used for efficient filtering and decimation using minimal resources (multipliers)

Expander Module (cont.) Total Resource Utilization: 4·8 18x18 multipliers at the modulators 4·3·240/3 18x18 multipliers at the 60MHz  20MHz filters 4·3·2·400/10 18x18 multipliers at the 20MHz  2MHz filters Total: =1952 multipliers There are 448 multipliers per FPGA!

Expander Module (cont.) Possible improvements: Reducing the number of filter taps by widening the transition band: π Reducing the stop band ripple: -70dB Operating at a higher clock frequency: Each channel can be sampled several times, and thus the same filter can be reused for several parallel channels

CTF – Q-Frame Description: Calculates Q frame out of y: Multiplies by Q is Hermitian:

CTF – Q-Frame Algorithm: For non-diagonal elements : Calculates 9 required products: Calculates elements of Q using the above products: For diagonal elements, using 6 products:

CTF – Q-Frame (cont.) Total Resource Utilization: Total multiplier requirements: 42 basic multipliers 36 Two-multiply-adders Basic Multiplier Two-Multiply Adder

CTF – OMP Description: Receives Q frame: Calculates using Orthogonal Matching Pursuit (OMP) Gets support from

CTF – OMP (cont.) Algorithm: A residue matrix R is loaded with Q U is calculated using iterations as follows: The matrix A is projected on the residue matrix R: The energy of each row in the projection is calculated: The row A i with the max projection energy is added to the support An orthogonal vector is constructed from A i using Gram-Schmidt process The projection of R on the orthogonal vector is subtracted from R The energy of the residue matrix R is calculated: If the energy of the residue is greater than a predefined threshold, continue to next iteration

CTF – OMP (cont.) Total Resource Utilization: Row by matrix multiplier, x18 complex multipliers 18-bit, operations operation: 12 18x18 complex multipliers bit adders Total Hardware requirements approximation: 18x18 complex multipliers: 156

CTF Possible improvements: Increasing clock frequency to speed up support calculation Using less multipliers for the calculations at the cost of additional latency (pipelining) Sharing multipliers with the DSP pseudo-inverse block (both never work simultaneously)

DSP & SCD Description: Receives the support from CTF Calculates A †, the Moore-Pennrose pseudo-inverse of A Reconstructs the original signal Detects a change in the support

DSP Algorithm – Pseudo inverse & Reconstruction: Receive the support S from CTF block Create A S from the columns of A that are in the support Decompose A S to an orthogonal matrix Q and an upper- triangular matrix R using QR decomposition (computed using Householder reflections) Inverse R using the upper-triangular matrix inversion algorithm Calculate the pseudo inverse by Reconstruct z[n] by matrix multiplication:

SCD Algorithm – Support Change Detection: Add an extra support to the matrix A s After Pseudo inverse, create a control vector from Multiply the control vector by 12 samples and sum up the result. If the energy level is high - a support change has occurred: Instruct the CTF to calculate a new support If the support has failed several times in Normal Operation Mode, instruct the CTF to switch to Iteration Mode If the support has failed several times in Iteration Mode, indicate that there is a problem.

DSP & SCD (cont.) Total Resource Utilization: QR decomposition x18 Complex multipliers Matrix Pseudo-Inverse x18 Complex multipliers Matrix Multiplication – 24 18x18 Complex multipliers Sample Multiplication – 48 18x18 Complex multipliers

DSP & SCD (cont.) Possible Improvements : Increasing clock frequency to speed up non-realtime calculations (pseudo-inverse, matrix multiplication) Using less multipliers for the calculations at the cost of additional latency (pipelining) Sharing multipliers with the CTF block (both never work simultaneously) Examining other decompositions (SVD, LQ, Cholesky, etc.)

Memory Description: Memory block designed as a FIFO to store sampled channels Designed to delay the input long enough to calculate a new support and a new A † Possible Improvements: If there is a shortage in on-chip memory, using an external DDR memory chip can be considered

Debug Modules Description: Designed to debug each block of the design separately Consists of a signal generator for the input of the block, and a FIFO memory to hold the output Possible Improvements: If these modules are expensive in hardware, two firmware versions can be prepared – a compact version without the debug modules, and a complete one with them

Gantt Chart

Hidden

Thank You!