Download presentation
Presentation is loading. Please wait.
Published byCharity Patrick Modified over 9 years ago
1
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital Systems lab Electrical Engineering faculty Technion – Israeli institute of technology
2
Outline Overview – Goals and discussion Algorithm review Implementation in hardware Changes for Adaptation to hardware Evaluation Possible Optimization & Future Work
3
Overview The Goal system The module’s Objectives Interface Memory CTF (Support recovery) DSP (Baseband) Analog Back-end (Realtime) Detector Expand 1:q DSP (Baseband) DSP & SUPPORT CHANGE DETECTOR A matrix vector 432 bits Support Anlysis vector 101 bits First Beta (For QR decomposition) 36 bits Samples Bundle 432 bits Support Changed 1 bit Valid Supports 1 bit A Matrix Address 9 bits Valid samples 1 bit
4
Outline Overview – Goals and discussion Algorithm review Implementation in hardware Changes for Adaptation to hardware Evaluation Possible Optimization & Future Work
5
Algorithm Review Pseudo-Inverse – Matrix Decomposition – Matrix Inversion – Matrix Multiplication Support Change Detection – Support threshold evaluation attempt
6
Algorithm Review – Pseudo Inverse Matrix Decomposition QR Decomposition Using Householder Reflections
7
Algorithm Review – Pseudo Inverse Matrix Inversion – Gaussian Elimination Matrix Multiplication Matrix Multiplier Vector Multiplier Matrix Multiplier’s Common Interface
8
Algorithm Review - SCD The support change detector is a vector multiplier – given one row of the pseudo inversed A matrix and multiply it by the signal to see if any energy there is not noise. Threshold generation attempt: – If there was no support change – If we replace W with the average: – The generated value doesn't show any false alarms. But may have misdetection on several cases where the SNR is low. *Eventually The Threshold was defined as an input by the user. Our estimated guess for threshold is 000001000110010100 (for the AM demo) ~0.3
9
DSP & SCD system operation QR Decomposition Upper triangular matrix inverse Matrix multiplier R Q’ Auxiliary multiplications Reflections creation Reflection multiplication R inversed Delay FIFO A Matrix RAM Real Time Matrix- Samples Multiplier Ping-Pong Buffer (RAM) A dagger Expand Support Change Detector Control Vector CTF Support indexes A_s Samples From Expand Reconstructed Signal '1'
10
Outline Overview – Goals and discussion Algorithm review Implementation in hardware Changes for Adaptation to hardware Evaluation Possible Optimization & Future Work
11
Implementation In Hardware QR Decomposition Inverting an upper triangular matrix Matrix Multiplier Block (Entities) Definition – Pseudo Inverse QR Decomposition Matrix Multiplier Matrix Inversion
12
Implementation In Hardware Block (Entities) Definition – Pseudo Inverse QR Decomposition Phase 2 Phase 1 Aux 2 24 Multipliers Beta calculation unit
13
Matrix Inversion Unit Implementation In Hardware Block (Entities) Definition – Pseudo Inverse Vector Inversion Unit Vector Inverter FIFO for Original R Matrix
14
Implementation In Hardware Matrix Multiplie r RAM SCD Real Time Mult
15
Outline - Adaptation to Hardware Overview – Goals and discussion Algorithm review Implementation in hardware Adaptation to hardware – Complex Enhance – Normalizing the Input – Resolution (Overflow) discussion – SCD – running average – Timing issues Evaluation Possible Optimization & Future Work
16
Complex Enhance To avoid all complex multiplications we changed the structures of the matrix. The matrix is 4 times bigger. For every complex vector multiplication we can still multiply 1 vector with another vector the ordinary way, and get the correct results.
17
Normalizing the Input Accuracy falls with smaller mantissa Matrices can be normalized pre inverse and post inverse Hence: Motivation – The real data differed from the synthetic data given – thus 18 bits are not enough (we need to represent both the number and 1 divided by the number). – Normalizing the matrix allows us to play with the fraction to minimize error and underflow.
18
Support Change Detection – with running average Vector multiplier Cycle counter Control vector RAM Samples MUX REG6REG7REG8REG1REG2REG3REG5REG4 + Detection > Threshold
19
Timing Deep pipeline – We incorporated a deeper pipeline to make the module work on the high desired frequency. The Quartus currently shows that the module may perform only up to the given frequency. It is possible to rise it by raising the pipe levels in the bottlenecks found in the design. Clocks – Main clock – 20 MHz may rise to 70MHz – Working clock for pseudo inverse – 100 MHz – currently non flexible Hardware reuse – The matrix multiplier and the inverse unit use a single unit for a vector size for many iterations – hence they make the bottlenecks.
20
Bottlenecks in the design Matrix Inverse Matrix Inverse Matrix Multiplier Matrix Multiplier Beta calculation in the QR – heavy arithmetic actions taking place. Beta calculation in the QR – heavy arithmetic actions taking place. If we replace the arithmetic units within these entities with higher pipeline units (the division is 23 cycles, the square root is 11 cycles and the multiplier is 2) – the maximal frequency will rise. No real reason to activate with a higher clock except when memory on the chip is lacking for the delay FIFO or speed being an actual necessity.
21
Resource Consumption Total numbers taken from Stratix III FPGA EP3SE260F1152C2
22
DSP – Runtime Analysis Worse case pseudo inverse timing (for 11 support vectors) is a delay of 0.5 milliseconds. Hence an appropriate delay FIFO is required. The SCD and reconstruction multiplier works in real time (1 cycle 50 ns).
23
Outline Overview – Goals and discussion Algorithm review Implementation in hardware Changes for Adaptation to hardware Evaluation – Testing method – Results – discussion – Conclusions Possible Optimization & Future Work
24
Evaluation - Testing Input text files Output text files Matlab (fixed point) = VHDL Logical Testing Expanded samples CTF output support VHDL – Test bench A matrix memory Status parser Functional module DSPSCD
25
Evaluation - Testing Input text files Output text files Analysis & Comparison to Modelsim On Chip Testing Expanded samples CTF output support Debug Environment A matrix RAM CTF model & FIFO ctrl Functional module DSPSCD
26
Evaluation - Results Results of the run on FPGA with the following signals – Fm259_252_sin824_809 – Fm259_252_am872.697 – Am_872.697_sin824 SCD test
27
Evaluation - Results FPGA output Matlab simulation
28
Evaluation - Results FPGA output Matlab simulation
29
Evaluation - Results FPGA output Matlab simulation
30
Evaluation - Results Support changed Support Change experiment
31
Evaluation - Discussion Inspection of correctness were done in comparison to Matlab under the following: – Maximal MSE of the calculated pseudo inversed matrix values – Maximal and averaged values of the difference between the results of the matlab simulation and the actual results – By looking and inspecting differences…. The SCD experiment was composed of two uneven support samples bundles put together to inspect correctness and conclude further about the support threshold.
32
Evaluation – conclusions The MSE inspected for the inversed matrix is 10^-3 The MSE for the reconstructed signal: – Maximal 0.04 – Averaged ~10^-6 No actual conclusions were made about the support changes in function – the predictable behavior of the function is only in the support changes.
33
Outline Overview – Goals and discussion Algorithm review Implementation in hardware Changes for Adaptation to hardware Evaluation Possible Optimization & Future Work
34
Future Work Possible Optimizations – Modification to the inversion algorithm for higher parallelism. – Scaling hardware to increase performance. Possibly changing the resolution of the calculations to 22 or more bits for more accurate resolution - great cost in hardware. Integration
35
Summary We have managed to activate the DSP and SCD module on FPGA and got sufficient results. We introduced an algorithm for calculating the support threshold. We changed most architecture to support pipeline and use minimal hardware – vector resolution. Changed debug environment to support a different FPGA.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.