Download presentation
Presentation is loading. Please wait.
Published byAugustine Cobb Modified over 9 years ago
1
By: Daniel Barsky, Natalie Pistunovich Supervisors: Rolf Hilgendorf, Ina Rivkin Characterization Sub Nyquist Implementation Optimization 11/04/2010
2
Agenda Project Overview Hardware Features Project Goals & Agenda Design Overview – End to End Expander Module CTF Module DSP & SCD Modules Memory Module Debug Module Gantt Chart
3
Project Overview This project is part of the Sub-Nyquist Sampling & Reconstruction card. The design is to be implemented on a card consisting of 4 Altera Stratix-III FPGAs, as well as a set of DDR memories. The currently suggested implementation requires significant resources, and is implemented on 3 FPGA’s. The design consists 5 of separate blocks, designed by 8 groups. Data is represented in 18 bit Fixed Point, 16 bits fraction.
4
Hardware Features
5
Hardware Features (cont.)
6
HIDDEN Expand Sequences 4:12 מרחיב 4:12 וגם שולח פרוסות של 2 MHz Memory שומר את ה Y כדי לחשב בהמשך Z=A^Y Support Change Detector לגלות שינוי ב Y ים DSP יחידת שיחזור מחשב ^A Analog Back-End Analog System + A/D Controller...... Samples Bundle Support CTF 1) בונה Q 2) מחשב מ Q וקטור אנרגיה U u y 288 bits Z 12bits 20MHz 12bits 10 X 2MHz Request for iteration If change
8
Project Goals & Agenda Reducing the design to 2 FPGA’s, at the expense of latency: Studying each group’s activity – algorithms, implementations, resources utilized, etc. Pointing out possible efficiency improvements: Resources that can be reused Implementations that exceed requirements Hardware idleness Implementing improvements Ultimately, suggesting the optimal architecture to be implemented in an ASIC
9
Design Overview – End to End Expand Sequences 4:12 Morad, Amir Memory Architecture CTF Support Change Detector Omer, Daniel DSP Omer, Daniel Analog Back-End Analog System + A/D Controller...... Samples Bundle Support Eli, Tzvika Yoni A†A†
10
Expander Module Description: In Normal Operation Mode: Receives 4 channels at 60 MHz, expands each to 3 slices of 20MHz - a total of 12 channels - and sends them to the Memory block (for later reconstruction) as well as to the CTF & Support Change Detector In Iteration Mode: Creates 10 slices of 2MHz out of each 20MHz slice, and sends them to the CTF block for support calculation, in iterations – a different slice each cycle – A total of 12 slices per iteration, 10 iterations required
11
Expander Module Algorithm: Modulate (if needed) – multiply by Sine/Cosine coefficients LPF – using a FIR polyphase Kaiser filter, 240 taps FIR filters are used for added stability and linear phase Polyphase filters are used for efficient filtering and decimation using minimal resources (multipliers)
12
Expander Module (cont.) Total Resource Utilization: 4·8 18x18 multipliers at the modulators 4·3·240/3 18x18 multipliers at the 60MHz 20MHz filters 4·3·2·400/10 18x18 multipliers at the 20MHz 2MHz filters Total: 960+960+32=1952 multipliers There are 448 multipliers per FPGA!
13
Expander Module (cont.) Possible improvements: Reducing the number of filter taps by widening the transition band: 0.044 π Reducing the stop band ripple: -70dB Operating at a higher clock frequency: Each channel can be sampled several times, and thus the same filter can be reused for several parallel channels
14
CTF – Q-Frame Description: Calculates Q frame out of y: Multiplies by Q is Hermitian:
15
CTF – Q-Frame Algorithm: For non-diagonal elements : Calculates 9 required products: Calculates elements of Q using the above products: For diagonal elements, using 6 products:
16
CTF – Q-Frame (cont.) Total Resource Utilization: Total multiplier requirements: 42 basic multipliers 36 Two-multiply-adders Basic Multiplier Two-Multiply Adder
17
CTF – OMP Description: Receives Q frame: Calculates using Orthogonal Matching Pursuit (OMP) Gets support from
18
CTF – OMP (cont.) Algorithm: A residue matrix R is loaded with Q U is calculated using iterations as follows: The matrix A is projected on the residue matrix R: The energy of each row in the projection is calculated: The row A i with the max projection energy is added to the support An orthogonal vector is constructed from A i using Gram-Schmidt process The projection of R on the orthogonal vector is subtracted from R The energy of the residue matrix R is calculated: If the energy of the residue is greater than a predefined threshold, continue to next iteration
19
CTF – OMP (cont.) Total Resource Utilization: Row by matrix multiplier, 144 18x18 complex multipliers 18-bit, operations operation: 12 18x18 complex multipliers 12 18-bit adders Total Hardware requirements approximation: 18x18 complex multipliers: 156
20
CTF Possible improvements: Increasing clock frequency to speed up support calculation Using less multipliers for the calculations at the cost of additional latency (pipelining) Sharing multipliers with the DSP pseudo-inverse block (both never work simultaneously)
21
DSP & SCD Description: Receives the support from CTF Calculates A †, the Moore-Pennrose pseudo-inverse of A Reconstructs the original signal Detects a change in the support
22
DSP Algorithm – Pseudo inverse & Reconstruction: Receive the support S from CTF block Create A S from the columns of A that are in the support Decompose A S to an orthogonal matrix Q and an upper- triangular matrix R using QR decomposition (computed using Householder reflections) Inverse R using the upper-triangular matrix inversion algorithm Calculate the pseudo inverse by Reconstruct z[n] by matrix multiplication:
23
SCD Algorithm – Support Change Detection: Add an extra support to the matrix A s After Pseudo inverse, create a control vector from Multiply the control vector by 12 samples and sum up the result. If the energy level is high - a support change has occurred: Instruct the CTF to calculate a new support If the support has failed several times in Normal Operation Mode, instruct the CTF to switch to Iteration Mode If the support has failed several times in Iteration Mode, indicate that there is a problem.
24
DSP & SCD (cont.) Total Resource Utilization: QR decomposition - 51 18x18 Complex multipliers Matrix Pseudo-Inverse - 20 18x18 Complex multipliers Matrix Multiplication – 24 18x18 Complex multipliers Sample Multiplication – 48 18x18 Complex multipliers
25
DSP & SCD (cont.) Possible Improvements : Increasing clock frequency to speed up non-realtime calculations (pseudo-inverse, matrix multiplication) Using less multipliers for the calculations at the cost of additional latency (pipelining) Sharing multipliers with the CTF block (both never work simultaneously) Examining other decompositions (SVD, LQ, Cholesky, etc.)
26
Memory Description: Memory block designed as a FIFO to store sampled channels Designed to delay the input long enough to calculate a new support and a new A † Possible Improvements: If there is a shortage in on-chip memory, using an external DDR memory chip can be considered
27
Debug Modules Description: Designed to debug each block of the design separately Consists of a signal generator for the input of the block, and a FIFO memory to hold the output Possible Improvements: If these modules are expensive in hardware, two firmware versions can be prepared – a compact version without the debug modules, and a complete one with them
28
Gantt Chart
29
Hidden
30
Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.