Download presentation
Presentation is loading. Please wait.
1
A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System
2
Project Scope To compare the implementation efficiencies (area times delay) of Distributed Arithmetic (DA), RNS and DA- RNS based parallel multiply accumulate architectures on FPGAs
3
Background and Context FPGAs increasingly used for DSP computations FPGAs have potential for parallelism FPGAs architecture exploitation (LUT based) Novel MAC architectures especially suitable for FPGAs
4
Some More Background In DSP MACs use constant coefficient (Fixed Multiplicand) Full Multiplier Implementation Not Required Not All Multiplier Architecture Efficient for FPGAs
5
Motivation Distributed Arithmetic and Residue Arithmetic techniques are LUT based techniques Explore the “synergy” between FPGA architecture and above mentioned techniques
6
Distributed Arithmetic Overview
7
Basic Serial Architecture
8
Residue Arithmetic Overview (z1, z2,..., zn) = ( x1, x2, …, xn) (y1,y2, …, yn) zi = (xi yi) mod mi denotes any of the modulo operations of addition, subtraction or multiplication
9
Modulo Adder
10
Modulo Constant Multiplier Due to the small sizes of residues and a constant multiplicand, a direct LUT based implementation is very efficient 4-bit Constant Modulo Multiplier A0 A1 A2 A3 X[3:0] 5-bit Constant Modulo Multiplier A0 A1 A2 A3 X[4:0] A4
11
RNS MAC Architecture
12
Conversion Issues in RNS Binary to RNS and RNS to Binary Conversion are significant overheads Binary to RNS relatively simple RNS to Binary Using a Direct CRT Implementation Requires Modulo M adders
13
Forward Conversion
14
Reverse Conversion
15
DA-RNS Coupling
16
Scaling Accumulator Design
17
DA 8-bits 8 Taps 12-bits Coefficients Implementation
18
Critical Path Results Source: PSC8_0_PSC_0/I_Q7 (FF) Destination: SACC24_REG2/I_Q3 (FF) Data Path: PSC8_0_PSC_0/I_Q7 to SACC24_REG2/I_Q3e)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.