Applications of Distributed Arithmetic to Digital Signal Processing:

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 11 Cordic, Log, Square, Exponential Functions.
Distributed Arithmetic
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Design of a Power-Efficient Interleaved CIC Architecture for Software Defined Radio Receivers By J.Luis Tecpanecatl-Xihuitl, Ruth Aguilar-Ponce, Ashok.
ENGIN112 L4: Number Codes and Registers ENGIN 112 Intro to Electrical and Computer Engineering Lecture 4 Number Codes and Registers.
ELEC692 VLSI Signal Processing Architecture Lecture 9 VLSI Architecture for Discrete Cosine Transform.
CS 151 Digital Systems Design Lecture 4 Number Codes and Registers.
Distributed arithmetic SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic All the slides have been copied.
Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of Product (SOP) Linear convolution y[n] = f[n]*x[n] = Σ f[k]
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Distributed Arithmetic: Implementations and Applications
A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System.
Chapter 7 – Registers and Register Transfers Part 1 – Registers, Microoperations and Implementations Logic and Computer Design Fundamentals.
Chapter 1_4 Part II Counters
GPGPU platforms GP - General Purpose computation using GPU
Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (Coordinate rotation digital computer) Ref: Y. H. Hu, “CORDIC based VLSI architecture.
Applications of Distributed Arithmetic to Digital Signal Processing:
Automatic Evaluation of the Accuracy of Fixed-point Algorithms Daniel MENARD 1, Olivier SENTIEYS 1,2 1 LASTI, University of Rennes 1 Lannion, FRANCE 2.
1. Adaptive System Identification Configuration[2] The adaptive system identification is primarily responsible for determining a discrete estimation of.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
COP3502: Introduction to Computer Science Yashas Shankar Hardware.
CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.
CORDIC (Coordinate rotation digital computer)
Unit IV Finite Word Length Effects
Invitation to Computer Science, C++ Version, Fourth Edition
Combinational Logic Design
CORDIC (Coordinate rotation digital computer)
Computer Architecture & Operations I
ELECTRONICS AND COMMUNICATION ENGINEERING
EEE4176 Applications of Digital Signal Processing
3.1 Introduction Why do we need also a frequency domain analysis (also we need time domain convolution):- 1) Sinusoidal and exponential signals occur.
Research Institute for Future Media Computing
Overview of Residue Number System (RNS) for Advanced VLSI Design and VLSI Signal Processing NTUEE 吳安宇.
KU College of Engineering Elec 204: Digital Systems Design
Outline Introduction Floating Point Arithmetic Adder Multiplier.
Invitation to Computer Science, Java Version, Third Edition
Recent from Dr. Dan Lo regarding 12/11/17 Dept Exam
Boolean Algebra and Digital Logic
Chapter One: Introduction
Tree and Array Multipliers
Arithmetic Where we've been:
Text Book Computer Organization and Architecture: Designing for Performance, 7th Ed., 2006, William Stallings, Prentice-Hall International, Inc.
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Lect5 A framework for digital filter design
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
Computer Organization
Multiplier-less Multiplication by Constants
Programmable Configurations
Applications of Distributed Arithmetic to Digital Signal Processing:
Overview Part 1 - Registers, Microoperations and Implementations
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
University of Texas at Austin
Data Wordlength Reduction for Low-Power Signal Processing Software
C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor
ECE 352 Digital System Fundamentals
Prof. Giancarlo Succi, Ph.D., P.Eng.
Recent from Dr. Dan Lo regarding 12/11/17 Dept Exam
Memory System Performance Chapter 3
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Instruction execution and ALU
Computer Architecture
Computer Architecture
Presentation transcript:

Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review Ref: Stanley A. White, “Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review,” IEEE ASSP Magazine, July, 1989

Outline: Distributed Arithmetic (DA) Introduction Technical overview of DA Increasing the speed of DA multiplication Application of DA to a biquadratic digital filter Conclusions

Distributed Arithmetic (DA, 1974) Introduction The most-often encountered form of computation in DSP: Sum of product Dot-product Inner-product Executed most efficiently by DA By careful design one may reduce gate count in a signal processing arithmetic unit by a number seldom smaller then 50% and often as large as 80%

Technical overview of DA Advantage of DA: efficiency of mechanization A frequently argued: slowness because of its inherent bit-serial nature (not true) Some modifications to increase the speed by employing techniques: Plus more arithmetic operations expense of exponentially increased memory

Technique overview -continued I The sum of product: where xk is a 2’s-complement binary number scaled such that | xk |<1 Ak is fixed coefficients xk : {bk0, bk1, bk2……, bk(N-1) } , word length=N where bk0 is the sign bit Express each xk as: (2)代入(1) => (1) (2) (3)

Technique overview -continued II Critical step where K=No. of taps (inputs), N: Wordlength of Data (3) (4)

Technique overview -continued III Consider the equation (4) has only 2K possible values We can store it in a lookup-table(ROM): size=2*2K

Technique overview -continued IV Example Let no. of taps K=4 and the fixed coefficients are A1=0.72, A2=-0.3, A3=0.95, A4=0.11 We need 22K = 32-word ROM (k=4)

Example Unfolding

Example -continued I Hardware architecture

Example -continued II Shorten the table eq. (4) (5)

Example -continued II Hardware architecture Only 16 words of ROM are required,now.

Technique overview -advanced Input (6) 2‘s-complement (7)

Technique overview -advanced I 代入 Where and

Technique overview -advanced II Hardware architecture

Increase the speed of DA multiplication The way I: Plus more arithmetic operations Initial condition Even part Odd part

Increase the speed of DA multiplication Hardware architecture of way I -at the expense of linearly increased memory & arithmetic operation Odd part(sign} Even part Initial Condition 1/2*Q(0)

Increase the speed of DA multiplication The way II: - at the expense of exponentially increased memory ROM : 2*7 words => 1*128 words Hardware architecture

Application of DA to a biquadratic digital filter Transfer function of a typical biquadratic digital filter: The time-domain description:

Application of DA to a biquadratic digital filter Hardware architecture

Application of DA to a biquadratic digital filter The vector matrix equation The relationships between two equations

Application of DA to a biquadratic digital filter Normal form biquadratic filter

Application of DA to a biquadratic digital filter DA realization

Application of DA to a biquadratic digital filter Increase speed 1*3 BAAT DA structure 4*3 BAAT DA structure

Application of DA to a biquadratic digital filter Increase speed II 2048 word/ROM require 4 ROM 4 operations

Application of DA to a biquadratic digital filter Increase speed III 32 word/ROM require 8 ROM 8 operations

Conclusions DA is a very efficient means to mechanize computations that are dominated by inner products If performance/cost ratio is critical, DA should be seriously considered as a contender When a many computing methods are compared, DA should be considered. It is not always (but often) best, and never poorly.