Applications of Distributed Arithmetic to Digital Signal Processing:

Slides:

Advertisements

Similar presentations

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Advertisements

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 11 Cordic, Log, Square, Exponential Functions.

Distributed Arithmetic

1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.

Design of a Power-Efficient Interleaved CIC Architecture for Software Defined Radio Receivers By J.Luis Tecpanecatl-Xihuitl, Ruth Aguilar-Ponce, Ashok.

ENGIN112 L4: Number Codes and Registers ENGIN 112 Intro to Electrical and Computer Engineering Lecture 4 Number Codes and Registers.

ELEC692 VLSI Signal Processing Architecture Lecture 9 VLSI Architecture for Discrete Cosine Transform.

CS 151 Digital Systems Design Lecture 4 Number Codes and Registers.

Distributed arithmetic SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic All the slides have been copied.

Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of Product (SOP) Linear convolution y[n] = f[n]*x[n] = Σ f[k]

Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)

Distributed Arithmetic: Implementations and Applications

A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System.

Chapter 7 – Registers and Register Transfers Part 1 – Registers, Microoperations and Implementations Logic and Computer Design Fundamentals.

Chapter 1_4 Part II Counters

GPGPU platforms GP - General Purpose computation using GPU

Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.

1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.

A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.

DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.

D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (Coordinate rotation digital computer) Ref: Y. H. Hu, “CORDIC based VLSI architecture.

Applications of Distributed Arithmetic to Digital Signal Processing:

Automatic Evaluation of the Accuracy of Fixed-point Algorithms Daniel MENARD 1, Olivier SENTIEYS 1,2 1 LASTI, University of Rennes 1 Lannion, FRANCE 2.

1. Adaptive System Identification Configuration[2] The adaptive system identification is primarily responsible for determining a discrete estimation of.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.

COP3502: Introduction to Computer Science Yashas Shankar Hardware.

CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.

CORDIC (Coordinate rotation digital computer)

Unit IV Finite Word Length Effects

Invitation to Computer Science, C++ Version, Fourth Edition

Combinational Logic Design

CORDIC (Coordinate rotation digital computer)

Computer Architecture & Operations I

ELECTRONICS AND COMMUNICATION ENGINEERING

EEE4176 Applications of Digital Signal Processing

3.1 Introduction Why do we need also a frequency domain analysis (also we need time domain convolution):- 1) Sinusoidal and exponential signals occur.

Research Institute for Future Media Computing

Overview of Residue Number System (RNS) for Advanced VLSI Design and VLSI Signal Processing NTUEE 吳安宇.

KU College of Engineering Elec 204: Digital Systems Design

Outline Introduction Floating Point Arithmetic Adder Multiplier.

Invitation to Computer Science, Java Version, Third Edition

Recent from Dr. Dan Lo regarding 12/11/17 Dept Exam

Boolean Algebra and Digital Logic

Chapter One: Introduction

Tree and Array Multipliers

Arithmetic Where we've been:

Text Book Computer Organization and Architecture: Designing for Performance, 7th Ed., 2006, William Stallings, Prentice-Hall International, Inc.

EE 445S Real-Time Digital Signal Processing Lab Spring 2014

Lect5 A framework for digital filter design

CSE 370 – Winter 2002 – Comb. Logic building blocks - 1

Computer Organization

Multiplier-less Multiplication by Constants

Programmable Configurations

Applications of Distributed Arithmetic to Digital Signal Processing:

Overview Part 1 - Registers, Microoperations and Implementations

Overview Part 1 – Design Procedure Part 2 – Combinational Logic

University of Texas at Austin

Data Wordlength Reduction for Low-Power Signal Processing Software

C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor

ECE 352 Digital System Fundamentals

Prof. Giancarlo Succi, Ph.D., P.Eng.

Recent from Dr. Dan Lo regarding 12/11/17 Dept Exam

Memory System Performance Chapter 3

Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM

Instruction execution and ALU

Computer Architecture

Computer Architecture

Presentation transcript:

Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review Ref: Stanley A. White, “Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review,” IEEE ASSP Magazine, July, 1989

Outline: Distributed Arithmetic (DA) Introduction Technical overview of DA Increasing the speed of DA multiplication Application of DA to a biquadratic digital filter Conclusions

Distributed Arithmetic (DA, 1974) Introduction The most-often encountered form of computation in DSP: Sum of product Dot-product Inner-product Executed most efficiently by DA By careful design one may reduce gate count in a signal processing arithmetic unit by a number seldom smaller then 50% and often as large as 80%

Technical overview of DA Advantage of DA: efficiency of mechanization A frequently argued: slowness because of its inherent bit-serial nature (not true) Some modifications to increase the speed by employing techniques: Plus more arithmetic operations expense of exponentially increased memory

Technique overview -continued I The sum of product: where xk is a 2’s-complement binary number scaled such that | xk |<1 Ak is fixed coefficients xk : {bk0, bk1, bk2……, bk(N-1) } , word length=N where bk0 is the sign bit Express each xk as: (2)代入(1) => (1) (2) (3)

Technique overview -continued II Critical step where K=No. of taps (inputs), N: Wordlength of Data (3) (4)

Technique overview -continued III Consider the equation (4) has only 2K possible values We can store it in a lookup-table(ROM): size=2*2K

Technique overview -continued IV Example Let no. of taps K=4 and the fixed coefficients are A1=0.72, A2=-0.3, A3=0.95, A4=0.11 We need 22K = 32-word ROM (k=4)

Example Unfolding

Example -continued I Hardware architecture

Example -continued II Shorten the table eq. (4) (5)

Example -continued II Hardware architecture Only 16 words of ROM are required,now.

Technique overview -advanced Input (6) 2‘s-complement (7)

Technique overview -advanced I 代入 Where and

Technique overview -advanced II Hardware architecture

Increase the speed of DA multiplication The way I: Plus more arithmetic operations Initial condition Even part Odd part

Increase the speed of DA multiplication Hardware architecture of way I -at the expense of linearly increased memory & arithmetic operation Odd part(sign} Even part Initial Condition 1/2*Q(0)

Increase the speed of DA multiplication The way II: - at the expense of exponentially increased memory ROM : 2*7 words => 1*128 words Hardware architecture

Application of DA to a biquadratic digital filter Transfer function of a typical biquadratic digital filter: The time-domain description:

Application of DA to a biquadratic digital filter Hardware architecture

Application of DA to a biquadratic digital filter The vector matrix equation The relationships between two equations

Application of DA to a biquadratic digital filter Normal form biquadratic filter

Application of DA to a biquadratic digital filter DA realization

Application of DA to a biquadratic digital filter Increase speed 1*3 BAAT DA structure 4*3 BAAT DA structure

Application of DA to a biquadratic digital filter Increase speed II 2048 word/ROM require 4 ROM 4 operations

Application of DA to a biquadratic digital filter Increase speed III 32 word/ROM require 8 ROM 8 operations

Conclusions DA is a very efficient means to mechanize computations that are dominated by inner products If performance/cost ratio is critical, DA should be seriously considered as a contender When a many computing methods are compared, DA should be considered. It is not always (but often) best, and never poorly.