Applications of Distributed Arithmetic to Digital Signal Processing:

Slides:

Advertisements

Similar presentations

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Advertisements

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 11 Cordic, Log, Square, Exponential Functions.

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Distributed Arithmetic

1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.

Design of a Power-Efficient Interleaved CIC Architecture for Software Defined Radio Receivers By J.Luis Tecpanecatl-Xihuitl, Ruth Aguilar-Ponce, Ashok.

CENG536 Computer Engineering Department Çankaya University.

ENGIN112 L4: Number Codes and Registers ENGIN 112 Intro to Electrical and Computer Engineering Lecture 4 Number Codes and Registers.

Digital Signal Processing (DSP) Fundamentals. Overview What is DSP? Converting Analog into Digital –Electronically –Computationally How Does It Work?

ELEC692 VLSI Signal Processing Architecture Lecture 9 VLSI Architecture for Discrete Cosine Transform.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Why Systolic Architecture ? VLSI Signal Processing 台灣大學電機系吳安宇.

Distributed arithmetic SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic All the slides have been copied.

Distributed Arithmetic: Implementations and Applications

A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System.

Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.

GPGPU platforms GP - General Purpose computation using GPU

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU A New Algorithm to Compute the Discrete Cosine Transform VLSI Signal Processing 台灣大學電機系.

Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multirate Processing of Digital Signals: Fundamentals VLSI Signal Processing 台灣大學電機系吳安宇.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Adaptive Signal Processing -Equalization- Ben Advisor: Prof. An-Yeu Wu 2008/01/22.

1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.

A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Digital System Design Course Introduction Lecturer ：吳安宇 Date ： 2004/02/20.

Fixed-Point Arithmetics: Part II

Introduction to Compressive Sensing

Fall 2012: FCM 708 Foundation I Lecture 2 Prof. Shamik Sengupta

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Digital System Design Course Introduction Lecturer ：吳安宇 Date ： 2005/2/25.

DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.

D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product.

Chapter 0 deSiGn conCepTs EKT 221 / 4 DIGITAL ELECTRONICS II.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Undergraduate Projects Speaker: Wes Adviser: Prof. An-Yeu Wu Date: 2015/09/22 Lab.

Arithmetic Logic Unit (ALU) Anna Kurek CS 147 Spring 2008.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (Coordinate rotation digital computer) Ref: Y. H. Hu, “CORDIC based VLSI architecture.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.

Automatic Evaluation of the Accuracy of Fixed-point Algorithms Daniel MENARD 1, Olivier SENTIEYS 1,2 1 LASTI, University of Rennes 1 Lannion, FRANCE 2.

1. Adaptive System Identification Configuration[2] The adaptive system identification is primarily responsible for determining a discrete estimation of.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Low-Power CMOS Design For Advanced VLSI Design and VLSI Signal Processing Courses

Overview of Residue Number System (RNS) for Advanced VLSI Design and VLSI Signal Processing NTUEE 吳安宇.

Speaker: Darcy Tsai Advisor: Prof. An-Yeu Wu Date: 2013/10/31

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (COordinate Rotation DIgital Computer) For Advanced VLSI and VLSI Signal Processing.

Chapter 6 Discrete-Time System. 2/90  Operation of discrete time system 1. Discrete time system where and are multiplier D is delay element Fig. 6-1.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU 99-1 Under-Graduate Project Design of Datapath Controllers Speaker: Shao-Wei Feng Adviser:

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Multirate Processing of Digital Signals (II): Short-Length FIR Filter VLSI Signal Processing.

ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Brief Overview of Residue Number System (RNS) VLSI Signal Processing 台灣大學電機系吳安宇.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.

CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.

FFT VLSI Implementation

VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.

Analog-Digital Conversion. Other types of ADC i. Dual Slope ADCs use a capacitor connected to a reference voltage. the capacitor voltage starts at zero.

CORDIC (Coordinate rotation digital computer)

Combinational Logic Design

CORDIC (Coordinate rotation digital computer)

ELECTRONICS AND COMMUNICATION ENGINEERING

Introduction to Microprocessors

Adaptation Behavior of Pipelined Adaptive Filters

ECE 434 Advanced Digital System L13

Brief Overview of Residue Number System (RNS)

EE 445S Real-Time Digital Signal Processing Lab Spring 2014

Applications of Distributed Arithmetic to Digital Signal Processing:

Chapter 6 Discrete-Time System

University of Texas at Austin

C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor

FFT VLSI Implementation

1.Introduction to Advanced Digital Design (14 marks)

Fixed-point Analysis of Digital Filters

Chapter 14 Arithmetic Circuits (II): Multiplier Rev /12/2003

Applications of Distributed Arithmetic to Digital Signal Processing:

Computer Architecture

Presentation transcript:

Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review VLSI Signal Processing 台灣大學電機系吳安宇 Ref: Stanley A. White, “Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review,” IEEE ASSP Magazine, July, 1989

Outline Distributed Arithmetic (DA) Introduction Technical overview of DA Increasing the speed of DA multiplication Application of DA to a biquadratic digital filter Conclusions

Distributed Arithmetic (DA, 1974) Introduction The most-often encountered form of computation in DSP: Sum of product Dot-product Inner-product Executed most efficiently by DA By careful design one may reduce gate count in a signal processing arithmetic unit by a number seldom smaller then 50% and often as large as 80%

Technical overview of DA Advantage of DA: efficiency of mechanization A frequently argued: Slowness because of its inherent bit-serial nature (not true) Some modifications to increase the speed by employing techniques: Plus more arithmetic operations Expense of exponentially increased memory

Technique overview -continued I The sum of product: where xk is a 2’s-complement binary number scaled such that | xk |<1 Ak is fixed coefficients xk : {bk0, bk1, bk2……, bk(N-1) } , word length=N where bk0 is the sign bit Express each xk as: (2)代入(1) => (1) (2) (3)

Technique overview -continued II Critical step where K=No. of taps (inputs), N: Wordlength of Data (3) (4)

Technique overview -continued III Consider the equation (4) has only 2K possible values We can store it in a lookup-table(ROM): size=2*2K

Technique overview -continued IV Example Let no. of taps K=4 and the fixed coefficients are A1=0.72, A2=-0.3, A3=0.95, A4=0.11 We need 22K = 32-word ROM (k=4)

Example Unfolding

Example -continued I Hardware architecture

Example -continued II Shorten the table eq. (4) (5)

Example -continued II Hardware architecture Only 16 words of ROM are required,now.

Technique overview -advanced Input (6) 2‘s-complement (7)

Technique overview -advanced I 代入 Where and

Technique overview -advanced II Hardware architecture

Increase the speed of DA multiplication The way I: Plus more arithmetic operations Initial condition Even part Odd part

Increase the speed of DA multiplication Hardware architecture of way I -at the expense of linearly increased memory & arithmetic operation Initial Condition 1/2*Q(0) Odd part(sign} Even part

Increase the speed of DA multiplication The way II: - at the expense of exponentially increased memory ROM : 2*7 words => 1*128 words Hardware architecture

Application of DA to a biquadratic digital filter Transfer function of a typical biquadratic digital filter: The time-domain description:

Application of DA to a biquadratic digital filter Hardware architecture

Application of DA to a biquadratic digital filter The vector matrix equation The relationships between two equations

Application of DA to a biquadratic digital filter Normal form biquadratic filter

Application of DA to a biquadratic digital filter DA realization

Application of DA to a biquadratic digital filter Increase speed 1*3 BAAT DA structure 4*3 BAAT DA structure

Application of DA to a biquadratic digital filter Increase speed II 2048 word/ROM require 4 ROM 4 operations

Application of DA to a biquadratic digital filter Increase speed III 32 word/ROM require 8 ROM 8 operations

Conclusions DA is a very efficient means to mechanize computations that are dominated by inner products If performance/cost ratio is critical, DA should be seriously considered as a contender When a many computing methods are compared, DA should be considered. It is not always (but often) best, and never poorly.