Divide-and-Conquer Design

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

David Hansen and James Michelussi
Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University.
Fast Fourier Transform for speeding up the multiplication of polynomials an Algorithm Visualization Alexandru Cioaca.
UNIVERSITY OF MASSACHUSETTS Dept
Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Fast Fourier Transform Lecture 6 Spoken Language Processing Prof. Andrew Rosenberg.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Fast Fourier Transform (FFT) (Section 4.11) CS474/674 – Prof. Bebis.
Lecture 17: Adders.
Introduction to Algorithms
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Fast Fourier Transform Irina Bobkova. Overview I. Polynomials II. The DFT and FFT III. Efficient implementations IV. Some problems.
Fast Fourier Transforms
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
FFT USING OPEN-MP Done by: HUSSEIN SALIM QASIM & Tiba Zaki Abdulhameed
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
5.6 Convolution and FFT. 2 Fast Fourier Transform: Applications Applications. n Optics, acoustics, quantum physics, telecommunications, control systems,
The Fast Fourier Transform
Karatsuba’s Algorithm for Integer Multiplication
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Applied Symbolic Computation1 Applied Symbolic Computation (CS 300) Karatsuba’s Algorithm for Integer Multiplication Jeremy R. Johnson.
The Fast Fourier Transform and Applications to Multiplication
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.
Block p and g Generators. Carry Determination as Prefix Computations Two Contiguous (or Overlapping) Blocks (g’, p’) and (g’’, p’’) Merged Block (g, p)
Unrolling Carry Recurrence
Conditional-Sum Adders Parallel Prefix Network Adders
1 Mathematical Algorithms 1. Arithmetic, (Pseudo) Random Numbers and all that 2. Evaluation & Multiplication of Polynomials I 3. Multiplication of Large.
EE466: VLSI Design Lecture 13: Adders
Adding the Superset Adder to the DesignWare IP Library
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
Low Power Design for a 64 point FFT Processor
Lecture 3: Parallel Algorithm Design
CORDIC (Coordinate rotation digital computer)
Conditional-Sum Adders Parallel Prefix Network Adders
DIGITAL SIGNAL PROCESSING ELECTRONICS
7.1 What is a Sorting Network?
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Polynomial + Fast Fourier Transform
Adder wrap-up Lecture
Part II Circuit-Level Parallelism
VLSI Arithmetic Lecture 5
Fast Fourier Transform (FFT) (Section 4.11)
FAST FOURIER TRANSFORM ALGORITHMS
A New Approach to Pipeline FFT Processor
DFT and FFT By using the complex roots of unity, we can evaluate and interpolate a polynomial in O(n lg n) An example, here are the solutions to 8 =
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Parallelism in Computer Arithmetic:
Real-time 1-input 1-output DSP systems
Multiplier-less Multiplication by Constants
Lecture 6 More Divide-Conquer and Paradigm #4 Data Structure.
Chapter 9 Computation of the Discrete Fourier Transform
Parallel Processing, Extreme Models
Part III The Arithmetic/Logic Unit
Lecture 9 Digital VLSI System Design Laboratory
The Cooley-Tukey decimation-in-time algorithm
Speaker: Chris Chen Advisor: Prof. An-Yeu Wu Date: 2014/10/28
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Conditional-Sum Adders Parallel Prefix Network Adders
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Presentation transcript:

Divide-and-Conquer Design Ladner-Fischer construction T(n) = T(n/2) + 1 = log2n C(n) = 2C(n/2) + n/2 = (n/2) log2n Simple Ladner-Fisher Parallel prefix network (its delay is optimal, but has fan-out issues if implemented directly) Prefix sum network built of two n/2-input networks and n/2 adders. Fall 2008 Parallel Processing, Extreme Models

8.4 Parallel Prefix Networks T(n) = T(n/2) + 2 = 2 log2n – 1 C(n) = C(n/2) + n – 1 = 2n – 2 – log2n This is the Brent-Kung Parallel prefix network (its delay is actually 2 log2n – 2) Fig. 8.6 Prefix sum network built of one n/2-input network and n – 1 adders. Fall 2008 Parallel Processing, Extreme Models

Example of Brent-Kung Parallel Prefix Network Originally developed by Brent and Kung as part of a VLSI-friendly carry lookahead adder One level of latency T(n) = 2 log2n – 2 C(n) = 2n – 2 – log2n Brent–Kung parallel prefix graph for n = 16. Fall 2008 Parallel Processing, Extreme Models

Example of Kogge-Stone Parallel Prefix Network T(n) = log2n C(n) = (n – 1) + (n – 2) + (n – 4) + . . . + n/2 = n log2n – n + 1 Optimal in delay, but too complex in number of cells and wiring pattern Kogge-Stone parallel prefix graph for n = 16. Fall 2008 Parallel Processing, Extreme Models

Comparison and Hybrid Parallel Prefix Networks Brent/Kung 6 levels 26 cells Kogge/Stone 4 levels 49 cells Fig. 8.10 A hybrid Brent–Kung / Kogge–Stone parallel prefix graph for n = 16. Han/Carlson 5 levels 32 cells Fall 2008 Parallel Processing, Extreme Models

Another Divide-and-Conquer Algorithm T(p) = T(p/2) + 1 T(p) = log2 p Strictly optimal algorithm, but requires commutativity Each vertical line represents a location in shared memory Another divide-and-conquer scheme for parallel prefix computation. Fall 2010 Parallel Processing, Shared-Memory Parallelism

کاربردهایی از شبکه جمع پیشوندی شماره بیتهای یک کشف بیت یک دارای اولویت

n-th Root of Unity Xn = 1 wnk+n/2 = -wnk wndkd = wnk

DFT, DFT-1 O(n2)-time naïve algorithm using Horner-evaluation

Application of DFT to Smoothing or Filtering Fall 2008 Parallel Processing, Extreme Models

Application of DFT to Spectral Analysis Fall 2008 Parallel Processing, Extreme Models

Multiplying Two Polynomial

Multiplying Two Polynomial

Fast Fourier Transform (FFT)

Parallel Processing, Extreme Models Fast Fourier Transform (FFT) yi = ui + wni vi (0  i < n/2) yi+n/2 = ui + wni+n/2 vi = ui - wni vi T(n) = 2T(n/2) + n = n log2n sequentially T(n) = T(n/2) + 1 = log2n in parallel Fall 2008 Parallel Processing, Extreme Models

Computation Scheme for 16-Point FFT   Fall 2008 Parallel Processing, Extreme Models

Parallel Architectures for FFT Butterfly network for an 8-point FFT. Parallel Processing, Extreme Models