Presentation is loading. Please wait.

Presentation is loading. Please wait.

Divide-and-Conquer Design

Similar presentations


Presentation on theme: "Divide-and-Conquer Design"— Presentation transcript:

1 Divide-and-Conquer Design
Ladner-Fischer construction T(n) = T(n/2) + 1 = log2n C(n) = 2C(n/2) + n/2 = (n/2) log2n Simple Ladner-Fisher Parallel prefix network (its delay is optimal, but has fan-out issues if implemented directly) Prefix sum network built of two n/2-input networks and n/2 adders. Fall 2008 Parallel Processing, Extreme Models

2 8.4 Parallel Prefix Networks
T(n) = T(n/2) + 2 = 2 log2n – 1 C(n) = C(n/2) + n – 1 = 2n – 2 – log2n This is the Brent-Kung Parallel prefix network (its delay is actually 2 log2n – 2) Fig Prefix sum network built of one n/2-input network and n – 1 adders. Fall 2008 Parallel Processing, Extreme Models

3 Example of Brent-Kung Parallel Prefix Network
Originally developed by Brent and Kung as part of a VLSI-friendly carry lookahead adder One level of latency T(n) = 2 log2n – 2 C(n) = 2n – 2 – log2n Brent–Kung parallel prefix graph for n = 16. Fall 2008 Parallel Processing, Extreme Models

4 Example of Kogge-Stone Parallel Prefix Network
T(n) = log2n C(n) = (n – 1) + (n – 2) + (n – 4) n/2 = n log2n – n + 1 Optimal in delay, but too complex in number of cells and wiring pattern Kogge-Stone parallel prefix graph for n = 16. Fall 2008 Parallel Processing, Extreme Models

5 Comparison and Hybrid Parallel Prefix Networks
Brent/Kung 6 levels 26 cells Kogge/Stone 4 levels 49 cells Fig A hybrid Brent–Kung / Kogge–Stone parallel prefix graph for n = 16. Han/Carlson 5 levels 32 cells Fall 2008 Parallel Processing, Extreme Models

6 Another Divide-and-Conquer Algorithm
T(p) = T(p/2) + 1 T(p) = log2 p Strictly optimal algorithm, but requires commutativity Each vertical line represents a location in shared memory Another divide-and-conquer scheme for parallel prefix computation. Fall 2010 Parallel Processing, Shared-Memory Parallelism

7 کاربردهایی از شبکه جمع پیشوندی
شماره بیتهای یک کشف بیت یک دارای اولویت

8 n-th Root of Unity Xn = 1 wnk+n/2 = -wnk wndkd = wnk

9 DFT, DFT-1 O(n2)-time naïve algorithm using Horner-evaluation

10 Application of DFT to Smoothing or Filtering
Fall 2008 Parallel Processing, Extreme Models

11 Application of DFT to Spectral Analysis
Fall 2008 Parallel Processing, Extreme Models

12 Multiplying Two Polynomial

13 Multiplying Two Polynomial

14

15 Fast Fourier Transform (FFT)

16 Parallel Processing, Extreme Models
Fast Fourier Transform (FFT) yi = ui + wni vi (0  i < n/2) yi+n/2 = ui + wni+n/2 vi = ui - wni vi T(n) = 2T(n/2) + n = n log2n sequentially T(n) = T(n/2) + 1 = log2n in parallel Fall 2008 Parallel Processing, Extreme Models

17

18

19

20 Computation Scheme for 16-Point FFT
Fall 2008 Parallel Processing, Extreme Models

21 Parallel Architectures for FFT
Butterfly network for an 8-point FFT. Parallel Processing, Extreme Models


Download ppt "Divide-and-Conquer Design"

Similar presentations


Ads by Google