Download presentation
Presentation is loading. Please wait.
Published byTrevor Miller Modified over 8 years ago
1
18.337 Parallel FFT in Julia
2
18.337 Review of FFT
3
18.337 Review of FFT (cont.)
4
18.337 Review of FFT (cont.)
5
18.337 Sequential FFT Pseudocode Recursive-FFT ( array ) -arrayEven = even indexed elements of array -arrayOdd = odd indexed elements of array -Recursive-FFT ( arrayEven ) -Recursive-FFT ( arrayOdd ) -Combine results using Cooley-Tukey butterfly -Bit reversal, could be done either before, after or in between
6
18.337 Parallel FFT Algorithms -Binary Exchange Algorithm -Transpose Algorithm
7
18.337 Binary Exchange Algorithm
8
18.337 Binary Exchange Algorithm
9
18.337 Binary Exchange Algorithm
10
18.337 Parallel Transpose Algorithm
11
18.337 Parallel Transpose Algorithm
12
18.337 Julia implementations -Input is represented as distributed arrays. -Assumptions: N, P are powers of 2 for sake of simplicity -More focus on minimizing communication overhead versus computational optimizations
13
18.337 Easy Recursive-FFT ( array ) -…………. - @spawn Recursive-FFT ( arrayEven ) - @spawn Recursive-FFT ( arrayOdd ) -…………. Same as FFTW parallel implementation for 1D input using Cilk.
14
18.337 Too much unnecessary overhead because of random spawning. Better: Recursive-FFT ( array ) -…………. - @spawnat owner Recursive-FFT ( arrayEven ) - @spawnat owner Recursive-FFT ( arrayOdd ) -…………. Not so fast
15
18.337 FFT_Parallel ( array ) -Bit reverse input array and distribute - @spawnat owner Recursive-FFT ( first half ) - @spawnat owner Recursive-FFT ( last half ) -Combine results Binary Exchange Implementation
16
18.337 Binary Exchange Implementation
17
18.337 Binary Exchange Implementation
18
18.337 Binary Exchange Implementation
19
18.337 Binary Exchange Implementation
20
18.337 Binary Exchange Implementation
21
18.337 Alternate approach – Black box -Initially similar to parallel transpose method: data is distributed so that each sub-problem is locally contained within one node FFT_Parallel ( array ) -Bit reverse input array and distribute equally - for each processor - @spawnat proc FFT-Sequential ( local array ) -Redistribute data and combine locally
22
18.337 Alternate approach
23
18.337 Alternate approach – Black box Pros: -Eliminates needs for redundant spawning which is expensive -Can leverage black box packages such as FFTW for local sequential run, or black box combiners -Warning: Order of input to sub-problems is important Note: -Have not tried FFTW due to configuration issues
24
18.337 Benchmark Caveats array = redist(array, 1, 5)
25
18.337 Benchmark Results FFTWBinary Exchange Black BoxCommunicationBlack Box 8pCommunication 8p 0.000211.9387560.109482040.1508360.373639820.4737592 0.0009193.1029610.181380980.14027090.597929950.416522 0.128A year?8.288745881.19512519.865745061.701771 6.3023Apocalypse290.63422230.29218314.7306944.75283
26
18.337 Benchmark Results
27
18.337 Communication issues
28
18.337 New Results FFTW Black Box Improved Redistribution Communication for Improved Redistribution Black Box Improved Redistribution 8p Communication for Improved Redistribution 8p 0.00020.102382040.089720.1547340.145179 0.00090.2477282.7035390.2849160.191254 0.1288.3893010.7836758.7172081.014697 6.3023287.016621.3901287.400226.08323
29
18.337 New Results
30
18.337 More issues and considerations -Communication cost: where and how. Better redistribution method. -Leverage of sequential FFTW on black box problems -A separate algorithm, better data distribution?
31
18.337 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.