Download presentation
Presentation is loading. Please wait.
Published byBlaze Collins Modified over 9 years ago
1
FFT Accelerator Project Rohit Prakash Anand Silodia Date: June 7 th, 2007
2
Objectives Analysis using random input points %age improvement (from the previous implementations) Cache profiling
3
Improvements Calls to sine/cosine decreased Separate arrays for power, some other terms –Division decreased –Multiplications decreased Error in last time corrected (FFTW floating point)
4
System Configuration Intel Pentium 4 (HT) 3.0Ghz RAM : 1GB Cache : 1MB L2 O.S. : Fedora Core 3 Compiler icc Flags used : -xW, -O3, -ipo-prec-div, - static
5
User time : vs. FFTW (single precision) Radix-4 works 1.5 times slower than fftw Radix-8 works 1.6 times slower than fftw
6
User time : previous (double) vs. new (float) Approximately 20% improvement
7
User time : previous (double) vs new (float) Approximately 19% improvement
8
Cache Organization Cache Level SizeAssociativityLine size L21 MB8-way64 I116 KB4-way64 D116KB4-way64
9
Radix-4 L2 misses Approximately 30% less L2 misses
10
Radix-4 D1 misses Approximately 1.6% less D1 misses
11
Radix-8 L2 misses Approximately 13.6% less L2 misses
12
Radix-8 D1 misses Approximately.96% less D1 misses
13
Profiling results: using vtune
14
Profiling results: using gprof
15
Profiling results : using vtune
16
Profiling results: using gprof
17
Profiling results: using vtune
18
Profiling results: using gprof
19
Profiling results: using vtune
20
Profiling results: using gprof
21
Profiling results: using vtune
27
Profiling results: using gprof
29
Further Improvements : use sse instructions Vectorize the loop T A[r] U w*A[r+p] V w*w*A[r+2*p] W w*w*w*A[r+3*p] ---------------------------------- Complex temp[4]; For(i = 1; i<4;i++) { temp[i] = twiddle[i*p]*A[r+ i*l] }
30
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.