Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman.

Similar presentations


Presentation on theme: "Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman."— Presentation transcript:

1 Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman

2 Correlator Radio Telescopes have many separate antennas Use correlator to combine them to produce high resolution images Do this by correlating Frequency domain better for large inputs

3 FPGA Used 2x Nallatech H101 Board –Has V4LX100, PCI-X interface, 16MB SRAM and 512MB DDR2 –Used Dime-C tools, which is a C like language to program. Aimed at software acceleration -, FPGA achieved clock rates around 100MHz +, can create custom hardware for application. –Parallel execution –Pipeline. HPRC Card

4 GPUs Processing monsters Achieved by using little cache and control Used to be fixed functions. Recently programable. People started using pixel shaders for GPP. Nvidia have released CUDA, a language specifically for GP. Used Nvidia 8800 GT –112 pixel shaders @ 1.5GHz

5 FX Correlator Each antenna 3 Steps, FFT and then the multiplication with every other antenna and then integrated The Multiplication being the dominant area of computation was the function implemented on FPGA and GPU

6 Correlation Graphically [1] Freq 0Freq M …… N^2/2N^2/2 x int lengthN^2/2 x int length x Freq

7 FPGA Design We were able to implement 96 floating point units. Created pipelined engine that computes single output for three time steps and integrates Could fit four of these engines so could compute for four frequencies at a time Getting speedup ~ 3x vs. 3GHz Xeon (SSE). Getting ~ 85% theoretical peak (excluding transfers). Freq 0 Freq 1Freq 2Freq 3 Clock cycle 0Clock cycle 1 Clock cycle N 2 /2

8 GPU Design [1] Works on thread parallelism. Each executes on a pixel shader. Cuda uses light weight threads. –Created thread for each output (+ redundant ones) then integrated. Getting speedup ~ 5x vs. 3GHz Xeon (SSE).

9 Findings The GPUs vs Nallatech FPGA –GPU required considerably less effort, –Performed better, –Much cheaper ~20x –Still a lot of areas to squeeze out more performance. (Chris Harris). In defense of FPGAs –Virtex 5 can achieve higher clock rate (up to 500MHz) –96 multipliers on V4LX100 is not enough, V5SX240 has 1,056 –About 25% of the time was spent on transfers via older PCI-X bus. –More power efficient

10 References [1] Chris Harris et al, The University of Western Australia (UWA), GPU Accelerated Radio Astronomy Signal Convolution, published in Experimental Astronomy, 2008

11 Questions


Download ppt "Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman."

Similar presentations


Ads by Google