Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman.

Slides:



Advertisements
Similar presentations
Computer Graphics Prof. Muhammad Saeed. 2 Hardware ( Graphic Cards ) II Hardware II Computer Graphics 1 August 2012.
Advertisements

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Network coding on the GPU Péter Vingelmann Supervisor: Frank H.P. Fitzek.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GPGPU platforms GP - General Purpose computation using GPU
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Advisor: Dr. Chandra Christopher Picard Michael Neuberg.
Performance and Energy Efficiency of GPUs and FPGAs
Slide 1 / 16 On Using Graphics Hardware for Scientific Computing ________________________________________________ Stan Tomov June 23, 2006.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
Computer Graphics Graphics Hardware
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
By : Arjun Radhakrishnan Supervisor : Prof. M. Inggs.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
NDA Confidential. Copyright ©2005, Nallatech.1 Implementation of Floating- Point VSIPL Functions on FPGA-Based Reconfigurable Computers Using High- Level.
Pseudorandom Number Generation on the GPU Myles Sussman, William Crutchfield, Matthew Papakipos.
Results – Peak Streaming Performance Implementing Closed-Form Expressions on FPGAs Using the NAL, with Comparison to CUDA GPU and Cell BE Implementations.
XNTD/SKAMP/LFD Correlator 4th RadioNet Engineering Forum Workshop Next Generation Correlators for Radio Astronomy and Geodesy June 2006, Groningen,
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Radar Pulse Compression Using the NVIDIA CUDA SDK
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
GPU Architecture and Programming
GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
by Arjun Radhakrishnan supervised by Prof. Michael Inggs
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
CSE 690: GPGPU Lecture 8: Image Processing PDE Solvers Klaus Mueller Computer Science, Stony Brook University.
Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.
Current Research Overview Jeremy Espenshade 09/04/08.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Cross correlators are a highly computationally intensive part of any radio interferometer. Although the number of operations per data sample is small,
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Mihaela Malița Gheorghe M. Ștefan
Computer Graphics Graphics Hardware
M. Bellato INFN Padova and U. Marconi INFN Bologna
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
Hiba Tariq School of Engineering
Graphics Processing Unit
FPGAs in AWS and First Use Cases, Kees Vissers
Computer Graphics Graphics Hardware
Presentation transcript:

Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman

Correlator Radio Telescopes have many separate antennas Use correlator to combine them to produce high resolution images Do this by correlating Frequency domain better for large inputs

FPGA Used 2x Nallatech H101 Board –Has V4LX100, PCI-X interface, 16MB SRAM and 512MB DDR2 –Used Dime-C tools, which is a C like language to program. Aimed at software acceleration -, FPGA achieved clock rates around 100MHz +, can create custom hardware for application. –Parallel execution –Pipeline. HPRC Card

GPUs Processing monsters Achieved by using little cache and control Used to be fixed functions. Recently programable. People started using pixel shaders for GPP. Nvidia have released CUDA, a language specifically for GP. Used Nvidia 8800 GT –112 pixel 1.5GHz

FX Correlator Each antenna 3 Steps, FFT and then the multiplication with every other antenna and then integrated The Multiplication being the dominant area of computation was the function implemented on FPGA and GPU

Correlation Graphically [1] Freq 0Freq M …… N^2/2N^2/2 x int lengthN^2/2 x int length x Freq

FPGA Design We were able to implement 96 floating point units. Created pipelined engine that computes single output for three time steps and integrates Could fit four of these engines so could compute for four frequencies at a time Getting speedup ~ 3x vs. 3GHz Xeon (SSE). Getting ~ 85% theoretical peak (excluding transfers). Freq 0 Freq 1Freq 2Freq 3 Clock cycle 0Clock cycle 1 Clock cycle N 2 /2

GPU Design [1] Works on thread parallelism. Each executes on a pixel shader. Cuda uses light weight threads. –Created thread for each output (+ redundant ones) then integrated. Getting speedup ~ 5x vs. 3GHz Xeon (SSE).

Findings The GPUs vs Nallatech FPGA –GPU required considerably less effort, –Performed better, –Much cheaper ~20x –Still a lot of areas to squeeze out more performance. (Chris Harris). In defense of FPGAs –Virtex 5 can achieve higher clock rate (up to 500MHz) –96 multipliers on V4LX100 is not enough, V5SX240 has 1,056 –About 25% of the time was spent on transfers via older PCI-X bus. –More power efficient

References [1] Chris Harris et al, The University of Western Australia (UWA), GPU Accelerated Radio Astronomy Signal Convolution, published in Experimental Astronomy, 2008

Questions