by Arjun Radhakrishnan supervised by Prof. Michael Inggs

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.
IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.
Team Presentation July 22, Jodrell Bank is the original arboretum for Manchester University. Immediately after World War II, first radio telescope.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Measuring Dispersion in Signals from the Crab Pulsar Jared Crossley National Radio Astronomy Observatory Tim Hankins & Jean Eilek New Mexico Tech Jared.
The Highest Time-Resolution Measurements in Radio Astronomy: The Crab Pulsar Giant Pulses Tim Hankins New Mexico Tech and NRAO, Socorro, NM Extreme Astrophysics.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
The Transient Radio Sky to be Revealed by the SKA Jim Cordes Cornell University AAS Meeting Washington, DC 8 January 2002.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
ALICE HLT High Speed Tracking and Vertexing Real-Time 2010 Conference Lisboa, May 25, 2010 Sergey Gorbunov 1,2 1 Frankfurt Institute for Advanced Studies,
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
Dakota Johnson, Tildon Johnson, Kyle Barker Rowan County Senior High School Mentor: Mrs. Jennifer Carter Abstract Data Analysis Acknowledgements Radio.
Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array Digital Signal Processing.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Announcements. Radio Astronomy of Pulsars Tiffany Pewett
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Pulsing Prizes By: Kyle Wenger and Megan Weaver, (Broadway High School, Broadway, VA) How old might they be? Using the ATNF catalog we have approximated.
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
P ulsa R E xploration and S earch TO Jintao Luo NRAO-CV CREDIT: Bill Saxton, NRAO/AUI/NSF.
Abstract Pulsars are highly magnetized, rotating neutron stars that emit a beam of electromagnetic radiation. The radiation can only be observed when the.
Computer Graphics Graphics Hardware
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
Random Media in Radio Astronomy Atmospherepath length ~ 6 Km Ionospherepath length ~100 Km Interstellar Plasma path length ~ pc (3 x Km)
PSR J1400 – 1410 Jessica Pal Rowan County Senior High School Introduction Data Analysis Summary Acknowledgements Results A pulsar is a rapidly rotating.
FPGA-based Dedispersion for Fast Transient Search John Dickey 23 Nov 2005 Orange, NSW.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
By : Arjun Radhakrishnan Supervisor : Prof. M. Inggs.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.
Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Accelerating Statistical Static Timing Analysis Using Graphics Processing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University,
Radar Pulse Compression Using the NVIDIA CUDA SDK
APSR: digital signal processing at Parkes Willem van Straten, Andrew Jameson and Matthew Bailes Centre for Astrophysics & Supercomputing Third ATNF Gravitational.
GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
Real-time Acquisition and Processing of Data from the GMRT Pulsar Back- ends Ramchandra M. Dabade (VNIT, Nagpur) Guided By, Yashwant Gupta.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Introduction “Tick, tock, tick, tock.” Clocks help keep us on schedule everyday, but how does our own galaxy keep in time? Pulsar’s are natures very own,
ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.
Who discovered the first pulsar? Jocelyn Bell Pulsars spin fast due to what physics concept?
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
GPU Programming Shirley Moore CPS 5401 Fall 2013
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
Short introduction Pulsar Parkes. Outline PDFB – Single beam pulsar timing system CASPER – Single beam pulsar coherent dedispersion system.
A real-time software backend for the GMRT : towards hybrid backends CASPER meeting Capetown 30th September 2009 Collaborators : Jayanta Roy (NCRA) Yashwant.
GROUP 6 WIDEBAND ARTIFICIAL PULSAR Alexander Botten Kerlin Canelli Advisor: Randy McCullough.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems Isaac Gelado, Javier Cabezas. John Stone, Sanjay Patel, Nacho Navarro.
Long-Term Timing of Globular Cluster Pulsars
Searching FRB with Jiamusi-66m Radio Telescope
Gravitational Waves and Pulsar Timing
Pulsar Search Collaboratory
Hardware Accelerated Video Decoding in
6- General Purpose GPU Programming
Presentation transcript:

by Arjun Radhakrishnan supervised by Prof. Michael Inggs Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units by Arjun Radhakrishnan supervised by Prof. Michael Inggs

Outline Graphics Processing Units (GPUs) Pulsars Pulsar De-dispersion Motivation Implementation Results Conclusion & Future Work

Graphics Processing Units GPUs are massively parallel processors that are present on consumer graphics cards Generally used to render 3D objects on screen and calculate the colour of pixel to display *Source: [7] Are mass market products due to the video game industry Performance tracks Moore's Law since the majority of on-chip space is devoted to compute units as opposed to cache on CPUs

Why Use GPUs? Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]

Figure 2: Pulsar Model [3] Pulsars Highly magnetised, rapidly rotating neutron stars formed after a supernova Pulsars emit beams of electromagnetic radiation from their magnetic poles Beams sweep in a circular path called the “lighthouse effect” Produce periodic pulses when the pulse sweeps Earth Figure 2: Pulsar Model [3]

Pulsar Dispersion Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) Lower frequency components of the pulse are delayed more than higher frequencies

Figure 3: Pulsar De-dispersion [4] Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) Lower frequency components of the pulse are delayed more than higher frequencies Correct for the dispersion by shifting the received signal a certain amount Figure 3: Pulsar De-dispersion [4]

Coherent De-dispersion Coherent de-dispersion is the most accurate method of removing the dispersion effects of the Interstellar Matter Preserves amplitude and phase information from the receiving signal Convolve the voltage signal with the inverse transfer function of the ISM This transfer function is a function of the Dispersion Measure (DM) of the signal got from models of the galactic electron density In practice we use the Fast Fourier Transform (FFT) to make the convolution operation a multiplication in the frequency domain and then apply an inverse FFT

Motivation Why study Pulsars? GPU acceleration for MeerKAT A major SKA Science driver: Detection of gravitational waves and tests of strong field relativity; Analysing black holes GPU acceleration for MeerKAT Large frequency range (Low: 0.5 – 2.5 GHz, High: 8 – 14.5GHz) High bandwidth per polarisation (4GHz final) Large number of channels (16384) >10GB of data per second Even more important for SKA since precision will be a high priority and data storage is not feasible

Implementation Considerations Both CPU and GPU were tested with single-precision floating point A bottleneck for GPU computing is the time taken to send data to it from main memory – minimise as much as possible Use asynchronous data transfers to hide the latency Re-calculate rather than copy data across Use shared memory on the GPU for calculations and store to global memory at the end Source data file used is fake dual polarisation data generated with a DM of 50pc/cm3 and 100MHz bandwidth centred on 1450MHz

Receive de-dispersed signal Basic Program Flow HOST DEVICE Read in Data Copy to GPU memory Allocate memory on GPU Initiate GPU Kernel Begin De-dispersion Parallel FFT Parallel FFT ... Parallel FFT V(f0) . H-1(f0) V(f1) . H-1(f1) ... V(fn) . H-1(fn) Inverse FFT Inverse FFT ... Inverse FFT + + Output Array Receive de-dispersed signal Send Data Back to Host Free Memory Figure 4: Program flow

Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x) Results Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)

Results Was able to coherently de-disperse 50MHz on 1 GPU Used 2 GPUs for the full 100MHz Scaling across multiple GPUs was linear Using larger transfer functions was found to increase performance since there was less of an overhead in memory access times

Conclusion GPUs are significantly faster than CPUs for de-dispersion Enabled real-time coherent de-dispersion for the dataset used Coherent de-dispersion of a 100MHz bandwidth signal requires multiple GPUs at present Faster memory access would greatly improve overall speedup Currently testing with real undetected pulsar data

Thank You! Questions?

References D. R. Lorimer and M. Kramer, Handbook of Pulsar Astronomy Cambridge University Press, 2005 NVIDIA CUDA Programming Guide D. Manchester, “CSIRO ATNF Pulsar Education Page” Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield Surveys for Transients, Pulsars and ETI”, SKA Memo 97 John Rowe Animation/Australia Telescope National Facility, CSIRO [Online]. http://www.atnf.csiro.au/research/pulsar/array/gallery.html Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online]. http://arecibo.tc.cornell.edu/legacypulsardata/Default.aspx VR-Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. http://vr- zone.com/articles/nvidia-geforce-gtx-280-preview/5872.html?doc=5872