P ulsa R E xploration and S earch TO Jintao Luo NRAO-CV CREDIT: Bill Saxton, NRAO/AUI/NSF.

Slides:



Advertisements
Similar presentations
GPU Programming using BU Shared Computing Cluster
Advertisements

West Coast Spectrometer Team Mark Wagner, Berkeley project manager, FPGA designer Terry Filiba, data transport: FPGA --> CPU --> GPU Suraj Gowda, boosting.
Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Why GPU Computing. GPU CPU Add GPUs: Accelerate Science Applications © NVIDIA 2013.
Pulsar Timing with the GBT Scott Ransom National Radio Astronomy Observatory.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Dakota Johnson, Tildon Johnson, Kyle Barker Rowan County Senior High School Mentor: Mrs. Jennifer Carter Abstract Data Analysis Acknowledgements Radio.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array Digital Signal Processing.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
OpenSSL acceleration using Graphics Processing Units
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Efficient Pseudo-Random Number Generation for Monte-Carlo Simulations Using GPU Siddhant Mohanty, Subho Shankar Banerjee, Dushyant Goyal, Ajit Mohanty.
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
CVN software correlator development and its applications Zheng Weimin*, Zhang Juan, Tong Li, Tong Fengxian, Liu Lei, Chen Zhong, Shu Fengchun, Wang Guangli.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
NVDA Preetam Jinka Akhil Kolluri Pavan Naik. Background Graphics processing units (GPUs) Chipsets Workstations Personal computers Mobile devices Servers.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
DBBC Stutus Report November 2007 G. Tuccari, W. Alef, S. Buttaccio, G. Nicotra, M. Wunderlich.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
New correlator MicroPARSEC Igor Surkis, Vladimir Zimovsky, Violetta Shantyr, Alexey Melnikov Institute of Applied Astronomy Russian Academy of Science.
Extracted directly from:
PSR J1400 – 1410 Jessica Pal Rowan County Senior High School Introduction Data Analysis Summary Acknowledgements Results A pulsar is a rapidly rotating.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
By : Arjun Radhakrishnan Supervisor : Prof. M. Inggs.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Genetic Programming on General Purpose Graphics Processing Units (GPGPGPU) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering.
Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman.
GPU Architecture and Programming
APSR: digital signal processing at Parkes Willem van Straten, Andrew Jameson and Matthew Bailes Centre for Astrophysics & Supercomputing Third ATNF Gravitational.
GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012.
GPU-Accelerated Beat Detection for Dancing Monkeys Philip Peng, Yanjie Feng UPenn CIS 565 Spring 2012 Final Project – Final Presentation img src:
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GSB : A real-time Software back-end for the GMRT Jayanta Roy National Centre for Radio Astrophysics Pune, India 12 th December 2008 Collaborators.
Gravitational Wave and Pulsar Timing Xiaopeng You, Jinlin Han, Dick Manchester National Astronomical Observatories, Chinese Academy of Sciences.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
Spin Frequency and Harmonics of Pulsar J Abby Chaffins Spring Valley High School Huntington, West Virginia. Equations Frequency f = Spin Frequency.
by Arjun Radhakrishnan supervised by Prof. Michael Inggs
Short introduction Pulsar Parkes. Outline PDFB – Single beam pulsar timing system CASPER – Single beam pulsar coherent dedispersion system.
Hunting for Glitches Sarah Buchner. …are the leftover cores from supernova explosions. Almost black holes Neutron stars are very dense (10 17 kg/m 3 )
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
A real-time software backend for the GMRT : towards hybrid backends CASPER meeting Capetown 30th September 2009 Collaborators : Jayanta Roy (NCRA) Yashwant.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
VGOS GPU Based Software Correlator Design Igor Surkis, Voytsekh Ken, Vladimir Mishin, Nadezhda Mishina, Yana Kurdubova, Violet Shantyr, Vladimir Zimovsky.
General Purpose computing on Graphics Processing Units
Parallel Plasma Equilibrium Reconstruction Using GPU
What is GPU? how does it work?
Searching FRB with Jiamusi-66m Radio Telescope
Matthew Pitkin on behalf of the LIGO Scientific Collaboration
Radio astronomy at Green Bank
Graphics Processing Unit
Presentation transcript:

P ulsa R E xploration and S earch TO Jintao Luo NRAO-CV CREDIT: Bill Saxton, NRAO/AUI/NSF

A newbie NRAO: NANOGrav, mainly on pulsar instrument SHAO(Shanghai Astronomical Observatory), China: VLBI backend, correlator, observations, Pulsar instrument JIVE(Joint Institute for VLBI in Europe), Netherlands: VLBI correlator, Pulsar instrument

Outline Pulsar PRESTO GPU Future Work

Pulsar Spinning neutron star Precise period Dispersion Stable integrated profile Weak signals Time keeping, navigation, measure gravitational wave(NANOGrav)

PRESTO PulsaR Exploration and Search TOolkit Developed by Scott Ransom A large suite of pulsar search and analysis software One of the best pulsar searching software in the world pulsars found with PRESTO Including the fastest pulsar ever found, PSR J ad, 716-Hz spin frequency

(From PRESTO_search_tutorial)

Data preparation Interference detection and removal, de-dispersion, barycentering Searching Fourier-domain acceleration, single-pulse, and phase- modulation or sideband searches Folding Candidate optimization, Time-of-Arrival generation Misc Data exploration, de-dispersion palnning, data conversion… My work is to speep up the Fourier-Domain acceleration search: accelsearch with GPU And, why GPU? GPU is powerful!

GPU Graphics Processing Unit chip in computer video cards, PlayStation3, Xbox, etc. Two major vendors: NVIDIA, ATI(now AMD) GPUs are massively multithreaded many core chips (From

(From NVIDIA CUDA_C_Programmig_Guide)

GPU Capabilities (From NVIDIA CUDA_C_Programmig_Guide) GPU is specialized for compute-intensive, highly parallel computation GPU devotes more transistors to data processing

IFFT Core computation: FFT_MUL_IFFT FFT Data Kernel_0 Kernel_1 Kernel_n-1

Diagram of the realization Data & Kernel preparation Run FFT_Mul_IFFT Combination Following process Copy to GPU Mem Copy to CPU Mem (On CPU) (On GPU) (On CPU, plan to partly on GPU) Mem copy operations are time consuming

Testbench: GPU vs CPU(without mem copy) ~100X GPU runtime CPU runtime

Accel_search: GPU vs CPU(whole program with mem copy) With almost the heaviest duty in practical use GPU version run time: 18.15sec CPU version run time: 60.18sec Just 3 times faster We want ~20X How to?

1. Mem copy 2. Following process on CPU 3. Loops of Mul on GPU There are possibilities!

An improvement MulIFFT Run time of Mul has been reduced, via using no loop The same level of FFT run time

Future work: faster Mem copy Reduce number of mem copy operations Following processes Move more processes to GPU Mul loops Use only one loop Using texture mem of GPU, etc

Summary PRESTO has been made not fast enough Could be even faster, ~20X Using FPGA, RoachBoard for example?...