Cross correlators are a highly computationally intensive part of any radio interferometer. Although the number of operations per data sample is small,

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Digital FX Correlator Nimish Sane Center for Solar-Terrestrial Research New Jersey Institute of Technology, Newark, NJ EOVSA Technical Design Meeting.
The University of Adelaide, School of Computer Science
Parallel computer architecture classification
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
A supercomputer-based software correlator at the Swinburne University of Technology Tingay, S.J. and Deller, A.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
David Hawkins Exascale Signal Processing for Millimeter-Wavelength Radio Interferometers David Hawkins
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
Accelerating a random forest classifier: multi-core, GP-GPU, or FPGA?
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array Digital Signal Processing.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
CSIRO. Paul Roberts Digital Receivers SKANZ 2012 Digital Receivers for Radio Astronomy Paul Roberts CSIRO Astronomy and Space Science Engineering Development.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
UIUC CSL Global Technology Forum © NVIDIA Corporation 2007 Computing in Crisis: Challenges and Opportunities David B. Kirk.
E-VLBI at ≥ 1 Gbps -- “unlimited” networks? Tasso Tzioumis Australia Telescope National Facility (ATNF) 4 November 2008.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Realtime 3D model construction with Microsoft Kinect and an NVIDIA Kepler laptop GPU Paul Caheny MSc in HPC 2011/2012 Project Preparation Presentation.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Design of a Software Correlator for the Phase I SKA Jongsoo Kim Cavendish Lab., Univ. of Cambridge & Korea Astronomy and Space Science Institute Collaborators:
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Paul Alexander & Jaap BregmanProcessing challenge SKADS Wide-field workshop SKA Data Flow and Processing – a key SKA design driver Paul Alexander and Jaap.
Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman.
A High-Performance Scalable Graphics Architecture Daniel R. McLachlan Director, Advanced Graphics Engineering SGI.
Next Generation Digital Back-ends at the GMRT Yashwant Gupta Yashwant Gupta National Centre for Radio Astrophysics Pune India CASPER meeting Cambridge.
GPGPUs and CUDA Guest Lecture Computing with GPGPUs Raj Singh National Center for Microscopy and Imaging Research.
Australian Astronomy MNRF Development of Monolithic Microwave Integrated Circuits (MMIC) ATCA Broadband Backend (CABB)
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
An FX software correlator for VLBI Adam Deller Swinburne University Australia Telescope National Facility (ATNF)
Imaging Molecular Gas in a Nearby Starburst Galaxy NGC 3256, a nearby luminous infrared galaxy, as imaged by the SMA. (Left) Integrated CO(2-1) intensity.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
Personal Chris Ward CS147 Fall  Recent offerings from NVIDA show that small companies or even individuals can now afford and own Super Computers.
Central processing unit
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
Distributed FX software correlation Adam Deller Swinburne University/CSIRO Australia Telescope National Facility Supervisors: A/Prof Steven Tingay, Prof.
A real-time software backend for the GMRT : towards hybrid backends CASPER meeting Capetown 30th September 2009 Collaborators : Jayanta Roy (NCRA) Yashwant.
July 29 th, 2006 M. van Veelen J.Bregman1 Failing scaling and Roadmapping to new Architectures Can we use the existing architectures with new and future.
Correlator Options for 128T MWA Cambridge Meeting Roger Cappallo MIT Haystack Observatory
© 2006 IBM Corporation Jan Blommaart, IBM Netherlands. June 2006 The LOFAR Experience and its relevance to future radio astronomy projects Next Generation.
V.M. Sliusar, V.I. Zhdanov Astronomical Observatory, Taras Shevchenko National University of Kyiv Observatorna str., 3, Kiev Ukraine
Space Charge with PyHEADTAIL and PyPIC on the GPU Stefan Hegglin and Adrian Oeftiger Space Charge Working Group meeting –
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
The Science Data Processor and Regional Centre Overview Paul Alexander UK Science Director the SKA Organisation Leader the Science Data Processor Consortium.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Digital Receivers for Radio Astronomy
Korea Astronomy and Space Science Institute
Presentation transcript:

Cross correlators are a highly computationally intensive part of any radio interferometer. Although the number of operations per data sample is small, correlation occurs before time integration and so the sample rate is very high – it must be at least twice the radio-frequency bandwidth of the telescope. The number of operations is proportional to both the bandwidth of the telescope, and the number of baselines being correlated. For SKA 1, and operation rate of many petaflops will be required in the correlator alone. Software Correlators At present, FPGA and/or ASIC based correlators are widely used. These offer highly transistor-efficient solutions as logic units are hard-wired to solve the particular correlation problem required. However, this efficiency comes at the expense of flexibility. Software correlators, running on general-purpose CPUs or GPUs offer the ability to easily add antennas, change the number of baselines correlated, or tune the spectral resolution of the telescope as required. The specification of the telescope can be scaled with time as more computing power becomes available. Figure 3: The MWA has already trialled a GPU-based correlator (Wayth et al. 2007). Introduction NVIDIA Tesla cards Graphics Processing Units (GPUs) may offer the computational power needed in a correlator for the SKA. Today, NVIDIA’s Tesla cards already offer 100s GFLOPs per card, and a maximum data rate of 64 Gbit/s through the card’s 16-lane PCI-Express-2 connection. They easily surpass more traditional CPUs in cost per FLOP and power consumption per FLOP. In Cambridge, we are developing a CUDA-based correlator to run on NVIDIA GPUs. We hope to deploy a small-scale trial system on the Arcminute Microkelvin Imager (AMI). Prototype systems Dominic Ford, Jongsoo Kim & Paul Alexander Figure 2: LOFAR uses an IBM Bluegene/P as its correlator (Romein et al. 2010). Figure 4: An NVIDIA C1060 Tesla card, offering a theoretical peak performance of 933 GFLOP/s in single precision. your comments to: Figure 5: The memory model of an NVIDIA Tesla card. Using GPGPUs as Correlators Antennas Time-domain FFT Cross correlation Time integration Spatial FFT and Imaging Figure 1: A schematic FX correlator. High data rate. Low complexity. Lower sample rate. High complexity. Several implementations already exist. LOFAR uses an IBM Bluegene supercomputer as its correlator. An MPI-based correlator for use on x86 compute clusters has been developed by Deller et al. (2007) at Swinburne and is in use at the LBA. The correlation problem is sufficiently parallel that such highly-parallel architectures can be readily used. Moreover, GPU architectures are optimised for performing the multiply-and-add operations that we require. For telescopes with fewer than around 300 antennas, a GPU correlator would be throttled by the speed of PCI-Express data transport, but for larger systems such as SKA 1, they are an attractive option. This will allow us to optimise our use of the memory architecture of the Tesla cards and the distribution of data between cards. It will also allow us to assess the performance which could be achieved in a correlator for SKA 1.