IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.

Slides:

Advertisements

Similar presentations

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.

Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.

Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

TUPEC057 Advances With Merlin – A Beam Tracking Code J. Molson, R.J. Barlow, H.L. Owen, A. Toader MERLIN is a.

Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.

Emittance Measurement Simulations in the ATF Extraction Line Anthony Scarfe The Cockcroft Institute.

Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.

1: Operating Systems Overview

Lattice calculations: Lattices Tune Calculations Dispersion Momentum Compaction Chromaticity Sextupoles Rende Steerenberg (BE/OP) 17 January 2012 Rende.

Beam dynamics meeting, 2007/05/14Lars Fröhlich, MPY Dark Current Transport at FLASH Start-to-end tracking simulation analysis of beam and dark current.

Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)

A Survey of Parallel Tree- based Methods on Option Pricing PRESENTER: LI,XINYING.

Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

GRD - Collimation Simulation with SIXTRACK - MIB WG - October 2005 LHC COLLIMATION SYSTEM STUDIES USING SIXTRACK Ralph Assmann, Stefano Redaelli, Guillaume.

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

SciDAC Accelerator Simulation project: FNAL Booster modeling, status and plans Robert D. Ryne, P. Spentzouris.

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Quadrupole Transverse Beam Optics Chris Rogers 2 June 05.

Trajectory Correction and Tuning James Jones Anthony Scarfe.

25-26 June, 2009 CesrTA Workshop CTA09 Electron Cloud Single-Bunch Instability Modeling using CMAD M. Pivi CesrTA CTA09 Workshop June 2009.

Simulation of direct space charge in Booster by using MAD program Y.Alexahin, N.Kazarinov.

GPU Programming with CUDA – Optimisation Mike Griffiths

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

Development of Simulation Environment UAL for Spin Studies in EDM Fanglei Lin December

Matching recipe and tracking for the final focus T. Asaka †, J. Resta López ‡ and F. Zimmermann † CERN, Geneve / SPring-8, Japan ‡ CERN, Geneve / University.

GPU Architecture and Programming

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

October 4-5, Electron Lens Beam Physics Overview Yun Luo for RHIC e-lens team October 4-5, 2010 Electron Lens.

Simulation of direct space charge in Booster by using MAD program Y.Alexahin, A.Drozhdin, N.Kazarinov.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

QCAdesigner – CUDA HPPS project

By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.

1 Bunched-Beam Envelope Simulation with Space Charge within the SAD Environment Christopher K. Allen Los Alamos National Laboratory.

J-PARC Trace3D Upgrades Christopher K. Allen Los Alamos National Laboratory.

Optics considerations for ERL test facilities Bruno Muratori ASTeC Daresbury Laboratory (M. Bowler, C. Gerth, F. Hannon, H. Owen, B. Shepherd, S. Smith,

Collimator wakefields - G.Kurevlev Manchester 1 Collimator wake-fields Wake fields in collimators General information Types of wake potentials.

Zeuten 19 - E. Wilson - 1/18/ Slide 1 Recap. of Transverse Dynamics E. Wilson – 15 th September 2003  Transverse Coordinates  Relativistic definitions.

Tuesday, 02 September 2008FFAG08, Manchester Stephan I. Tzenov1 Modeling the EMMA Lattice Stephan I. Tzenov and Bruno D. Muratori STFC Daresbury Laboratory,

GWENAEL FUBIANI L’OASIS GROUP, LBNL 6D Space charge estimates for dense electron bunches in vacuum W.P. LEEMANS, E. ESAREY, B.A. SHADWICK, J. QIANG, G.

Space Charge with PyHEADTAIL and PyPIC on the GPU Stefan Hegglin and Adrian Oeftiger Space Charge Working Group meeting –

By Verena Kain CERN BE-OP. In the next three lectures we will have a look at the different components of a synchrotron. Today: Controlling particle trajectories.

Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.

CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

Evan Li, Brown University 2012 Evan Li Xiaobiao Huang SLAC National Accelerator Laboratory August 12, 2010.

Simulation of Intrabeam Scattering A. Vivoli*, M. Martini Thanks to : Y. Papaphilippou and F. Antoniou *

Lecture 4 - E. Wilson –- Slide 1 Lecture 4 - Transverse Optics II ACCELERATOR PHYSICS MT 2009 E. J. N. Wilson.

SLAC LET Meeting Global Design Effort 1 CHEF: Recent Activity + Observations about Dispersion in Linacs Jean-Francois Ostiguy FNAL.

Tuning Knobs for ATF2: An Update Anthony Scarfe, James Jones The Cockcroft Institute LCWS’07 31 st May 2007.

S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

PipeliningPipelining Computer Architecture (Fall 2006)

GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.

OPERATED BY STANFORD UNIVERSITY FOR THE U.S. DEPT. OF ENERGY 1 Alexander Novokhatski April 13, 2016 Beam Heating due to Coherent Synchrotron Radiation.

Benchmarking MAD, SAD and PLACET Characterization and performance of the CLIC Beam Delivery System with MAD, SAD and PLACET T. Asaka† and J. Resta López‡

Chapter 1: Introduction

Intra-Beam Scattering modeling for SuperB and CLIC

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

Parallel 3D Finite Element Particle-In-Cell Simulations with Pic3P*

Single-bunch instability preliminary studies ongoing.

What am I working on… H. Loos 08/02/2010.

Lecture 4 - Transverse Optics II

M. Pivi PAC09 Vancouver, Canada 4-8 May 2009

Lecture 4 - Transverse Optics II

Presentation transcript:

IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units (GPUs) to perform beam dynamics simulation. The understanding of modern particle accelerators requires the simulation of charged particle transport through the machine elements. These simulations can be time consuming due to many-particle transport being computationally expensive. Modern GPUs can be used to run such simulations with a significant increase in performance at an affordable price; here the NVidia CUDA architecture is used. The speed gains of GPMAD have been documented in [1], building on this, GPMAD was upgraded and a space charge algorithm included. GPMAD is benchmarked against MAD-X [2] and ASTRA [3]. Test cases of the DIAMOND Booster-to-Storage lattice at RAL and the ALICE transfer line at Daresbury are used. It is found that particle transport and space charge calculations are suitable for the GPU and large performance increases are possible in both cases. H. Rafique, S. Alexander, R. Appleby, H. Owen Particles are treated as 6-vectors in phase space: x = transverse horizontal position p x = transverse horizontal momentum y = transverse vertical position p y = transverse vertical momentum τ = time of flight relative to ideal reference particle p t = ∆E/p s c ∆E = energy relative to ideal reference particle P s = nominal momentum of an on-energy particle GPMAD uses TRANSPORT [4] maps, as used in MAD-X, to perform the operation of transporting particles through magnetic elements. The full Taylor expansion is truncated to order two. First order R terms are represented by 6x6 matrices. This method assumes that particles do not interact with each other in the ultra-relativistic limit. Copy from host to device Copy from device to host Loop over magnetic elements Half Matrix Kernel Space Charge Kernel Half Matrix Kernel When operating at ultra-relativistic energies, space charge forces may be omitted, in this case GPMAD operates the complete transport of all particles in a single Kernel function, thus minimising time taken for memory copies to and from the GPU. When operating at lower energies (of the order of the particle mass) the space charge algorithm may be included. In this case three Kernel functions are launched per magnetic element as shown in the flow diagram starting here: Particle transport and space charge effects are performed on the GPU (device) via ‘Kernel functions’ the remaining part of the code operates on the CPU (host). In order to handle memory efficiently, two sets of particle data are used on the GPU, denoted by the superscript {1} and {2}. Illustrated below is the transport of N particles through a simple drift element followed by a quadrupole element. Note that the superscripts 0, 1 and 2 denote the initial (0) particle data and subsequent particle data after transport through 1 or 2 magnetic elements. Figure 1: TWISS parameters for the DIAMOND BTS GPMAD compared to MAD-X (no space charge) Figure 4: Transverse beam emittance for the ALICE transfer line - GPMAD compared to ASTRA (with space charge) Figure 3: Stability under magnetic element splitting – GPMAD with space charge for a 1m quadrupole Figure 2: Run times for the DIAMOND BTS GPMAD compared to MAD-X (no space charge) Without space charge we see that the TWISS parameters (optical parameters that characterise the bunch of particles) are identical to MAD-X, in fact the raw particle data is identical to 10 significant figures. Figure 2 compares the performance of GPMAD to that of MAD-X, it is clear that GPMAD offers an accurate particle tracking code with considerable improvement in performance over the MAD-X tracking algorithm. Figure 3 shows that GPMAD’s space charge algorithm is stable under magnetic element splitting – here a 1 metre quadrupole element is split into 10, 20 and 100 parts. We can infer from Figure 4 that GPMAD’s space charge algorithm gives similar emittance growth behaviour to that of ASTRA for identical initial particle distributions. Figure 5 illustrates the performance benefits of GPMAD over ASTRA – this is made even more apparent in Figure 6 where a logarithmic scale has been used to show that GPMAD is around 100 times faster than ASTRA. The performance gains of GPMAD scale with the GPU that is used, newer models offer more processors and thus better performance. GPMAD is a proof of principle; for parallel problems such as many particle transport, the GPU offers an affordable and mobile solution. Here we have implemented an algorithm which exploits the parallel nature of the GPU, and in doing so offer performance comparable with HPC at a substantial monetary saving. [1] M.D. Salt, R. B. Appleby, D. S. Bailey, Beam Dynamics using Graphical Processing Units – EPAC08 – TUPP085 [2] Methodical Accelerator Design, mad.web.cern.ch/mad/ [3] A Space charge Tracking Algorithm, [4] K. Brown, A First- and Second-Order Matrix Theory for the Design of Beam Transport Systems and Charged Particle Spectrometers – SLAC-75 CPUGPU The CPU operates sequentially. To transport N particles through a single magnetic element, each particle will be transported one after the other. The time taken scales like ~ N 2. In reality N is large – to simulate this requires either a large run time or expensive hardware (HPC). The GPU is a parallel processor. Using the Singe Instruction Multiple Data framework (SIMD) N particles can be transported through the same magnetic element simultaneously. The time taken scales like ~ N. This allows parallel problems such as particle tracking to be performed quickly and inexpensively. Particle Magnetic Element Bunch of Particles Figure 5: Run times for the ALICE Transfer Line GPMAD compared to ASTRA (with space charge) Figure 6: Run times for the ALICE Transfer Line GPMAD compared to ASTRA (with space charge)