Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009.

Slides:



Advertisements
Similar presentations
IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.
Advertisements

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
An Effective GPU Implementation of Breadth-First Search Lijuan Luo, Martin Wong and Wen-mei Hwu Department of Electrical and Computer Engineering, UIUC.
Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
GPU Computing with CUDA as a focus Christie Donovan.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
GPU PROGRAMMING David Gilbert California State University, Los Angeles.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
SAGE: Self-Tuning Approximation for Graphics Engines
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
CA+KF Track Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008.
GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Shu Nishioka Faculty of Science and Technology, Keio Univ.
3D Templates = Elliptic Discs or Rings: Space Charge Templates for 3D beam simulation Contribution to HB 2008, ORNL/SNS, Tennessee L.G.Vorobiev * APC,
Modeling GPU non-Coalesced Memory Access Michael Fruchtman.
MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
SPECIALISED CYCLOTRON FOR BEAM THERAPY APPLICATION Yu. G. Alenitsky, A
Radar Pulse Compression Using the NVIDIA CUDA SDK
IRPSS: A Green’s Function Approach to Modeling Photoinjectors Mark Hess Indiana University Cyclotron Facility & Physics Department *Supported by NSF and.
GPU Architecture and Programming
GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012.
Calculation of the beam dynamics of RIKEN AVF Cyclotron E.E. Perepelkin JINR, Dubna 4 March 2008.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
The Mathematical Modeling of New Operation Modes of Multi–purpose Isochronous Cyclotrons Authors: I.V.Amirhanov, G.A.Karamysheva, I.N. Kiyan Joint Institute.
 Advanced Accelerator Simulation Panagiotis Spentzouris Fermilab Computing Division (member of the SciDAC AST project)
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU
Main Ring + Space charge effects WHAT and HOW … Alexander Molodozhentsev for AP_MR Group May 10, 2005.
‘Computer power’ budget for the CERN Space Charge Group Alexander Molodozhentsev for the CERN-ICE ‘space-charge’ group meeting March 16, 2012 LIU project.
Physics of electron cloud build up Principle of the multi-bunch multipacting. No need to be on resonance, wide ranges of parameters allow for the electron.
FNAL 8 GeV SC linac / HINS Beam Dynamics Jean-Paul Carneiro FNAL Accelerator Physics Center Peter N. Ostroumov, Brahim Mustapha ANL March 13 th, 2009.
Midwest Accelerator Physics Meeting. Indiana University, March 15-19, ORBIT Electron Cloud Model Andrei Shishlo, Yoichi Sato, Slava Danilov, Jeff.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Cyclotrons Chapter 3 RF modelisation and computation B modelisation and computation Beam transport computation 1.
Space Charge with PyHEADTAIL and PyPIC on the GPU Stefan Hegglin and Adrian Oeftiger Space Charge Working Group meeting –
Warp LBNL Warp suite of simulation codes: developed to study high current ion beams (heavy-ion driven inertial confinement fusion). High.
Development of a GPU based PIC
My Coordinates Office EM G.27 contact time:
Performed by:Liran Sperling Gal Braun Instructor: Evgeny Fiksman המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory.
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
GPU-based iterative CT reconstruction
Study of Beam Properties at SECRAL and The Solenoid Pre-focusing LEBT
Amit Amritkar & Danesh Tafti Eric de Sturler & Kasia Swirydowicz
Injector Cyclotron for a Medical FFAG
Sabrina Appel, GSI, Beam physics Space charge workshop 2013, CERN
Linchuan Chen, Xin Huo and Gagan Agrawal
Cross-Check for 14N5+ ion acceleration regime
November 14, 2008 The meeting on RIKEN AVF Cyclotron Upgrade Progress report on activity plan Sergey Vorozhtsov.
11 MeV/u 16O7+ ion acceleration
November 7, 2008 The meeting on RIKEN AVF Cyclotron Upgrade Progress report on activity plan Sergey Vorozhtsov.
Extraction for 14N5+ ion acceleration regime
Summary & Concluding remarks
Calibration simulation for 14N5+ ion acceleration regime
Cut inflector electrodes for 14N5+ ion
11 MeV/u 16O7+ ion acceleration
Graphics Processing Unit
6- General Purpose GPU Programming
Simon Jolly UKNFIC Meeting 25th April 2008
Presentation transcript:

Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009

Introduction Cyclotron beam dynamic problems [1]: Losses on geometry Space Charge effects Optimization of the central region [2] CBDA [3] code calculations: OpenMP ( by CPU ) CUDA ( by GPU ) __________________________________________________________________ [1] Beam injection and extraction of RIKEN AVF cyclotron, A. Goto, CNS-RIKEN Workshop on Upgrade of AVF Cyclotron, CNS Wako Campus, 3-4 March 2008 [2] SPIRAL INFLECTORS AND ELECTRODES IN THE CENTRAL REGION OF THE VINCY CYCLOTRON, E. Perepelkin, A. Vorozhtsov, S. Vorozhtsov, P. Beličev, V. Jocić, N. Nešković, etc., Cyclotrons and Their Applications 2007, Eighteenth International Conference [3] CBDA - CYCLOTRON BEAM DYNAMICS ANALYSIS CODE, E. Perepelkin, S. Vorozhtsov, RuPAC 2008, Zvenigorod, Russia

Computer model of the cyclotron Injection line ESD Dee Magnet sectors

Regions of the field maps Inflector Electric field Axial channel Magnetic field G1 Magnetic field

Axial injection line

Cyclotron

Central region optimization φ RF = 13° φ RF = 15° φ RF = 28° φ RF = 10°

Particle losses

Bunch acceleration

Optimization process S0S1S2 S3S4

Acceleration field map

Very time consuming problem About 5 different variants – minimum Many ion species – accelerated Very complicated structure Multi macro particle simulations for SC dominated beams One run requires ~ several days of computer time

Open Multi-Processing ( Open MP )

Spiral inflector

Beam phase space projections at the inflector entrance

Beam phase space projections at the inflector exit Blue points – PIC by FFT (Grid: 2 5 x 2 5 x 2 5 ) Red points – PP

Method Without OpenMP With OpenMP Computer platform PP 4 h. 53 min.2 h. 34 min. AMD Turion 64×2, 1.60 GHz 4 h. 38min1 h. 25 min. Intel Core Quad 2.4 GHz PIC 2 5 x 2 5 x 2 5 ~11 min.~6 min. AMD Turion 64×2, 1.60 GHz 7 min.~2 min. Intel Core Quad 2.4 GHz Calculation time 10,000 particles No geometry losses

Compute Unified Device Architecture ( CUDA )

GeForce 8800 GTX ( price ~ $300 )

GPU structure 128 SP ( Streaming Processors )

Kernel functions __global__ void Track ( field maps, particles coordinates ) Calculate particle motion in electromagnetic field maps __global__ void Losses ( geometry, particles coordinates ) Calculate particle losses on the structure __global__ void Rho ( particles coordinates ) Produce charge density for SC effects

Kernel functions __global__ FFT ( charge density ) FFT method ( analysis / synthesis ) __global__ PoissonSolver ( Fourie’s coefficients ) Find solution of Poisson equation __global__ E_SC ( electric potential ) Calculate electric field by E = -grad( U )

__global__ void Track ( ) Function with many parameters. Use variable type __constant__: __device__ __constant__ float d_float[200]; __device__ __constant__ int d_int[80]; Particle number corresponds int n = threadIdx.x+blockIdx.x*blockDim.x; Number of “if, goto, for” should be decreased;

__global__ void Losses ( ) Geometry structure consists from triangles. Triangles coordinates stored in __shared__ variables. This feature gave drastically increase performance int tid = threadIdx.x; - used for parallel copying data to shared memory Particle number corresponds to int n = threadIdx.x+blockIdx.x*blockDim.x; Check particles and triangle match

__global__ void Rho Calculate charge impact in the nodes of mesh from particle with number int n = threadIdx.x+blockIdx.x*blockDim.x; Cell 7 Cell 1 Cell 8 Cell 3 Cell 2 Cell 5 Cell 6 Node

__global__ FFT ( ) Used real FFT for sin(πn/N) basis functions; 3D transform consist from three 1D FFT for each axis: X, Y, Z int n = threadIdx.x+blockIdx.x*blockDim.x; k=(int)(n/(NY+1)); j=n-k*(NY+1); m=j*(NX+1)+k*(NX+1)*(NY+1); FFT_X[i+1]=Rho[i+m]; n = j + k*(NY+1) NZ NY

__global__ PoissonSolver ( ) int n = threadIdx.x+blockIdx.x*blockDim.x; U ind(i,j,k) = U ind(i,j,k) / ( kx i 2 + ky j 2 + kz k 2 ) ind(i,j,k)=i+j*(NX+1)+k*(NX+1)*(NY+1); k=(int)(n/(NX+1)*(NY+1)); j=(int)(n-k*(NX+1)*(NY+1))/(NX+1); i=n-j*(NX+1)-k*(NX+1)*(NY+1);

__global__ E_SC ( ) int n = threadIdx.x+blockIdx.x*blockDim.x+st_ind UnUn U n + 1 U n - 1 U n - ( NX + 1 ) U n + ( NX + 1 ) U n - ( NX + 1 )( NY + 1 ) U n + ( NX + 1 )( NY + 1 )

Performance Functions* Time, [msec]Ratio, [x] CPUGPU Track Losses Rho79614 Poisson/FFT35313 E_SC Total * Mesh size: 2 5 x 2 5 x 2 5. Particles: 100,000. Triangles: 2054

Comparison Number of particles Calculation time Rate, [x] CPUGPU 1,0003 min 19 sec12 sec17 10,00034 min 14 sec42 sec49 100,0005h 41 min~6 min56 1,000,0002 days 8h 53 min1h60

SC effect no SC Losses 24% SC Losses 94% I = 4 mA

Conclusions Very chipper technology Increasing of performance at power 1.5 gave chance to produce the complex cyclotron modeling Careful programming Expand this method for calculation of beam halo and etc.