Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009.

Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009

Introduction Cyclotron beam dynamic problems [1]: Losses on geometry Space Charge effects Optimization of the central region [2] CBDA [3] code calculations: OpenMP ( by CPU ) CUDA ( by GPU ) __________________________________________________________________ [1] Beam injection and extraction of RIKEN AVF cyclotron, A. Goto, CNS-RIKEN Workshop on Upgrade of AVF Cyclotron, CNS Wako Campus, 3-4 March 2008 [2] SPIRAL INFLECTORS AND ELECTRODES IN THE CENTRAL REGION OF THE VINCY CYCLOTRON, E. Perepelkin, A. Vorozhtsov, S. Vorozhtsov, P. Beličev, V. Jocić, N. Nešković, etc., Cyclotrons and Their Applications 2007, Eighteenth International Conference [3] CBDA - CYCLOTRON BEAM DYNAMICS ANALYSIS CODE, E. Perepelkin, S. Vorozhtsov, RuPAC 2008, Zvenigorod, Russia

Computer model of the cyclotron Injection line ESD Dee Magnet sectors

Regions of the field maps Inflector Electric field Axial channel Magnetic field G1 Magnetic field

Axial injection line

Cyclotron

Central region optimization φ RF = 13° φ RF = 15° φ RF = 28° φ RF = 10°

Particle losses

Bunch acceleration

Optimization process S0S1S2 S3S4

Acceleration field map

Very time consuming problem About 5 different variants – minimum Many ion species – accelerated Very complicated structure Multi macro particle simulations for SC dominated beams One run requires ~ several days of computer time

Open Multi-Processing ( Open MP )

Spiral inflector

Beam phase space projections at the inflector entrance

Beam phase space projections at the inflector exit Blue points – PIC by FFT (Grid: 2 5 x 2 5 x 2 5 ) Red points – PP

Method Without OpenMP With OpenMP Computer platform PP 4 h. 53 min.2 h. 34 min. AMD Turion 64×2, 1.60 GHz 4 h. 38min1 h. 25 min. Intel Core Quad 2.4 GHz PIC 2 5 x 2 5 x 2 5 ~11 min.~6 min. AMD Turion 64×2, 1.60 GHz 7 min.~2 min. Intel Core Quad 2.4 GHz Calculation time 10,000 particles No geometry losses

Compute Unified Device Architecture ( CUDA )

GeForce 8800 GTX ( price ~ $300 )

GPU structure 128 SP ( Streaming Processors )

Kernel functions __global__ void Track ( field maps, particles coordinates ) Calculate particle motion in electromagnetic field maps __global__ void Losses ( geometry, particles coordinates ) Calculate particle losses on the structure __global__ void Rho ( particles coordinates ) Produce charge density for SC effects

Kernel functions __global__ FFT ( charge density ) FFT method ( analysis / synthesis ) __global__ PoissonSolver ( Fourie’s coefficients ) Find solution of Poisson equation __global__ E_SC ( electric potential ) Calculate electric field by E = -grad( U )

__global__ void Track ( ) Function with many parameters. Use variable type __constant__: __device__ __constant__ float d_float[200]; __device__ __constant__ int d_int[80]; Particle number corresponds int n = threadIdx.x+blockIdx.x*blockDim.x; Number of “if, goto, for” should be decreased;

__global__ void Losses ( ) Geometry structure consists from triangles. Triangles coordinates stored in __shared__ variables. This feature gave drastically increase performance int tid = threadIdx.x; - used for parallel copying data to shared memory Particle number corresponds to int n = threadIdx.x+blockIdx.x*blockDim.x; Check particles and triangle match

__global__ void Rho Calculate charge impact in the nodes of mesh from particle with number int n = threadIdx.x+blockIdx.x*blockDim.x; Cell 7 Cell 1 Cell 8 Cell 3 Cell 2 Cell 5 Cell 6 Node

__global__ FFT ( ) Used real FFT for sin(πn/N) basis functions; 3D transform consist from three 1D FFT for each axis: X, Y, Z int n = threadIdx.x+blockIdx.x*blockDim.x; k=(int)(n/(NY+1)); j=n-k*(NY+1); m=j*(NX+1)+k*(NX+1)*(NY+1); FFT_X[i+1]=Rho[i+m]; n = j + k*(NY+1) NZ NY

__global__ PoissonSolver ( ) int n = threadIdx.x+blockIdx.x*blockDim.x; U ind(i,j,k) = U ind(i,j,k) / ( kx i 2 + ky j 2 + kz k 2 ) ind(i,j,k)=i+j*(NX+1)+k*(NX+1)*(NY+1); k=(int)(n/(NX+1)*(NY+1)); j=(int)(n-k*(NX+1)*(NY+1))/(NX+1); i=n-j*(NX+1)-k*(NX+1)*(NY+1);

__global__ E_SC ( ) int n = threadIdx.x+blockIdx.x*blockDim.x+st_ind UnUn U n + 1 U n - 1 U n - ( NX + 1 ) U n + ( NX + 1 ) U n - ( NX + 1 )( NY + 1 ) U n + ( NX + 1 )( NY + 1 )

Performance Functions* Time, [msec]Ratio, [x] CPUGPU Track4863016 Losses69977593 Rho79614 Poisson/FFT35313 E_SC1.20.81.4 Total759811467 * Mesh size: 2 5 x 2 5 x 2 5. Particles: 100,000. Triangles: 2054

Comparison Number of particles Calculation time Rate, [x] CPUGPU 1,0003 min 19 sec12 sec17 10,00034 min 14 sec42 sec49 100,0005h 41 min~6 min56 1,000,0002 days 8h 53 min1h60

SC effect no SC Losses 24% SC Losses 94% I = 4 mA

Conclusions Very chipper technology Increasing of performance at power 1.5 gave chance to produce the complex cyclotron modeling Careful programming Expand this method for calculation of beam halo and etc.

Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009.

Similar presentations

Presentation on theme: "Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009.

Similar presentations

Presentation on theme: "Beam Dynamic Calculation by NVIDIA® CUDA Technology E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna 7 July 2009."— Presentation transcript:

Similar presentations

About project

Feedback