DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian
DCABES 2009 China University Of Geosciences 2 Outline Introduction pB Calculation Formula Serial pB Calculation Process Parallel pB Calculation Models Conclusion
DCABES 2009 China University Of Geosciences 3 Part Ⅰ. Introduction Space weather forecast needs an accurate solar wind model for the solar atmosphere and the interplanetary space. The global model of corona and heliosphere is the basis of numerical space weather forecast, and the observation basis of explaining various relevant relations. Meanwhile, three-dimensional numerical Magnetohydrodynamics (MHD) simulation is one of the most common numerical methods to study corona and solar wind.
DCABES 2009 China University Of Geosciences 4 Part Ⅰ. Introduction Besides, calculating and converting the generated coronal electron density to the coronal polarization brightness (pB) is the key method of comparing with observation results, and is important to validate the MHD models. Due to the massive data and the complexity of the pB model, the computation will cost too much time to visualize the pB data in nearly real time while using a single CPU (or core).
DCABES 2009 China University Of Geosciences 5 Part Ⅰ. Introduction According to the characteristic of CPU/GPU computing environment, we analyze the pB conversion algorithm, implement two parallel models of pB calculation with MPI and CUDA, and compares the two models’ efficiency.
DCABES 2009 China University Of Geosciences 6 Part Ⅱ. pB Calculation Formula pB is derived from electron-scattered photosphere radiation. It can be used in the inversion of coronal electron density and to validate numerical models. Taking limb darkening into account, pB calculation formula of a small coronal volume element is shown as followed : (1) (2) (3)
DCABES 2009 China University Of Geosciences 7 Part Ⅱ. pB Calculation Formula The polarization brightness image for comparing with the observation of coronagraph can be generated through integrating the electron density along the line of sight. Density integral Process of pB Calculation
DCABES 2009 China University Of Geosciences 8 Part Ⅲ. Serial pB Calculation Process The steps of the serial model of pB calculation on CPU with the experimental data are shown as below. The serial process of pB calculation
DCABES 2009 China University Of Geosciences 9 Part Ⅲ. Serial pB Calculation Process According to the serial process of pB calculation above, we implement it under the environment of G95 on Linux and Visual Studio 2005 on Windows XP respectively. With being measured the time cost of each step, it is found that the most time-consuming part of the whole program is the calculation of pB values, accounting for 98.05% and 99.05% of the total time cost respectively.
DCABES 2009 China University Of Geosciences 10 Part Ⅲ. Serial pB Calculation Process Therefore, in order to improve the performance to meet the command of getting coronal polarization brightness in nearly real-time, we should optimize the calculation part of pB values. As the density integration of each point over solar limb along the line of sight is independent, the parallel computation method is very suitable for pB calculation.
DCABES 2009 China University Of Geosciences 11 Part Ⅳ. Parallel pB Calculation Models Currently, parallelized MHD numerical calculation is mainly based on MPI. With the development of high performance computation, using GPU architecture to solve intensive computation shows obvious advantages. Based on this situation, it will be an efficient parallel solution to implement the parallel MHD numerical calculation using GPU. We implement two parallel models based on MPI and CUDA respectively.
DCABES 2009 China University Of Geosciences 12 Part Ⅳ. Parallel pB Calculation Models Experiment Environment Experimental Data 42×42×82(r, θ, φ) density data(den) 321×321×481(x, y, z) cartesian coordinate grid 321×321 pB values will be generated. Hardware Intel(R) Xeon(R) CPU, 2.00GHz(8 CPUs) 1GB memory NVIDIA Quadro FX 4600 GPU, 760MB Global Memory GDDR3 SDRAM graphics card (It owns G80 kernel architecture, 12 MPs and 128 SPs )
DCABES 2009 China University Of Geosciences 13 Part Ⅳ. Parallel pB Calculation Models Experiment Environment Compiling Environment CUDA-based parallel model Visual Studio 2005 on Windows XP CUDA 1.1 SDK MPI-based parallel model G95 on Linux MPICH2
DCABES 2009 China University Of Geosciences 14 Part Ⅳ. Parallel pB Calculation Models MPI-based Parallelized Implementation In the MPI environment, how the experiment decomposes computing domain into sub-domains is shown as bellow.
DCABES 2009 China University Of Geosciences 15 Part Ⅳ. Parallel pB Calculation Models MPI-based Parallelized Implementation
DCABES 2009 China University Of Geosciences 16 Part Ⅳ. Parallel pB Calculation Models MPI-based Parallelized Implementation The final result shows that MPI-based parallel model reaches a speedup of 5.8. As the experiment is implemented under the platform with 8 CPU cores, the speed-up ratio of the result is closed to its theoretical value. Meanwhile, it is revealed that the MPI-based parallel solution for the experiment has balanced the utilization ratio of processors and the communication between processors.
DCABES 2009 China University Of Geosciences 17 Part Ⅳ. Parallel pB Calculation Models CUDA-based Parallelized Implementation According to pB serial calculation process and the CUDA architecture, we should put the calculation part into the Kernel function to implement the parallel program. Since the calculation of density interpolation and the cumulative sum involved in every pB value are independent, we can use multi-threads to process the pB value calculation in the CUDA, and each thread calculates one pB value.
DCABES 2009 China University Of Geosciences 18 Part Ⅳ. Parallel pB Calculation Models CUDA-based Parallelized Implementation However, the pB values to be calculated is much larger than the available thread number of GPU, so each thread should calculate multiple pB values. According to experimental conditions, the thread number is setting to 256 for each block so as to maximize the use of computing resources. The block number depends on the ratio of pB number and thread number. In addition, since the access time of global memory is large, we can put some independent data to the shared memory to reduce data access time.
DCABES 2009 China University Of Geosciences 19 Part Ⅳ. Parallel pB Calculation Models CUDA-based Parallelized Implementation The size of data put into shared memory is about 7KB, less than 16KB provided by GPU, so the parallel solution is feasible. Moreover, the data-length array is read-only and its using frequency is very high, so the optimized strategy that the data-length array is migrated from shared memory into constant memory is adopted to further improve its access efficiency. The CUDA-based parallel calculation process is shown as bellow.
DCABES 2009 China University Of Geosciences 20 Part Ⅳ. Parallel pB Calculation Models
DCABES 2009 China University Of Geosciences 21 Part Ⅳ. Parallel pB Calculation Models Experiment results The pB calculation time of two models is shown in Table 1. Table 1. The pB calculation time of serial models and parallel models and their speed-up ratio MPI ( G95 ) CUDA ( Visual Studio 2005 ) pB calculation time of serial models ( s ) pB calculation time of parallel models ( s ) Speed-up ratio
DCABES 2009 China University Of Geosciences 22 Part Ⅳ. Parallel pB Calculation Models Experiment results The total performance of two models is as shown in Table 2. Table 2. The total running-time of two parallel models and the speed-up ratios compared with their serial models MPI ( G95 )( s ) CUDA ( Visual Studio 2005 )( s ) The speed-up ratio of running-time Serial models Parallel models
DCABES 2009 China University Of Geosciences 23 Part Ⅳ. Parallel pB Calculation Models Experiment results Finally, we draw the coronal polarization brightness image shown as bellow with using calculated data.
DCABES 2009 China University Of Geosciences 24 Conclusion Under the same environment, pB calculation time of MPI-based parallel model costs seconds while the serial model costs seconds. The model’s speedup is The pB calculation time of CUDA-based parallel model costs seconds while the serial model costs seconds. The model’s speedup is The total running-time of CUDA-based model is 2.84 times than that of MPI-based model.
DCABES 2009 China University Of Geosciences 25 Conclusion It finds that the CUDA-based parallel model is more suitable for pB calculation, and it provides a better solution for post-processing and visualizing the MHD numerical calculation results.
DCABES 2009 China University Of Geosciences 26 Thank you!!!