R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde

R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde
Experience of Porting and Optimization of Seismic Modelling on Multi and Many Cores of Hybrid Computing Cluster We P4 14 Introduction Seismic modelling is a technique for simulation of seismic response for a given geological subsurface model and shot receiver geometry. It is based on finite-difference solution of second order wave equation. Till last decade, the finite difference based seismic modelling application had fairly scaled on single processor based parallel clusters using MPI. After the advent of accelerators like Nvidia’s GPU’s and coprocessors like Intel’s Xeon Phi with many cores, the MPI only programming model became inefficient in terms of performance. There is a need felt to use hybrid programming models and apply various optimizations to enhance the performance of the application. In this paper, we report our experience of porting and optimization of legacy 2D acoustic modelling application on hybrid architecture of PARAM Yuva II. This application solves seismic acoustic wave equation using finite difference method which is second order accurate in time and fourth order in space. The initial application was MPI based and used domain decomposition approach for parallelization. The optimization and porting details on Xeon and Xeon Phi along with comparative performance study results are presented here. Acknowledgements Authors are thankful to Centre for Development of Advanced Computing (CDAC), Pune, for permission to publish this work and grateful to Mr Arvind Amin from Intel for his expert advice and initial guidance. References Dongarra, J. et al. [2011] The international exascale software project roadmap. Int. J. High Perform. Comput. Appl., 25(1), 3 – 60. Fang, J., Sips, H., Zhang, L., Xu, C., Che, Y. and Varbanescu, A.L. [2014] Test-driving intel xeon phi. Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, 137–148. Subrata, C., Sudhakar, Y., Suhas, P. and Dheeraj, B. [2003] Parallelization strategies for seismic modeling algorithms. J. Ind. Geophys. Union, 7(1), 11 – 14. Sudhakar, Y., Dheeraj, B., Subrata, C. and Suhas, P. [2002] Finite difference forward modeling for complex geological models. SEG Technical Program Expanded Abstracts, 1987 – 1990. Zhebel, E., Minisini, S., Kononov, A. and Mulder, W. [2013] Performance and scalability of finite-difference and finite-element wave-propagation modeling on intel’ s xeon phi. SEG Technical Program Expanded Abstracts, 3386 – 3390. Optimizations on Xeon Porting on Xeon Phi Compute time of the application on Xeon due to augmentative optimizations and its relative speedup. Compute time of the application on Xeon Phi in native and symmetric modes. Methodology Initially, this application was using MPI for parallelization and the approach was to divide computation among processors using domain decomposition. Following steps are taken to run this application on hybrid architecture and compare the performance on multi cores of Xeon cluster and many cores of Xeon Phi coprocessor : The application was profiled to identify the hotspots. The analysis for application using 2×2 domain decomposition indicated that the wave propagation computation function is the most compute intensive part of the application. The major computation is 80% and MPI communication is 17 % of the total compute time. OpenMP was introduced at the wave propagation loop to achieve data decomposition using multiple cores of Xeon. Various optimization techniques were applied to enhance the performance of the application. The optimized application was ported on many cores enabled Xeon Phi using native and symmetric mode. Scalablity and Efficiency on Xeon and Xeon Phi Before Optimization After Optimization Application Outcome The System – PARAM Yuva II Conclusions Optimizations and porting of legacy finite difference based seismic modelling application on Xeon and Xeon Phi was successfully demonstrated using PARAM Yuva II. Performance of 5.5X was achieved on Xeon due to optimizations. The performance on Xeon was better than Xeon Phi, but after optimizations, the compute time of Xeon Phi using native mode and Xeon using single node were comparable for 2X8 domain decomposition. Maximum efficiency achieved for Xeon is 46% and 8% for Xeon Phi wherein it does not improve further with increase in domain decomposition. Compute time are presented for different symmetric and native executions on Xeon Phi. In symmetric mode we got comparable and better compute time with Xeon in few domain decompositions. As seismic modelling is a key application for advanced applications like RTM and FWI, further exploration for performance gain of such applications is required on similar hardware platform. A hybrid computing cluster with peak performance of TF. Each node have two Intel Xeon E (Sandybridge) processors and Xeon Phi 5110P coprocessors. Seismic modelling outcome (a) Input velocity model (b) modelling parameters (c) synthetic seismogram for single shot location (d) wave propagation snapshots at time 0.1 sec, 0.2 s, 0.3 s and 0.4 s.

R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde

Similar presentations

Presentation on theme: "R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde

Similar presentations

Presentation on theme: "R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde"— Presentation transcript:

Similar presentations

About project

Feedback