EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GPGPU status of deployment and user requirements EGI GPGPU-VT (Slides by John Walsh, Karolis Eigelis, Sophie Ferry)
EGI-InSPIRE RI Motivation Anecdotal evidence that GPGPUs and other accelerator devices will drive many (future) scientific and commercial computational needs: Moore’s law limitations in single core Data deluge Top 500 HPC centres deployments … GPGPU are (somewhat) disruptive However - no direct Grid support (yet).
EGI-InSPIRE RI Learning from history MPI on EDG/EGEE support initially poor RCs implemented different solutions Fragmentation No standardization Establishment of MPI-WG (circa 2006) Developed standardized mechanism for: S/W + H/W advertisement in GlueSchema Improved resource allocation at RCs Improved WMS support Lesson: get interested parties involved early!
EGI-InSPIRE RI NILs and Virtual Teams Discussion between EGI members on how to support GPGPUs (> 12 months) Wanted to gauge usage and potential impact GPGPUs on EGI Formal EGI-InSPIRE structures NILs + Virtual Teams (VT) VTs investigate issues/problems/solutions Short lived (3 to 6 months) Proposed set-up of short lived GPGPU VT First of many steps?
EGI-InSPIRE RI The GPGPU VT Proposed after EGI CF 2012 Started June 2012, with 3 month lifetime 22 members ( Including MAPPER members) Determine 5W+H Method Gather Use Cases (On Wiki) Develop two survey Resource Centre Administrators Users Communities – Grid and non-Grid
EGI-InSPIRE RI Part II Resource Centre Survey
EGI-InSPIRE RI RC Survey Aim - See how the RC administrators (intend to) deploy and integrate GPGPUs or other accelerated devices into the Grid (or hybrid Grid/Cloud infrastructures), 13 Questions Many questions allowed respondents to add further optional details 44 Responses Several from NGIs acting on behalf of RCs
EGI-InSPIRE RI Survey Format Current and Future Deployment Plans Q1, Q2, Q3 Batch System and H/W Profile Q4, Q5, Q6, Q7, Q9, Q10, Q11 User Community Profile Q8 Respondent Information Q12 Other Misc Information Q13
EGI-InSPIRE RI Current Deployments Q1 Does your site currently provide GPGPU resources Answered by 43 respondents Yes 13 (30.2%) No 30 (69.8%) Comments asked for if “no” response include 1x Will deploy GPGPUs (1) 1x Current hardware not capable (1) 1x Not within scope of project, but would like to 1x Users have no use/interest.
EGI-InSPIRE RI GPGPU expansion at current sites Q2 Do you plan to further extend the amount of GPGPU compute capacity offered in the coming 24 months? Answered by 13 “Yes” respondents of Q1 Yes 11 (84.6%) No 2 (15.4%) Comments 2x Yes, depends on user requirements 1x No, due to financial reasons
EGI-InSPIRE RI GPU expansion at sites Q3 Will your site provide GPGPU resources in the coming 24 months ? Answered by 41 respondents Yes 23 (56.3%) No 18 (43.9%) Comments 4x No, due to budget/Site decommissioning 1x No, not in plans 1x No, but could change
EGI-InSPIRE RI Q1/Q2/Q3 Conclusions Of sites that responded affirmatively to Q1 (30.2%), most of these sites (84.6%) intend to increase their offering in the next 2 years. This may depend on User requirements. Over 50% of sites expect to have some GPGPU resources within 2 years
EGI-InSPIRE RI Batch System Usage Q4 Which LRMS is/will be used to provide access to GPGPUs? Answered by 22 respondents Some RCs/NGI selected multiple batch systems Torque 18 (71%) Slurm 6 (21.8%) Other 4 (14%) – includes Maui
EGI-InSPIRE RI Q5 How are/will the GPU resources be seen from LRMS point of view ?
EGI-InSPIRE RI Q6 Does/will every node on your cluster has/have GPU? Answered by 23 respondents Yes 5 (21.7%) No 18 (78.3%)
EGI-InSPIRE RI User/Group Access Q7 - Is access to GPGPU-enabled hosts restricted to specific users or groups Answered by 14 respondents Yes 7 (50.0%) No 7 (50.0%)
EGI-InSPIRE RI Scientific Disciplines (1) Q8 Which user communities/projects are making use of GPGPUs in your site ? Answered by 12 of 13 respondents Shows large number of diverse disciplines already using the infrastructure
EGI-InSPIRE RI Physical Setup Q9 - Please provide information about the GPGPU hardware deployed in your site … 13 Responses Typ. 1, 2 or 4 GPGPUs per physical machine 2 GPGPUs per node was most common Will need to contact some respondents for clarification NVIDIA dominates current deployments
EGI-InSPIRE RI User Job Submission Support Q10 - Please provide information for the end-user or url to the documentation for accessing your GPGPU resources (e.g. how to submit a GPU job) Responses (8) indicate Lack of user documentation for job submission, or Use of custom grid/batch submission techniques
EGI-InSPIRE RI Managed Access Customisations Q11 - Did you implement some additional mechanism to manage access to GPU devices ? (e.g. custom prolog/epilog scripts, apper/ etc.) Answered by 8 respondents Yes 2 No 6
EGI-InSPIRE RI Part III User Survey
EGI-InSPIRE RI Grids/Clouds Users? Q1 Do you currently use grid or cloud technologies ? Answered by 44 respondents Yes 37 (84.1%) No 7 (15.9%) Significant number or respondents are non-grid/cloud users. Some use HPC resources, some use Desktop
EGI-InSPIRE RI Interested in Using GPGPU on Grid? Q2 - Would you be interested in accessing remote GPGPU based resources through a computing infrastructure, such as National Grid Infrastructures or the European Grid Infrastructure (EGI) ? Answered by 42 respondents Yes 39 (92.9%) No 3 (7.1%) The “No” responses cited complex job submission etc
EGI-InSPIRE RI GPGPU users Q3 - Do you use GPGPU based applications for your scientific computations ? Answered by 47 respondents Yes 30 (64.8%) No 17 (36.2%) Just over half of respondents currently use GPGPUs
EGI-InSPIRE RI Q1/Q2/Q3 Conclusion Vast majority of surveyed users would be interested in accessing GPGPUs through the Grid..
EGI-InSPIRE RI User Req. Summary Highlights User showed preferences for: CUBLAS/CUSPARSE libraries (93%,60%) Double Floating Point Arithmetic (70.6%) Has impact on RC choice of GPGPU hardware? Some indicate this is desirable/preferable CUDA/OpenCL dev APIs (94.1%,41.2%) OpenACC given honourable mention Some will use MPI/Mult. GPGPUs per WN Exclusive access to WorkerNode (64.7%) Fast network connection desirable
EGI-InSPIRE RI Benchmarks Respondents were asked if they could quantify overall speed-up (measured against typical single core CPU) 20 responses Evidence is mostly anecdotal However, some papers published by Users Heavily dependent on application Most frequent answer was about 10x Max 500x increase Many in 50x-100x
EGI-InSPIRE RI User Survey RC implications Users would like to use GPGPUs on EGI Double Precision H/W is preferable Users would like exclusive access to W/N CUDA will be main development platform This is not unexpected RCs may also be expected to install extra libraries (CUBLAS)
EGI-InSPIRE RI Conclusions GPGPU deployment and user base is expected to increase in next 24 months Predominantly NVIDIA Users would like to be able to access these resources on EGI Users prefer Full node allocation Potentially many GPGPUs per Phys host
EGI-InSPIRE RI Next Steps? Reasonably good evidence to further investigate technical issues of GPGPU integration. EGI SA1 Interest Group/Technical Group? Use VT members + survey contacts to develop this group?
EGI-InSPIRE RI VT-GPGPU members Karolis Eigelis (EGI.eu)John Walsh (TCD) Emanouil Atanassov (IICT-BAS)Radosław Januszewski (EGI.eu) Marek Blazewicz (ICBP)Aneta Karaivanova (IICT-BAS) Miguel Cárdenas-Montes (CIEMAT)Jan Just Keijser (FOM) Abdeslem Djaoui (STFC)Pierrick MICOUT (CEA) Tiziana Ferrari (EGI.eu)Mariusz Mamoński (ICBP) Nuno L. Ferreira (EGI.eu)Pablo Briongos Rabadán (CSIC) Sophie Ferry (CEA)Andrea Sartirana (CNRS) Maciej Filocha (UWAR)Oleksandr Savytskyi (IMBG/NAS) Andrea Giachetti (CIRMMP)Mariusz Sterzel (CYFRONET) Тодор Гюров (IICT-BAS)Hardi Teder (EENet)