HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1.

HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1

Agenda Introduction – Where we are now CERN used applications requiring HPC infrastructure User cases (Engineering) – Ansys Mechanical – Ansys Fluent Physics HPC applications Next steps Q&A 2

Introduction Some 95% of our applications are served well with bread- and-butter machines We (CERN IT) have invested heavily in AI including layered approach to responsibilities, virtualization, private cloud. There are certain applications, traditionally called HPC applications, which have different requirements Even though these applications sail under common HPC name, they are different and have different requirements These applications need detailed requirements analysis 4

Scope of talk We contacted our user community and started to gather continuously user requirements We have started detailed system analysis of our HPC applications to gain knowledge of their behavior. In this talk I would like to present the progress and the next steps At a later stage, we will look how the HPC requirements can fit into the IT infrastructure 5

HPC applications Engineering applications: – Used at CERN in different departments to model and design parts of the LHC machine. – IT-PES-ES section is supporting the user community of these tools – Tools used for: structural analysis, Fluid Dynamics, Electromagnetics, Multiphysics – Major commercial tools : Ansys, Fluent, HFSS, Comsol, CST – but also open source: OpenFOAM (fluid dynamics) Physics simulation applications – PH-TH Lattice QCD simulations – BE LINAC4 plasma simulations – BE beams simulation (CLIC, LHC etc) – HEP simulation applications for theory and accelerator physics 6

Use case 1: Ansys Mechanical Where? – LINAC4 Beam Dump System Who ? – Ivo Vicente Leitao, Mechanical Engineer (EN/STI/TCD) How ? – Ansys Mechanical for design modeling and simulations (stress and thermal structural analysis) Use case 1: Ansys Mechanical 8

How does it work ? Ansys Mechanical – Structural analysis: stress and thermal, steady and transient – Finite Element Method We have physical problem defined by differential equations It is impossible to analytically solve it for complicated structure (problem) We divide problem into subdomains (elements) We solve differential equations (numerically) for selected points (nodes) And then by the means of approximation functions we project solution to the global structure Example has 6.0 Million (6M0) of mesh nodes – Compute intensive – Memory intensive Use case 1: Ansys Mechanical 9

Simulation results Measurement hardware configuration: – 2x HP 580 G7 server (4x E7-8837, 512 GB RAM, 32c), 10 Gb low latency Ethernet link Time to obtain single cycle 6M0 solution: – 8 cores -> 63 hours to finish simulation, 60 GB RAM used during simulation – 64 cores -> 17 hours to finish simulation, 2*200 GB RAM used during simulation – User interested in 50 cycles: would need 130 days on 8 cores, or 31 days on 64 cores It is impossible to get simulation results for this case in a reasonable time on a standard user engineering workstation Use case 1: Ansys Mechanical 11

Challenges Why do we care ? – Everyday we are facing users asking us how to speed up some engineering application Challenges – Problem size and its complexity are challenging user computer workstations in terms of computing power, memory size, and file I/O – This can be extrapolated to other Engineering HPC applications How to solve the problem ? – Can we use current infrastructure to provide a platform for these demanding applications ? – … or do we need something completely new ? – … and if something new, how this could fit into our IT infrastructure So, let’s have a look at what is happening behind the scene 12

Analysis tools Standard Linux performance monitoring tools used: – Memory usage: sar, – Memory bandwidth: Intel PCM (Performance Counter Monitor, open source) – CPU usage: iostat, dstat – Disk I/O : dstat – Network traffic monitoring: netstat Monitoring scripts started from the same node where the simulation job is started. Collection of measurement results is done automatically by our tools. 13

Multi-core scalability Measurement info: – LINAC4 beam dump system, single cycle simulation – 64c@1TB, 2 nodes of (quad socket Westmere, E7-8837, 512 GB), 10 Gb iWARP Results: – Ansys Mechnical simulation scales well beyond single multi-core box. – Greatly improved number of jobs/week, or simulation cycles/week Next steps: scale on more than two nodes and measure impact of MPI Conclusion – Multi-core platforms needed to finish simulation in reasonable time Use case 1 : Ansys Mechanical 14

Memory requirements In-core/out-core simulations (avoiding costly file I/O) – In-core = most of temporary data is stored in the RAM (still can write to disk during simulation) – Out-of-core = uses files on file system to store temporary data. – Preferable mode is in-core to avoid costly disk I/O accesses, but this requires increased RAM memory and its bandwidth Ansys Mechanical (and some other engineering applications) has limited scalability – Depends heavily on solver and user problem All commercial engineering application use some licencing scheme, which can put skew on choice of a platform Conclusion: – We are investigating if we can spread required memory on multiple dual socket systems, or 4 socket systems are necessary for some HPC applications – There are certain engineering simulations which seem to be limited by a memory bandwidth, this has to be also considered when choosing a platform Use case 1 : Ansys Mechanical 15

Disk I/O impact Ansys Mechanical – BE CLIC test system – Two Supermicro servers (dual E5-2650, 128 GB), 10 Gb iWARP back to back. Disk I/O impact on speedup. Two configurations compared. – Measured with sar, and iostat – Applications spends a lot of time in iowait – Using SSD instead of HDD increases jobs/week by almost 100 % Conclusion: – We need to investigate more cases to see if this is a marginal case or something more common Use case 3 : Ansys Mechanical 16

Use case 2: Fluent CFD Computational Fluid Dynamics (CFD) application, Fluent (now provided by Ansys) Beam dump system at PS booster. – Heat is generated inside the dump and you need to cool it in order to avoid it to melt or break because of mechanical stresses. Extensively parallelized MPI-based software Performance characteristics similar to other MPI-based software: – Importance of low latency for short messages – Importance of bandwidth for medium and big messages 18

Interconnect network latency impact Ansys Fluent – CFD “heavy” test case from CFD group ( EN-CV-PJ)EN-CV-PJ – 64c@1TB, 2 nodes of (quad socket Westmere, E7-8837, 512 GB), 10 Gb iWARP Speedup beyond single node can be diminished because of high latency interconnect. – The graph shows good scalability for 10 Gb low latency beyond single box, and dips in performance when switched to 1 Gb for node to node MPI Next step: Perform MPI statistical analysis (size and type of messages, computation vs. communication) Use case 2 : Fluent 19

Memory bandwidth impact Ansys Fluent: – Measured with Intel PCM – Supermicro SandyBridge server (Dual E5-2650), 102.5 GB/s peak memory bandwidth Observed “few” seconds peaks demanding 57 GB/s, during period=5s. This is very close to numbers measured with STREAM synthetic benchmark on this platform. Memory bandwidth measured with Intel PCM at memory controller level Next step: check impact of memory speed on solution time Use case 2 : Fluent 20

Analysis done so far We have invested our time to build first generation of tools in order to monitor different system parameters – Multi-core scalability (Ansys Mechanical) – Memory size requirements (Ansys Mechanical) – Memory bandwidth requirements (Fluent) – Interconnect network (Fluent) – File I/O (Ansys Mechanical) Redo some parts – Westmere 4 sockets -> SandyBridge 4 sockets Next steps: – Start performing detailed interconnect monitoring by using MPI tracing tools (Intel Trace Analyzer and Collector) 21

Physics HPC applications PH-TH: – Lattice QCD simulations BE LINAC4 plasma simulations: – plasma formation in the Linac4 ion source BE CLIC simulations: – preservation of the Luminosity in time, under the effects of dynamic imperfections, such as vibrations, ground motion, failures of accelerator components 23

Lattice QCD MPI based application with inline assembly in the most time-critical parts of the program Main objective is to investigate: – Impact of memory bandwidth on performance – Impact of interconnection network on performance (comparison of 10 Gb iWARP and Infiniband QDR) 24

BE LINAC4 Plasma studies MPI based application Users requesting system with 250 GB of RAM for 48 cores. Main objective is to investigate: – Scalability of application beyond 48 cores for a reason of spreading memory requirement on more cores than 48 25

Clusters To better understand requirements of CERN Physics HPC applications two clusters have been prepared – Investigate Scalability – Investigate importance of interconnect, memory bandwidth and file i/o Test configuration – 20x Sandy Bridge dual socket nodes with 10 Gb iWARP low latency link – 16x Sandy Bridge dual socket nodes with Quad Data Rate (40Gb/s) Infiniband 26

Next steps Activity started to better understand requirements of CERN HPC applications The standard Linux performance monitoring tools give us a very detailed overview of system behavior for different applications Next steps are to: – Refine our approach and our scripts to work at higher scale (first target 20 nodes). – gain more knowledge about impact of interconnection network on MPI jobs 28

Thank you Q&A 29

HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1.

Similar presentations

Presentation on theme: "HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1.

Similar presentations

Presentation on theme: "HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES 1."— Presentation transcript:

Similar presentations

About project

Feedback