I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2

Towards the Implementation of Wind Turbine Simulations on Many-Core Systems
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2 1University of Patras, Greece 2Embry-Riddle Aeronautical University, FL, USA

Many systems modelled by PDEs To simulate on a computer
Introduction Many systems modelled by PDEs To simulate on a computer Discretization of the underlying PDEs Finite Element Method (FEM) Construct system of linear or non-linear equations Solve system of equations Typically very time consuming Use of HPC systems

Accelerate FSI simulations of next generation wind turbine blades
Target Accelerate FSI simulations of next generation wind turbine blades FSI application by J. A. Ekaterinaris Use GPU computing power to reduce execution time

Typical FEM Workflow Discretization of the application domain by applying a grid of elements Numerical integration which includes calculation of the local stiffness matrix (LSM) for each element Matrix assembly for constructing the global stiffness matrix from the local matrices Repeat Solve the system of linear equations described by the large, sparse matrix computed in the previous step Improvements for calculating LSMs have not much impact in the overall execution time

Wind Turbine Simulation Application
Next generation wind turbines are large Wind blowing applies forces Causes deformation of blades that cannot be ignored anymore Causes movement of turbine that cannot be ignored anymore Parts of turbine do not correspond to elements from discretization Simulation results are not accurate Solution Local stiffness matrix for each element has to be recalculated after each time step

Workflow in Wind Turbine Simulation Application
Discretization of the application domain by applying a grid of elements Repeat Numerical integration which includes calculation of the local stiffness matrix (LSM) for each element Matrix assembly for constructing the global stiffness matrix from the local matrices Solve the system of linear equations described by the large, sparse matrix computed in the previous step Accelerating construction of LSM is worth the effort

Recent activity

GPUs have evolved into extremely flexible and powerful processors
GPUs as accelerators GPUs have evolved into extremely flexible and powerful processors Contemporary GPUs provide large numbers of cores 2880 cores on NVidia Tesla K40 High throughput to cost ratio NVidia GPUs Programmable by using CUDA Extensions to industry standard programming languages

GPUs as accelerators

LSM calculations on the GPU
Calculation of the LSM of each element does not depend on other calculations Ideal candidate for computing on the GPU Typically there is a large number of elements Can naturally be handled by the programming model of the GPU Might be insufficient memory to store all the elements on the GPU

LSM construction pseudocode
Hexahedral elements Second order expansion NVB = 27 NPQ = 5 // Iterate over all elements for (k = 0; k < elnum; k++){ // iterate over polynomial bases for (m = 0; m < NVB; m++) { for (n = 0; n < NVB; n++) { row , col = getrowcol(m,n); // iterate over integration points for (x = 0; x < NQP; x++){ for (y = 0; y < NQP; y++){ for (z = 0; z < NQP; z++){ el[k].lsm[row][col] += "elasticity equation" }}} // x, y, z }} // m, n } // k

Mapping of calculations on the GPU

Improvements introduced for a single GPU
Overlap calculations with data transfers from/to host

Improvements introduced for a single GPU
All valid mappings of the loop for given number of threads for our configuration have been tested Input data reordered in memory to become GPU memory-friendly

Results for single GPU approach
Approach provides large improvement in execution time LSM calculations only: Up to 98.1% Total execution time: Up to 76.2% Extension: single GPU  MultiGPU  MultiNode

Multi-GPU

Multi-Node & Multi-GPU

Computing platform We thank the LinkSCEEM-2 project, funded by the European Commission under the 7th Framework Programme through Capacities Research Infrastructure, INFRA Virtual Research Communities, Combination of Collaborative Project and Coordination and Support Actions (CP-CSA) under grant agreement no RI

Parameters of application/experiments
8 degrees of freedom 4 cases for number of elements 256, 1024, 4096, 16384 Used up to 8 nodes of the cluster Total up to 16 GPUs

Speedup of LSMs calculation using 1 GPU per node

Speedup of LSMs calculation using 2 GPUs per node

Special case: 65536 elements
Does not fit into memory of 1 GPU But does fit into memory of 2 GPUs Speedup against execution time on 1 node using 2 GPUs

LSM calculations are highly parallelizable
Conclusion LSM calculations are highly parallelizable Significant overall improvement in execution time

Execute on larger cluster Allow large numbers of elements
Future Work Execute on larger cluster Allow large numbers of elements Elements do not fit into GPU memory Reorganize representation of elements in memory to better fit architectural characteristics of GPUs Parallelize more functions Include CUDA parallel solver Currently PETSc is used for this purpose Available solvers for CUDA seem to have poor performance

Acknowledgements This research has been co-financed by the European Union (European Social Fund -ESF) and Greek national funds through the Operational Program Education and Lifelong Learning of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES: Reinforcement of the interdisciplinary and/or inter - institutional research and innovation, (MIS , ”Expertise development for the aeroelastic analysis and the design-optimization of wind turbines”). Support by the LinkSCEEM-2 project, funded by the European Commission under the 7th Framework Programme through Capacities Research Infrastructure, INFRA Virtual Research Communities, Combination of Collaborative Project and Coordination and Support Actions (CP-CSA) under grant agreement no RI

I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2

Similar presentations

Presentation on theme: "I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2

Similar presentations

Presentation on theme: "I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2"— Presentation transcript:

Similar presentations

About project

Feedback