Presentation is loading. Please wait.

Presentation is loading. Please wait.

I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2

Similar presentations


Presentation on theme: "I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2"— Presentation transcript:

1 Towards the Implementation of Wind Turbine Simulations on Many-Core Systems
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2 1University of Patras, Greece 2Embry-Riddle Aeronautical University, FL, USA

2 Many systems modelled by PDEs To simulate on a computer
Introduction Many systems modelled by PDEs To simulate on a computer Discretization of the underlying PDEs Finite Element Method (FEM) Construct system of linear or non-linear equations Solve system of equations Typically very time consuming Use of HPC systems

3 Accelerate FSI simulations of next generation wind turbine blades
Target Accelerate FSI simulations of next generation wind turbine blades FSI application by J. A. Ekaterinaris Use GPU computing power to reduce execution time

4 Typical FEM Workflow Discretization of the application domain by applying a grid of elements Numerical integration which includes calculation of the local stiffness matrix (LSM) for each element Matrix assembly for constructing the global stiffness matrix from the local matrices Repeat Solve the system of linear equations described by the large, sparse matrix computed in the previous step Improvements for calculating LSMs have not much impact in the overall execution time

5 Wind Turbine Simulation Application
Next generation wind turbines are large Wind blowing applies forces Causes deformation of blades that cannot be ignored anymore Causes movement of turbine that cannot be ignored anymore Parts of turbine do not correspond to elements from discretization Simulation results are not accurate Solution Local stiffness matrix for each element has to be recalculated after each time step

6 Workflow in Wind Turbine Simulation Application
Discretization of the application domain by applying a grid of elements Repeat Numerical integration which includes calculation of the local stiffness matrix (LSM) for each element Matrix assembly for constructing the global stiffness matrix from the local matrices Solve the system of linear equations described by the large, sparse matrix computed in the previous step Accelerating construction of LSM is worth the effort

7 Recent activity

8 GPUs have evolved into extremely flexible and powerful processors
GPUs as accelerators GPUs have evolved into extremely flexible and powerful processors Contemporary GPUs provide large numbers of cores 2880 cores on NVidia Tesla K40 High throughput to cost ratio NVidia GPUs Programmable by using CUDA Extensions to industry standard programming languages

9 GPUs as accelerators

10 LSM calculations on the GPU
Calculation of the LSM of each element does not depend on other calculations Ideal candidate for computing on the GPU Typically there is a large number of elements Can naturally be handled by the programming model of the GPU Might be insufficient memory to store all the elements on the GPU

11 LSM construction pseudocode
Hexahedral elements Second order expansion NVB = 27 NPQ = 5 // Iterate over all elements for (k = 0; k < elnum; k++){ // iterate over polynomial bases for (m = 0; m < NVB; m++) { for (n = 0; n < NVB; n++) { row , col = getrowcol(m,n); // iterate over integration points for (x = 0; x < NQP; x++){ for (y = 0; y < NQP; y++){ for (z = 0; z < NQP; z++){ el[k].lsm[row][col] += "elasticity equation" }}} // x, y, z }} // m, n } // k

12 Mapping of calculations on the GPU

13 Improvements introduced for a single GPU
Overlap calculations with data transfers from/to host

14 Improvements introduced for a single GPU
All valid mappings of the loop for given number of threads for our configuration have been tested Input data reordered in memory to become GPU memory-friendly

15 Results for single GPU approach
Approach provides large improvement in execution time LSM calculations only: Up to 98.1% Total execution time: Up to 76.2% Extension: single GPU  MultiGPU  MultiNode

16 Multi-GPU

17 Multi-Node & Multi-GPU

18 Computing platform We thank the LinkSCEEM-2 project, funded by the European Commission under the 7th Framework Programme through Capacities Research Infrastructure, INFRA Virtual Research Communities, Combination of Collaborative Project and Coordination and Support Actions (CP-CSA) under grant agreement no RI

19 Parameters of application/experiments
8 degrees of freedom 4 cases for number of elements 256, 1024, 4096, 16384 Used up to 8 nodes of the cluster Total up to 16 GPUs

20 Speedup of LSMs calculation using 1 GPU per node

21 Speedup of LSMs calculation using 2 GPUs per node

22 Special case: 65536 elements
Does not fit into memory of 1 GPU But does fit into memory of 2 GPUs Speedup against execution time on 1 node using 2 GPUs

23 LSM calculations are highly parallelizable
Conclusion LSM calculations are highly parallelizable Significant overall improvement in execution time

24 Execute on larger cluster Allow large numbers of elements
Future Work Execute on larger cluster Allow large numbers of elements Elements do not fit into GPU memory Reorganize representation of elements in memory to better fit architectural characteristics of GPUs Parallelize more functions Include CUDA parallel solver Currently PETSc is used for this purpose Available solvers for CUDA seem to have poor performance

25 Acknowledgements This research has been co-financed by the European Union (European Social Fund -ESF) and Greek national funds through the Operational Program Education and Lifelong Learning of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES: Reinforcement of the interdisciplinary and/or inter - institutional research and innovation, (MIS , ”Expertise development for the aeroelastic analysis and the design-optimization of wind turbines”). Support by the LinkSCEEM-2 project, funded by the European Commission under the 7th Framework Programme through Capacities Research Infrastructure, INFRA Virtual Research Communities, Combination of Collaborative Project and Coordination and Support Actions (CP-CSA) under grant agreement no RI

26


Download ppt "I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2"

Similar presentations


Ads by Google