Download presentation
Presentation is loading. Please wait.
Published byNatalie Meagan Willis Modified over 6 years ago
1
Towards the Implementation of Wind Turbine Simulations on Many-Core Systems
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2 1University of Patras, Greece 2Embry-Riddle Aeronautical University, FL, USA
2
Many systems modelled by PDEs To simulate on a computer
Introduction Many systems modelled by PDEs To simulate on a computer Discretization of the underlying PDEs Finite Element Method (FEM) Construct system of linear or non-linear equations Solve system of equations Typically very time consuming Use of HPC systems
3
Accelerate FSI simulations of next generation wind turbine blades
Target Accelerate FSI simulations of next generation wind turbine blades FSI application by J. A. Ekaterinaris Use GPU computing power to reduce execution time
4
Typical FEM Workflow Discretization of the application domain by applying a grid of elements Numerical integration which includes calculation of the local stiffness matrix (LSM) for each element Matrix assembly for constructing the global stiffness matrix from the local matrices Repeat Solve the system of linear equations described by the large, sparse matrix computed in the previous step Improvements for calculating LSMs have not much impact in the overall execution time
5
Wind Turbine Simulation Application
Next generation wind turbines are large Wind blowing applies forces Causes deformation of blades that cannot be ignored anymore Causes movement of turbine that cannot be ignored anymore Parts of turbine do not correspond to elements from discretization Simulation results are not accurate Solution Local stiffness matrix for each element has to be recalculated after each time step
6
Workflow in Wind Turbine Simulation Application
Discretization of the application domain by applying a grid of elements Repeat Numerical integration which includes calculation of the local stiffness matrix (LSM) for each element Matrix assembly for constructing the global stiffness matrix from the local matrices Solve the system of linear equations described by the large, sparse matrix computed in the previous step Accelerating construction of LSM is worth the effort
7
Recent activity
8
GPUs have evolved into extremely flexible and powerful processors
GPUs as accelerators GPUs have evolved into extremely flexible and powerful processors Contemporary GPUs provide large numbers of cores 2880 cores on NVidia Tesla K40 High throughput to cost ratio NVidia GPUs Programmable by using CUDA Extensions to industry standard programming languages
9
GPUs as accelerators
10
LSM calculations on the GPU
Calculation of the LSM of each element does not depend on other calculations Ideal candidate for computing on the GPU Typically there is a large number of elements Can naturally be handled by the programming model of the GPU Might be insufficient memory to store all the elements on the GPU
11
LSM construction pseudocode
Hexahedral elements Second order expansion NVB = 27 NPQ = 5 // Iterate over all elements for (k = 0; k < elnum; k++){ // iterate over polynomial bases for (m = 0; m < NVB; m++) { for (n = 0; n < NVB; n++) { row , col = getrowcol(m,n); // iterate over integration points for (x = 0; x < NQP; x++){ for (y = 0; y < NQP; y++){ for (z = 0; z < NQP; z++){ el[k].lsm[row][col] += "elasticity equation" }}} // x, y, z }} // m, n } // k
12
Mapping of calculations on the GPU
13
Improvements introduced for a single GPU
Overlap calculations with data transfers from/to host
14
Improvements introduced for a single GPU
All valid mappings of the loop for given number of threads for our configuration have been tested Input data reordered in memory to become GPU memory-friendly
15
Results for single GPU approach
Approach provides large improvement in execution time LSM calculations only: Up to 98.1% Total execution time: Up to 76.2% Extension: single GPU MultiGPU MultiNode
16
Multi-GPU
17
Multi-Node & Multi-GPU
18
Computing platform We thank the LinkSCEEM-2 project, funded by the European Commission under the 7th Framework Programme through Capacities Research Infrastructure, INFRA Virtual Research Communities, Combination of Collaborative Project and Coordination and Support Actions (CP-CSA) under grant agreement no RI
19
Parameters of application/experiments
8 degrees of freedom 4 cases for number of elements 256, 1024, 4096, 16384 Used up to 8 nodes of the cluster Total up to 16 GPUs
20
Speedup of LSMs calculation using 1 GPU per node
21
Speedup of LSMs calculation using 2 GPUs per node
22
Special case: 65536 elements
Does not fit into memory of 1 GPU But does fit into memory of 2 GPUs Speedup against execution time on 1 node using 2 GPUs
23
LSM calculations are highly parallelizable
Conclusion LSM calculations are highly parallelizable Significant overall improvement in execution time
24
Execute on larger cluster Allow large numbers of elements
Future Work Execute on larger cluster Allow large numbers of elements Elements do not fit into GPU memory Reorganize representation of elements in memory to better fit architectural characteristics of GPUs Parallelize more functions Include CUDA parallel solver Currently PETSc is used for this purpose Available solvers for CUDA seem to have poor performance
25
Acknowledgements This research has been co-financed by the European Union (European Social Fund -ESF) and Greek national funds through the Operational Program Education and Lifelong Learning of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES: Reinforcement of the interdisciplinary and/or inter - institutional research and innovation, (MIS , ”Expertise development for the aeroelastic analysis and the design-optimization of wind turbines”). Support by the LinkSCEEM-2 project, funded by the European Commission under the 7th Framework Programme through Capacities Research Infrastructure, INFRA Virtual Research Communities, Combination of Collaborative Project and Coordination and Support Actions (CP-CSA) under grant agreement no RI
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.