High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech 1 AFOSR-BRI Workshop December 19 2014 Amit Amritkar,

Slides:



Advertisements
Similar presentations
Fluent Overview Ahmadi/Nazridoust ME 437/537/637.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
Advanced CFD Analysis of Aerodynamics Using CFX
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Numerical Parallel Algorithms for Large-Scale Nanoelectronics Simulations using NESSIE Eric Polizzi, Ahmed Sameh Department of Computer Sciences, Purdue.
OpenFOAM on a GPU-based Heterogeneous Cluster
Finite Element Primer for Engineers: Part 2
Coupled Fluid-Structural Solver CFD incompressible flow solver has been coupled with a FEA code to analyze dynamic fluid-structure coupling phenomena CFD.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
1/36 Gridless Method for Solving Moving Boundary Problems Wang Hong Department of Mathematical Information Technology University of Jyväskyklä
CS240A: Conjugate Gradients and the Model Problem.
NETL 2014 Workshop on Multiphase Flow Science August 5-6, 2014, Morgantown, WV Accelerating MFIX-DEM code on the Intel Xeon Phi Dr. Handan Liu Dr. Danesh.
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Introduction Multigrid finite-element solvers using the corotational formulation of finite elements provide an attractive means for the simulation of deformable.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Introduction to virtual engineering László Horváth Budapest Tech John von Neumann Faculty of Informatics Institute of Intelligent Engineering.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Haptics and Virtual Reality
A Novel Wave-Propagation Approach For Fully Conservative Eulerian Multi-Material Simulation K. Nordin-Bates Lab. for Scientific Computing, Cavendish Lab.,
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Supercomputing Center CFD Grid Research in N*Grid Project KISTI Supercomputing Center Chun-ho Sung.
Discontinuous Galerkin Methods and Strand Mesh Generation
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech AFOSR-BRI Workshop July 20-21, 2014 Keyur Joshi,
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
© Fluent Inc. 11/24/2015J1 Fluids Review TRN Overview of CFD Solution Methodologies.
CFX-10 Introduction Lecture 1.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
HEAT TRANSFER FINITE ELEMENT FORMULATION
Mallett Technology, Inc.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
COMPUTER SIMULATION OF BLOOD FLOW WITH COMPLIANT WALLS  ITC Software All rights reserved.
FALL 2015 Esra Sorgüven Öner
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.
M. Khalili1, M. Larsson2, B. Müller1
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
The Materials Computation Center, University of Illinois Duane Johnson and Richard Martin (PIs), NSF DMR Computer science-based.
Recent Development on IN3D-ACC July 22, 2014 Recent Progress: 3D MPI Performance 1 Lixiang (Eric) Luo, Jack Edwards, Hong Luo Department of Mechanical.
Hui Liu University of Calgary
Xing Cai University of Oslo
Introduction to the Finite Element Method
Amit Amritkar & Danesh Tafti Eric de Sturler & Kasia Swirydowicz
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2
Data Structures for Efficient and Integrated Simulation of Multi-Physics Processes in Complex Geometries A.Smirnov MulPhys LLC github/mulphys
Programming Models for SimMillennium
Fluent Overview Ahmadi/Nazridoust ME 437/537/637.
Konferanse i beregningsorientert mekanikk, Trondheim, Mai, 2005
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
CS 252 Project Presentation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech 1 AFOSR-BRI Workshop December Amit Amritkar, Keyur Joshi, Long He & Danesh Tafti Collaborators Wu-chun Feng, Paul Sathre, Kaixi Hou, Sriram Chivukula, Hao Wang, Tom Scogland, Eric de Sturler & Kasia Swirydowicz

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Recap GPU version of GenIDLEST CUDA Fortran Validation studies of the GPU code Turbulent channel flow Turbulent pipe flow Application Bat flight 2

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Goals Improvement of GenIDLEST performance GPU computing Code (linear solvers) port with CUDA Fortran Optimization/MetaMorph Library Use OpenACC to accelerate the code Linear solvers Recycling of Krylov subspaces Preconditioners Parallel Fluid Structure interaction in GenIDLEST Non-linear Finite Element Method (FEM) for structure Unstructured grid Immersed Boundary Method (IBM) Interface tracking Parallelization Challenges 3

High Performance Computational Fluid-Thermal Sciences & Engineering Lab GPU computing co-design with CS team Amit Amritkar, Danesh Tafti, Wu Feng, Paul Sathre, Kaixi Hou, Sriram Chivukula, Hao Wang, Tom Scogland Manual CUDA code optimization From 5x to 10x OpenACC version of the code OpenACC vs CUDA code performance OpenACC currently at 0.6x of CUDA Integration with MetaMorph Dot product Inter mesh block communication 4

High Performance Computational Fluid-Thermal Sciences & Engineering Lab 5 Solution of pressure Poisson equation Most time consuming function (50 to 90 % of total time) Solving multiple linear systems Ax = b ‘A’ remains constant from one time step to other in many CFD calculations rGCROT/rGCRODR algorithm Recycling of vectors from one time step to the subsequent ones Hybrid approach rGCROT to build the recycle space initially rBiCG-STAB for subsequent systems for faster performance Left vs Right preconditioning Right preconditioning suited for BiCGSTAB Similar performance for rGCROT and GMRES(m) Solver co-design with Math team Amit Amritkar, Danesh Tafti, Eric deSturler, Katarzyna Swirydowicz

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Future work 6 Code Acceleration (CS team) Integrate with the MetaMorph library Assess performance on multiple architectures Overlapping computations with communications Evaluation of OpenMP 4.0 for accelerator programming Linear Solvers/preconditioners (Math team) Study of recycling algorithms on different classes of problems Convergence based on quantity of interest like turbulence statistics instead of primary variables (pressure & velocity) Multilevel preconditioner

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Publications 7 Journal Amit Amritkar, Eric De Sturler, Katarzyna Swirydowicz, Danesh Tafti and Kapil Ahuja. “Recycling Krylov subspaces for CFD application.” To be submitted to Computer methods in Applied Mechanics and Engineering Amit Amritkar and Danesh Tafti. “CFD computations using a preconditioned Krylov solver on GPUs.” Journal of Fluids Engineering, under review Conference Katarzyna Swirydowicz, Amit Amritkar, Eric De Sturler and Danesh Tafti. “Recycling Krylov subspaces for CFD application.” Presentation at ASME 2014 Fluids Engineering Division Summer Meeting, August 3-7, 2014, Chicago, Illinois, USA Amit Amritkar and Danesh Tafti. “CFD computations using preconditioned Krylov solver on GPUs.” Proceedings of ASME 2014 Fluids Engineering Division Summer Meeting, August 3-7, 2014, Chicago, Illinois, USA Amit Amritkar, Danesh Tafti, Paul Sathre, Kaixi Hou, Sriram Chivakula and Wu-Chun Feng. “Accelerating Bio-Inspired MAV Computations using GPUs.” Proceedings of AIAA Aviation and Aeronautics Forum and Exposition 2014, June 2014, Atlanta, Georgia

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Immersed Boundary Method Finite Element Solver Fluid structure Interaction coupling Benchmark simulation results Fluid Structure Interaction Long He, Keyur Joshi, Danesh Tafti

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Immersed Boundary Method Body conforming grid Immersed boundary grid

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Curvilinear body-fitting grid around a circular surface Body non-conforming cartesian grid and an immersed boundary Immersed Boundary Method

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Types of nodes and domains Fluid Solid Fluid IB Nodetype: solid is 0, fluid is 1, fluid ibnode is 2 Immersed Boundary Method

High Performance Computational Fluid-Thermal Sciences & Engineering Lab 1.Based on the immersed boundary provided by the surface grid, all the nodes in the background are assigned as one of the following nodetypes: fluid node, solid node, fluid IB node, solid IB node. 2.The governing equations are solved for all the fluid nodes in the domain. 3.Modifications are made on the IB node values in order for the fluid and solid nodes to see the presence of the immersed boundary. Nodetype: solid is 0, fluid is 1, fluid ibnode is 2 Immersed Boundary Method

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Nonlinear Structural FE Code Capable of Large deformation, large strain, large rotation Geometric Nonlinearity Total Lagrangian as well as Updated Lagrangian formulation 3D as well as 2D elements Extensible to material nonlinearity,( hyperelasticity, plasticity) Extensible to active materials such as piezo-ceramics Linear model Nonlinear model

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Special sparse matrix storage stores only nonzero elements Preconditioned Conjugate Gradient method Nonlinear iterations through Newton-Raphson iterations, also modified NR and Initial Stress updates are supported Newmark method for time integration gives unconditional stability and introduces no numerical damping Parallelized through OpenMP and extensible to MPI Exploring METIS for mesh partition and mesh adaptation Node renumbering Nonlinear Structural FE Code

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Fluid structure Interaction coupling OpenMP OpenMP/MPI

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Turek-Hron FSI Benchmark wall inlet outlet Fluid Structure Interface

High Performance Computational Fluid-Thermal Sciences & Engineering Lab FSI Case 2

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Bubble in channel flow

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Parallelization of FSI Code Currently FSI code is parallelized for shared memory architecture (OpenMP) How to extend the code further to MPI or GPU computing to maximize parallel efficiency? Challenges  Partitioning the unstructured Structural FE mesh (using Parmetis/Scotch)  Creating ghost cells at partition boundaries  Creating gather and scatter algorithm for minimizing communication cost.  Synchronization  Debugging on parallel architecture

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Parallelization Scenarios: Scenario-1 Scenario 1 – Structural computation is less demanding and localized to one processor We would like to restrict structure to 1 MPI process with several OpenMP/GPU threads while Fluid is spawned on multiple MPI processes Proposed Solution

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Scenario 2 – Structure has multiple units and presents an opportunity to be solved independently, e.g., QuadCopter Parallelization Scenarios: Scenario-2 Proposed Solution We would like to solve these structural units on independent MPI processes, each unit can use several OpenMP/GPU threads while Fluid is spawned on multiple MPI processes irrespective of structural partition

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Scenario 3 – Structural computation themselves are demanding We would like to break structural unit itself into multiple MPI processes, each unit can use several OpenMP/GPU threads while Fluid is spawned on multiple MPI processes irrespective of structural partition Parallelization Scenarios: Scenario-3 Proposed Solution

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Future Application Deformation of elastic wings: Energy harvesting applications Shape memory alloys Biomechanical Engineering Airfoil Aerodynamics Aero-elasticity 23 Source:

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Future work Structural solver (co-design with Math teams) Evaluate different preconditioners (currently point Jacobi) Non-diagonal dominant matrix Evaluate time stepping algorithms Parallelization of FSI Collaborate with CS/Math teams to identify most effective ways to parallelize FSI problem Evaluate use of PGAS (partitioned global address space) model – global arrays 24

High Performance Computational Fluid-Thermal Sciences & Engineering Lab Left vs Right preconditioner Turbulent channel flow case Application to BiCGStab Flow through porous media 10 time steps rGCROT Hybrid approach (rGCROT + rBiCGStab) Time steps Average iterationsTime Right Left Left (PC residual) RightLeft Time (S) RightLeft Time (S)